Assuming disallow-all, and some research on robots.txt in Geminispace (Was: Re: robots.txt for Gemini formalised)

Krixano krixano at protonmail.com
Thu Nov 26 06:18:48 GMT 2020

Previous message (by thread): Assuming disallow-all, and some research on robots.txt in Geminispace (Was: Re: robots.txt for Gemini formalised)
Next message (by thread): Assuming disallow-all, and some research on robots.txt in Geminispace (Was: Re: robots.txt for Gemini formalised)
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

One more thing I want to point out... copyright law isn't opt-in. It's opt-out.
If you don't have a copyright statement or any other licensing information,
then "all rights reserved" is automatically assumed, afaik. You can't just copy
something just because the author didn't explicitly disallow you from doing that.

Christian Seibold

Sent with [ProtonMail](https://protonmail.com/) Secure Email.

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Thursday, November 26th, 2020 at 12:12 AM, Krixano <krixano at protonmail.com> wrote:

> I'm not sure why Internet Archive matters here. Just because they do something doesn't mean
> it's the right thing to do. Seems like an appeal to authority to me.
>
> Christian Seibold
>
> Sent with [ProtonMail](https://protonmail.com/) Secure Email.
>
> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> On Wednesday, November 25th, 2020 at 9:10 AM, John Cowan <cowan at ccil.org> wrote:
>
>> On Wed, Nov 25, 2020 at 6:32 AM Nick Thomas <gemini at ur.gs> wrote:
>>
>>> (Received off-list, but I assume it was *meant* for the list, so
>>> replying there)
>>
>> It was, so thanks. My private messages are labeled (Private message) at the top because I make this mistake a lot.
>>
>>> Whatever the outcome of the opt-in vs opt-out part of this discussion,
>>
>> That's the only part that concerns me. A robots.txt spec is good and crawlers/archivers that respect it are fine too, though of course some won't.
>>
>> I once wrote to the author of a magazine article who had published a simple crawler that it would hammer whatever server it was crawling, since it did not delay between requests or intersperse them with requests to other servers, but simply walked the server's tree depth-first. and that it should respect robots.txt. He wrote back saying "That's the Internet today; deal with it." I could have answered (but I didn't) that hits are a cost to the server operator, and anyone running his dumb crawler was not only DDOSing, but spending my money for his own purposes.
>>
>> But I do think that once robots.txt support is in place, no robots.txt = no expressed preference.
>>
>>> If it's true for people with an explicit preference, it can also be
>>> true for people who haven't expressed a preference yet. Since Gemini
>>> has a higher standard for user privacy than the web, it can also have a
>>> higher standard for these preferences - one that does not rely on
>>> presumed consent - if we want it to.
>>
>> By this logic, nobody should be able to access a Gemini server at all unless the capsule author has expressed a preference for them to do so. But to publish is to expose your work to the public.
>>
>>> The FAQ immediately above the one you quoted reads:
>>>
>>>> Why isn't the site I'm looking for in the archive?*
>>>
>>>> Some sites may not be included because the automated crawlers were
>>>> unaware of their existence at the time of the crawl. It's also
>>>> possible that some sites were not archived because they were
>>>> password protected, blocked by robots.txt, or otherwise inaccessible
>>>> to our automated systems. Site owners might have also requested that
>>>> their sites be excluded from the Wayback Machine.
>>
>> I interpret that to mean that some sites were not crawled during the period when the Archive was paying attention to robots.txt, and so their content as of that date is unavailable. Note the past tense: "were [...] protected by robots.txt" as opposed to "are protected".
>>
>>> If archive.org didn't respect robots.txt at all, it would lend a lot of
>>> flavour to the "archiver" virtual user-agent idea in the companion
>>> spec, in addition to this discussion. Do you still have doubts after
>>> reading this section?
>>
>> I have no doubt whatever that the crawler doesn't respect robots.txt. I could do a little experiment, though.
>>
>> John Cowan http://vrici.lojban.org/~cowan cowan at ccil.org
>> The competent programmer is fully aware of the strictly limited size of his own
>> skull; therefore he approaches the programming task in full humility, and among
>> other things he avoids clever tricks like the plague. --Edsger Dijkstra
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.orbitalfox.eu/archives/gemini/attachments/20201126/625adbc1/attachment.htm>

Previous message (by thread): Assuming disallow-all, and some research on robots.txt in Geminispace (Was: Re: robots.txt for Gemini formalised)
Next message (by thread): Assuming disallow-all, and some research on robots.txt in Geminispace (Was: Re: robots.txt for Gemini formalised)
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Gemini mailing list