Assuming disallow-all, and some research on robots.txt in Geminispace (Was: Re: robots.txt for Gemini formalised)
John Cowan
cowan at ccil.org
Tue Nov 24 23:44:23 GMT 2020
On Tue, Nov 24, 2020 at 3:25 PM Nick Thomas <gemini at ur.gs> wrote:
> > Of the 362 hosts known to GUS, only 36 have a robots.txt file, so
> > any choice made as to what the default robots.txt should be will
> > affect around 90% of Geminispace
>
> Thanks for running the numbers on this. I agree with everything you
> said based on them. That any change affects such a large proportion of
> existing geminispace is especially worth emphasising.
>
Why is that a Good Thing? It's another piece of bureaucracy: 90% of hosts
were happy to be archived before, so now they have to write a robots.txt
file. Although small for any one server operator, it is large when
multiplied by the number of servers there *will be*. "Small Internet" does
not mean "Internet with only a few servers", AFAIK.
Two things about the Internet Archive:
1) It is a U.S. public library, which gives it special rights when it comes
to making copies.
2) Though it does not respect robots.txt, it is happy to make your content
invisible to archive users by informal request (or, of course, by a DCMA
takedown notice).
John Cowan http://vrici.lojban.org/~cowan cowan at ccil.org
Gules six bars argent on a canton azure 50 mullets argent
six five six five six five six five and six
--blazoning the U.S. flag <http://web.meson.org/blazonserver>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.orbitalfox.eu/archives/gemini/attachments/20201124/cf7c7400/attachment.htm>
More information about the Gemini
mailing list