robots.txt for Gemini formalised

Nick Thomas gemini at ur.gs
Tue Nov 24 11:42:26 GMT 2020


Hi,

On Sun, 2020-11-22 at 17:31 +0100, Solderpunk wrote:
> Hi folks,
> 
> There is now (finally!) an official reference on the use of
> robots.txt
> files in Geminispace.  Please see:
> 
> gemini://gemini.circumlunar.space/docs/companion/robots.gmi

Thanks for this. One change that I'd be interested in is adding a
statement that if there is no `robots.txt` for the site, we assume an
implicit disallow-all for all the virtual-agents except proxies.

Presumed consent, with opt-outs for the tiny minority of people who
have the time and mental space to work out how to get those opt-outs to
apply, is standard behaviour on the web, but it's not behaviour I like.
GitHub recently dumped code of mine into an arctic vault, for instance;
the archive.org snapshots of geminispace have similar dynamics. We can
do better by asking people to opt *in* to these kinds of things if they
want it, rather than to opt *out* if they don't.

I exclude Virtual-Agent: webproxy here because the likely use of such a
proxy is transient, rather than persistent. It seems odd to me that it
sits alongside indexing, archival, and research, all of which lead to
durable artifacts on success. It does complicate things a little to
treat it differently, thought.

Thoughts? I appreciate this would impact on the ability of archivists
or researchers to capture geminispace, but I see that as a feature,
rather than an unfortunate side-effect :). 

/Nick



More information about the Gemini mailing list