Assuming disallow-all, and some research on robots.txt in Geminispace (Was: Re: robots.txt for Gemini formalised)

Luke Emmet luke at marmaladefoo.com
Thu Nov 26 10:15:49 GMT 2020

Previous message (by thread): Assuming disallow-all, and some research on robots.txt in Geminispace (Was: Re: robots.txt for Gemini formalised)
Next message (by thread): Assuming disallow-all, and some research on robots.txt in Geminispace (Was: Re: robots.txt for Gemini formalised)
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 25-Nov-2020 00:18, Nick Thomas wrote:
>
> You're presuming consent here. We don't actually *know* that said 90%
> of hosts are happy to be archived; we only know that 90% of hosts
> haven't included a robots.txt file, which could be for any one of a
> multitude of reasons.
>
> *If* a not-insignificant proportion of those hosts without robots.txt
> files would actually prefer not to be included in archives when asked,
> the current situation is not serving their privacy well, and gemini is
> suppose to be protective of user privacy. *If* an overwhelming majority
> of them simply don't care, then sure, the argument for it starts to
> look a bit niche. Talking in IRC earlier today, I hand-waved a 5%
> threshold for the first condition and 1% for the second.
>
> A personal example: *I* didn't have a robots.txt on my capsule file
> until today, but I don't want to be included in archives for various
> reasons. Presuming consent from the lack of a robots.txt file would
> have incorrectly guessed my preference, and harmed my privacy. Who else
> in that 90% is like me? We don't know.
>
Hello all

Personally, I'm not really that interested in the legal arguments back 
and forth about archiving and access. Yes there are some legal case 
precedents in this area in some jurisdictions, but I would say that by 
and large that ship has sailed. Sorry about that folks. The web is the 
de-facto baseline reference in this respect, whether we like it or not.

If you *publish* information on the internet, there *will* be actors who 
will re-purpose it. Gemini is no different to the web in this.

If any of us have information that is to be preserved as private, I 
cannot see how you can expect that to be achieved if you publish on the 
public internet (i.e. servers that do not require authentication). If 
you want to hide something, use authentication or a private channel.

Yes there is robots.txt which is an opt-out mechanism, from general 
robot access to a server's content. It is established practice and good 
actors will respect it. But it cannot be a mechanism to preserve privacy.

My take on the whole "Gemini preserves privacy better" is really about 
clients. We don't have extended headers, cookies or agent names in 
requests. So to that extent, client privacy is maintained better than 
the web, where the expectation is of long term, cross-session tracking. 
We dont thankfully have that.

I don't see it as Gemini's role to attempt to set a cultural/legal 
privacy framework for servers who are choosing to publish on Gemini. We 
cannot imagine we can break new ground in this respect. We can however 
do our efforts to have this as a side effect of technical design in the 
protocol itself, and within the Gemini community we can look out for 
risks in exposing such personal information via the protocol.

If Gemini ever becomes interesting enough to the outside world that some 
case goes to court (what a publicity success that would be!), surely the 
existing infrastructure of public server hypertext systems, namely the 
web, will be the established precedent.

So I support use of robots.txt, but if none exists, the presumption - 
like the web -  is that access and usage is allowed. If some actor 
doesn't follow a server's robots.txt, I'm sad about it, but we should 
ultimately expect it.

  - Luke

Previous message (by thread): Assuming disallow-all, and some research on robots.txt in Geminispace (Was: Re: robots.txt for Gemini formalised)
Next message (by thread): Assuming disallow-all, and some research on robots.txt in Geminispace (Was: Re: robots.txt for Gemini formalised)
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Gemini mailing list