Assuming disallow-all, and some research on robots.txt in Geminispace (Was: Re: robots.txt for Gemini formalised)
Luke Emmet
luke at marmaladefoo.com
Thu Nov 26 10:15:49 GMT 2020
On 25-Nov-2020 00:18, Nick Thomas wrote:
>
> You're presuming consent here. We don't actually *know* that said 90%
> of hosts are happy to be archived; we only know that 90% of hosts
> haven't included a robots.txt file, which could be for any one of a
> multitude of reasons.
>
> *If* a not-insignificant proportion of those hosts without robots.txt
> files would actually prefer not to be included in archives when asked,
> the current situation is not serving their privacy well, and gemini is
> suppose to be protective of user privacy. *If* an overwhelming majority
> of them simply don't care, then sure, the argument for it starts to
> look a bit niche. Talking in IRC earlier today, I hand-waved a 5%
> threshold for the first condition and 1% for the second.
>
> A personal example: *I* didn't have a robots.txt on my capsule file
> until today, but I don't want to be included in archives for various
> reasons. Presuming consent from the lack of a robots.txt file would
> have incorrectly guessed my preference, and harmed my privacy. Who else
> in that 90% is like me? We don't know.
>
Hello all
Personally, I'm not really that interested in the legal arguments back
and forth about archiving and access. Yes there are some legal case
precedents in this area in some jurisdictions, but I would say that by
and large that ship has sailed. Sorry about that folks. The web is the
de-facto baseline reference in this respect, whether we like it or not.
If you *publish* information on the internet, there *will* be actors who
will re-purpose it. Gemini is no different to the web in this.
If any of us have information that is to be preserved as private, I
cannot see how you can expect that to be achieved if you publish on the
public internet (i.e. servers that do not require authentication). If
you want to hide something, use authentication or a private channel.
Yes there is robots.txt which is an opt-out mechanism, from general
robot access to a server's content. It is established practice and good
actors will respect it. But it cannot be a mechanism to preserve privacy.
My take on the whole "Gemini preserves privacy better" is really about
clients. We don't have extended headers, cookies or agent names in
requests. So to that extent, client privacy is maintained better than
the web, where the expectation is of long term, cross-session tracking.
We dont thankfully have that.
I don't see it as Gemini's role to attempt to set a cultural/legal
privacy framework for servers who are choosing to publish on Gemini. We
cannot imagine we can break new ground in this respect. We can however
do our efforts to have this as a side effect of technical design in the
protocol itself, and within the Gemini community we can look out for
risks in exposing such personal information via the protocol.
If Gemini ever becomes interesting enough to the outside world that some
case goes to court (what a publicity success that would be!), surely the
existing infrastructure of public server hypertext systems, namely the
web, will be the established precedent.
So I support use of robots.txt, but if none exists, the presumption -
like the web - is that access and usage is allowed. If some actor
doesn't follow a server's robots.txt, I'm sad about it, but we should
ultimately expect it.
- Luke
More information about the Gemini
mailing list