robots.txt for Gemini formalised
John Cowan
cowan at ccil.org
Mon Nov 23 00:44:40 GMT 2020
Of course they can: that's always true, as the pre-spec already says. The
idea is to give crawlers (etc.) that want to keep to the rules some way to
clearly and uniquely identify themselves to servers.
On Sun, Nov 22, 2020 at 7:39 PM Adnan Maolood <me at adnano.co> wrote:
> On Sun Nov 22, 2020 at 7:30 PM EST, John Cowan wrote:
> > Additionally: "Agent:" should specify a SHA-256 hash of the client cert
> > used by particular crawlers rather than a random easy-to-forge name.
> > Thus
> > GUS should crawl using a cert and publicly post the hash of this cert.
> > Then callers with that cert are necessarily GUS, since the cert itself
> > is
> > not published. (Of course it's still possible for a server to steal
> > GUS's
> > client cert.)
>
> This doesn't seem very useful, as bad robots can simply ignore the rules
> in robots.txt.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.orbitalfox.eu/archives/gemini/attachments/20201122/4c9e06ec/attachment.htm>
More information about the Gemini
mailing list