robots.txt for Gemini formalised

Sean Conner sean at conman.org
Mon Nov 23 02:05:41 GMT 2020


It was thus said that the Great Robert khuxkm Miles once stated:
> 
> Is there any good usecase for a proxy User-Agent in robots.txt, other than
> blocking web spiders from being able to crawl gemspace? If not, I would be
> in favor of dropping that part of the definition.

  I'm in favor of dropping that part of the definition as it doesn't make
sense at all.  Given a web based proxy at <https://example.com/gemini>, web
crawlers will check for <https://example.com/robots.txt> for guidance, not
<https://example.com/gemini?gemini.conman.org/robots.txt>.  Web crawlers
will not be able to crawl gemini space for two main reasons:

        1. Most server certificates are self-signed and opt out of the CA
           business.  And even if a crawler where to accept self-signed
          (or non-standard CA signed) certificates, then---

        2. The Gemini protocol is NOT HTTP, so all such HTTP requests will
           fail anyway.

  -spc


More information about the Gemini mailing list