robots.txt for Gemini formalised
Sean Conner
sean at conman.org
Mon Nov 23 02:05:41 GMT 2020
It was thus said that the Great Robert khuxkm Miles once stated:
>
> Is there any good usecase for a proxy User-Agent in robots.txt, other than
> blocking web spiders from being able to crawl gemspace? If not, I would be
> in favor of dropping that part of the definition.
I'm in favor of dropping that part of the definition as it doesn't make
sense at all. Given a web based proxy at <https://example.com/gemini>, web
crawlers will check for <https://example.com/robots.txt> for guidance, not
<https://example.com/gemini?gemini.conman.org/robots.txt>. Web crawlers
will not be able to crawl gemini space for two main reasons:
1. Most server certificates are self-signed and opt out of the CA
business. And even if a crawler where to accept self-signed
(or non-standard CA signed) certificates, then---
2. The Gemini protocol is NOT HTTP, so all such HTTP requests will
fail anyway.
-spc
More information about the Gemini
mailing list