Crawlers on Gemini and best practices

Stephane Bortzmeyer stephane at sources.org
Thu Dec 10 16:35:12 GMT 2020


On Thu, Dec 10, 2020 at 02:43:11PM +0100,
 Stephane Bortzmeyer <stephane at sources.org> wrote 
 a message of 26 lines which said:

> The spec is quite vague about the *order* of directives.

Another example of the fact that you cannot rely on robots.txt:
regexps. The official site <http://www.robotstxt.org/robotstxt.html>
is crystal-clear: "Note also that globbing and regular expression are
not supported in either the User-agent or Disallow lines".

But in the wild you find things like
<gemini://drewdevault.com/robots.txt>:

User-Agent: gus
Disallow: /cgi-bin/web.sh?*

Opinion: may be we should specify a syntax for Gemini's robots.txt,
not relying on the broken Web one?


More information about the Gemini mailing list