Crawlers on Gemini and best practices
Stephane Bortzmeyer
stephane at sources.org
Thu Dec 10 16:35:12 GMT 2020
On Thu, Dec 10, 2020 at 02:43:11PM +0100,
Stephane Bortzmeyer <stephane at sources.org> wrote
a message of 26 lines which said:
> The spec is quite vague about the *order* of directives.
Another example of the fact that you cannot rely on robots.txt:
regexps. The official site <http://www.robotstxt.org/robotstxt.html>
is crystal-clear: "Note also that globbing and regular expression are
not supported in either the User-agent or Disallow lines".
But in the wild you find things like
<gemini://drewdevault.com/robots.txt>:
User-Agent: gus
Disallow: /cgi-bin/web.sh?*
Opinion: may be we should specify a syntax for Gemini's robots.txt,
not relying on the broken Web one?
More information about the Gemini
mailing list