Crawlers on Gemini and best practices

Stephane Bortzmeyer stephane at sources.org
Thu Dec 10 16:42:20 GMT 2020

Previous message (by thread): Crawlers on Gemini and best practices
Next message (by thread): Crawlers on Gemini and best practices
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Thu, Dec 10, 2020 at 03:18:31PM +0100,
 Petite Abeille <petite.abeille at gmail.com> wrote 
 a message of 16 lines which said:

> Perhaps best to look at how things are actually implemented in the
> wild :)

I have a big disagreement with this approach. As a matter of principle
(this approach allow big actors to set de facto standards and forcing
the others into a race which was precisely what made the Web the
bloated horror it is) and also because you cannot check every possible
implementation, and, anyway, they disagree among them.

So, no, I want a clear specification of what a crawler is supposed to
do.

> Given your example -and a user agent of archiver- robots.txt
> Validator and Testing Tool at
> https://technicalseo.com/tools/robots-txt/ says Disallow.

I was not able to make it work. It keeps telling me that
http://t.example/foo is "Invalid URL" and I find no way to enter an
arbitrary User-Agent. And, anyway, it will not be an official test,
just one implementation with some proprietary extensions.

Previous message (by thread): Crawlers on Gemini and best practices
Next message (by thread): Crawlers on Gemini and best practices
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Gemini mailing list