Crawlers on Gemini and best practices
Stephane Bortzmeyer
stephane at sources.org
Thu Dec 10 16:42:20 GMT 2020
On Thu, Dec 10, 2020 at 03:18:31PM +0100,
Petite Abeille <petite.abeille at gmail.com> wrote
a message of 16 lines which said:
> Perhaps best to look at how things are actually implemented in the
> wild :)
I have a big disagreement with this approach. As a matter of principle
(this approach allow big actors to set de facto standards and forcing
the others into a race which was precisely what made the Web the
bloated horror it is) and also because you cannot check every possible
implementation, and, anyway, they disagree among them.
So, no, I want a clear specification of what a crawler is supposed to
do.
> Given your example -and a user agent of archiver- robots.txt
> Validator and Testing Tool at
> https://technicalseo.com/tools/robots-txt/ says Disallow.
I was not able to make it work. It keeps telling me that
http://t.example/foo is "Invalid URL" and I find no way to enter an
arbitrary User-Agent. And, anyway, it will not be an official test,
just one implementation with some proprietary extensions.
More information about the Gemini
mailing list