Crawlers on Gemini and best practices
colecmac at protonmail.com
colecmac at protonmail.com
Fri Dec 11 20:38:12 GMT 2020
(Sorry if this is the wrong place to reply.)
Why are we defining new standards and filenames? bots.txt, .well-know, etc.
We don't need this.
Gemini is based around the idea of radical familiarity. Creating a new robots
standard breaks that, and makes things more complicated. There are existing
complete robots.txt standards, are there not? I admit I'm not well-versed in
this, but let's just pick a standard that works and make it offical.
After doing some quick research, I found that Google has submitted a draft
spec for robots.txt to the IETF. The original draft was submitted on July 07,
2019, and the most recent draft was submitted ~3 days ago, on the 8th.
https://developers.google.com/search/reference/robots_txt
https://tools.ietf.org/html/draft-koster-rep-04
I am no big fan of Google, but they are the kings of crawling and it makes sense
to go with them here.
The spec makes many example references to HTTP, but note that it is fully
protocol-agnostic, so it works fine for Gemini.
makeworld
More information about the Gemini
mailing list