Crawlers on Gemini and best practices
Stephane Bortzmeyer
stephane at sources.org
Tue Dec 8 13:36:56 GMT 2020
I just developed a simple crawler for Gemini. Its goal is not to build
another search engine but to perform some surveys of the
geminispace. A typical result will be something like (real data, but
limited in size):
gemini://gemini.bortzmeyer.org/software/crawler/
Currently, I did not yet let it loose on the Internet, because there
are some questions I have.
Is it "good practice" to follow robots.txt? There is no mention of it
in the specification but it could work for Gemini as well as for the
Web and I notice that some programs query this name on my server.
Since Gemini (and rightly so) has no User-Agent, how can a bot
advertise its policy and a point of contact?
More information about the Gemini
mailing list