Crawlers on Gemini and best practices

Solene Rapenne solene at perso.pw
Tue Dec 8 13:58:40 GMT 2020


On Tue, 8 Dec 2020 14:36:56 +0100
Stephane Bortzmeyer <stephane at sources.org>:

> I just developed a simple crawler for Gemini. Its goal is not to build
> another search engine but to perform some surveys of the
> geminispace. A typical result will be something like (real data, but
> limited in size):
> 
> gemini://gemini.bortzmeyer.org/software/crawler/
> 
> Currently, I did not yet let it loose on the Internet, because there
> are some questions I have.
> 
> Is it "good practice" to follow robots.txt? There is no mention of it
> in the specification but it could work for Gemini as well as for the
> Web and I notice that some programs query this name on my server.
> 
> Since Gemini (and rightly so) has no User-Agent, how can a bot
> advertise its policy and a point of contact?

depending on what you try, you may add your contact info
in the query.

First contact with a new server before you start crawling you
could get gemini://hostname/CRAWLER_FROM_SOMEONE_AT_HOST_DOT_COM

This is what I do for a gopher connectivity check.

I have to admit, it's a really poor solution but I didn't
find better way.


More information about the Gemini mailing list