[discussion] The matter of Robots.txt
Alan Bunbury
gemini at bunburya.eu
Thu Oct 21 14:05:19 BST 2021
Why wouldn't we? We certainly have a lot of bots so it seems reasonable
to have robots.txt.
I learned the value of robots.txt soon after setting up Remini, my
Gemini proxy for Reddit. Many Reddit pages tend to link to a lot of
other Reddit pages, so crawlers that visited Remini were sent down a
rabbit hole which ultimately led to them trying to index all of Reddit
(which is huge) via the proxy.
That's obviously not a usual case but I don't think it's *that* unusual
either, in Geminispace. More generally, it seems obvious to me that
there should be a (mostly) agreed-upon way to direct the behaviour of
bots that visit one's capsule, so if there are good arguments against
robots.txt I'd be interested in hearing them. I don't think this is
strictly speaking a Gemini question though, as the robots exclusion
standard is something quite separate to Gemini (or HTTP).
On 21/10/2021 13:41, Andrew Singleton wrote:
>
> I'm going to lead in with a question prompted by Sean's experiences.
>
> Do we even need a robots.txt?
>
> --
> -----
> http://singletona082.flounder.online
> gemini://singletona082.flounder.online
> My online presence
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.orbitalfox.eu/archives/gemini/attachments/20211021/43034106/attachment.htm>
More information about the Gemini
mailing list