robots.txt for Gemini
solderpunk
solderpunk at SDF.ORG
Thu Mar 26 19:57:17 GMT 2020
On Tue, Mar 24, 2020 at 05:35:08PM -0400, Sean Conner wrote:
> Two possible solutions for robot identification:
>
> 1) Allow IP addresses to be used where a user-agent would be specificifed.
> Some examples:
>
> User-agent: 172.16.89.3
> User-agent: 172.17.24.0/27
> User-agent: fde7:a680:47d3/48
>
> Yes, I'm including CIDR (Classless Inter-Domain Routing) notation to specify
> a range of IP addresses. And for a robot, if your IP addresss matches an IP
> address (or range), then you need to follow the following rules.
Hmm, I'm not a huge fan of this idea (although I recognise it as a valid
technical solution to the problem at hand, which is perhaps all you
meant it to be). Mostly because I don't like to encourage people to
think of IP addresses as permanently mapping to, well, just anything.
The address of a VPN running an abusive bot today might be handed out to
a different customer running a well-behaved bot next year.
> 2) Use the fragment portion of a URL to designate a robot. The fragment
> portion of a URL has no meaning for a server (it does for a client). A
> robot could use this fact to skip it its identifier when making a request.
> The server MUST NOT use this information, but the logs could show it. For
> example, a robot could request:
>
> gemini://example.com/robots.txt#GUS
>
> A review of the logs would reveal that GUS is a robot, and the text "GUS"
> could be placed in the User-agent: field to control it. It SHOULD be the
> text the robot would recognize in robots.txt.
Hmm, nice out-of-the-box thinking. Since the suggestion has come from
you I will assume it does not violate the letter of any RFCs, even
though I can't shake a strange feeling that this is "abusing" the
fragment concept a little...
Cheers,
Solderpunk
More information about the Gemini
mailing list