Web proxies and robots.txt (was Re: Heads up about a Gemini client @ 198.12.83.123)

Sean Conner sean at conman.org
Mon Nov 30 02:25:18 GMT 2020


It was thus said that the Great Sean Conner once stated:
> 
>   It's not threatening my server or anything, but who ever is responsible
> for the client at 198.12.83.123, your client is currently stuck in the
> Redirection From Hell test and has been for some time.  From the length of
> time, it appears to be running autonomously so perhaps a leftover thread, or
> an autonomous client that doesn't read robots.txt, or didn't follow the spec
> carefully enough.
> 
>   Anyway, just a heads up.
> 
>   -spc

  So the client in question was most likely a web proxy.  I'm not sure what
site, nor the software used, but it did response to a Gemini request with
"53 Proxy Requet Refused" so there *is* a Gemini server there.  And given
that it made 137,060 requests before I shut down my own server told me that
it was an autonomous agent that no one was watching.  Usually, I may see a
client hit 20 or 30 times before it stops.  Not this one.

  Now granted, my server is a bit unique in that I have tests set up
specifically for clients to test against, and several of them involve
infinite redirects.  And yes, that was 137,060 *unique* requests.

  So first up, Solderpunk, if you could please add a redirection follow
limit to the specification and make it mandatory.  You can specify some
two, heck, even three digit number to follow, but please, *please*, add it
to the specification and *not* just the best practices document to make
programmers aware of the issue.  It seems like it's too easy to overlook
this potential trap (I see it often enough).

  Second, had the proxy in question fetched robots.txt, I had this area
specifically marked out:

User-agent: *
Disallow: /test/redirehell 

  I have that for a reason, and had the autonomous client in question read
it, this wouldn't have happened in the first place.  Even if you disagree
with this, it may be difficult to stop an autonomous agent once the user of
said web proxy has dropped the web connection.  I don't know, I haven't
written a web proxy, and this is one more thing to keep in mind when writing
one.  I think it would be easier to follow robots.txt.

  -spc (To the person who called me a dick for blocking a web proxy---yes,
	there *are* reasons to block them)


More information about the Gemini mailing list