Getting slammed by a client
Hannu Hartikainen
hannu.hartikainen+gemini at gmail.com
Sat Jul 25 09:51:39 BST 2020
Thanks for pointing this out. I never read logs if everything works, so...
I'm getting lots of requests to urls like
gemini://hannuhartikainen.fi/twinwiki/Welcome,%20visitors%21/twinwiki/Welcome%252C%2520visitors%2521/_history/twinwiki/_edit/twinwiki/_create/twinwiki/_create/twinwiki/_help/twinwiki/_help/twinwiki/_index/twinwiki/_edit/twinwiki/_history/twinwiki/_edit/twinwiki/_create/twinwiki/_help/twinwiki/_history/twinwiki/_help/twinwiki/_create/twinwiki/_history/twinwiki/_history/twinwiki/_history/twinwiki/_create/twinwiki/_create/twinwiki/_index/twinwiki/_history/twinwiki/_edit/twinwiki/_edit/twinwiki/_history/twinwiki/_help/twinwiki/_help/twinwiki/_history/twinwiki/_create/twinwiki/_help/twinwiki/_edit/twinwiki/_index/twinwiki/_history
Oops, I've written bugs once again! I do have this robots.txt, though:
User-agent: gus
Allow: /
User-agent: *
Disallow: /
(I guess I should disallow even gus from twinwiki, or at least any
non-content pages.)
The crawler also breaks ansi.hrtk.in for other users while crawling
(which disallows even gus in robots.txt). I couldn't figure out how to
make Jetforce stop streaming if the client closes connection. The code
is here if someone has pointers:
https://github.com/dancek/ansimirror/blob/master/ansimirror.py
Anyone have experience fighting misbehaving crawlers? Should we
develop low-resource honeypots to exhaust crawler resources? Or start
maintaining a community blacklist?
-Hannu
More information about the Gemini
mailing list