Getting slammed by a client

Hannu Hartikainen hannu.hartikainen+gemini at gmail.com
Sat Jul 25 09:51:39 BST 2020


Thanks for pointing this out. I never read logs if everything works, so...

I'm getting lots of requests to urls like
gemini://hannuhartikainen.fi/twinwiki/Welcome,%20visitors%21/twinwiki/Welcome%252C%2520visitors%2521/_history/twinwiki/_edit/twinwiki/_create/twinwiki/_create/twinwiki/_help/twinwiki/_help/twinwiki/_index/twinwiki/_edit/twinwiki/_history/twinwiki/_edit/twinwiki/_create/twinwiki/_help/twinwiki/_history/twinwiki/_help/twinwiki/_create/twinwiki/_history/twinwiki/_history/twinwiki/_history/twinwiki/_create/twinwiki/_create/twinwiki/_index/twinwiki/_history/twinwiki/_edit/twinwiki/_edit/twinwiki/_history/twinwiki/_help/twinwiki/_help/twinwiki/_history/twinwiki/_create/twinwiki/_help/twinwiki/_edit/twinwiki/_index/twinwiki/_history

Oops, I've written bugs once again! I do have this robots.txt, though:

User-agent: gus
Allow: /

User-agent: *
Disallow: /

(I guess I should disallow even gus from twinwiki, or at least any
non-content pages.)

The crawler also breaks ansi.hrtk.in for other users while crawling
(which disallows even gus in robots.txt). I couldn't figure out how to
make Jetforce stop streaming if the client closes connection. The code
is here if someone has pointers:
https://github.com/dancek/ansimirror/blob/master/ansimirror.py

Anyone have experience fighting misbehaving crawlers? Should we
develop low-resource honeypots to exhaust crawler resources? Or start
maintaining a community blacklist?

-Hannu


More information about the Gemini mailing list