On Web-proxies (was Re: Assuming disallow-all, and some research on robots.txt in Geminispace (Was: Re: robots.txt for Gemini formalised))
Sean Conner
sean at conman.org
Wed Nov 25 23:36:16 GMT 2020
It was thus said that the Great Nick Thomas once stated:
> >
> For clarity: I think it's fine to presume consent for browsing (whether
> through a proxy or not), and not fine to presume consent for archiving.
> If adopted, this represents a significant enhancement to capsule author
> privacy compared to web norms.
The issue with proxying (especially via the web) is the web side. Using a
webproxy that runs locally to browse Gemini sites via a browser is fine, but
it becomes problematic if said proxy is listening on a public IP address.
It's not a matter of *if* but *WHEN* webbots of all types start hitting it,
and *those* are a mixture of indexer, archiver, research and other [1]. At
the very least, any web proxy should respond to "/robots.txt" and either
serve up a file, or have command line options to generate a response to
"/robots.txt" or at the very least (or as a default), send this:
User-agent: *
Disallow: /
This is the crux of the diagreement between myself and Drew---I didn't
explain my concerns very well, and he didn't pick up on the actual issue I
had (so my fault here). A web proxy can inadvertently allow indexers,
archivers, researchers and others access to Gemini content.
-spc
[1] Indexers, archivers and research bots tend to respect robots.txt.
It's the "other" class that don't. These "other" bots are typically
looking for exploits and there's not much you can do about these
other than outright ban the IP they're coming from [2].
[2] And even then it's a game of "whack-a-mole", although if a web proxy
sees a bunch of requests from a single IP address that result in a
bunch of "not found" errors from Gemini (say, a threshhold of 10
such results in a row) then that IP is automatically banned for a
period of time (say, 48 hours---enough to let it finish its job, but
not forever since the list of IPs will grow).
More information about the Gemini
mailing list