robots.txt for Gemini formalised
Drew DeVault
sir at cmpwn.com
Tue Nov 24 14:07:45 GMT 2020
On Tue Nov 24, 2020 at 9:06 AM EST, Jason McBrayer wrote:
> I believe the concern is not that a web portal will archive pages, or
> run on its own as an automated process, but that it will be used by a
> third-party web bot (i.e., one not run by the owner of the portal) to
> crawl Gemini sites and index them on the web.
Aha, this is a much better point. One which should probably be addressed
in the robots.txt specification.
> It seems to me that the correct thing is for people that run web portals
> to have a very strong robots.txt on /their/ web site, and additionally,
> to be proactive about blocking web bots that don't observe robots.txt. I
> think people want to block web portals in their Gemini robots.txt
> because they don't trust web portal authors to do those two things. I
> understand the feeling, but they're still trusting web portal authors to
> obey robots.txt, which is honestly more work.
Web portals are users, plain and simple. Anyone who blocks a web portal
is blocking legitimate users who are engaging in legitimate activity.
This is a dick move and I won't stand up for anyone who does it.
However, the issue of web crawlers hitting geminispace through a web
portal is NOT that, and I'm glad you brought it up. I'm going to forbid
web crawlers from crawling my gemini portal.
More information about the Gemini
mailing list