[ANN] Gemini historical snapshot
Michael Lazar
lazar.michael22 at gmail.com
Wed Nov 18 03:30:31 GMT 2020
Greetings,
I'm happy to report that I have finished my effort to create a historical
snapshot of geminispace and upload it to the Internet Archive. I ended up
running three separate crawls in total, spaced over a few months. In total
there were 115,223 unique gemini URLs captured. Here are some general
statistics and download links:
Crawl | September | October | November
--- | --- | --- | ---
Date | 2020-09-24 | 2020-10-31 | 2020-11-07
Size | 9.3 GB | 12.9 GB | 13.5 GB
Domains seen | 283 | 276 | 314
Total Responses | 51,995 | 71,632 | 65,347
2x Responses | 43,425 | 61,771 | 56,680
https://archive.org/details/mozz-gemini-crawl-2020-1
https://archive.org/details/mozz-gemini-crawl-2020-2
https://archive.org/details/mozz-gemini-crawl-2020-3
More information on the crawls can be found here:
gemini://mozz.us/archive/
The crawling software and related tools can be found here:
https://github.com/michael-lazar/mozz-archiver
I am also temporarily hosting a mirror of this snapshot on my gemini server.
It works using proxy URLs (which I thought was a neat idea). You can send any
request for a gemini URL to mozz.us:1966, and the server will attempt to
retrieve that URL from the snapshot.
Example using gemget:
$ gemget --proxy mozz.us:1966 -o - gemini://gemini.circumlunar.space/capcom/
Best,
Michael
More information about the Gemini
mailing list