[ANN] Gemini historical snapshot

Michael Lazar lazar.michael22 at gmail.com
Wed Nov 18 03:30:31 GMT 2020


Greetings,

I'm happy to report that I have finished my effort to create a historical
snapshot of geminispace and upload it to the Internet Archive. I ended up
running three separate crawls in total, spaced over a few months. In total
there were 115,223 unique gemini URLs captured. Here are some general
statistics and download links:

Crawl           | September  | October    | November
---             | ---        | ---        | ---
Date            | 2020-09-24 | 2020-10-31 | 2020-11-07
Size            | 9.3 GB     | 12.9 GB    | 13.5 GB
Domains seen    | 283        | 276        | 314
Total Responses | 51,995     | 71,632     | 65,347
2x Responses    | 43,425     | 61,771     | 56,680

https://archive.org/details/mozz-gemini-crawl-2020-1
https://archive.org/details/mozz-gemini-crawl-2020-2
https://archive.org/details/mozz-gemini-crawl-2020-3

More information on the crawls can be found here:

gemini://mozz.us/archive/

The crawling software and related tools can be found here:

https://github.com/michael-lazar/mozz-archiver

I am also temporarily hosting a mirror of this snapshot on my gemini server.
It works using proxy URLs (which I thought was a neat idea). You can send any
request for a gemini URL to mozz.us:1966, and the server will attempt to
retrieve that URL from the snapshot.

Example using gemget:

$ gemget --proxy mozz.us:1966 -o - gemini://gemini.circumlunar.space/capcom/

Best,
Michael


More information about the Gemini mailing list