[ANN] A Gemini crawler, for statistics about the geminispace
Stephane Bortzmeyer
stephane at sources.org
Thu Dec 24 12:34:45 GMT 2020
On Thu, Dec 24, 2020 at 02:08:57AM +0100,
Philip Linde <linde.philip at gmail.com> wrote
a message of 37 lines which said:
> Could you add statistics about character encodings used for
> text/gemini responses specifically?
Only for text/gemini:
* Unspecified: 5997
* utf-8: 4619
* tcvn-5712: 2
* cp437: 2
* utf-16be: 1
* utf-16: 1
* windows-1252: 1
* utf-32le: 1
* utf-32be: 1
* utf-16le: 1
* ebcdicatde: 1
But wait, all the exotic charsets are at <gemini://egsam.pitr.ca/>
which is a test site for various funny stuff. So, it is safe to say
that not one "real" gemtext resource uses something else than UTF-8.
By the way, this is the RFC 5198 recommendation
<gemini://gemini.bortzmeyer.org/rfc-mirror/rfc5198.txt>
> I'd like to know if there are currently text/gemini responses in any
> other encoding than UTF-8 (or US ASCII). That would be an
> interesting topic in the IRI+IDN discussion.
I don't see the relationship. There is clearly unanimity among
geminauts that *content* should be in UTF-8 (and I would suggest that
this SHOULD could be changed in MUST), the discussion is about
metadata, the identifier (the IRI).
More information about the Gemini
mailing list