[spec] Limit valid encodings of text/gemini to UTF-8
Philip Linde
linde.philip at gmail.com
Mon Dec 28 13:16:27 GMT 2020
This is not fully expressed in the specification, but practically, "all"
text/gemini documents are either UTF-8 or US-ASCII encoded. Stephane
Bortzmeyer compiled the following list from his crawler:
> Only for text/gemini:
>
> * Unspecified: 5997
> * utf-8: 4619
> * tcvn-5712: 2
> * cp437: 2
> * utf-16be: 1
> * utf-16: 1
> * windows-1252: 1
> * utf-32le: 1
> * utf-32be: 1
> * utf-16le: 1
> * ebcdicatde: 1
>
> But wait, all the exotic charsets are at <gemini://egsam.pitr.ca/>
> which is a test site for various funny stuff. So, it is safe to say
> that not one "real" gemtext resource uses something else than UTF-8.
While it is the case that impact is minimal, I suggest that the
specification reflects the much simpler situation these statistics
indicate rather than keep itself open to the general problem of
representing text/gemini in encodings that might not even have the meta
information characters encoded in the same way, and—if IRIs are
introduced—creates the problem of how IRIs should be represented in
e.g. ISO-8859-1.
I understand the need for other document types to take other character
encodings. For example, I have a collection of old text files in IBM437
encoding. For text/gemini, we pretty much have a blank slate, though,
and I see no reason that it should extend to support arbitrary
encodings when limiting to UTF-8 creates a much simpler situation for
implementers and is already the unspoken standard.
There are display systems and platforms that fundamentally can't
display UTF-8 directly. For example, in the PC text modes I am limited
to IBM437. The problem of transcoding text/gemini should then lie with
the client authors for those platforms, not with every other client
author. ELinks for DOS will for example transcode UTF-8 (and various
other encodings) to IBM437 and use a placeholder character where no
equivalents exist.
--
Philip
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 488 bytes
Desc: not available
URL: <https://lists.orbitalfox.eu/archives/gemini/attachments/20201228/f3768982/attachment.sig>
More information about the Gemini
mailing list