[spec] Limit valid encodings of text/gemini to UTF-8
Petite Abeille
petite.abeille at gmail.com
Sun Jan 3 16:02:54 GMT 2021
> On Jan 3, 2021, at 14:46, Stephane Bortzmeyer <stephane at sources.org> wrote:
>
> UTF-8 has a quasi-monopoly.
Not quite.
For text/gemini, your stats read:
• Unspecified: 42,322
• utf-8: 6,513
• us-ascii: 3
Unspecified rules. By far. Most likely plain ASCII in practice.
Could you run #file --mime-type --mime-encoding on all these text/gemini?
$ openssl s_client -quiet -crlf -connect mozz.us:1965 <<< gemini://mozz.us/ 2>/dev/null | file --brief --mime-type --mime-encoding -
text/plain; charset=utf-8
Validating the encoding would be informative as well:
$ openssl s_client -quiet -crlf -connect mozz.us:1965 <<< gemini://mozz.us/ 2>/dev/null | iconv -f utf-8 -t utf-8 > /dev/null; echo $?
0
Ditto for guessing the actual language:
# echo $(openssl s_client -quiet -crlf -connect mozz.us:1965 <<< gemini://mozz.us/ 2>/dev/null ) | polyglot detect | cut -d' ' -f1 | uniq
English
https://polyglot.readthedocs.io/en/latest/Detection.html
℀ ±𝟤¢
More information about the Gemini
mailing list