[tech] [eli5] URI = IRI = ASCII = UTF-8 = Unicode
Petite Abeille
petite.abeille at gmail.com
Tue Dec 29 12:08:04 GMT 2020
"Uniform Resource Locators were defined in RFC 1738 in 1994 by Tim Berners-Lee"
~26 years ago. We must have a good understanding of what they are by now.
An URI can -in plain US-ASCII- represent all UTF-8 code points. Therefore all of Unicode.
An IRI can -in plain UTF-8- represent all of Unicode. No need for US-ASCII encoding.
They are wholly equivalent in term of content, and only differs in terms of encoding.
The only visible difference between URI and IRI is how Unicode is conveyed: ASCII encoded in URI, UTF-8 in IRI.
They are otherwise identical in all aspects.
The following Gemini response is in vanilla US-ASCII:
20 text/gemini;charset=us-ascii;
=> gemini://xn--el8h/%F0%9F%91%B9.gmi
And yet, because of the nature of URI, it contains two Unicode characters; US-ASCII transmitted; UTF-8 encoded.
Here is the exact same response, but in UTF-8, due to IRI in the link:
20 text/gemini;charset=utf-8;
=> gemini://🎭/👹.gmi
Both responses represent exactly the same content. They are only encoded differently.
Both contain UTF-8, and therefore Unicode. Both are identical.
HTH.
More information about the Gemini
mailing list