[tech] [eli5] URI = IRI = ASCII = UTF-8 = Unicode

Petite Abeille petite.abeille at gmail.com
Tue Dec 29 12:08:04 GMT 2020


"Uniform Resource Locators were defined in RFC 1738 in 1994 by Tim Berners-Lee"

~26 years ago. We must have a good understanding of what they are by now.


An URI can -in plain US-ASCII- represent all UTF-8 code points. Therefore all of Unicode.
An IRI can -in plain UTF-8- represent all of Unicode. No need for US-ASCII encoding.

They are wholly equivalent in term of content, and only differs in terms of encoding. 

The only visible difference between URI and IRI is how Unicode is conveyed: ASCII encoded in URI, UTF-8 in IRI.

They are otherwise identical in all aspects.


The following Gemini response is in vanilla US-ASCII:

20 text/gemini;charset=us-ascii;
=> gemini://xn--el8h/%F0%9F%91%B9.gmi

And yet, because of the nature of URI, it contains two Unicode characters; US-ASCII transmitted; UTF-8 encoded.

Here is the exact same response, but in UTF-8, due to IRI in the link:

20 text/gemini;charset=utf-8;
=> gemini://🎭/👹.gmi


Both responses represent exactly the same content. They are only encoded differently.

Both contain UTF-8, and therefore Unicode. Both are identical. 

HTH.






More information about the Gemini mailing list