Some reading on IRIs and IDNs
Jason McBrayer
jmcbray at carcosa.net
Wed Dec 9 02:49:50 GMT 2020
Hi, all. The discussion on IRIs and IDNs is a little intense, and I
thought I would take a step back and do some reading on it. I'm not
monolingual, but I am ISO-8859-1-lingual, if that makes sense, so some
of the issues are new to me.
So, there's an overview of all the issues involved here:
https://www.w3.org/International/articles/idn-and-iri/. This article
(from 2008) goes over the things you need to do to implement support for
IRIs, without going too much into the technical details. It makes things
look pretty straightforward and cut-and-dried, but...
In terms of actual standardization, things are kind of a mess. See this
page: https://www.w3.org/International/wiki/IRIStatus. This page brings
up the real issues with the standard.
It seems like the effort to standardize IRIs in the same framework as
URLs, URNs, and URIs fell apart in 2014. The effort was picked up by the
HTML5 WHATWG, which has their own "living standard" called URL:
http://url.spec.whatwg.org/. The URL standard focuses somewhat on
parsing/processing/serializing international URLs, which is useful to
us, but it is also *extremely* WWW-centric. It doesn't really take into
account non-HTTP(S) URLs, especially ones that are not very web-like,
like mailto or schemes where the authority field is not a domain name.
Much of the spec focuses on things like how a web browser should
represent URLs in the address bar and in text.
This *probably* contributes to the lack of IRI-parsing libraries for
various languages: there's no standard for them to implement!
Given all that... maybe we should just consider our use cases and see
what the minimum we have to do is?
As I see it, the main requirement is that authors want to be able to use
non-ASCII characters in both the domain part and the path part of the
links in their documents, and have that work with no problems. IMO this
is a *reasonable expectation* for a retrofuturistic protocol like
Gemini.
Now, what does that require of client authors and server authors?
What is the *absolute minimum* we can require of client and server
authors and have things work?
--
+-----------------------------------------------------------+
| Jason F. McBrayer jmcbray at carcosa.net |
| A flower falls, even though we love it; and a weed grows, |
| even though we do not love it. -- Dogen |
More information about the Gemini
mailing list