Some reading on IRIs and IDNs

Petite Abeille petite.abeille at gmail.com
Thu Dec 10 14:36:17 GMT 2020



> On Dec 10, 2020, at 15:20, Jason McBrayer <jmcbray at carcosa.net> wrote:
> 
> non-internet domain names in Active Directory, which
> we don't have to support.

Hmmm... so... no .local queries ala Cheshire?

https://tools.ietf.org/html/rfc6762#appendix-F

Perhaps worthwhile quoting in full:

Appendix F.  Use of UTF-8

   After many years of debate, as a result of the perceived need to
   accommodate certain DNS implementations that apparently couldn't
   handle any character that's not a letter, digit, or hyphen (and
   apparently never would be updated to remedy this limitation), the
   Unicast DNS community settled on an extremely baroque encoding called
   "Punycode".  Punycode is a remarkably ingenious encoding
   solution, but it is complicated, hard to understand, and hard to
   implement, using sophisticated techniques including insertion unsort
   coding, generalized variable-length integers, and bias adaptation.
   The resulting encoding is remarkably compact given the constraints,
   but it's still not as good as simple straightforward UTF-8, and it's
   hard even to predict whether a given input string will encode to a
   Punycode string that fits within DNS's 63-byte limit, except by
   simply trying the encoding and seeing whether it fits.  Indeed, the
   encoded size depends not only on the input characters, but on the
   order they appear, so the same set of characters may or may not
   encode to a legal Punycode string that fits within DNS's 63-byte
   limit, depending on the order the characters appear.  This is
   extremely hard to present in a user interface that explains to users
   why one name is allowed, but another name containing the exact same
   characters is not.  Neither Punycode nor any other of the "ASCII-
   Compatible Encodings" proposed for Unicast DNS may be used
   in Multicast DNS messages.  Any text being represented internally in
   some other representation must be converted to canonical precomposed
   UTF-8 before being placed in any Multicast DNS message.




More information about the Gemini mailing list