[spec] IRIs, IDNs, and all that international jazz
Gary Johnson
lambdatronic at disroot.org
Wed Dec 23 20:18:41 GMT 2020
Although my server is written in Clojure, I'm leveraging the Java
standard libraries in Space Age since there is little value in
reinventing the wheel here.
In Java world, URIs can be parsed and generated with java.net.URI. This
class accepts URIs with Unicode characters in the path, query, and
fragment segments. However, it will throw an exception if Unicode
characters are included in the domain name.
Conversion between Unicode and punycode can be done with java.net.IDN.
```
Clojure 1.10.1
user=> (import 'java.net.IDN)
java.net.IDN
user=> (IDN/toUnicode "xn--9dbne9b.com")
"שלום.com"
user=> (IDN/toASCII "שלום.com")
"xn--9dbne9b.com"
```
Easy peasy.
Sadly, there is no java.net.IRI.
So if we went with options 2 or 3, I would need to manually parse the
Gemini request into segments (not particularly challenging, of course).
Then I could use java.net.IDN to perform punycode-to-Unicode or
Unicode-to-punycode encoding (depending on whether we went with option 2
or 3) to perform robust virtual hostname lookups (and presumably SNI
verification as well).
Finally, I'd need to use java.net.URI to combine the punycoded domain
name back with the path, query, and fragment segments into a valid URI
that I could then parse and percent-decode without throwing an
exception.
All of this should be doable with a bit of custom logic wrapped around
the Java standard library, so I think either option 2 or 3 should be
technically feasible from my end (or for anyone else using a language
that compiles to Java bytecode).
Happy hacking,
Gary
--
GPG Key ID: 7BC158ED
Use `gpg --search-keys lambdatronic' to find me
Protect yourself from surveillance: https://emailselfdefense.fsf.org
=======================================================================
() ascii ribbon campaign - against html e-mail
/\ www.asciiribbon.org - against proprietary attachments
Why is HTML email a security nightmare? See https://useplaintext.email/
Please avoid sending me MS-Office attachments.
See http://www.gnu.org/philosophy/no-word-attachments.html
More information about the Gemini
mailing list