[spec] IRIs, IDNs, and all that international jazz
Philip Linde
linde.philip at gmail.com
Tue Dec 22 23:09:56 GMT 2020
On Tue, 22 Dec 2020 16:13:06 +0100
"Solderpunk" <solderpunk at posteo.net> wrote:
> 1. Parse and relativise URLs with non-ASCII characters (so, yes, okay,
> technically not URLs at all, you know what I mean) in paths and/or
> domains?
> 2. Transform back and forth between URIs and IRIs?
I am using Go, which will do these things as you mentioned.
Output from net/url:
gemini://räksmörgås.example.com:3131/åäöüÿ/hej/hopp?ö=ï#ççç
Scheme: gemini
Path: /åäöüÿ/hej/hopp
EscapedPath: /%C3%A5%C3%A4%C3%B6%C3%BC%C3%BF/hej/hopp
RawQuery: ö=ï
Hostname: räksmörgås.example.com
Port: 3131
RawFragment: ççç
EscapedFragment: %C3%A7%C3%A7%C3%A7
> 3. Do DNS lookups of IDNs without them being punycoded first? You can
> test this with räksmörgås.josefsson.org.
Go won't do this automatically as mentioned, but there is an
experimental standard library project golang.org/x/net/idna that can
assist. I think that this is the best approach; the use of IDNA is
application dependent and IMO shouldn't be done automatically at such a
low level.
Note that for Python, Python 3.x will correctly resolve as per your
example, but Python 2.x will not. Python 3 also doesn't support
IDNA2008 (see https://bugs.python.org/issue17305), which is slightly
incompatible with IDNA2003. There is a third party library that
supports IDNA2008. As a last resort, client authors should be able to
link to e.g. Libidn2, license permitting.
In my case the problem with implementing IDNA is not in my application.
My client is a browser plugin. The browser (Dillo) doesn't support IDN
and development is pretty slow on their end. My plugin inherits this
limitation.
Even then, I am for option #1 personally. IDN/IRI are presentational
problems which I think should be left to the client. IDN/IRI in
text/gemini for authors can be solved with tooling, but I am not sure
that's desirable. I've attached the source code to a text/gemini
formatter that "un-internationalizes" IRIs in a text/gemini document
passed on stdin anyway...discovered an HTTP-ism in net/url along the
way :)
--
Philip
-------------- next part --------------
A non-text attachment was scrubbed...
Name: gmifmt.go
Type: application/octet-stream
Size: 1733 bytes
Desc: not available
URL: <https://lists.orbitalfox.eu/archives/gemini/attachments/20201223/b244c0ac/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 488 bytes
Desc: not available
URL: <https://lists.orbitalfox.eu/archives/gemini/attachments/20201223/b244c0ac/attachment.sig>
More information about the Gemini
mailing list