[spec] IRIs, IDNs, and all that international jazz
mbays at sdf.org
mbays at sdf.org
Wed Dec 23 14:00:09 GMT 2020
* Tuesday, 2020-12-22 at 16:13 +0100 - Solderpunk <solderpunk at posteo.net>:
>What I'd be most interested in hearing, at this point, is client
>authors letting me know whether the standard library in the language
>their client is implemented in can straightforwardly:
>
>1. Parse and relativise URLs with non-ASCII characters (so, yes, okay,
> technically not URLs at all, you know what I mean) in paths and/or
> domains?
>2. Transform back and forth between URIs and IRIs?
>3. Do DNS lookups of IDNs without them being punycoded first? You can
> test this with räksmörgås.josefsson.org.
I've looked into the situation in Haskell. It isn't nearly as good as
I'd expected. The standard uri library 'network-uri' is strictly 3986.
There is an 'iri' library, but it isn't widely used and doesn't seem to
be very actively maintained: I can't even get it to install with recent
ghc (ghc-8.8.4). It only deals with parsing and rendering, afaict
there's no normalisation or "absolutising", nor anything on transforming
between URIs and IRIs.
As for question 3, the answer appears to be no. In ghci:
> :set -package network
package flags have changed, resetting and loading new packages...
> import Network.Socket
> getAddrInfo (Just $ defaultHints {addrSocketType = Stream}) (Just "räksmörgås.josefsson.org") (Just "1965")
*** Exception: Network.Socket.getAddrInfo (called with preferred socket type/protocol: AddrInfo {addrFlags = [], addrFamily = AF_UNSPEC, addrSocketType = Stream, addrProtocol = 0, addrAddress = 0.0.0.0:0, addrCanonName = Nothing}, host name: Just "r\228ksm\246rg\229s.josefsson.org", service name: Just "1965"): does not exist (Name or service not known)
So library support isn't perfect. However: converting between
utf8-encoded IRIs and URIs seems pretty trivial to implement by hand
(Step 2 in section 3.1 of the rfc, and its inverse), and there are
punycode implementations in standard haskell libraries (e.g. in the
'encoding' package), so I am not at all scared by option 3. I'd just
convert IRIs to URIs for internal use and manipulation, then convert
back when displaying, and punycode when making requests. I'm not sure
I'm not being naive here -- someone please explain the subtleties (or
tell me to read the existing threads on this more carefully) if so!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 195 bytes
Desc: not available
URL: <https://lists.orbitalfox.eu/archives/gemini/attachments/20201223/ee90c268/attachment.sig>
More information about the Gemini
mailing list