Some reading on IRIs and IDNs

John Cowan cowan at ccil.org
Fri Dec 11 02:45:55 GMT 2020


On Thu, Dec 10, 2020 at 8:12 PM Gary Johnson <lambdatronic at disroot.org>
wrote:

1. Punycode the hostname.
>

If there is one.  You can look for "//" on the left and the next "/" on the
right, so you don't need full parsing.

> 2. Percent-encode reserved characters and non-US-ASCII characters in the
>    path, query, and fragment components.
>

You don't want to escape the ASCII reserved characters, because they should
already be escaped.  Changing the path /foo/bar.gmi to %25foo%25bar.gmi
would be Evil and Wrong.  If you really want that path, you have to encode
it yourself.

In addition, you can safely %-encode the whole IRI reference without
parsing it, since Punycode names are always safe.
2.5. If the IRI is a relative reference, resolve it against the URI of the
text/gemini file that contains it.

3. Make a DNS query with the punycoded hostname.
>
> 4. Send the punycode + percent-encoded URI as the request to the Gemini
>    server.
>

Note that fragments must not be sent, so if there is one, chop it off.


> 5. The server parses the URI into scheme, host, port, path, query, and
>    fragment components and then percent-decodes the path, query, and
>    fragment strings.
>

Consequently, the server will not get a fragment string.  There would be no
need for fragment strings if they were understood on the server side;
they'd just be part of the path.

Whether it %-decodes or not is up to the server.  If it's serving a
conventional file system, then it needs to document whether it does such
decoding.  If it isn't, it can do whatever it wants to with the paths.


>  6. The parsed and decoded URI information can then either be used to

   perform a file retrieval, generate a directory listing, or run a CGI
>    script, ultimately sending back a valid Gemini response to the
>    client. Redirect responses should make sure to percent-encode the
>    path, query, and fragment components of the redirected URI.
>

Except not the fragment.

> Since at least one
> poster has indicated that the widespread unevenness in DNS support for
> unicode has lead to the need to store A records in their punycoded form,
>

Indeed, I don't think that any registrar using the standard DNS root will
even register non-punycoded names.  MS Active Directory DNS servers are
another story.



John Cowan          http://vrici.lojban.org/~cowan        cowan at ccil.org
This great college [Trinity], of this ancient university [Cambridge],
has seen some strange sights. It has seen Wordsworth drunk and Porson
sober. And here am I, a better poet than Porson, and a better scholar
than Wordsworth, somewhere betwixt and between.  --A.E. Housman
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.orbitalfox.eu/archives/gemini/attachments/20201210/33b59f84/attachment.htm>


More information about the Gemini mailing list