Some reading on IRIs and IDNs
John Cowan
cowan at ccil.org
Fri Dec 11 02:45:55 GMT 2020
On Thu, Dec 10, 2020 at 8:12 PM Gary Johnson <lambdatronic at disroot.org>
wrote:
1. Punycode the hostname.
>
If there is one. You can look for "//" on the left and the next "/" on the
right, so you don't need full parsing.
> 2. Percent-encode reserved characters and non-US-ASCII characters in the
> path, query, and fragment components.
>
You don't want to escape the ASCII reserved characters, because they should
already be escaped. Changing the path /foo/bar.gmi to %25foo%25bar.gmi
would be Evil and Wrong. If you really want that path, you have to encode
it yourself.
In addition, you can safely %-encode the whole IRI reference without
parsing it, since Punycode names are always safe.
2.5. If the IRI is a relative reference, resolve it against the URI of the
text/gemini file that contains it.
3. Make a DNS query with the punycoded hostname.
>
> 4. Send the punycode + percent-encoded URI as the request to the Gemini
> server.
>
Note that fragments must not be sent, so if there is one, chop it off.
> 5. The server parses the URI into scheme, host, port, path, query, and
> fragment components and then percent-decodes the path, query, and
> fragment strings.
>
Consequently, the server will not get a fragment string. There would be no
need for fragment strings if they were understood on the server side;
they'd just be part of the path.
Whether it %-decodes or not is up to the server. If it's serving a
conventional file system, then it needs to document whether it does such
decoding. If it isn't, it can do whatever it wants to with the paths.
> 6. The parsed and decoded URI information can then either be used to
perform a file retrieval, generate a directory listing, or run a CGI
> script, ultimately sending back a valid Gemini response to the
> client. Redirect responses should make sure to percent-encode the
> path, query, and fragment components of the redirected URI.
>
Except not the fragment.
> Since at least one
> poster has indicated that the widespread unevenness in DNS support for
> unicode has lead to the need to store A records in their punycoded form,
>
Indeed, I don't think that any registrar using the standard DNS root will
even register non-punycoded names. MS Active Directory DNS servers are
another story.
John Cowan http://vrici.lojban.org/~cowan cowan at ccil.org
This great college [Trinity], of this ancient university [Cambridge],
has seen some strange sights. It has seen Wordsworth drunk and Porson
sober. And here am I, a better poet than Porson, and a better scholar
than Wordsworth, somewhere betwixt and between. --A.E. Housman
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.orbitalfox.eu/archives/gemini/attachments/20201210/33b59f84/attachment.htm>
More information about the Gemini
mailing list