Some reading on IRIs and IDNs

Gary Johnson lambdatronic at disroot.org
Fri Dec 11 01:12:04 GMT 2020


Sean Conner <sean at conman.org> writes:

>   Okay,  Here's a IRI:
>
> 	gemini://café.mozz.us/files/𝒻𝒶𝓃𝒸𝓎.txt
>
>   Please specify what a client and server MUST do to properly handle this.

Well, if I'm following all of these conversations correctly to date, I
believe the procedure looks like this:

1. Punycode the hostname.

2. Percent-encode reserved characters and non-US-ASCII characters in the
   path, query, and fragment components.

3. Make a DNS query with the punycoded hostname.

4. Send the punycode + percent-encoded URI as the request to the Gemini
   server.

5. The server parses the URI into scheme, host, port, path, query, and
   fragment components and then percent-decodes the path, query, and
   fragment strings.

6. The parsed and decoded URI information can then either be used to
   perform a file retrieval, generate a directory listing, or run a CGI
   script, ultimately sending back a valid Gemini response to the
   client. Redirect responses should make sure to percent-encode the
   path, query, and fragment components of the redirected URI.

My Gemini server (Space Age) handles steps 5 and 6 as described here (as
I suspect most Gemini servers do). Clients should already be performing
step 2 as per the Gemini spec.

I suspect the missing piece of the puzzle here is *just* having client
authors implement steps 1, 3, and 4 (for some definition of "just"). I
don't think these client changes would require any changes to the
current Gemini spec.

There is also the open question of whether servers should convert
punycoded hostnames back into unicode hostnames for the purposes of
virtual hosting (either via SNI or post-handshake). Since at least one
poster has indicated that the widespread unevenness in DNS support for
unicode has lead to the need to store A records in their punycoded form,
this suggests to me that virtual hosting may be performed most
universally by just directly matching the received punycoded domain
names.

Of course, YMMV.

Happy hacking,
  Gary

-- 
GPG Key ID: 7BC158ED
Use `gpg --search-keys lambdatronic' to find me
Protect yourself from surveillance: https://emailselfdefense.fsf.org
=======================================================================
()  ascii ribbon campaign - against html e-mail
/\  www.asciiribbon.org   - against proprietary attachments

Why is HTML email a security nightmare? See https://useplaintext.email/

Please avoid sending me MS-Office attachments.
See http://www.gnu.org/philosophy/no-word-attachments.html


More information about the Gemini mailing list