What is required to be IRI compliant?

William Orr will at worrbase.com
Mon Dec 28 15:56:28 GMT 2020


Yes, that's absolutely the case even if IRIs are not used. Hopefully URI libraries and IDNA libraries handle this correctly by doing normalization before percent encoding/punycoding, but I haven't checked any implementations personally.

Normalization should come up in other contexts as well that would be common in gemini, like "find in page," search indexing, etc. Those cases may even make use of other normalization schemes like NFKC.

28 dic. 2020 15:01:41 Solderpunk <solderpunk at posteo.net>:

> On Mon Dec 28, 2020 at 1:12 PM CET, William Orr wrote:
> 
>> Normalization is the process of looking for all of these synonyms for
>> characters, and standardizing them to the same set of codepoints. If you
>> don't normalize, you could have a case where one user gets the intended
>> host for écrire.hostname and another user gets an NXDOMAIN, all
>> depending on the sequence of bytes their input method produced.
> 
> ...and actually, now that I think about, this issue is not specific to
> IRI support, is it?  Even if we followed the web's lead and declared
> that Gemini requests and text/gemini links must contain ASCII-only URLs,
> and people have to do punycoding of non-ASCII hostnames and
> percent-encoding of UTF-8 representations of non-ASCII paths, it's still
> possible for the server and client to have different ideas about how a
> hostname or path are represented, right?  With one using a composed form
> and the other a decomposed form?  Whether you send a UTF-8 string as-is
> or first punycode and/or percent-encode it so it's valid ASCII is
> totally orthogonal to that question.  Or have I missed something
> important?
> 
> Cheers,
> Solderpunk


More information about the Gemini mailing list