Three possible uses for IRIs

John Cowan cowan at ccil.org
Tue Dec 8 21:45:57 GMT 2020


On Tue, Dec 8, 2020 at 4:10 PM <colecmac at protonmail.com> wrote:


> The most difficult part of what you outlined is the Unicode normalization,
> which maybe not all languages have libraries for, and would also require
> updating every so often. But it wouldn't be a requirement for clients at
> all,
> just something nice to have.
>

If a client has an unnormalized IRI, it needs to normalize it before
sending it to the server.  That said, a 2009 study looked at a sample of
700 million HTML documents, of which only 0.02% were not in NFC already,
which suggests that NFC text is already pretty dominant.

I assume you mean NFC normalization?
>

Yes.  When I speak of normalization, I mean NFC normalization exclusively.

> What if the user named a domain/file/folder in a non-NFC way? Now does the
> server
> need to support NFC as well, and apply it to vhost recognition or local
> file paths
> to correctly match requests? That seems wrong. But so does the user
> entering
> something visually identical to what the sysadmin typed, and things not
> working.
>

I'm okay with that just failing, as file names are not really part of
text/gemini content.  The difference will be obvious to the admin by
checking the requested URIs from the server log against the %-encoded names
of the folders.



John Cowan          http://vrici.lojban.org/~cowan        cowan at ccil.org
                I am a member of a civilization. --David Brin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.orbitalfox.eu/archives/gemini/attachments/20201208/6b45b640/attachment.htm>


More information about the Gemini mailing list