IDN with Gemini?
bie
bie at 202x.moe
Mon Dec 7 13:09:39 GMT 2020
> > And it is even more important for people who use scripts like arabic,
> > chinese, devanageri, etc.
>
> I am to be convinced that unicode URLs are a good thing.
>
> And I say that as a native speaker of a language which
> includes glyphs which aren't in US ASCII.
>
> An URL is an address, in the same way that a phone
> number or an IP is an address. Ideally these are globally
> unique, unambiguous and representable everywhere.
> This address scheme should be independent of a localisation.
>
> We don't insist that phone numbers are rendered in roman
> numerals either. My dialing prefix isn't +XXVII. The
> gemini:// prefix isn't tweeling:// in dutch.
>
> Using unicode in addresses balkanises this global space into
> separate little domains, with subtle ambiguities (is the
> cyrilic C the same as a latin - C, who knows ?), reducing
> security, and making crossover harder. If somebody points
> me at an url in kanji or ethiopian, I would have great
> difficulty remembering nevermind recreating it, even if the
> photo there is useful to the rest of the world. If you
> are saying what about the guy from Ethiopia - well, I suspect he
> would have trouble with kanji too... without a common
> denominator this is an N^2 problem.
>
> I appreciate that many languages are in decline and even
> facing extinction - but interacting with the internet requires
> a jargon or specialisation anyway, in the same way that botanists
> invoke latin names, mathematicians write about eigenvectors
> and brain surgeons talk about the hippocampus, all regardless
> of which languages they speak at home.
>
> TLDR: The words after the gemini => link can be unicode, the
> link itself should not.
I mostly agree with this in the sense that the protocol and text/gemini
should stick to URLs that are URI-safe (nothing outside the safe
80-something characters).
That said, I don't think there's anything wrong with a friendly client
showing percent-decoded unicode representations of a path or
punycode-decoded representations of an international domain name in the
address bar or anywhere else in the interface.
In the same vein, if a server wants to be extra friendly to gmi file
authors, it can, like I suggested earlier, allow users to name and link
to files in unicode, but percent-encode everything before sending it to
over the wire. I actually implemented this in my personal gemini server
today, and it was a trivial change (especially when compared to what I'd
have to do to properly validate IRIs...), allowing me to write "=> 雑念/
雑念" and have it sent to the client as "=> %e%9b%91%e5%bf%b5/ 雑念".
bie
More information about the Gemini
mailing list