[spec] IRIs, IDNs, and all that international jazz

Sean Conner sean at conman.org
Wed Dec 23 23:03:19 GMT 2020


It was thus said that the Great marc once stated:
> 
> It is one thing to find full I8N support in a language such as python
> (slow batteries included), but what about minorities such tcl, lua, m4 or
> sed ?

  I have Lua covered.  I can't say for the others (other than, you really
use m4?  You are a better man than I am, Gunga Din).

> I think internationalisation concern belong in the very highest level of a
> stack. You expect me to say presentation or application-level, but
> remember the OSI model is wrong (For instance, things like HTTP or gemini
> are typically lumped into one application layer, when there many layers to
> them). The actual highest level is the naive computer uses who gets told
> to "move the mouse over this and then click on this, like so...". At that
> level, it might make sense for a gemini browser to be fully localised, and
> render an url in the local language (maybe even left to right, or top to
> bottom).

  I have to deal with the telephony network at work.  It *is* the OSI seven
layer burrito [1] and even *there* there are baked in assumptions relating
to i18n [2].  Text is limited to ASCII.  Yup.  7-bit US-ASCII it all its
glory.  Anything else requires some very nasty hacks.  Even better, there
does exist a way to relate a name to a phone number, but it's restricted to
just 15 bytes of US-ASCII.  So "Rafaella Gabriela Sarsaparilla" gets cut to
"Rafaella Gabrie".  Lovely, isn't it?

> But even the layer just below that (the competent user level) this starts
> leaking. A gemini url starts with "gemini://" - that is ascii text, and
> even funnier, taken from latin. If a non-english user is confused by
> english (nay, latin, with no native speakers at all) words, then surely
> "gemini://" has to be rewritten as "tweling://" or "zwilling://" or
> whatever farsi, japanese or mongolian use for "twin". If not, then an full
> ascii text url should be manageable too... an url is primarily a computer
> address.

  Sushi comes from Japanese, gesundheit from German, sauna from Finnish,
smorgasbord from Swedish, borscht from Russian and ketchup from China,
what's your point?  All those are perfectly cromulent (from Simpsons) words. 
Modern English sucks up words from all other languages.

  Also, what's the Japanese equivalent of 'https'?  I'm curious.

> Long ago I came across a version of (I think it was) Pascal
> had been localised into french with language keywords
> like "begin" and "if" replaced. 

  It wasn't Håstad [3], was it?  If it was, I made that up to make a point
about LISP.

  But yes, there have been several such localizations in the past for
various languages but they never caught on internationally for some reason. 
One language I heard about, Cornerstone, used a novel method for
identifiers---the visual representation was not part of the code but from a
map---change a variable name in one place, and every place that variable
appeared would change its name.  Pretty cool concept if you ask me.

  -spc

[1]	And a complete pain to work with.  Fortunately, it's becoming less
	and less of an issue as things are transitioning to the Internet,
	but the phone companies are fighting and screaming all the way.

[2]	Ïñtèrñàtìòñálízâtîøñ

[3]	http://boston.conman.org/2008/01/04.1


More information about the Gemini mailing list