Proposed minor spec changes, for comment.

Sean Conner sean at conman.org
Tue May 19 01:07:53 BST 2020


It was thus said that the Great jan6 at tilde.ninja once stated:
> On Mon, May 18, 2020 at 05:03:41PM -0400, Sean Conner wrote:
> > What's a client to do if 'lang=' isn't there? Assume English? Assume nothing?
> 
> I'd think only the mimetype should be mandatory, and the rest will use defaults, when not
> specified...
> of course, spec shouldn't specify what the defaults are...
> 
> it could also attempt to auto-detect and prompt user if it matters (normal text browsers will
> probably be indifferent, but audio browser could ask, and search engines could warn, which will
> incentivize users to put a language anyway), but that's a client-specific extra...

  I thought about autodetection---Unicode is defined in blocks, where each
alphabet becomes a defined block in Unicode.  I then realized that there are
multiple languages that use the European block.  Sure, detecting Greek is
easy since they have their own alphabet, but what about Spanish, French and
German?  They use the same alphabet.

  Nice idea, but there are some tough issues to address.

> I'm not sure I see the point in the encoding part, though...
> practically everything can be converted to utf8 rather easily, making it a bit useless to
> specify...

  Think legacy documents.  And not every legacy encoding scheme can round
trip through Unicode---I recall there being issues with several east Asian
languages (Chinese, Japanese in particular).

> another interesting point, what specification is the lang= tag?

  Solderpunk mentioned RFC-1766, which uses the two letter standard for
languages.

> it should probably encouraged to use some special use codes too, taking ISO 639-2 as example
> (standard specifying three-letter codes for languages):
> mis, for "uncoded languages";
> mul, for "multiple languages";
> und, for "undetermined";
> zxx, for "no linguistic content; not applicable";

  I buy that.

  -spc


More information about the Gemini mailing list