Text reflow woes (or: I want bullets back!)y

solderpunk solderpunk at SDF.ORG
Sat Jan 18 12:59:16 GMT 2020


On Sat, Jan 18, 2020 at 12:02:42AM -0500, Michael Lazar wrote:
 
> Python's textwrap module is fundamentally flawed for unicode and they have no
> intention of ever fixing it [0]. Once you start going down the rabbit hole of
> CJK characters, emojis, grapheme clusters, etc. it becomes exceedingly hard
> to figure out how to correctly determine the width of unicode text. You can
> get it working 99% of the time, but there's always those fringe cases that
> no one thinks about until somebody files a bug report.

...

God, I hate computers.

But, many thanks for bringing this to my attention.
 
> I don't know if this has any bearing on the discussion, but it's worth keeping
> in the back of your mind if you intend to make unicode a first-class citizen.

Unicode is already a first-class citizen in Gemini (text/gemini is
assumed to be UTF-8 if a different encoding is not explicitly provided
in the response header), and I don't think I have any interest in
changing that.

As for the present discussion...well, it's obvious this problem is no
less of a problem under paragraph-oriented "bidirectional" reflowing.
It's not obvious to me if it's less of a problem under a Gopher-style
hard-wrapping to a pre-defined maximum width model....I suppose if the
width of line including CJK characters is dependent upon the combination
of font and terminal being used (I don't know if it is, but it seems
probable) then it's not actually possible for a CJK-using author to
comply with a spec like "Hard-wrap all your content at X characters"...

Hmm...

Solderpunk


More information about the Gemini mailing list