Supporting optional underscores for italics
Sean Conner
sean at conman.org
Fri Nov 13 02:02:30 GMT 2020
Oooh! A bike sheeding thread! I know Drew DeVault might complain, but
hey, this list wouldn't be this list unless the majority of messages were
about text formatting (seriously---over half the messages are not about the
protocol at all, but about text formatting).
So with my introduction out of the way, let me nitpick [1] this proposal
with a bunch of corner cases I can already see ...
It was thus said that the Great John Cowan once stated:
> Given that we are *not* going to change the definition of text/gemini, but
^^^^^ shouldn't that be _not_? Or are you going for
strong emphasis here?
> 1. If an underscore appears outside an emphasized text section, and is at
> the beginning of a text line or after the # characters in a header line, or
> is preceded by a whitespace character, then it marks the beginning of an
> emphasized-text section
Pattern wise, it's like:
(start_of_line ?('#'*) | whitespace) '_'
> (rendered as italics or in some other way).
Such as bold, a larger font, a smaller font, or some other way other than
using italics. Okay, got it.
> 2. If an underscore appears inside an emphasized text section, and is at
> the end of a line, or is followed by whitespace, sentence-terminating
> punctuation, or parenthesis- or quotation-terminating punctuation, then it
> marks the end of the emphasized-text section.
I've found that terminating italic sections before sentence-terminating
punctuation can lead to very ugly output. For example:
He asked about _blandit_?
Here, the italic t will run into the trailing question mark, which I feel
looks terrible. That's why I tend to include sentence terminating marks
within the italic section:
He asked about _blandit?_
This is of less concern with periods and commas, since there isn't much of a
difference, stylistic wise, between a normal period and italic period.
It also sounds like you are expecting users to write stuff like
_lorem ipsum dolor sit amet_
or
_lorem_ipsum_dolor_sit_amet_
else, why not just say that once in an emphasized text section, the next
underscore ends it. Much easier to deal with, and a bit easier to deal with
when wrapping text (although I suppose one can add '_' to the list, along
with whitespace and hypens).
> These rules exclude underscores in things like snake_case_variables, while
> supporting most actual uses.
>
> 3. An emphasized-text section ends unconditionally at the end of a line.
Odd, but I can see why you say so, given the nature of parsing gemtext.
But one unaware of that might end up writing:
blah blabh _lorem ipsum dolor
sit amet_ blah blah blah
and wonder why the italicised text is all wrong.
> The attached file specifies all the Unicode whitespace and terminating
> punctuation, from the Unicode Character Database. There are quite a few,
> but you don't even need a regular expression, just a list of the characters.
All 352 of the characters.
For now.
That might be updated at the next Unicode revision.
Got it.
> I hope this is helpful and/or inspirational.
-spc (Unicode is hard! Let's do rocketry!)
[1] Can we still say that term?
More information about the Gemini
mailing list