Color and other escape sequences in Gemini
Sean Conner
sean at conman.org
Sun Jan 19 00:10:19 GMT 2020
I'm seeing some people wanting to support some other formatting options
like color, leading to the support of ECMA-48 [1]. I would rather NOT see
this supported because of several issues.
1). Complexity. The spec is rather complex and wide ranging. There are 130
escape codes (and I'm EXCLUDING those defined by ANSI, codes 0 through
31). If you are filtering them, then five or six cases (depending if
you want to support one non-standard encoding used by xterm) need to be
considered (which is still four or five more than most would expect).
2). Security. Passing raw escape sequences can not only leave the termainal
into an unknown state, but there are a few seqences that are especially
alarming:
DCS Device Control String
OSC Operating System Command
APC Application Program Command
which, if supported, are exactly what they say on the tin, and thus, you
*don't* want these to be processed *at all*. Personally, I haven't come
across any terminal or program that supports these.
Also, ANSI.SYS, originally for MS-DOS but maybe it still exists for
Windows? (I don't know, I don't use Windows) allows one to redefine any
key on the keyboard to send a new sequence. "DELTREE C:\*" anyone?
Below is the BNF (RFC-5234) for ECMA-48 control sequences:
CSI = %d27 '['
/ %d155 ; ISO charset
/ %d194 %d155 ; UTF-8
OSC = %d27 ']'
/ %d157 ; ISO
/ %d194 %d157 ; UTF-8
ST = %d27 '\'
/ %d156 ; ISO
/ %d194 %156 ; UTF-8
string = %d27 ( 'P' / 'X' / '^' / '_')
/ (%d144 / %d152 / %d158 / %d159) ; ISO
/ %d194 (%d144 / %d152 / %d158 / %d159) ; UTF-8
iso = %d160-255
; the utf8 rule is any UTF-8 codepoint 160 or higher
sequence = CSI *(%d48-63) *(%d32-47) %d64-126
/ OSC *(%d8-13 / %32-126 / iso / utf8) (ST / %d7)
/ string *(%d8-13 / %32-126 / iso / utf8) ST
/ %d27 (%d64-126)
/ %d128-159 ; ISO
/ %d194 (%d128-159) ; UTF-8
If you are parsing an ISO charset (like ISO-8859-1) then remove the rules
marked UTF-8; similarly, if you are parsing UTF-8, then remove the rules
marked ISO.
-spc
[1] The so called 'ANSI escape codes', which is funny because they're
not defined by ANSI, but ISO, so of course they're called ECMA.
More information about the Gemini
mailing list