Ambiguity in spec regarding line endings
prisonpotato at tilde.team
prisonpotato at tilde.team
Thu Jun 4 17:23:26 BST 2020
I disagree with this idea, as it adds a signifigant burden to both
server implementations and client implementations running on unix
systems.
On Thu, Jun 04, 2020 at 12:08:44PM -0400, Ryan Kavanagh wrote:
> I'm reading the current version of the spec, and have come across the
> following ambiguous paragraph in §3.3:
>
> When in canonical form, media subtypes of the "text" type use CRLF
> as the text line break. Gemini relaxes this requirement and allows
> the transport of text media with plain LF alone (but NOT a plain CR
> alone) representing a line break when it is done consistently for an
> entire response body. Gemini clients MUST accept CRLF and bare LF
> as being representative of a line break in text media received via
> HTTP.
>
> How do the second and third sentences interact? In particular, how does
>
> [...] when it is done consistently for an entire response body.
>
> interact with
>
> Gemini clients MUST accept CRLF and bare LF as being representative
> of a line break in text media received via HTTP.
>
> How should Gemini clients behave when both CRLF and LF appear in the
> same text/gemini transmission? Are both to be equivalently treated as
> line breaks?
>
> I've looked through the archives to see what has been said in the past
> about line breaks, and the two following messages appear most relevant:
>
> On Sat, Sep 07, 2019 at 04:30:14PM -0400, Jason McBrayer wrote:
> > IMO, it makes sense to require CRLF in the plain text parts of the
> > protocol (after requests, after the status line of a response), but I
> > don't think that the text/gemini file format needs to have CR/LF; IMO
> > clients should be prepared to accept either LF or CR/LF just as they
> > would with text/plain. And maybe if we're serious about supporting old
> > devices, clients should be prepared for bare CR, too (Classic MacOS).
> > But it's a pain in the arse to authors to have to save text documents
> > with non-native line endings, and I don't feel like servers need to be
> > in the business of reformatting the content they serve.
>
> On Sun, Sep 08, 2019 at 02:42:08PM +0000, solderpunk wrote:
> > I will admit that the current liberal use of CRLF throughout the
> > Gemini spec is the result of me blindly copying from Gopher and other
> > RFCs (as Sean mentioned, it's ubiquitous).
>
> Here's [0,1] some of the history of requiring CRLF in network protocols
> and in requiring CRLF for text/ subtypes [2] during transmission.
>
> TL;DR: every system has a different native line ending sequence (LF vs
> CR vs CRLF). To ensure all can communicate with each other (and to
> simplify parsing of communications), transmissions are required to
> represent all line endings in text formats by CRLF. Line endings used in
> the local storage of text files have *nothing to do* with the line
> endings used in transmission, and clients are expected to convert from
> CRLF to whatever local format is preferred. So indeed, servers are in
> the business of reformatting text/* content that they serve, and they do
> so to ensure interoperability between systems with different line ending
> conventions.
>
> I think there's a conceptual point to be made here: text/gemini files
> are not binary data, but rather, *text files*. This means that their
> transmission should not attempt to provide byte-for-byte identical
> copies of the local data, but should instead follow well-defined and
> agreed-upon representations. If your goal is to transmit a byte-for-byte
> identical copy of your file, there are other mime types you can use to
> accomplish this (e.g., application/octet-stream).
>
> The FTP protocol makes a similar conceptual distinction. It allows for
> text transmission (ASCII and EBCDIC types), where end-of-lines are
> defined to be CLRF (ASCII type) and NL (EBCDIC type). It also allows for
> a stream / binary transfer mode for transmitting text (and other data)
> without any conversion. Quoting from the RFC [4, §3.4]:
>
> For the purpose of standardized transfer, the sending host will
> translate its internal end of line or end of record denotation into
> the representation prescribed by the transfer mode and file
> structure, and the receiving host will perform the inverse
> translation to its internal denotation. [...] Since these
> transformations imply extra work for some systems, identical systems
> transferring non-record structured text files might wish to use a
> binary representation and stream mode for the transfer.
>
> However, in keeping with Postel's law, I suggest allowing clients to
> accept LF as a line ending, as is done by RFC 7230 §3.5 [3]:
>
> Although the line terminator for the start-line and header fields
> is the sequence CRLF, a recipient MAY recognize a single LF as a
> line terminator and ignore any preceding CR.
>
> Conclusion:
>
> To eliminate ambiguity and to make the gemini protocol consistent with
> every other text transmission protocol I know of, I propose amending the
> ambiguous paragraph in the spec as follows:
>
> As specified in RFC 2046 §4.1.1, the canonical form of any MIME
> "text" subtype MUST always represent a line break as a CRLF
> sequence. For robustness, a recipient MAY recognize a single LF as
> a line terminator and ignore any preceding CR in text media.
>
> Best,
> Ryan
>
> [0] https://www.rfc-editor.org/old/EOLstory.txt
> [1] https://tools.ietf.org/html/rfc318
> [ page 8, "End of Line Convention" ]
> [2] https://tools.ietf.org/html/rfc2046#section-4.1.1
> [3] https://tools.ietf.org/html/rfc7230#section-3.5
> [4] https://tools.ietf.org/html/rfc959
>
> --
> |)|/ Ryan Kavanagh | GPG: 4E46 9519 ED67 7734 268F
> |\|\ https://rak.ac | BD95 8F7B F8FC 4A11 C97A
More information about the Gemini
mailing list