Scheme Section 2 quibble
Sean Conner
sean at conman.org
Tue Nov 17 22:10:41 GMT 2020
It was thus said that the Great Ali Fardan once stated:
> On Tue, 17 Nov 2020 10:19:52 +0100
> Philip Linde <linde.philip at gmail.com> wrote:
> > With respect to RFC3986, it's not a matter of opinion.
> >
> > It's very much not an implementation specific hack. It's defined in
> > RFC 3986 as "relative-ref", a "network-path reference" specifically.
> > Non-URIs of the "example.com/hello" style on the other hand are an
> > implementation specific hack, as you've noted, discouraged by RFC 3986
> > and not specified in any of the syntaxes it defines. It's obviously
> > unsuitable for links because it's ambiguous with relative-ref.
>
> I don't know about that, section 3.2 states that authority should be
> preceded by a "//", not that it is a part of the authority component,
> also, the ABNF representation has no "//" in it.
>
> Suffix references (section 4.5) are only discouraged because of
> possible misinterpretation, however in the case of Gemini requests,
> people can write their code to handle them just like they write their
> code to handle "//example.tld", it's not that hard and looks much much
> cleaner, the argument that it could be interpreted as path should also
> apply for "//example.tld" too, because it could be interpreted as a
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> path too, however if the author decided to handle such case, it'll be
^^^^^^^^^
Citation needed.
I'm sorry, this just isn't the case. From the full ABNF in Appendix A:
URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ]
hier-part = "//" authority path-abempty
/ path-absolute
/ path-rootless
/ path-empty
URI-reference = URI / relative-ref
absolute-URI = scheme ":" hier-part [ "?" query ]
relative-ref = relative-part [ "?" query ] [ "#" fragment ]
relative-part = "//" authority path-abempty
/ path-absolute
/ path-noscheme
/ path-empty
[ NON-PATH RELATED RULES OMITTED FOR SPACE I REPEAT NON-PATH RELATED RULES OMITTED FOR SPACE ]
path = path-abempty ; begins with "/" or is empty
/ path-absolute ; begins with "/" but not "//"
/ path-noscheme ; begins with a non-colon segment
/ path-rootless ; begins with a segment
/ path-empty ; zero characters
path-abempty = *( "/" segment )
path-absolute = "/" [ segment-nz *( "/" segment ) ]
path-noscheme = segment-nz-nc *( "/" segment )
path-rootless = segment-nz *( "/" segment )
path-empty = 0<pchar>
The path parsing rules state a single slash. Not '/'+, nor '/'*, but a
single '/'. The only place where more than a single slash is allowed PER
THE @#%@#$@$ ABNF is just prior to the authority, which contains the
hostname. THE ONLY PLACE!
I will also draw your attention to the URI-reference rule, which is there
for some reason, which allows both a full URI, or a RELATIVE URI, which
means that
//example.com/path/to/resource
IS A VALID URI! IT IS NOT A HACK! What part of the ABNF do you not
understand?
> handled just fine, you can have your parser treat the text before the
> first occurrence of '/' as host subcomponent of authority component if
> scheme is not specified just like you have your parser treat the first
> occurrence of '/' after the "//" prefix as host subcomponent in the
> current way of handling schemeless requests in Gemini, the Gemini
> protocol requires passing full URL in requests, therefore, such should
> not be interpreted as path because Gemini requests don't allow path
> without stating host.
No, the spec allows both the full URI, and a relative URI as long as it
starts with '//' (it has the authority section). The wording in the spec is
bad and should be changed to clarify it, but that's the current
specification.
Again,
//example.com/path/to/resource
IS NOT A HACK!
> So yeah, I'm not changing my mind, "//example.tld" is a hack because
> that is not a valid URI and "//" is supposed to be only present when
> scheme is specified, however, "example.tld" is while discouraged,
> acceptable for this use case and the RFC even acknowledged it.
>
> Let me quote to you why it is that RFC 3986 discourages its use:
>
> > Although this practice of using suffix references is common, it
> > should be avoided whenever possible and should never be used in
> > situations where long-term references are expected.
>
> In the case of Gemini requests, they are not a 'long-term' reference,
> they're one-time requests, I don't see any downside to not doing it.
>
> > Last I checked, if you connect to gemini://gemini.circumlunar.space
> > and request "gemini.circumlunar.space/" you get an error. You may
> > however request "//gemini.circumlunar.space/" and get the appropriate
> > 20 response. Should gemini.circumlunar.space be considered to be
> > running a canonical implementation of Gemini?
>
> You shouldn't look at any particular implementation as a reference for
> the spec,
I believe Philip used gemini.circumlunar.space because that's the server
written by solderpunk, author of the specification.
> I'm assuming gemini.circumlunar.space is running molly-brown,
Also written by solderpunk. The bastard! Writing a Gemini server that
doesn't follow his specification!
> do you know that molly-brown treats single '\n' as valid request
> terminators instead of explicit '\r\n'? (see:
> https://tildegit.org/solderpunk/molly-brown/src/branch/master/handler.go#L138),
> do you know that if a transaction is finished, molly-brown waits for
> the client to close the connection instead of closing it from the
> server side, is that spec compliant?
>
> The reason I think molly-brown accepted "//example.tld" in the first
> place is because the Go standard library URL parser implementation
> accepted this, I don't know if this was a bug or it is intended design,
It's by design---see the ABNF above.
> but that's what it is, other URI parsers that are more strict with
> compliance to the RFC will refuse to parse a URI without scheme
> present,
If it does, it's broken by design. Again, see the ABNF above.
> here is an excerpt from the library's documentation that might
> give an idea of how they treat URLs:
>
> > A URL represents a parsed URL (technically, a URI reference).
> >
> > The general form represented is:
> >
> > [scheme:][//[userinfo@]host][/]path[?query][#fragment]
> >
> > URLs that do not start with a slash after the scheme are
> > interpreted as:
> >
> > scheme:opaque[?query][#fragment]
>
> Notice that [scheme:] is enclosed in brackets implying that it is
> optional, while [//host] is optional too, the "//" is considered a part
> of the authority component by the Go URL parser implementation, this is
> why "//example.tld" is accepted while "example.tld" is not, try passing
> both strings to url.Parse() and see what you get.
Yes, exactly. Again, that's per the ABNF above. Why do you not get this?
Here, have one more excerpt from RFC-3986, this time from section 3:
The following are two example URIs and their component parts:
foo://example.com:8042/over/there?name=ferret#nose
\_/ \______________/\_________/ \_________/ \__/
| | | | |
scheme authority path query fragment
| _____________________|__
/ \ / \
urn:example:animal:ferret:nose
and the URL parsing library I have parses those as:
['foo://example.com:8042/over/there?name=ferret#nose'] =
{
fragment = "nose",
query = "name=ferret",
path = "/over/there",
scheme = "foo",
port = 8042.000000,
host = "example.com",
}
['urn:example:animal:ferret:nose'] =
{
path = "example:animal:ferret:nose",
scheme = "urn",
}
and because I like belaboring the inanimate equus pleonastically:
["//example.com/path/to/resource"] =
{
host = "example.com",
path = "/path/to/resource",
}
["/example.com/path/to/resource"] =
{
path = "/example.com/path/to/resource",
}
["example.com/path/to/resource"] =
{
path = "example.com/path/to/resource",
}
You should try those with the Go URL parser you use and see what YOU get.
-spc
More information about the Gemini
mailing list