Scheme Section 2 quibble

Sean Conner sean at conman.org
Tue Nov 17 22:10:41 GMT 2020


It was thus said that the Great Ali Fardan once stated:
> On Tue, 17 Nov 2020 10:19:52 +0100
> Philip Linde <linde.philip at gmail.com> wrote:
> > With respect to RFC3986, it's not a matter of opinion.
> > 
> > It's very much not an implementation specific hack. It's defined in
> > RFC 3986 as "relative-ref", a "network-path reference" specifically.
> > Non-URIs of the "example.com/hello" style on the other hand are an
> > implementation specific hack, as you've noted, discouraged by RFC 3986
> > and not specified in any of the syntaxes it defines. It's obviously
> > unsuitable for links because it's ambiguous with relative-ref.
> 
> I don't know about that, section 3.2 states that authority should be
> preceded by a "//", not that it is a part of the authority component,
> also, the ABNF representation has no "//" in it.
> 
> Suffix references (section 4.5) are only discouraged because of
> possible misinterpretation, however in the case of Gemini requests,
> people can write their code to handle them just like they write their
> code to handle "//example.tld", it's not that hard and looks much much
> cleaner, the argument that it could be interpreted as path should also
> apply for "//example.tld" too, because it could be interpreted as a
                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> path too, however if the author decided to handle such case, it'll be
  ^^^^^^^^^
  Citation needed.

  I'm sorry, this just isn't the case.  From the full ABNF in Appendix A:

   URI           = scheme ":" hier-part [ "?" query ] [ "#" fragment ]

   hier-part     = "//" authority path-abempty
                 / path-absolute 
                 / path-rootless 
                 / path-empty

   URI-reference = URI / relative-ref
   
   absolute-URI  = scheme ":" hier-part [ "?" query ]

   relative-ref  = relative-part [ "?" query ] [ "#" fragment ]

   relative-part = "//" authority path-abempty
                 / path-absolute
                 / path-noscheme
                 / path-empty 

[ NON-PATH RELATED RULES OMITTED FOR SPACE I REPEAT NON-PATH RELATED RULES OMITTED FOR SPACE ]

   path          = path-abempty    ; begins with "/" or is empty
                 / path-absolute   ; begins with "/" but not "//"
                 / path-noscheme   ; begins with a non-colon segment
                 / path-rootless   ; begins with a segment
                 / path-empty      ; zero characters

   path-abempty  = *( "/" segment )
   path-absolute = "/" [ segment-nz *( "/" segment ) ]
   path-noscheme = segment-nz-nc *( "/" segment )
   path-rootless = segment-nz *( "/" segment )
   path-empty    = 0<pchar>

  The path parsing rules state a single slash.  Not '/'+, nor '/'*, but a
single '/'.  The only place where more than a single slash is allowed PER
THE @#%@#$@$ ABNF is just prior to the authority, which contains the
hostname.  THE ONLY PLACE!  

  I will also draw your attention to the URI-reference rule, which is there
for some reason, which allows both a full URI, or a RELATIVE URI, which
means that

		//example.com/path/to/resource

IS A VALID URI!  IT IS NOT A HACK!  What part of the ABNF do you not
understand?

> handled just fine, you can have your parser treat the text before the
> first occurrence of '/' as host subcomponent of authority component if
> scheme is not specified just like you have your parser treat the first
> occurrence of '/' after the "//" prefix as host subcomponent in the
> current way of handling schemeless requests in Gemini, the Gemini
> protocol requires passing full URL in requests, therefore, such should
> not be interpreted as path because Gemini requests don't allow path
> without stating host.

  No, the spec allows both the full URI, and a relative URI as long as it
starts with '//' (it has the authority section).  The wording in the spec is
bad and should be changed to clarify it, but that's the current
specification.  

  Again,

		//example.com/path/to/resource

IS NOT A HACK!

> So yeah, I'm not changing my mind, "//example.tld" is a hack because
> that is not a valid URI and "//" is supposed to be only present when
> scheme is specified, however, "example.tld" is while discouraged,
> acceptable for this use case and the RFC even acknowledged it.
> 
> Let me quote to you why it is that RFC 3986 discourages its use:
> 
> > Although this practice of using suffix references is common, it
> > should be avoided whenever possible and should never be used in
> > situations where long-term references are expected.
> 
> In the case of Gemini requests, they are not a 'long-term' reference,
> they're one-time requests, I don't see any downside to not doing it.
> 
> > Last I checked, if you connect to gemini://gemini.circumlunar.space
> > and request "gemini.circumlunar.space/" you get an error. You may
> > however request "//gemini.circumlunar.space/" and get the appropriate
> > 20 response. Should gemini.circumlunar.space be considered to be
> > running a canonical implementation of Gemini?
> 
> You shouldn't look at any particular implementation as a reference for
> the spec, 

  I believe Philip used gemini.circumlunar.space because that's the server
written by solderpunk, author of the specification.  

> I'm assuming gemini.circumlunar.space is running molly-brown,

  Also written by solderpunk.  The bastard!  Writing a Gemini server that
doesn't follow his specification!

> do you know that molly-brown treats single '\n' as valid request
> terminators instead of explicit '\r\n'? (see:
> https://tildegit.org/solderpunk/molly-brown/src/branch/master/handler.go#L138),
> do you know that if a transaction is finished, molly-brown waits for
> the client to close the connection instead of closing it from the
> server side, is that spec compliant?
> 
> The reason I think molly-brown accepted "//example.tld" in the first
> place is because the Go standard library URL parser implementation
> accepted this, I don't know if this was a bug or it is intended design,

  It's by design---see the ABNF above.  

> but that's what it is, other URI parsers that are more strict with
> compliance to the RFC will refuse to parse a URI without scheme
> present, 

  If it does, it's broken by design.  Again, see the ABNF above.

> here is an excerpt from the library's documentation that might
> give an idea of how they treat URLs:
> 
> > A URL represents a parsed URL (technically, a URI reference).
> >
> > The general form represented is:
> >
> > [scheme:][//[userinfo@]host][/]path[?query][#fragment]
> >
> > URLs that do not start with a slash after the scheme are
> > interpreted as:
> >
> > scheme:opaque[?query][#fragment]
> 
> Notice that [scheme:] is enclosed in brackets implying that it is
> optional, while [//host] is optional too, the "//" is considered a part
> of the authority component by the Go URL parser implementation, this is
> why "//example.tld" is accepted while "example.tld" is not, try passing
> both strings to url.Parse() and see what you get.

  Yes, exactly.  Again, that's per the ABNF above.  Why do you not get this? 
Here, have one more excerpt from RFC-3986, this time from section 3:

   The following are two example URIs and their component parts:

         foo://example.com:8042/over/there?name=ferret#nose
         \_/   \______________/\_________/ \_________/ \__/
          |           |            |            |        |
       scheme     authority       path        query   fragment
          |   _____________________|__
         / \ /                        \
         urn:example:animal:ferret:nose

and the URL parsing library I have parses those as:

['foo://example.com:8042/over/there?name=ferret#nose'] =
{
  fragment = "nose",
  query = "name=ferret",
  path = "/over/there",
  scheme = "foo",
  port = 8042.000000,
  host = "example.com",
}

['urn:example:animal:ferret:nose'] =
{
  path = "example:animal:ferret:nose",
  scheme = "urn",
}

and because I like belaboring the inanimate equus pleonastically:

["//example.com/path/to/resource"] =
{
  host = "example.com",
  path = "/path/to/resource",
}

["/example.com/path/to/resource"] =
{
  path = "/example.com/path/to/resource",
}

["example.com/path/to/resource"] =
{
  path = "example.com/path/to/resource",
}

  You should try those with the Go URL parser you use and see what YOU get.

  -spc


More information about the Gemini mailing list