URLs in request lines
Sean Conner
sean at conman.org
Sat Sep 14 10:50:26 BST 2019
It was thus said that the Great plugd once stated:
> Hi again Sean,
Hello, plugd.
> Sean Conner writes:
> > we have the 'scheme' portion, then the two '//' which means we're following
> > the first rule in 'hier-part'. 'authority' is the host part (which I didn't
> > include) followed by a 'path-abempty', of which there can be 0 or more of,
> > so that's a perfectly cromulent URL. It's the responsibility of the
> > *server* to handle the situation, not the client.
>
> I just read over this again and realised I'd been too hasty in my
> earlier response. You point out that according to the URI RFC an empty
> path is a valid URL, and while this is good to know, does the following
> necessarily follow?
>
> > Semantically speaking, these:
> >
> > gemini://example.com
> > gemini://example.com/
> >
> > are the same.
>
> For gopher, gopher://example.com/1 and gopher://example.com/1/ are not
> semantically the same. (Although they are often - but not always -
> treated as such.) Section 6.2.3 on scheme-based normalization notes
> that http://example.com and http://example.com/ are semantically
> equivalent, and goes on to suggest that URIs of other schemes _should_
> follow this example. So I suppose we now say that gemini does?
The URL spec is RFC-3986. Gopher gets its own URL RFC with RFC-4266. One
major difference is in the query portion. To send in a "query" string with
a non-gopher URL, you do:
http://example.com/?search%20for%20me (yes, this is valid)
The same example for Gopher would be:
gopher://example.com/7search%09look%20for%20me
It does NOT use the normal query syntax for URLs. In fact, RFC-4266 even
states:
A Gopher URL takes the form:
gopher://<host>:<port>/<gopher-path>
...
Within the <gopher-path>, no characters are reserved.
So the intent (in my opinion) is that one can decode the <gopher-path>
portion and pass it (minus the first character) verbatim to a gopher server
(of course after decoding any URL-encoded characters, which means that %09
is translated to an ASCII HT (horizontal tab). Had Gopher been more in line
with URL-3986, then a gopher URL might be more like:
gopher://example.com/7search?look%20for%20me
but I suspect this wasn't done because of Gopher+, which is covered in
RFC-4266 but I don't know of *any* servers today that support it (although
I'm willing to be corrected on that). The Gopher+ information, is, of
course, separated from the search portion by another %09 in the URL (see
RFC-4266 section 2.9 for a crazy example of that).
So, the upshot (as I see it) is that the gopher URL format is divorced
from the RFC-3986 URL and is its own thing. You can't really say they have
the same semantic rules. This is also reflected in the caps.txt file you
will sometimes find on gopher servers to address the bit in RFC-1436 that
gopher selectors are opaque and *no* meaning is to be inferred by the
client.
As far as Gemini goes, I've been parsing Gemini URLs under RFC-3986, just
like http:, https:, ftp: and file:.
-spc (Did that answer your question?)
More information about the Gemini
mailing list