Scheme Section 2 quibble
Sean Conner
sean at conman.org
Wed Nov 18 08:42:57 GMT 2020
It was thus said that the Great Sudipto Mallick once stated:
> While you are discussing about the specs, please have a look at how
> the servers are currently responding to the edge cases.
>
> http://ix.io/2EyQ
>
> Request -> Response (first line only)
> The list of known servers from gemini://gus.guru/known-hosts : removed
> all non existent servers and *.flounder.online
> Test yourself: http://ix.io/2Etk
>
> And if you can, forgive my madness.
Thank you for running this and reporting the results. I can describe why
you got the results for my server: gemini.conman.org
gemini.conman.org -> 59 Bad Request
gemini.conman.org/ -> 59 Bad Request
gemini.conman.org// -> 59 Bad Request
These are bad because there's no scheme nor authority (missing a '//') and
thus, these are marked as a bad request.
//gemini.conman.org -> 20 text/gemini
//gemini.conman.org/ -> 20 text/gemini
//gemini.conman.org// -> 59 Bad Request
These are missing the scheme, but have an authority section [1]. The URL
parser I use adds a '/' for the path if the path does not exist. That's why
my server does not do a 31-redirect with a missing '/' at the end. The
double slash at the end is being checked by a modified path-abempty rule.
The ABNF from the RFC is:
path-abempty = *( "/" segment )
while the URL parser I'm using is doing:
path_abempty <- {~ ( '/' segment)+ ~}
/ '' -> '/'
The parsing code is in LPEG [2] and is equivalent to
path-abempty = +( "/" segment)
/ 0<pchar> # and return a '/'
and was written that way to fix an issue inherent with the ABNF of
"0<pchar>" and how parsing works with LPEG. I can go into details of LPEG
if anyone is interested, but suffice to say, the path_abempty of LPEG is
different from the ABNF of the RFC for a good reason, and this is why the
trailing '//' from the authority section is not parsing.
gemini://gemini.conman.org -> 20 text/gemini
gemini://gemini.conman.org/ -> 20 text/gemini
gemini://gemini.conman.org// -> 59 Bad Request
A more normal request, and the same explanation from above. No surprises
for my server (at least, to me). A more interesting response is from
blekksprut.net and cadence.moe:
blekksprut.net -> 20 text/gemini
blekksprut.net/ -> 20 text/gemini
blekksprut.net// -> 20 text/gemini
//blekksprut.net -> 51 not found
//blekksprut.net/ -> 51 not found
//blekksprut.net// -> 51 not found
gemini://blekksprut.net -> 20 text/gemini
gemini://blekksprut.net/ -> 20 text/gemini
gemini://blekksprut.net// -> 20 text/gemini
cadence.moe -> 20 text/gemini; charset=utf-8; lang=en
cadence.moe/ -> 20 text/gemini; charset=utf-8; lang=en
cadence.moe// -> 20 text/gemini; charset=utf-8; lang=en
//cadence.moe -> 50 Bliz server: Not found: //cadence.moe
//cadence.moe/ -> 50 Bliz server: Not found: //cadence.moe/
//cadence.moe// -> 50 Bliz server: Not found: //cadence.moe//
gemini://cadence.moe -> 20 text/gemini; charset=utf-8; lang=en
gemini://cadence.moe/ -> 20 text/gemini; charset=utf-8; lang=en
gemini://cadence.moe// -> 20 text/gemini; charset=utf-8; lang=en
These results probably stem from a same issue, but possibly different
servers. Just going quickly through the results, if there was no problem
with the first grouping (just the domain name), it seems the servers *have* an
issue with the second grouping (leading '//'). Odd.
Again, thanks for this.
-spc
[1] I've been debating if I should mark a missing scheme as a "bad
request" as I've come around to support that a Gemini server should
ONLY accept an absolute URL. I haven't ... yet.
[2] Lua Parsing Expression Grammar
More information about the Gemini
mailing list