Query Strings
colecmac at protonmail.com
colecmac at protonmail.com
Mon May 25 16:54:32 BST 2020
I think it might just make the most sense to say in the spec that
encoding is required, and should be done with percent signs, for
spaces too. Like in Sean's message:
?query=what%20is%20this%20madness&lang=en
makeworld
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Sunday, May 24, 2020 5:28 PM, Sean Conner <sean at conman.org> wrote:
> It was thus said that the Great Brian Evans once stated:
>
> > Greetings,
> > I got a bug report recently for Bombadillo about how I have been handling
> > query strings.
>
> [ snip ]
>
> > I think it would be good to clearly state what is expected of clients and
> > servers regarding the escaping of querystring values for gemini.
>
> There are three standards conflating here. They are:
>
> [CGI] RFC-3875
> [URI] RFC-3986
> [WEBFORM] https://www.w3.org/TR/html401/interact/forms.html
>
> I'm going to try to do a summary here (if anyone is interested in the gory
> details, check the docs listed above). To encode a URL (per [URI]), the
> following characters can be used AS IS:
>
> ALPHA DIGIT - . _ ~
>
> and the following characters MUST always be encoded [1]:
>
> % < > [ ] { } | \ ^ SPACE CONTROL NON-ASCII
>
> The set of characters not included in this depend upon where in the URL is
> appears (more on that below).
>
> Encoding a character means converting it to its hex value and preceeding
> it with a '%':
>
> ##% -> %23%23%25
>
> Each section of a URL (scheme, authority [2], path, query, fragment)
> allows certain characters that would otherwise be encoded to NOT be encoded.
> I'll concentrate on the query portion since that's the part under question.
> The query portion allows the following characters to appear non-encoded:
>
> ALPHA DIGIT - . _ ~ / ? : @
>
> The '=' and '&' are used as sub-delimeters (to separate name and value,
> and to separate namevalue pairs). If a '=' or '&' appear in a name or the
> value, they have to be encoded.
>
> The '+' sign is listed as a sub-delimeter in [URI], but otherwise says
> nothing about it. [CGI] and [WEBFORM] define it differently. [CGI] allows
> it, but only if '=' and '&' aren't used (section 4.4):
>
> ...?one+two+three '+' ALLOWED
> ...?one+two=3&three=3 '+' DISALLOWED
>
> And in this case, the '+' is to be treated as a space. In any other case,
> the space needs to be encoded:
>
> ...?query=what%20is%20this%20madness&lang=en DEFINED
> ...?query=what+is+this+madness&lang=en UNDEFINED
>
> [WEBFORM] defines the '+' to be a space, but only when the data is being
> sent as part of a POST, and the content type is
> "application/x-www-form-urlencoded". This doesn't apply at all to Gemini.
>
> Now, it could be that there are webservers (or CGI scripts) that convert
> '+' to spaces reguardless. I'm just saying ...
>
> Hopefully, this clears it all up (said as he wipes the mud off his face).
>
> -spc (Don't hesitate to ask any questions ... )
>
> [1] You'd be hard pressed to see these listed in [URI] since they aren't
> listed! RFC-1738 lists those characters explicitly, so that's four
> references. Sorry.
>
> [2] [URI] calls the host portion "authority".
More information about the Gemini
mailing list