Unicode vs. the World

Jason McBrayer jmcbray at carcosa.net
Thu Dec 17 13:31:38 GMT 2020


Björn Wärmedal <bjorn.warmedal at gmail.com> writes:

> Because — as I tried to point out — there is no reasonably simple
> heuristic for determining whether a URL is already percent encoded or
> not. And percent encoding a URL that is already percent encoded
> exchanges all % characters with %25.

It's not that hard. All you have to do is percent decode the path *first*,
then percent encode it. Consider this URL, which is a worst-case for
what you're talking about:

gemini://example.com/🐇%20🥕.gmi

Unquoting the path gives you 'gemini://example.com/🐇 🥕.gmi', of
course. And then quoting it gives you 

'gemini://example.com/%F0%9F%90%87%20%F0%9F%A5%95.gmi'

which decodes correctly.

Unquoting a path that is already plain ASCII does nothing to it.

-- 
Jason McBrayer      | “Strange is the night where black stars rise,
jmcbray at carcosa.net | and strange moons circle through the skies,
                    | but stranger still is lost Carcosa.”
                    | ― Robert W. Chambers,The King in Yellow


More information about the Gemini mailing list