Unicode vs. the World
Côme Chilliet
come at chilliet.eu
Tue Dec 15 20:00:33 GMT 2020
Le mardi 15 décembre 2020, 20:11:12 CET PJ vM a écrit :
> So that would define a special percent-encoding for clients, where
> they'd encode everything except percent signs, right? So in this link:
> =>gemini://example.com/🐰🥕🐇🐰-why-space-is-%20-in-urls.gmi
> , a client would have to percent-encode the emojis, but leave the "%20"
> bit alone? This seems very confusing; it's also not one-to-one (encoding
> then decoding "%20" gives " " back)... And if you just skip
> percent-encoding when the only "encodable" characters in the path are
> percent signs, that's confusing too. That rule also doesn't work on
> =>gemini://example.com/%XY.gmi
Because this is not a valid link, neither URI nor IRI.
> Also, if an author wants to link to "why-space-is-%20-in-urls.gmi" at
> example.com, the only option would be to write
> =>gemini://example.com/why-space-is-%2520-in-urls.gmi
> This introduces a pitfall for authors: they never have to think about
> percent-encoding, *except* when there are percent signs in the path.
Yes, and spaces, and delimiter characters, such as "/".
> How is this better than agreeing that link paths in gemtext are always
> completely percent-encoded? In that case, clients can percent-decode the
> path and display that. Authors could use a tool that 'fully' (as in, it
> also turns every "%" into "%25") percent-encodes a link for them.
Because a completely percent encoded link is hell to read and to write, for instance:
gemini://gemini.circumlunar.space/%64%6f%63%73/%66%61%71%2e%67%6d%69
So I think you do not mean «completely percent-encoded», you mean percent encode non-ascii non-reserved text, and you feel like this is better because you are use to english and ascii.
But you will always need to remember which chars you need to percent encode. You will never be able to use "/" in a file name without percent encoding. Or "?".
> Counterintuitively, in this way I think mandating completely
> percent-encoded paths in gemtext link lines might actually result in
> easier linking for authors.
No, it is just a different set of characters to percent encode.
> The same (clients may/should display, authors use tool) could be done
> with internationalised domain names (could be the same tool that does
> the percent-encoding), but crucially there is no ambiguity there,
> because an ascii domain name with "xn--" is unrepresentable in punycode
> and disallowed (I think). On the other hand, allowing anything
> whatsoever in the domain name and nothing in the path would be strange
> and a bit inconsistent.
Yes, IDN are covered by punycode, but the question remains whether I am allowed to use the unicode form in a link line.
=> gemini://gémeaux.example.com Is that legal?
> Assuming we don't do IRI paths in gemtext link lines, I don't really
> have an clear opinion regarding IDNs, the choice is between:
> * all clients need to convert to punycode when following a link, authors
> can easily link to IDNs without a tool (though they're already using a
> tool for unicode paths), somewhat inconsistent/strange
> * fancy clients will convert from punycode when displaying a link,
> authors need a tool to be able to easily make links to IDNs (though
> they're already using a tool for unicode paths)
Yes.
I am for IDN in link lines, but I am also in favor of IRI in link lines.
And I would be supportive of using IRI in request line also for that matter. And redirect responses.
Côme
More information about the Gemini
mailing list