Unicode vs. the World
PJ vM
pjvm742 at disroot.org
Tue Dec 15 19:11:12 GMT 2020
On 12/15/20 4:41 PM, Côme Chilliet wrote:
> So, if you see a "%", it is percent encoding. If you want to link to
> a path containing a percent, you have to percent encode the percent,
> resulting in %25.
>
> As a result, percent encoding twice does not break the link, as you
> only percent encode what is not percent encoded already.
So that would define a special percent-encoding for clients, where
they'd encode everything except percent signs, right? So in this link:
=>gemini://example.com/🐰🥕🐇🐰-why-space-is-%20-in-urls.gmi
, a client would have to percent-encode the emojis, but leave the "%20"
bit alone? This seems very confusing; it's also not one-to-one (encoding
then decoding "%20" gives " " back)... And if you just skip
percent-encoding when the only "encodable" characters in the path are
percent signs, that's confusing too. That rule also doesn't work on
=>gemini://example.com/%XY.gmi
Also, if an author wants to link to "why-space-is-%20-in-urls.gmi" at
example.com, the only option would be to write
=>gemini://example.com/why-space-is-%2520-in-urls.gmi
This introduces a pitfall for authors: they never have to think about
percent-encoding, *except* when there are percent signs in the path.
How is this better than agreeing that link paths in gemtext are always
completely percent-encoded? In that case, clients can percent-decode the
path and display that. Authors could use a tool that 'fully' (as in, it
also turns every "%" into "%25") percent-encodes a link for them.
Counterintuitively, in this way I think mandating completely
percent-encoded paths in gemtext link lines might actually result in
easier linking for authors.
The same (clients may/should display, authors use tool) could be done
with internationalised domain names (could be the same tool that does
the percent-encoding), but crucially there is no ambiguity there,
because an ascii domain name with "xn--" is unrepresentable in punycode
and disallowed (I think). On the other hand, allowing anything
whatsoever in the domain name and nothing in the path would be strange
and a bit inconsistent.
Assuming we don't do IRI paths in gemtext link lines, I don't really
have an clear opinion regarding IDNs, the choice is between:
* all clients need to convert to punycode when following a link, authors
can easily link to IDNs without a tool (though they're already using a
tool for unicode paths), somewhat inconsistent/strange
* fancy clients will convert from punycode when displaying a link,
authors need a tool to be able to easily make links to IDNs (though
they're already using a tool for unicode paths)
--
pjvm
More information about the Gemini
mailing list