Asian Text in URLs
Sean Conner
sean at conman.org
Sat Oct 3 08:53:17 BST 2020
It was thus said that the Great mieum once stated:
> Hi Everyone,
>
> I recently decided to start an experimental Gemlog written in Korean. I
> found, however, that some clients do not like having Asian text in the
> path of the URL---either as the name of a folder or a file.
>
> What is the general consensus about non-latin characters in URLs?
The actual specification for URLs is RFC-3986, and that lists the valid
characters for the path portion of a URL, which are:
abcdefghijklmnopqrstuvwxyz
ABCDEFGHIJKLMNOPQRSTUVWXYZ
0123456789
-._~
!$&'()*+,;=:@
ANY OTHER CHARACTER has to be percent-encoded.
> For an example, you can point your client to gemini://namu.blue/쌍록/
It should be:
gemini://namu.blue/%EC%8C%8D%EB%A1%9D/
The server will then have to decode the percent-encoded data to get the
proper file to use.
> Anyway, I just wanted to see what everyone thinks about the best
> practice in this situation.
Ensure you are using UTF-8 on the server, and percent-encode the path when
generating the URL.
-spc
More information about the Gemini
mailing list