Asian Text in URLs

Sean Conner sean at conman.org
Sat Oct 3 08:53:17 BST 2020


It was thus said that the Great mieum once stated:
> Hi Everyone,
> 
> I recently decided to start an experimental Gemlog written in Korean. I
> found, however, that some clients do not like having Asian text in the
> path of the URL---either as the name of a folder or a file.
> 
> What is the general consensus about non-latin characters in URLs? 

  The actual specification for URLs is RFC-3986, and that lists the valid
characters for the path portion of a URL, which are:

	abcdefghijklmnopqrstuvwxyz
	ABCDEFGHIJKLMNOPQRSTUVWXYZ
	0123456789
	-._~
	!$&'()*+,;=:@

  ANY OTHER CHARACTER has to be percent-encoded.  


> For an example, you can point your client to gemini://namu.blue/쌍록/

  It should be:

	gemini://namu.blue/%EC%8C%8D%EB%A1%9D/

The server will then have to decode the percent-encoded data to get the
proper file to use.
  
> Anyway, I just wanted to see what everyone thinks about the best 
> practice in this situation. 

  Ensure you are using UTF-8 on the server, and percent-encode the path when
generating the URL.

  -spc



More information about the Gemini mailing list