Unicode vs. the World

Gary Johnson lambdatronic at disroot.org
Fri Dec 18 17:16:43 GMT 2020


Katarina Eriksson <gmym at coopdot.com> writes:
>
> Anyway, the request reaches the server. "%20" become space and "%2b" become
> plus. I see no reason why it would be hard to also convert
> "%F0%9F%90%87" into bytes, so I will assume it isn't and wait for server
> software programmers to tell me how wrong I am.
>
> So now we have a string of bytes that we can use to fetch the bunny file.
> Wait. What happened with the case where the bunny isn't %-encoded? Why
> can't servers just blindly accept non-ASCII bytes as is? Is it a library
> thing? Anyway, I really should test this in a bunch of languages but I'm
> writing this on my phone on my way to work, so instead I present you this
> pseudo code:
>
> *ELIDED TEXT HERE*
>
> If these 3 lines are all true for the server software, I see no reason to
> %-encode those non-ASCII bytes in the client or anywhere else. Surely I
> have missed something obvious somewhere. Can anyone help me?

The Space Age server uses java.net.URI to parse incoming URI strings
into their component parts. It can accept URIs with unencoded UTF-8
path, query, and fragment parts (except that spaces must be
percent-encoded as %20). Unicode is not allowed in the hostname part.

One more data point for you,
  Gary

-- 
GPG Key ID: 7BC158ED
Use `gpg --search-keys lambdatronic' to find me
Protect yourself from surveillance: https://emailselfdefense.fsf.org
=======================================================================
()  ascii ribbon campaign - against html e-mail
/\  www.asciiribbon.org   - against proprietary attachments

Why is HTML email a security nightmare? See https://useplaintext.email/

Please avoid sending me MS-Office attachments.
See http://www.gnu.org/philosophy/no-word-attachments.html


More information about the Gemini mailing list