[spec] IRIs, IDNs, and all that international jazz
Omar Polo
op at omarpolo.com
Thu Dec 24 12:39:16 GMT 2020
bie <bie at 202x.moe> writes:
>
> My server doesn't have to know anything about unicode to serve a text
> file, just like it doesn't have to be able to parse JPEGs to serve
> images. IRIs means it *does* have to know something about unicode, which
> ucs characters are valid IRI characters, that the "private" UCS are only
> valid in the query part etc etc.
>
> bie
I think we're in the same boat, as I have written from scratch my server
using only stuff that's in base on OpenBSD too.
Initially I was totally for option #3 (but I've that I've just finished
skimming through the RFC), but by reading your messages I was a little
scared of the consequences.
Today I did some light testing, and it seems that (IF I'm understanding
everything correctly -- please correct me otherwise) that option #3 is
actually simpler for us.
Current state of the affairs: both Lagrange (0.13.1), amfora and elpher
will encode "gemini.omarpolo.com/cafè.gmi" as
"gemini.omarpolo.com/caf%C3%A8.gmi". Obviously open("caf%C3%A8.gmi")
fails, so my server return 51 because the actual file name is
"cafè.gmi". I have to write code that de-encode parts of the request if
I want to serve a file named like that (spoiler: I'm not gonna write it).
With IRI: the request becomes "gemini://gemini.omarpolo.com/cafè.gmi",
so open("cafè.gmi") doesn't fail. I think that we can continue to treat
the request as a bytestring, extract the path and try to open(2) it.
I know that what I'm proposing is a really poor-man solution, because it
doesn't matter we choose option #1, #2 or #3 as we can't really treat
the path in the URL/IRL as a bytestring and call it a day. UNIX file
names are real bytestring with only two forbidden octet, URL/IRI
aren't.
So, if I'm not missing anything, I'm all in for option #3.
More information about the Gemini
mailing list