[spec] IRIs, IDNs, and all that international jazz

Omar Polo op at omarpolo.com
Thu Dec 24 12:39:16 GMT 2020

Previous message (by thread): [spec] IRIs, IDNs, and all that international jazz
Next message (by thread): [spec] IRIs, IDNs, and all that international jazz
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

bie <bie at 202x.moe> writes:

>
> My server doesn't have to know anything about unicode to serve a text
> file, just like it doesn't have to be able to parse JPEGs to serve
> images. IRIs means it *does* have to know something about unicode, which
> ucs characters are valid IRI characters, that the "private" UCS are only
> valid in the query part etc etc.
>
> bie

I think we're in the same boat, as I have written from scratch my server
using only stuff that's in base on OpenBSD too.

Initially I was totally for option #3 (but I've that I've just finished
skimming through the RFC), but by reading your messages I was a little
scared of the consequences.

Today I did some light testing, and it seems that (IF I'm understanding
everything correctly -- please correct me otherwise) that option #3 is
actually simpler for us.

Current state of the affairs: both Lagrange (0.13.1), amfora and elpher
will encode "gemini.omarpolo.com/cafè.gmi" as
"gemini.omarpolo.com/caf%C3%A8.gmi".  Obviously open("caf%C3%A8.gmi")
fails, so my server return 51 because the actual file name is
"cafè.gmi".  I have to write code that de-encode parts of the request if
I want to serve a file named like that (spoiler: I'm not gonna write it).

With IRI: the request becomes "gemini://gemini.omarpolo.com/cafè.gmi",
so open("cafè.gmi") doesn't fail.  I think that we can continue to treat
the request as a bytestring, extract the path and try to open(2) it.

I know that what I'm proposing is a really poor-man solution, because it
doesn't matter we choose option #1, #2 or #3 as we can't really treat
the path in the URL/IRL as a bytestring and call it a day.  UNIX file
names are real bytestring with only two forbidden octet, URL/IRI
aren't.

So, if I'm not missing anything, I'm all in for option #3.

Previous message (by thread): [spec] IRIs, IDNs, and all that international jazz
Next message (by thread): [spec] IRIs, IDNs, and all that international jazz
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Gemini mailing list