[spec] IRIs, IDNs, and all that international jazz

bie bie at 202x.moe
Thu Dec 24 13:36:43 GMT 2020


On Thu, Dec 24, 2020 at 01:39:16PM +0100, Omar Polo wrote:
> I think we're in the same boat, as I have written from scratch my server
> using only stuff that's in base on OpenBSD too.
> 
> Initially I was totally for option #3 (but I've that I've just finished
> skimming through the RFC), but by reading your messages I was a little
> scared of the consequences.
> 
> Today I did some light testing, and it seems that (IF I'm understanding
> everything correctly -- please correct me otherwise) that option #3 is
> actually simpler for us.
> 
> Current state of the affairs: both Lagrange (0.13.1), amfora and elpher
> will encode "gemini.omarpolo.com/cafè.gmi" as
> "gemini.omarpolo.com/caf%C3%A8.gmi".  Obviously open("caf%C3%A8.gmi")
> fails, so my server return 51 because the actual file name is
> "cafè.gmi".  I have to write code that de-encode parts of the request if
> I want to serve a file named like that (spoiler: I'm not gonna write it).
> 
> With IRI: the request becomes "gemini://gemini.omarpolo.com/cafè.gmi",
> so open("cafè.gmi") doesn't fail.  I think that we can continue to treat
> the request as a bytestring, extract the path and try to open(2) it.
> 
> I know that what I'm proposing is a really poor-man solution, because it
> doesn't matter we choose option #1, #2 or #3 as we can't really treat
> the path in the URL/IRL as a bytestring and call it a day.  UNIX file
> names are real bytestring with only two forbidden octet, URL/IRI
> aren't.
> 
> So, if I'm not missing anything, I'm all in for option #3.

You're kind of correct in the sense that if we just treat the request as
arbitrary bytes and not as an IRI (no validation, no handling at all),
it's simple, but I don't think that's the right way to look at this
issue. Instead, it's about the complexity of proper URI handling vs
proper IRI handling. Not to mention that IRIs can still have
percent-encoded characters!

After thinking about this for a while, the biggest issue for me is that
this is a breaking change. Breaking in the sense that it breaks *every
single compliant server we already have*! If gemini, which has been
surprisingly good at resisting breaking spec changes, accepts this, I
don't see any reason to believe that it won't happen again and again,
for equally silly reasons.

bie


More information about the Gemini mailing list