[spec] IRIs, IDNs, and all that international jazz

Sean Conner sean at conman.org
Sat Dec 26 23:22:36 GMT 2020


It was thus said that the Great Petite Abeille once stated:
> > On Dec 26, 2020, at 22:10, John Cowan <cowan at ccil.org> wrote:
> > 
> > the whole URI can have its non-ASCII characters %-encoded all at once
> 
> Right. But that was not Stephane problematic, which was related to how to
> encode Reserved Characters gen-delims "/" in a path.
> 
> Consider the following 3 path segments: "Research", "A/B Testing",
> "Results".
> 
> Stephane asserts the following encodings are equivalent:
> 
> Research%2FA%2FB%20Testing%2FResults
> 
> vs.
> Research/A%2FB%20Testing/Results
> 
> They are clearly not. The first variant will result in one path segment,
> with data loss. While the second one will preserve the original semantic,
> with 3 segments, individually encoded, and intact.
> 
> They are not equivalent path. Try it in your favorite library.

  It was interesting to see the Go URL library you linked to.  For your two
examles, it will return the following structures:

	{
	  Path    = "Research/A/B Testing/Results",
	  RawPath = "Research%2FA%2FB%20Testing%2FResults",
	}

	{
	  Path    = "Research/A/B Testing/Results",
	  RawPath = "Research/A%2FB%20Testing/Results",
	}

and it's up to the client to check RawPath if it's *really* necessary to
make the distinction (meaning---the client *still* has to parse the path).

  A more normal example like "Research/ABTesting/Results" will result in:

	{
	  Path    = "Research/ABTesting/Results",
	  RawPath = "",
	}

so it's not like RawPath will always have the path.

  For the record, my own URL parsing library will just return 

	Research/A/BTesting/Results

for both samples.  I found it easier to work with that than what I was doing
at the time (pedantically correct, painfully hard to use in practice). You
would be hard pressed to actually create a file named "A/B Testing" on any
file system I know of (and not have it be "B Testing" in the "A" directory). 
If there *is* a file system that allows slashes in a filename (and not just
a seperator between directories) than I might revisit my decision, but until
then ...

  -spc


More information about the Gemini mailing list