[spec] IRIs, IDNs, and all that international jazz
Sean Conner
sean at conman.org
Sat Dec 26 23:52:14 GMT 2020
It was thus said that the Great Petite Abeille once stated:
>
>
> > On Dec 27, 2020, at 00:22, Sean Conner <sean at conman.org> wrote:
> >
> > For the record, my own URL parsing library will just return
> >
> > Research/A/BTesting/Results
>
> Tragic. I take back my assessment of your LPEG grammar. It's clearly
> wrong. Oh well.
Okay, given your two examples:
Research%2FA%2FB%20Testing%2FResults
Research/A%2FB%20Testing/Results
what should a "proper" URL parser return? And how should client code handle
such a construct? Perhaps even attempt to write a URL (or IRI) parser
yourself?
At one point, my URL parser would return the following for these:
{
path =
{
"Research/A/B Testing/Results",
}
}
{
path =
{
"Research",
"A/B Testing",
"Results",
}
}
but I found working with such paths to be painful. First off, how to
distinguish between
Research/A%2FB%20Testing/Results
and
/Research/A%2FB%20Testing/Results
How would I specify that any URL with a path starting with "/foo" be
redirected to a path starting with "/bar"?
/foo/this -> /bar/this
/foobar -> /barbar
And how would I deal with this in the code?
Yes, you can say I ruined the purity of my URL parser with an ugly
pragmatic approach (keep the path a string, but decoded and ignore the
semantics of encoded delims), but there's also the saying, "Perfect is the
enemy of good."
-spc
[1] https://en.wikipedia.org/wiki/Perfect_is_the_enemy_of_good
More information about the Gemini
mailing list