What is required to be IRI compliant?

Petite Abeille petite.abeille at gmail.com
Mon Dec 28 12:22:27 GMT 2020



> On Dec 28, 2020, at 13:04, Petite Abeille <petite.abeille at gmail.com> wrote:
> 
> This is what people means by "normalization": everyone needs to agree how to encode "é" the same way, so everyone understand what "é" is.

See https://en.wikipedia.org/wiki/Unicode_equivalence#Normalization for an introduction.

This is not an abstract problem, as it can lead to unwelcome outcomes, see #Errors_due_to_normalization_differences . Or simply misunderstands.

Furthermore, if you really want to go all the way in, you need to validate the UTF8 byte sequences as well.

In the same way as there are different ways to represent the very same character in Unicode, there are various ways to encode that character in Unicode Transformation Format (UTF). Some of them malicious.

See https://en.wikipedia.org/wiki/UTF-8#Invalid_sequences_and_error_handling for an introduction.

In your case, considering your stack, you can choose to ignore such potential issues, or delegate them to some external libraries.

Your choice, ultimately.

In any case, well done to get it going with the minimal amount of work. This is the way. ✌︎







More information about the Gemini mailing list