Canonicalize (AKA Normalize)¶
Canonicalization, which is also called normalization, transforms a resource into a “standard” representation.
This process is dependent on the type of the resource, but typically includes things such as:
- Formatting based on a fixed standard
- Sorting keys which do not (should not) change semantics
- Rewriting literal values
- Fixed Encoding
This is useful e.g. when testing, to compare output to a fixed expected outcome.
It also has an application in cryptography, and is useful when “signing” things.
JSON¶
enola canonicalize
for JSON transforms e.g. this canonicalize.json
:
{
"numbers": [333333333.33333329, 1E30, 4.50,
2e-3, 0.000000000000000000000000001],
"string": "\u20ac$\u000F\u000aA'\u0042\u0022\u005c\\\"\/",
"literals": [null, true, false],
"\u20ac": "Euro Sign",
"\r": "Carriage Return",
"\ufb33": "Hebrew Letter Dalet With Dagesh",
"1": "One",
"\ud83d\ude00": "Emoji: Grinning Face",
"\u0080": "Control",
"\u00f6": "Latin Small Letter O With Diaeresis"
}
into this, using an RFC 8785 JSON Canonicalization Scheme (JCS) -inspired (but currently not fully compliant) algorithm:
$ ./enola canonicalize --load=test/canonicalize.json
{"\r":"Carriage Return","1":"One","literals":[null,true,false],"numbers":[3.333333333333333E8,1.0E30,4.5,0.002,1.0E-27],"string":"€$\u000f\nA'B\"\\\\\"/","":"Control","ö":"Latin Small Letter O With Diaeresis","€":"Euro Sign","😀":"Emoji: Grinning Face","דּ":"Hebrew Letter Dalet With Dagesh"}
or more nicely (--pretty
) formatted:
$ ./enola canonicalize --pretty --load=test/canonicalize.json
{
"\r": "Carriage Return",
"1": "One",
"literals": [
null,
true,
false
],
"numbers": [
3.333333333333333E8,
1.0E30,
4.5,
0.002,
1.0E-27
],
"string": "€$\u000f\nA'B\"\\\\\"/",
"": "Control",
"ö": "Latin Small Letter O With Diaeresis",
"€": "Euro Sign",
"😀": "Emoji: Grinning Face",
"דּ": "Hebrew Letter Dalet With Dagesh"
}
Note how the order of the keys in the JSON changes, among other changes.
JSON-LD¶
enola canonicalize
for JSON-LD transforms this canonicalize.jsonld
:
[
{
"@id": "http://example.enola.dev/Picasso",
"https://schema.org/name": [
{
"@value": "Pablo Picasso"
}
]
},
{
"https://schema.org/name": [
{
"@value": "Salvador Domingo Felipe Jacinto Dalí"
}
],
"@id": "http://example.enola.dev/Dalí"
}
]
$ ./enola canonicalize --pretty --load=test/canonicalize.jsonld --output=test/canonicalize.jsonld.expected
into this - note how the 🎨 painters’ order was swapped, because not just all map keys but the list itself was also ordered alphabetically by @id
:
[
{
"@id": "http://example.enola.dev/Dalí",
"https://schema.org/name": [
{
"@value": "Salvador Domingo Felipe Jacinto Dalí"
}
]
},
{
"@id": "http://example.enola.dev/Picasso",
"https://schema.org/name": [
{
"@value": "Pablo Picasso"
}
]
}
]
Future versions may implement full RDF Dataset Canonicalization.