Skip to content

Canonicalize (AKA Normalize)

Canonicalization, which is also called normalization, transforms a resource into a “standard” representation.

This process is dependent on the type of the resource, but typically includes things such as:

  • Formatting based on a fixed standard
  • Sorting keys which do not (should not) change semantics
  • Rewriting literal values
  • Fixed Encoding

This is useful e.g. when testing, to compare output to a fixed expected outcome.

It also has an application in cryptography, and is useful when “signing” things.

RDF

enola canonicalize for RDF order statements by predicate IRI, for example:

$ ./enola canonicalize --load=test/picasso.ttl
@prefix ex: <http://example.enola.dev/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix schema: <https://schema.org/> .

ex:Dalí a ex:Artist;
  foaf:firstName "Salvador", "Domingo", "Felipe", "Jacinto";
  schema:birthDate "1904-05-11"^^schema:Date .

ex:Picasso a ex:Artist;
  ex:homeAddress [
      ex:city "Barcelona";
      ex:street "31 Art Gallery"
    ];
  <http://www.w3.org/ns/locn#location> "Spain"@en;
  foaf:firstName "Pablo" .

Future versions may implement full RDF Dataset Canonicalization, see also this Working Group.

JSON

enola canonicalize for JSON transforms e.g. this canonicalize.json:

{
  "numbers": [333333333.33333329, 1E30, 4.50,
    2e-3, 0.000000000000000000000000001],
  "string": "\u20ac$\u000F\u000aA'\u0042\u0022\u005c\\\"\/",
  "literals": [null, true, false],

  "\u20ac": "Euro Sign",
  "\r": "Carriage Return",
  "\ufb33": "Hebrew Letter Dalet With Dagesh",
  "1": "One",
  "\ud83d\ude00": "Emoji: Grinning Face",
  "\u0080": "Control",
  "\u00f6": "Latin Small Letter O With Diaeresis"
}

into this, using an RFC 8785 JSON Canonicalization Scheme (JCS) -inspired (but currently not fully compliant) algorithm:

$ ./enola canonicalize --load=test/canonicalize.json
{"\r":"Carriage Return","1":"One","literals":[null,true,false],"numbers":[3.333333333333333E8,1.0E30,4.5,0.002,1.0E-27],"string":"€$\u000f\nA'B\"\\\\\"/","€":"Control","ö":"Latin Small Letter O With Diaeresis","€":"Euro Sign","😀":"Emoji: Grinning Face","דּ":"Hebrew Letter Dalet With Dagesh"}

or more nicely (--pretty) formatted:

$ ./enola canonicalize --pretty --load=test/canonicalize.json
{
  "\r": "Carriage Return",
  "1": "One",
  "literals": [
    null,
    true,
    false
  ],
  "numbers": [
    3.333333333333333E8,
    1.0E30,
    4.5,
    0.002,
    1.0E-27
  ],
  "string": "€$\u000f\nA'B\"\\\\\"/",
  "€": "Control",
  "ö": "Latin Small Letter O With Diaeresis",
  "€": "Euro Sign",
  "😀": "Emoji: Grinning Face",
  "דּ": "Hebrew Letter Dalet With Dagesh"
}

Note how the order of the keys in the JSON changes, among other changes.

JSON-LD

enola canonicalize for JSON-LD transforms this canonicalize.jsonld:

[
  {
    "@id": "http://example.enola.dev/Picasso",
    "https://schema.org/name": [
      {
        "@value": "Pablo Picasso"
      }
    ]
  },
  {
    "https://schema.org/name": [
      {
        "@value": "Salvador Domingo Felipe Jacinto Dalí"
      }
    ],
    "@id": "http://example.enola.dev/Dalí"
  }
]
$ ./enola canonicalize --pretty --load=test/canonicalize.jsonld --output=test/canonicalize.jsonld.expected

into this - note how the 🎨 painters’ order was swapped, because not just all map keys but the list itself was also ordered alphabetically by @id:

[
  {
    "@id": "http://example.enola.dev/Dalí",
    "https://schema.org/name": [
      {
        "@value": "Salvador Domingo Felipe Jacinto Dalí"
      }
    ]
  },
  {
    "@id": "http://example.enola.dev/Picasso",
    "https://schema.org/name": [
      {
        "@value": "Pablo Picasso"
      }
    ]
  }
]