The latest version of the specification is version v2.

Item

  1. Conventional attributes
  2. Canonicalisation
  3. Hash

An item is an unordered set of attribute-value pairs (associative array) constrained by the schema.

An item is identified by the hash calculated from its contents.

type Item =
  Dict Name String

type Items =
  Dict Hash Item

For example, given a schema defining attributes name, x and y with datatypes String, Integer and Integer respectively, we can define an item as follows:

Item
  [ ("name", "Foo")
  , ("x", "0")
  , ("y", "1")
  ]

And can be represented as a Bar entity as:

Bar
  { name = Just "Foo"
  , x = Just 0
  , y = Just 1
  ]

Note that all attributes expect an optional value as explained in the evolution section.

The item can be serialised in JSON as:

{
  "name": "Foo",
  "x": "0",
  "y": "1"
}

Or in CSV as:

name, x, y
Foo, 0, 1

In the example above, the JSON serialisation uses the string representation of each value and the schema is needed to cast them back to the right datatype. Check the Serialisation section and the Schema for more details on this topic.

Conventional attributes

This section is non-normative.

It is convention for most registers to provide a few common attributes with particular meaning. These are:

  • start-date: (Datetime) The date the element started to exist in the world. This is not the same as the Entry timestamp.
  • end-date: (Datetime) The date the element stopped to exist in the world.
  • name: (String) The common name for the element.

For example, a register could identify an element with DD (ISO 3166-2 for "Germany Democratic Republic") with the data:

Item
  [ ("start-date", "1949")
  , ("end-date", "1990-10-02")
  , ("official-name", "Germany Democratic Republic")
  , ("name", "East Germany")
  ]

But being added to the register on 2016:

Entry
  { number : 3
  , key: ID "DD"
  , timestamp : Timestamp (2016, 4, 5, 13, 23, 5, Utc)
  , item : [Hash::Sha256 "e1357671d0da24668952373d0cdf9f7659a1b155e45c8fb3c2f24331e46edc26"]
  }

Canonicalisation

The canonicalisation algorithm is as follows:

  • The data blob MUST be a valid JSON object according to RFC8259.
  • All insignificant whitespace according to RFC8259 MUST be removed.
  • The JSON object keys must be valid attribute names. On top of being valid JSON keys they MUST be restricted to to the alphabet of lower case letters and hyphens ([a-z][a-z-0-9]*).
  • The JSON object values MUST be sorted into lexicographical order.
  • Unicode sequences \uXXXX MUST be in upper-case.
  • The forward slash or solidus (/) MUST be unescaped.
  • Non-control characters (i.e. out of the range \u0000..\u001F) MUST be unescaped.

For example, take an item with two attributes foo and bar with values abc and xyz respectively. This can be expressed as JSON:

{
  "foo": "abc",
  "bar": "xyz"
}

This can then be canonicalised

{"bar":"xyz","foo":"abc"}

Then hashed with SHA-256

$ echo -n '{"bar":"xyz","foo":"abc"}' | shasum -a 256
5dd4fe3b0de91882dae86b223ca531b5c8f2335d9ee3fd0ab18dfdc2871d0c61

And finally prepended with the hashing algorithm:

sha-256:5dd4fe3b0de91882dae86b223ca531b5c8f2335d9ee3fd0ab18dfdc2871d0c61

Hash

The identity of an item computed from its content. As the item hash is part of an entry, it is included in the input to the entry hash function.

The function takes an item and a hashing algorithm and returns a Hash datatype.

itemHash : Entry -> Alg -> Hash

The sha-256 hash is computed by serialising the item to a canonical form of JSON, and computing the SHA-256 hash, defined in the Secure Hash Standard, of the resulting serial form.

© Crown copyright released under the Open Government Licence.