Structured Data Templates

Last update: May 12, 2022

Comments: 4

Structured Data Templates

Structured data templates are a templating engine that takes a simplified set of input parameters and transforms them into a complex structured data output. Both the inputs and outputs can be validated against a schema.

The goals of this project are to:

Provide a simple format: it's just JSON/YAML!
Give enough tools to be useful:
- Interpolation ${my_value} & ${num / 2 >= 5}
- Branching (if/then/else)
- Looping (for/each)
Guarantee structural correctness
- The structured data template is valid JSON / YAML
- The input parameters are valid JSON / YAML
- The output of the template is guaranteed to produce valid JSON
Provide tools for semantic correctness via schemas
- The input types and values pass the schema
- The template will produce output that should pass the schema
- The output of the template after rendering passes the schema

Structure

A structured data template document is made of two parts: schemas and a template. The schemas define the allowable input/output structure while the template defines the actual rendered output. An example document might look like:

schemas:
  # Dialect selects the default JSON Schema version
  dialect: openapi-3.1
  input:
    # Input schema goes here
    type: object
    properties:
      name:
        type: string
        default: world
  output:
    # Output schema goes here, also supports refs:
    $ref: https://api.example.com/openapi.json#components/schemas/Greeting
template:
  # Templated output structure goes here
  greeting: Hello, ${name}!

Example

You can run the example like so:

$ go run ./cmd/sdt ./samples/greeting.yaml <./samples/params.yaml
{
  "greeting": "Hello, SDT!"
}

Input params for rendering can be passed via stdin as JSON/YAML and/or via command line arguments as CLI shorthand syntax.

Schemas

JSON Schema is used for all schemas. It defaults to JSON Schema 2020-12 but can be overridden via the $schema key or using dialect in the structured data template document like above. Available dialects:

openapi-3.0
openapi-3.1
https://json-schema.org/draft/2020-12/schema
https://json-schema.org/draft/2019-09/schema
https://json-schema.org/draft-07/schema
https://json-schema.org/draft-06/schema
https://json-schema.org/draft-04/schema

The input schema describes the input parameters and the template will not render unless the passed parameters validate using the input schema. It also lets you set defaults for the input parameters, which default to nil if not passed.

The output schema describes the template's output structure. The validator is capable of understanding branches & loops to ensure that the output is semantically valid regardless of which path is taken during rendering.

Template Language Specification

A template is just JSON/YAML. For example:

hello: world

That is a valid static template. Nothing will change when rendered, which is not very useful. Normally, when a template is rendered, it is passed parameters, and these are used for interpolation, branching, and looping. These features all make use of a basic expression language.

Expressions

String interpolation, branching conditions, and loop variable selection all use an expression language. This allows you to make simple comparisons of the parameter context data. Examples:

foo > 50
len(item.bars) <= 5 || my_override
name contains "sdt"
name startsWith "sdt"
"foo" in ["foo", "bar"]
loop.index + 1

See antonmedv/expr language definition for details.

String Interpolation

String interpolation is the act of replacing the contents of ${...} within strings, where ... corresponds to an expression that makes use of input parameters. For example:

hello: ${name}

If passed {"name": "Alice"} as parameters this would render:

{
  "hello": "Alice"
}

Whenever the string is just one ${...} statement it will use whatever type it evaluates to in the result, so you are not limited to just strings. If the expression result is nil, then the property/item is not included in the rendered output.

It's also possible to add static text or multiple interpolation expressions in a single value:

hello: Greetings, ${name}!

Given the same input that would result in:

{
  "hello": "Greetings, Alice!"
}

Tricks

Force a string output by using more than one expression: ${my_number}${""}

Branching

Branching allows one of multiple paths to be followed in the template at rendering time based on the result of an expression. The special properties $if, $then, and $else are used for this. For example:

foo:
  $if: ${value > 5}
  $then: I am big
  $else: I am small

If rendered with {"value": 1} the result will be:

{
  "foo": "I am small"
}

Notice that the special properties are completely removed and replaced with the contents of either the $then or $else clauses. So while in the template foo is an object, the end result is that foo is a string and would pass the output schema.

If the expression is false and no $then is given, then the property is removed from the result.

Looping

Looping allows an array of inputs to be expanded into the rendered output using a per-item template. The $for, $as, and $each special properties are used for this. For example:

squares:
  $for: ${numbers}
  $each: ${item * item}

If rendered with {"numbers": [1, 2, 3]} the result will be:

{
  "squares": [2, 4, 9]
}

The $as property controls the name of the variable holding the current item, which defaults to item. A local variable loop is also set, which includes an index, and whether the item is the first or last in the array. If using $as then the loop variable is named loop_ + the $as value. This allows nested loops to access both their own and outer scope's loop variables. For example:

things:
  $for: ${things}
  $as: thing
  $each:
    id: ${loop_thing.index}-${thing.name}
    tags:
      $for: ${tags}
      $as: tag
      $each: ${loop_thing.index}-${loop_tag.index}-${tag}

Given:

{
  "things": [{ "name": "Alice" }, { "name": "Bob" }],
  "tags": ["big", "small"]
}

You would get as output:

{
  "things": [
    {
      "id": "0-Alice",
      "tags": ["0-0-big", "0-1-small"]
    },
    {
      "id": "1-Bob",
      "tags": ["1-0-big", "1-1-small"]
    }
  ]
}

Multiple Outputs

If the result of the $each template is an array, then each item of that array is individually appended to the overall result. This allows one input to generate multiple output entries in the final array.

things:
  $for: ${things}
  $each:
    - name: ${item.name} 1
    - name: ${itme.name} 2

With the same input as above you'd get:

{
  "things": [
    {
      "name": "Alice 1"
    },
    {
      "name": "Alice 2"
    },
    {
      "name": "Bob 1"
    },
    {
      "name": "Bob 2"
    }
  ]
}

If you need to create arrays of arrays, wrap it in another array to get around this behavior.

Open Questions

Should we support macros? Could be done with $ref in the template, and we could add a top-level macros or definitions for document-local refs. They would be drop-in only, no calling with arguments, but would render based on the current params context.
Should nil results from interpolation be rendered in the final output? Example: name: ${name} and what if name is nil?
Support for constants? Values that should always be present in the params that can contain complex and reusable data for the template?

Comments

feat: measure and warn if template complexity is high

This uses each branch, loop, and interpolation to calculate a complexity value for a template, and then writes a warning to os.Stderr when that value is high (>50). This should help act as a warning to the user that maybe the template needs to be split into two separate templates (i.e. too many use-cases are being handled).
feat: add explicit $flatten operation
Rather than having magic behavior for $for loops returning arrays, this removes that "feature" in favor of an explicit $flatten operation that enables things like:

Pre- and appending default values to arrays

Zipper-merging $for loops that output arrays of items

This simplifies validation and rendering logic a bit. Docs are also updated.
feat: validate expr result type; better error filenames

This enables the validator to make sure the result of an expression is the right type based on the output schema. Before this was only possible at rendering time, not validation time.

It also fixes the filenames in errors and allows the original filename to include a JSON path #/foo/bar which is appended to. This is useful for documents loaded from within other documents like we do with the test fixtures.
build failed

$ go get -u github.com/danielgtaylor/sdt

github.com/danielgtaylor/sdt

../../go/pkg/mod/github.com/danielgtaylor/[email protected]/validate.go:137:123: invalid operation: match[0] + err.Offset() (mismatched types int and uint16) ../../go/pkg/mod/github.com/danielgtaylor/[email protected]/validate.go:150:132: cannot use err.Offset() + 2 (type uint16) as type int in argument to ctx.AddErrorOffset ../../go/pkg/mod/github.com/danielgtaylor/[email protected]/validate.go:179:133: cannot use err.Offset() + 2 (type uint16) as type int in argument to ctx.WithPath("$if").AddErrorOffset ../../go/pkg/mod/github.com/danielgtaylor/[email protected]/validate.go:206:135: cannot use err.Offset() + 2 (type uint16) as type int in argument to ctx.WithPath("$for").AddErrorOffset

Structured Data Templates

Structured Data Templates

Structure

Example

Schemas

Template Language Specification

Expressions

String Interpolation

Tricks

Branching

Looping

Multiple Outputs

Open Questions

Owner

Daniel G. Taylor

Similar Resources

Comments

feat: measure and warn if template complexity is high

feat: add explicit $flatten operation

feat: validate expr result type; better error filenames

build failed

github.com/danielgtaylor/sdt

Related tags

Graphoscope: a solution to access multiple independent data sources from a common UI and show data relations as a graph

A tree like tool help you to explore data structures in your redis server

Bitset data structure

Probabilistic set data structure

Probabilistic data structures for processing continuous, unbounded streams.

Data structure and algorithm library for go, designed to provide functions similar to C++ STL

Gota: DataFrames and data wrangling in Go (Golang)

A simple Set data structure implementation in Go (Golang) using LinkedHashMap.

Data structure and relevant algorithms for extremely fast prefix/fuzzy string searching.