A CUE-based framework for portable, evolvable, schema

Last update: Dec 24, 2022

Comments: 16

Scuemata

Scuemata is a system for writing schemas. Like JSON Schema or OpenAPI, it is general-purpose, and most obviously useful as an IDL.

Unlike JSON Schema, OpenAPI, or any other extant schema system, Scuemata's chief focus is on the evolution of schema. Rather than "one file/logical structure, one schema," Scuemata is "one file/logical structure, all the schema for a given kind of object, and logic for translating between them."

The effect of encapsulating schema definition, evolution, and translation into a single, portable, machine-verifiable logical structure is transformative. Taken together, these pieces allow systems that rely on schemas as the contracts for their communication to decouple and evolve independently - even across breaking changes to those schema.

Learn more in our (TODO) docs, or in this overview video!

Maturity

Scuemata is in early adolescence: it's mostly formed, but there are still some crucial undeveloped parts. Specifically, there are two planned changes that we are almost certain will cause breakages for users of scuemata:

Once these changes are finalized, however, we aim to treat the CUE and Go APIs as stable, scrupulously avoiding any breaking changes.

Owner

Grafana Labs

Grafana Labs is behind leading open source projects Grafana and Loki, and the creator of the first open & composable observability platform.

https://github.com/grafana/scuemata

Comments

Improve OpenAPI generator

Refactor codegen to no longer rely on eval and printing, taking advantage of the ability to pass a cue.Value to the CUE standard library package.

Also introduce support for a group flag, which allows translating only the children of the root schema to OpenAPI.

And introduce a long-overdue test txtar-based test framework.

cc @IfSentient
testing framework

Hello,

In my research about Scuemata, I have not seen a test framework being mentioned.

It would be great to be able to provide some data, and validate that schema migrations and lacunaes are generates accordingly. Instead of everyone implementing it, an opiniated framework built into scuemata would be useful.
Add `thema lineage gen` subcommand to CLI
Experience with using Thema within Grafana Labs has clearly shown that having a strong code generation story is the way to get to dopamine fastest with Thema.

Now that we have generics and in particular ConvergentLineage, it's feasible to fashion a simple, opinionated code generator as part of the CLI - thema lineage gen - that should be sufficient for anyone to:

Write a lineage

thema lineage gen ... to create Go bindings with a ConvergentLineage

Get a vmuxer with a single line of handwritten code, and put it to work immediately

Having this will also mean we can eliminate 80% of the existing tutorials, as all of those needs will be automated away.

Initial targets for cli codegen: openapi, json schema, go, typescript

https://github.com/grafana/thema/issues/71#tasklist-block-a98478df-2878-4873-ae12-3988f2427b33
Convert `thema.Library` to `*thema.Runtime`
thema.Library should be renamed to thema.Context. Its methods should also be converted to requiring a pointer, thereby requiring all existing function signatures to switch to *thema.Context.

I originally named it Library because i thought this was a nice nod to how we basically load a bunch of CUE "functions" in from the thema CUE package, then call them from the various Go methods within thema. And i wanted it to be a value, not a pointer, because i was hoping to make the zero value useful, similar to e.g. sync.Mutex.

However:

Library contains a cue.Context, and shuffling that thing around has ended up feeling like the main job of Library when actually using the Thema go package.

I'm not thrilled with calling it thema.Context because it sorta muddies the water by not having cancellation capabilities. But my real objection there is with how Go stdlib already muddied those waters by conflating variable bags together with cancellation. Asking people to learn a new term, Library, is a heavy lift for just resolving that ambiguity. Better to just fall in line with CUE itself and use thema.Context.

Implementing thema within Grafana has made it pretty clear that allowing an explicit nil is preferable for centralizing use of a single shared context. Adding nil to the accepted value space of a function signature allow the caller to make it clear that they are not specifying a value, and expecting the func impl to grab the central one instead. This could be accomplished without pointers by passing the zero value - thema.Library{} - but that feels like undocumented magic; expressing the absence of a value is, unfortunately, one of the main use cases for nil pointers in Go.

Thus: thema.Library -> *thema.Context.
thema: Introduce [de]hydrate API

First pass at a de/hydration API.

Initially, i'd been thinking that it would work well to turn hydration into a formatting option. And that still might be worth doing. But i think this is cleanest and simplest for now - just have Hydrate() and Dehydrate() methods on Instance that apply the transformation and return a new instance. (Much category, very morphism!)

(cc @ying-jeanne - really this just takes her code and adapts it to this new environment! 🎉 )

Fixes #43

edit: hydration is basically working, but i've futzed something up with dehydration. Going to circle back on it a little later.

Also, it may be preferable to append a suffix to the name of the input src/file (e.g. input.json becomes input-hydrated.json) when calling Hydrate() or Dehydrate(). That way future errors (if any) on that instance can be made clearer that they're happening against a modified object.
Plan out a path for converting a Thema lineage to a CRD

The Thema tutorials lay out how to map a Thema lineage to a LineageFactory, which forms a bridge from the literal lineage written in CUE to what's written in Go. The tutorials then continue on to create an InputKernel, which is one way of using Thema from Go programs.

However, that's not terribly helpful for the use case of Thema-as-Operator-framework - or at least, it doesn't seem that way right now. In that case, we want to go from a Lineage to a Go expression of a CRD, and the relevant necessary components on the Go side (controllers, Go types...?). i'm reasonably sure this can be done generically, which is why this issue is here, rather than in grafana/grafana.

This is a bit of a counterpart to grafana/grafana#44242, but that's just the tracker for where we're prototyping.

Would welcome input or just a "hey i'd like to help figure this out" from anyone so inclined :)
cli: Generate files adjacent to input, with sane cuefs

Numerous fixes to the go code generator for both generating output adjacent to the input lineage file, and generating a sane cueFS that'll actually WORK given cue.mod environs

(still finishing the latter part)

Fixes #81 Fixes #77
cli: Generate Go types and bindings in an adjacent file

Currently, the Go types and bindings generator in the CLI generate to stdout. This isn't terribly idiomatic - proto sets the general pattern here, with generating an adjacent file. It's probably generally more convenient for callers, anyway.

This is especially problematic for bindings, though, because the generating an embed.FS that will reliably work when passed to load.InstancesWithThema (or any such function we could write) requires being able to inspect the disk environment where the generated file will land. That's impossible on stdout.

So we'll have both of those generators default to generating an adjacent file (<name>_types_gen.go, <name>_bindings_gen.go), and if bindings are forced to stdout, just not allow an embed to be generated at all - it's up to the caller to make it sane.
Quickstart guide for using Thema with Go

This PR contains a quickstart guide for using Thema in Go. The focus for the quickstart is getting an understanding of how a developer would approach using Thema with Go by writing a schema in CUE, Generating Go types and bindings using the CLI and eventually testing the schema with test data within Go

Incorporates all changes from https://github.com/grafana/thema/pull/75

Fixes #74
codegen: Introduce codegen CLI cmd and Go/TS gen packages
This PR introduces a thema lineage gen subcommand, with further subcommands for generating output for:

OpenAPI (3.0)

JSON Schema (Draft 4)

Go types

Go bindings

TS types

Fixes #71
Make `thema` CLI dynamically support `github.com/grafana/thema` imports
Currently, running thema outside of either the thema repo itself, or outside of a cue.mod module context that does not have the github.com/grafana/thema codebase within its cue.mod/pkg dir, will result in

Error: import failed: cannot find package "github.com/grafana/thema"

This is clearly a big problem, as it means the thema CLI only works when the user has already set up their fs "correctly"...in a way that is difficult to even explain how to do correctly.
Clean up lineage/txtar testing framework

The test framework itself is also in a messy, halfway state. In retrospect, i realize what i was really trying to do was graft a subtest layer on top of the txtar-specific test runner. The framework itself already creates a subtest per txtar, but i was effectively trying to make each provided subtest executed from there be able to construct its own subtests in a way that's basically similar to stdlib *testing.T.Run.

Given that the central object in the framework is the lineage, this is crucial to allow for more standard helpers that, for example, allow a particular test case to be executed across all schemas in a lineage, and keeping each subtest's output therein tidily namespaced.

Originally posted by @sdboyer in https://github.com/grafana/thema/issues/90#issuecomment-1356770807

This messiness needs to be resolved, in the direction of actually making what's in the lineage test harness shaped like subtesting.

I can imagine that we may want to re-copy the original upstream CueTxTar testing framework again to create another, less one-lineage-centric version of this framework. But for now, we already have five packages within encoding, and i think this approach is most productive for testing at least all of those, and probably also the CLI.
Remove CUE replace statement once openapi changes are merged

I had to fork CUE for some fixes - there were a couple key cases that weren't covered (conjuncts in numeric value types, selectors on object types). This substantially impacted Go codegen elsewhere, being the difference between obviously incorrect and probably correct.

Originally posted by @sdboyer in https://github.com/grafana/thema/issues/90#issuecomment-1356770807
Introduce additional name fields
Currently, #Lineage has just one name field - name. This was fine as a starting point, but as we explore more code generation within Grafana, it's clearly becoming insufficient. Without this information, generators under thema/encoding can't really have sane defaults, because we have to make bad choices about what names to use.

#66 talks about adding some constraints on that field, which is a first step - it clarifies the intended usage of name. As soon as we start clarifying usage expectations, though, it quickly becomes clear that a single name field is unlikely to be sufficient. Rather, it seems likely that we need four:

The name representation we have today, which should probably become enforced as all-lowercase.

A PascalCase name representation. To be able to generate code in Go, this is necessary, and can't be inferred from a lower-case representation. (AlertingRule can become alertingrule, but not vice-versa). Snake case can be inferred from this.

Plural representations for each of the above.

This will also make conversion to CRDs straightforward.

We've already done this in Grafana. I suspect what we do here can look similar.

None of this should preclude the eventual addition of a uri or similar field.
Use "major version" and "minor version" for syntactic versions

This PR makes to use major and minor version instead of Sequence and Schema versions naming where appropriate. Variables and fields are renamed from seqv and schv to majv and minv

Resolves https://github.com/grafana/thema/issues/54
Add Protocol Buffers as a schema input and output target

Thema has an increasingly robust set of lang/schema expression inputs and outputs at this point - OpenAPI, JSON Schema, Go, TypeScript (though the latter two are only outputs).

It's about time we add Protocol Buffers, especially given that it should be pretty easy given CUE's stdlib support.

...or at least, that'll help us with proto->CUE - seems there isn't a CUE->proto generator in the other direction yet. That's not optimal, but we can do the same as we did with Go for a while - given a .proto, do a runtime check that verifies it aligns with a particular Thema schema.

Time Series Alerting Framework

Bosun Bosun is a time series alerting framework developed by Stack Exchange. Scollector is a metric collection agent. Learn more at bosun.org. Buildin

Dec 27, 2022

Open source framework for processing, monitoring, and alerting on time series data

Kapacitor Open source framework for processing, monitoring, and alerting on time series data Installation Kapacitor has two binaries: kapacitor – a CL

Dec 26, 2022

A simple logging framework for Go program.

ASLP A Go language based log library, simple, convenient and concise. Three modes, standard output, file mode and common mode. Convenient, simple and

Nov 14, 2022

Benchmore - A package allows you to report On-CPU Time in addition to the wall time measured by Go's builtin benchmarking framework

benchmore This package allows you to report On-CPU Time in addition to the wall

Feb 9, 2022

Time based rotating file writer

cronowriter This is a simple file writer that it writes message to the specified format path. The file path is constructed based on current date and t

Dec 29, 2022

CoLog is a prefix-based leveled execution log for Go

What's CoLog? CoLog is a prefix-based leveled execution log for Go. It's heavily inspired by Logrus and aims to offer similar features by parsing the

Dec 14, 2022

rtop is an interactive, remote system monitoring tool based on SSH

rtop rtop is a remote system monitor. It connects over SSH to a remote system and displays vital system metrics (CPU, disk, memory, network). No speci

Dec 30, 2022

Open Source Supreme Monitor Based on GoLang

Open Source Supreme Monitor Based on GoLang A module built for personal use but ended up being worthy to have it open sourced.

Nov 4, 2022

Interfaces for LZ77-based data compression

Pack Interfaces for LZ77-based data compression. Introduction Many compression libraries have two main parts: Something that looks for repeated sequen

Oct 19, 2021

Multi-level logger based on go std log

mlog the mlog is multi-level logger based on go std log. It is: Simple Easy to use NOTHING ELSE package main import ( log "github.com/ccpaging/lo

May 18, 2022

Gomon - Go language based system monitor

Nov 18, 2022

Continuous profiling of golang program based on pprof

基于 pprof 的 Golang 程序连续分析 Demo 点击 point Quick Start 需要被收集分析的golang程序,需要提供net/http/pprof端点，并配置在collector.yaml配置文件中 #run server :8080 go run ser

Jan 9, 2023

📝 🪵 A minimal level based logging library for Go

slogx A minimal level based logging library for Go. Installation Example Usage Logger Log Level Format Output Contribute License Installation go get g

May 23, 2022

Based uber/prototool

Prototool Update: We recommend checking out Buf, which is under active development. There are a ton of docs for getting started, including for migrati

Dec 30, 2021

Peimports - based on golang's debug/pe this package gives quick access to the ordered imports of pe files with ordinal support

This code is almost entirely derived from the Go standard library's debug/pe package. It didn't provide access to ordinal based entries in the IAT and

Jan 5, 2022

A CUE-based framework for portable, evolvable, schema

Scuemata

Maturity

Owner

Grafana Labs

Comments

Improve OpenAPI generator

testing framework

Add `thema lineage gen` subcommand to CLI

Convert `thema.Library` to `*thema.Runtime`

thema: Introduce [de]hydrate API

Plan out a path for converting a Thema lineage to a CRD

cli: Generate files adjacent to input, with sane cuefs

cli: Generate Go types and bindings in an adjacent file

Quickstart guide for using Thema with Go

codegen: Introduce codegen CLI cmd and Go/TS gen packages

Make `thema` CLI dynamically support `github.com/grafana/thema` imports

Clean up lineage/txtar testing framework

Remove CUE replace statement once openapi changes are merged

Introduce additional name fields

Use "major version" and "minor version" for syntactic versions

Add Protocol Buffers as a schema input and output target

Related tags