Common Expression Language -- specification and binary representation

Last update: Jan 8, 2023

Comments: 16

Common Expression Language

The Common Expression Language (CEL) implements common semantics for expression evaluation, enabling different applications to more easily interoperate.

Key Applications

Security policy: organization have complex infrastructure and need common tooling to reason about the system as a whole
Protocols: expressions are a useful data type and require interoperability across programming languages and platforms.

Guiding philosophy:

Keep it small & fast.
- CEL evaluates in linear time, is mutation free, and not Turing-complete. This limitation is a feature of the language design, which allows the implementation to evaluate orders of magnitude faster than equivalently sandboxed JavaScript.
Make it extensible.
- CEL is designed to be embedded in applications, and allows for extensibility via its context which allows for functions and data to be provided by the software that embeds it.
Developer-friendly.
- The language is approachable to developers. The initial spec was based on the experience of developing Firebase Rules and usability testing many prior iterations.
- The library itself and accompanying toolings should be easy to adopt by teams that seek to integrate CEL into their platforms.

The required components of a system that supports CEL are:

The textual representation of an expression as written by a developer. It is of similar syntax to expressions in C/C++/Java/JavaScript
A binary representation of an expression. It is an abstract syntax tree (AST).
A compiler library that converts the textual representation to the binary representation. This can be done ahead of time (in the control plane) or just before evaluation (in the data plane).
A context containing one or more typed variables, often protobuf messages. Most use-cases will use attribute_context.proto
An evaluator library that takes the binary format in the context and produces a result, usually a Boolean.

Example of boolean conditions and object construction:

// Condition
account.balance >= transaction.withdrawal
    || (account.overdraftProtection
    && account.overdraftLimit >= transaction.withdrawal  - account.balance)

// Object construction
common.GeoPoint{ latitude: 10.0, longitude: -5.5 }

For more detail, see:

A dashboard that shows results of conformance tests on current CEL implementations can be found here.

Released under the Apache License.

Disclaimer: This is not an official Google product.

Owner

Google

Google ❤️ Open Source

https://github.com/google/cel-spec https://opensource.google.com/projects/cel

Comments

Heterogeneous equality

Since [1, "foo"] is a valid list and _==_ is defined on lists, [1, "foo"] == ["foo", 1] should evaluate to false. It would be least surprising if list equality was determined by elementwise equality. Equivalently, [x] == [y] should have the same meaning as x == y. Therefore, _==_ should have signature A x B --> bool.

Similarly, _!=_ should work on heterogeneous types.

Restricting equality to homogeneous types was meant to prevent user errors. After all, heterogeneously 1 == 1u evaluates surprisingly to false. However the type checker can work with a stricter A x A --> bool signature for equality, catching these errors.

We might want to make heterogeneous order operators (_<_ and friends) too. We'll want an ordering across all values for deterministic map comprehensions, so why not expose it to users? The surprising consequences (e.g. if int < uint, then 1u < 2) can again be mitigated by having the type checker work heterogeneously.
Clarify how self_eval_int_negative_min test will be passed

It seems like the literal value in the expression -9223372036854775808 is outside the range of positive values. It's not clear this int64 object can exist as a CEL Value before the negation function (-_) is applied to it.

This seems like this should be a range error creating the initial integer value.

If it's not an immediate range error, this seems to imply the range check is deferred until the final value is returned. If negation is allowed on out-of-range int64 values, how much other computation can occur on out-of-range values?
String.size() definition is ambiguous

Standard Definitions states that string.size() should be the string length - but is it the number of UTF8 bytes or the number of unicode code points?

It's a bit confusing because values states that String is Strings of UTF-8 code points but later is defined as sequences of Unicode code points.. Should strings be treated as UTF-8 bytes or Unicode code points? I'd like to hope that it's Unicode code points, but the golang implementation uses Go's len() function which usually is UTF-8 bytes (demo) but are really just arbitrary byte containers, check out this blog post and this stack overflow post for details about golang. C++ is the same WRT to string encodings, but I couldn't find the size() function in my quick looking.

/cc @TristonianJones @ryanpbrewster
Packing conformance test structures instead

The conformance tests send very deeply nested structures to test the depth 32 required by the spec. For some cases, this unfortunately causes Java gRPC to run in to parsing stack limits. A suggestion would be to pack values into an Any instead for conformance, to make life easier for gRPC implementations with such limits (C# is the same for example, as I understand it).
Add notation and/or constructors for durations in days, hours, or minutes

The current duration constructor takes only a decimal string with a number of seconds, e.g. duration("5184000s"). There is desire for more handy units, e.g. duration("60d"). (Assuming we ignore leap seconds.)

The underlying protobuf well-known type goes down to nanoseconds, so we could potentially have "ms", "µs", and "ns" too.

A week is unambiguous, but month, quarter, year, etc. seem error-prone.

We could go full ISO 8601 and allow durations like "2h30m45s", but it's easy enough to compose them with addition: duration("2h") + duration("30m") + duration("45s"), although it's clunkier.

We could alternatively have separate constructors, e.g. hours(7), maybe in an extension library.
Optional Values Support
CEL supports presence testing using the has macro which can be useful for building ternary statements to provide alternative values during map, and protobuf object literals; however, there's no way to indicate that the value should be unset at the end of the computation. For example, it is possible to construct a protobuf message literal where the field is set to the default value, but once the field is referenced it cannot be marked as conditionally "unset" based on the result of the field initialization expression.

Optional values should leverage CEL's opaque types to be able to specify that a value with a strong type "might" exist, and if it does, to perform some sort of computation on it. This is akin to the feature requested in #245 where keys in a map are optionally set.

The following would indicate that if the event.user value is present, it should be set to the value of "user" in the map literal:

{ ?"user": event.?user }

This is roughly equivalent to the following:

{ ?"user": has(event.user) ? optional.of(event.user) : optional.none() }

The CEL grammar and type-checker would need to be updated to support this feature, and it should probably be opt-in as it expands the expressive power of CEL in ways that users may not have anticipated.
configurations support for the CEL?

Hi experts,

CEL is great, we are moving some server functions to cel. We have some functions depending on the user environment configuration.
The function behavior and output depend on some user conf.

in a runtime application, we can have a configuration for the app, the applications can read this conf at start time.

and for the CEL, if there is any way to do the same thing? thanks.
Syntax: has with map keys
CEL treats map lookups and field dereferences similarly in many ways, except for has() macro. In some cases, we have to use bracket notation because the key has illegal ident characters. This prevents the use of has macro in such keys. For example:

has(request.headers['x-clientid'])

Because headers use dashes, we cannot use field notation.

The workaround is using in operator, but I wonder if permitting has(a[f]) would be better.
Fix map comparison and make fixed tests live.

This PR closes https://github.com/google/cel-go/issues/156 by fixing map comparisons in the conformance driver.

Additionally, this PR moves the broken tests fixed by this change and by https://github.com/google/cel-go/pull/218 from broken.textproto to basic.textproto
Conformance test driver

Driver for conformance tests.

CEL implementations will write a server for the ConformanceService. Here in the cel-spec repository we have the conformance tests themselves as data files plus a ConformanceService client which interprets the files and talks to the implementation-dependent server. An example will be given in cel-go for how to set this up as a bazel test for the implementation.
Suggestion: Become a subset of ECMAScript

I'm sorry, this is just feedback, since the Issues feature is disabled I created a dummy PR instead to put this.

I was looking for a cross-language expression language and CEL seems to be a great match.

I noticed that it borrows some JavaScript/ECMAScript-like syntax but not always. For example: Account{name: "Eamonn"}.

I was wondering if CEL's syntax can be a subset of ECMAScript, while remaining specific to its restricted use case of expression language. For example, Account{name: "Eamonn"} may become new Account({name: "Eamonn"}) which would be a valid ES syntax, while (I think) not making it harder for CEL parser.

A nice benefit is that any ECMAScript parser can be used to parse CEL, even if not explicitly supported. For example I'm looking for an EL that is implemented in Python, and either @Kronuz's https://github.com/Kronuz/esprima-python or @jeffkistler's https://github.com/jeffkistler/BigRig would work well here, even if cel-python is not yet developed.

BTW I submitted a Wikipedia article on CEL: https://en.wikipedia.org/wiki/Common_Expression_Language

Thank you!
Map to Map/List conversion
The spec says map/filter macro only support list type, how to apply map/filter macros to map, so we can transform a map to another map or list.

expected use case:

map to list

{ "k1": "v1", "k2": "v2" }.map({k, v}, {k: v}) ==> [ {"k1": "v1"}, {"k2": "v2"} ]

list to map

map([ ["k1", "v1"], ["k2": "v2"] ]) ==> { "k1": "v1", "k2": "v2" }
CEL spec is too protobuf centric?

We're using CEL in Kubernetes to integrate with OpenAPIv3 schema types. When providing developers with the CEL spec as reference, it's a bit difficult to explain how our "object with fields" type maps to CEL types, because the spec doesn't have a term for this type that is independent of protobuf. For now we say in our documentation that our "object with fields" type maps to "message", but this gets a bit confusing.
Convenience feature: make map values available to predicates in macros
I'm new to CEL but it looks like when you use one of the macros evaluating predicates (currently all, exists, and exists_one) on a map, only the current key is made available to the predicate. With the key, you can get the value, of course, but this may be tedious in programs such as

some.very.deeply.nested.message.some_map_field.exists(k, some.very.deeply.nested.message.some_map_field[k] == 42)

where you have to repeat a long path to the map field. Would it make sense to have, in addition to the two argument form of these macros (not counting the map receiver) to also have a three argument form to conveniently include the value as well? The above example would then become

some.very.deeply.nested.message.some_map_field.exists(k, v, v == 42)

which, for me at least, is easier to read.
Widen duration to match google.protobuf.Duration from protocol buffer messages

Currently CEL's duration is limited to a range of +/- 290 years while timestamps support 10000 years. This makes it awkward when calculating the difference between two timestamps. We should widen CEL's duration to match google.protobuf.Duration.
Add examples in the "standard definitions" table

Every time I find myself referencing the spec, language definition doc, I have a hard time figuring out how each of the standard definitions can actually be used, because they just talk about an operator/function, but without context.

https://github.com/google/cel-spec/blob/master/doc/langdef.md#list-of-standard-definitions

I'd like to see examples for as many of the entries in that table as possible, in the description column. For example, it took me a while to figure out that a string -> int cast is done by doing int("string"), when I might've guessed it might be (int) "string" like in some C-like languages.

For the symbols column I think code styling should be used (backticks in markdown) to use a monospaced font, because some of the entries like _[_] are hard to read on their own, it would look better as _[_].

A note that _ means "where the operands go" would help clarify as well, I didn't find that immediately obvious.

Related tags

DevOps Tools cel-spec

A set of tests to check compliance with the Prometheus Remote Write specification

Prometheus Remote Write Compliance Test This repo contains a set of tests to check compliance with the Prometheus Remote Write specification. The test

Dec 4, 2022

Open Source runtime scanner for Linux containers (LXD), It performs security audit checks based on CIS Linux containers Benchmark specification

lxd-probe Scan your Linux container runtime !! Lxd-Probe is an open source audit scanner who perform audit check on a linux container manager and outp

Dec 26, 2022

Common Expression Language -- specification and binary representation

Common Expression Language

Owner

Google

Comments

Heterogeneous equality

Clarify how self_eval_int_negative_min test will be passed

String.size() definition is ambiguous

Packing conformance test structures instead

Add notation and/or constructors for durations in days, hours, or minutes

Optional Values Support

configurations support for the CEL?

Syntax: has with map keys

Fix map comparison and make fixed tests live.

Conformance test driver

Suggestion: Become a subset of ECMAScript

Map to Map/List conversion

CEL spec is too protobuf centric?

Convenience feature: make map values available to predicates in macros

Widen duration to match google.protobuf.Duration from protocol buffer messages

Add examples in the "standard definitions" table

Related tags

A set of tests to check compliance with the Prometheus Remote Write specification

Open Source runtime scanner for Linux containers (LXD), It performs security audit checks based on CIS Linux containers Benchmark specification

Supporting your devops by shortening your strings using common abbreviations and clever guesswork

YAML and Golang implementations of common Kubernetes patterns.

Static analysis for CloudFormation templates to identify common misconfigurations

Common Image Registry for Testcontainers-Go

Copy files and artifacts via SSH using a binary, docker or Drone CI.

Go package that aids in binary analysis and exploitation

k6 extension supporting avro textual and binary representations

Simple binary reader and writer

Becca - A simple dynamic language for exploring language design

Nvidia GPU exporter for prometheus using nvidia-smi binary

Binary program to restart unhealthy Docker containers

Lightweight, single-binary Backup Repository client. Part of E2E Backup Architecture designed by RiotKit

Running Go binary into Docker

:paw_prints: Detect if a file is binary or text

A binary to control the Z-Cam line of cameras via API

Gopherscript is a secure and minimal scripting language written in Go.

Build powerful pipelines in any programming language.