We went to great lengths in the zng architecture to create a human-readable text form (tzng) suitable for hand editing, tests, and demo. This is especially important because understanding the binary zng format takes developer-level commitment to get into it and understand the details.
But for broader audiences (and hence broader acceptance), tzng doesn't have the greatest ergonomics. The syntax is compact and cryptic and the type structure is separate from the DFS-ordered list of values so it's not the best format to illustrate the zng data model. And zjson suffers from this too.
Also, when we first developed tzng, we were concerned about parsing performance. We've now come to realize that tzng is never used in a performance-sensitive context so here we are now focused on ergonomics over perform-ity.
The intent is to adopt a "pretty" version of zng that would have broader appeal and be easier to understand. I will leave it open as to whether pretty-zng replaces or augments text-zng (though I am now leaning toward replacement).
We want it to have some overlap in the look and feel of JSON without its shortcomings. And, of course, needs to represent the full gamut of zng binary capabilities.
Proposal
Here is a proposal open for discussion.
Whitespace
All whitespace is equivalent so arbitrary pretty printing is possible. There is no notion of newline-terminated boundaries so values must be parsed to find their end.
Typedefs
Types are similar to tzng but can have arbitrary names. Types are optional as values can also be fully specified with embedded or implied types. A typedef has the form:
type <name>=<type>
where name is an identifier or an integer and <type>
is a zng/zql style type definition. (Type strings are not yet implemented in zql but they will be.). Type names can be redefined in a stream so that files can be concatenating and still be correct.
An alias has a similar form:
alias <name>=<type>
The difference between an alias and a type is that aliases are encoded into the corresponding zng input/output as zng aliases, where type names are just a convenience tag in pretty-zng to refer to a native zng type, which are localized to the pretty-zng file.
Implied-type Values
There are 11 "Implied-type" values:
- string a double-quoted string literal
- int64 as in an integer text string, e.g., 123, -1
- float64 as in an float text string, e.g., 123.0 or -1e6
- time unquoted string that parses as an ISO time
- bool unquoted identifier
true
or false
- ip unquoted numeric IP address (i.e., format parseable by go library)
- net unquoted numeric IP subnet address in cidr form (i.e., format parseable by go library)
- array as in a javascript style array where the type is implied by the elements or if the types are mixed then an array of unions
- record as in javascript-style object ''{ ... }' notation where the record type of the object is implied by the key names and types of the alements
- map syntax TBD
- null the unquoted identifier
null
Note: I believe all of the above values are unambiguously parseable like the zql parser understanding that 10.0.0.1
is an IP but "10.0.0.1"
is a string.
So here are some example implied-type values with their implied types:
"hello, world" -> string
123 -> int64
192.168.1.1 -> ip
[ 10, 11, 12 ] -> array[int64]
[ 10, "hello", 1.0, 2 ] -> array[union[int64,string,float64]]
[ "hello", "world" ] -> array[string]
{ foo:10, bar:"hello, word" } -> record[foo:int64,bar:string]
Explicit-type Values
To be able to represent all of the types of zng, values can be explicitly typed using the zql type case syntax and a prefix syntax for records and arrays.
- Primitive values have the form
<value>:<type>
where values it the tzng value and type is the type, e.g., uint8, bytes, etc.
- Enum has the form
<value>:<type>
- Union has the form
<value>:<type><selector>
where type is a value and selector the integer position of the type in the union
- Array has the form
[ elements... ]:<type>
- Record has the form
{ elements... }:<type>
Examples
Ex 1
#0:record[s:string]
0:[hello,world;]
is the same as
{s:"hello,world"}
Ex 2
#0:record[ts:time,addr:ip,msg:string]
0:[1;10.0.0.1;hello world;]
0:[2;192.168.1.1;here is a message;]
0:[3;10.0.0.2; here is another message;]
is the same as
{ ts:1970-01-01T00:00:00.000000001Z, addr:10.0.0.1, msg:"hello, world" }
{ ts:1970-01-01T00:00:00.000000002Z, addr:192.168.1.1, msg:"here is a message" }
{ ts:1970-01-01T00:00:00.000000003Z, addr:10.0.0.2, msg:"here is another message" }
is the same as
type log = record[ts:time,addr:ip,msg:string]
{ 1970-01-01T00:00:00.000000001Z, 10.0.0.1, "hello, world" }:log
{ 1970-01-01T00:00:00.000000002Z, 192.168.1.1, "here is a message" }:log
{ 1970-01-01T00:00:00.000000003Z, 10.0.0.2, "here is another message" }:log
is the same as
type log = record[ts:time,addr:ip,msg:string]
{ 1970-01-01T00:00:00.000000001Z, 10.0.0.1, "hello, world" }:log
{ ts:1970-01-01T00:00:00.000000002Z, addr:192.168.1.1, msg:"here is a message" }
{ 1970-01-01T00:00:00.000000003Z, 10.0.0.2, "here is another message" }:log
Ex 3
#port=uint16
#0:record[id:record[orig_h:ip,orig_p:port]]
0:[[10.0.0.1;8080]]
is the same as
alias port=uint16
{id: {orig_h:10.0.0.1, orig_p:8080:port}}
is the same as
alias port=uint16
type zeekID = record[orig_h:ip,orig_p:port]
{ id:{10.0.0.1,8080}:zeekID }
Ex 4
#vector=array[uint16]
#0:record[x:int32,point:vector]]
0:[1;[2;3;4;]]
0:[2;[23;18;]]
0:[3;[99;1024;0;1;-6;]]
is the same as
alias vector=array[uint16]
type pointVector=record[x:int32,point:vector]
{ 1, [ 2, 3, 4 ] }:pointVector
{ 2, [ 23, 18 ] }:pointVector
{ 3, [ 99, 1024, 0, 1, -6 ] }:pointVector