Search and analysis tooling for structured logs

Last update: Jan 5, 2023

Comments: 16

Zed

The Zed system provides an open-source, cloud-native, and searchable data lake for semi-structured and structured data.

Zed lakes utilize a superset of the relational and JSON document data models yet require no up-front schema definitions to insert data. They also provide transactional views and time travel by leveraging a git-like design pattern based on a commit journal. Using this mechanism, a lake's (optional) search indexes are transactionally consistent with its data.

At Zed's foundation lies a new family of self-describing data formats based on the Zed data model, which unifies the highly structured approach of dataframes and relational tables with the loosely structured document model of JSON.

While the Zed system is built around its family of data formats, it is also interoperable with popular data formats like CSV, (ND)JSON, and Parquet.

This repository contains tools and components used to organize, search, analyze, and store Zed data, including:

The zed command line tool for managing, searching, and querying a Zed lake
The Zed language documentation
The Zed formats specifications and documentation

The previously released zq tool is now packaged as a command-line shortcut for the zed query command.

Installation

To install zed or any other tool from this repo, you can either clone the repo and compile from source, or use a pre-compiled release, available for Windows, macOS, and Linux.

If you don't have Go installed, download and install it from the Go downloads page. Go version 1.16 or later is required.

To install the binaries in $GOPATH/bin, clone this repo and execute make install:

git clone https://github.com/brimdata/zed
cd zed
make install

Contributing

See the contributing guide on how you can help improve Zed!

Join the Community

Join our Public Slack workspace for announcements, Q&A, and to trade tips!

Owner

Brim

https://github.com/brimsec/zq

Comments

Put query is failing on large json zng file

@nwt: This is related to #3881. Large JSON files shown in this issue are present in #3881.

I am using zq from main branch after #3881 is present. Unfortunately, the -version is showing unknown.

Update.zed: over arr | switch id ( case 2 => put level0.level1.value:="2-changed" default => pass ) | merge id | arr:=collect(this)

$ ~/go/bin/zq -I update.zed 500mb.zng | zq -Z 'over arr | id==2' - stdio:stdin: format detection error zeek: line 1: bad types/fields definition in zeek header zjson: line 1: invalid character 'O' looking for beginning of value zson: zson syntax error zng: zngio: uncompressed length exceeds MaxSize zng21: zng type ID out of range csv: record on line 1023: wrong number of fields json: invalid character 'O' looking for beginning of value parquet: auto-detection not supported zst: auto-detection not supported

The same issue is observed for 100MB file too. However the above query works for 10MB zng file. 10mb.zng.zip

Interestingly, cut operation like below works on 1GB zng file too. $ ~/go/bin/zq -z -i zng 'over arr | where id==101 | cut level0.level1' 1gb.zng

Are there any commandline parameters I can tweak to make this query work on large files?
refactor package expr

My profuse apologies up front for this massive PR. I really thought long and hard about breaking this into smaller chunks and made some failed attempts, but the interdependencies between the refactoring changes were just too much so I went for the whole shebang.

THIS COMMIT BREAKS BACKWARD COMPAT WITH THE ZNG FILE FORMAT.

Now that we're addressing changes to the ZNG spec to bring it to beta status, I thought this PR would be a good time to introduce breaking changes, shortly after our previous release on Friday. I think we have a handle on the breaking changes we want to make and I think I Ican get them all in by the next release, so after this, hopefully we won't have anymore backward-compat changes (though we will have a few more additions that don't break compat as we go from beta to stable of the ZNG format).

===

This commit refactors the expr package to be more extensible and performant. The zngnative package has been eliminated. It had good intentions but every expression resulted in the allocation of an empty interface to hold a go native value resulting in lots of traffic to the GC. The implementation keeps scratch buffers around and reuses them and aside from strings and IPs, does not allocate any memory to evaluate expressions. There is a "keep" flag passed to expr.CompileExpr() that says whether you want values that get overwritten on subsequent calls to Eval() (e.g., ok for filtering and aggregations) or whether you want to hold onto the values (like groupby and sort do with field keys).

We also changed the field implementation and cleaned up the AST. There was overlap between the old FieldCall stuff and the generic expression code, which is now eliminated. This also makes the grammar a bit easier to reason about as we improve and extend it. There is now one inteface type, expr.Evaluator, to take over for all the function closures we previously had. This makes it easier to cookie-cutter and add new expression functionality. Also, the field expressions are now just an instance of expr.Evaluator so everything is now clean and uniform.

While we were changnig the coercion logic around and getting rid of zngnative, we laid out the ZNG type codes a bit differently and upudated the spec. Since we were making incompatible changes, we changed the "byte" type to "uint8" and added "int8" so all the integer types are now orthogonal. Also, the integers are all encoded using the same varint logic now, which greatly simplifies value encoding and decoding in the coercion engine.

Most of the tests that were updated simply accommdate the change to the zng binary format.
Get "chunk span does intersect provided span" on query of zar archive

This occurred with commit 78cc64ccecc2dac44dd69a51235cbbdec8b86151 On branch master, tag was v0.24.0-19-g78cc64cc

I created a 14 GB zar archive with the command:

[ec2-user@ip-172-31-19-39 ~]$ time zar import -bufsize 1000MB -s 100MB -R s3://zqd-demo-1/mark/zeek-logs/conn-try3 s3://brim-sampledata/wrccdc/zeek-logs/conn.log.gz

real 93m36.353s user 126m19.417s sys 2m37.429s

(Note that this took over 80 minutes on an ec2 m5d.large)

Then on my laptop, I created an archive with zapi:

zapi new -k archivestore -d s3://zqd-demo-1/mark/zeek-logs/conn-try3 conn-100

And ran the following queries:

$ time zapi ls conn-100 zapi ls 0.02s user 0.01s system 0% cpu 20.577 total $ time zapi -s conn-100 get -t head 5 #port=uint16 #zenum=string #0:record[_path:string,ts:time,uid:bstring,id:record[orig_h:ip,orig_p:port,resp_h:ip,resp_p:port],proto:zenum,service:bstring,duration:duration,orig_bytes:uint64,resp_bytes:uint64,conn_state:bstring,local_orig:bool,local_resp:bool,missed_bytes:uint64,history:bstring,orig_pkts:uint64,orig_ip_bytes:uint64,resp_pkts:uint64,resp_ip_bytes:uint64,tunnel_parents:set[bstring]] 0:[conn;1521940133.838568;CWMAhm3fIlJPE8Rp8c;[10.230.73.251;38491;10.47.6.30;443;]tcp;-;-;-;-;S0;-;-;0;S;1;44;0;0;-;] 0:[conn;1521940133.765872;C0JVnY3iTsTikg2rH5;[10.0.0.111;43416;10.47.5.208;80;]tcp;-;-;-;-;S0;-;-;0;S;1;60;0;0;-;] 0:[conn;1521940133.738418;CfFhAb3ii6EHIrCo3c;[10.230.73.251;38478;10.47.6.30;111;]tcp;-;-;-;-;S0;-;-;0;S;1;44;0;0;-;] 0:[conn;1521940133.710309;C0Yha7UpCgYes56g4;[10.0.0.111;35357;10.47.5.142;80;]tcp;-;-;-;-;S0;-;-;0;S;1;60;0;0;-;] 0:[conn;1521940133.638178;Cvg3tt1l4jL7hl4MEf;[10.230.73.251;38477;10.47.6.30;111;]tcp;-;-;-;-;S0;-;-;0;S;1;44;0;0;-;] zapi -s conn-100 get -t head 5 0.02s user 0.01s system 0% cpu 22.977 total $ time zapi -s conn-100 get -workers 10 -t "73841" chunk span does intersect provided span zapi -s conn-100 get -workers 10 -t "73841" 0.03s user 0.01s system 0% cpu 27.384 total $

Note that only the last query got an error. I included the "head" query to show that the archive is readable for some queries.

And here are some useful comments from Al:

"Hmm, that's bad: it means we incorrectly selected a chunk for a given time range. It would take some digging to see what's wrong; it could be the chunk metadata that's wrong, or something about the selected span for the timerange that's off."
Make space available for querying during ingest
This PR adjusts the ingest process following @mccanne's design to make data available for querying as it is processed, rather than waiting for everything to land before allowing anything to be queried.

The main changes are:

Indexing and zeek processing of the pcap now happen sequentially. Indexing is first; a HTTP 2xx is returned upon successful index creation.

Instead of waiting for all zeek logs to be ready, and then running a one-time bzng conversion job after that, we now take "snapshots" of the zeek logs as they are being generated, and make these snapshots available for regular querying while zeek processing continues. A final snapshot is done after all logs have been processed.

@mccanne, I did not implement the "stable" scheme in this PR, as I had a question that raised offline (on the issue). I can add it here once that's resolved or in a different PR.

This change needs tests. I'm waiting for #399 to get in and will add those after that.
out of memory error when fusing many types
This error occurs with Zed commit 0143b5a.

The attached data_4500_types.zson.gz contains 4500 records, each with its own type. When I try to fuse these into one type, I get an out of memory error:

$ zq -version Version: v0.29.0-413-g0143b5a7 $ zq -o test.zng 'fuse' data_4500_types.zson.gz fatal error: runtime: out of memory ...

Fuse works without error with fewer types:

$ zq -z 'head 4000 | fuse | by typeof(this) | count()' data_4500_types.zson.gz {count:1 (uint64)} (=0)
Parsimonious use of zng.Record.TypeCheck()

We currently run TypeCheck() on every record read by a zio reader, which for some workloads adds significant overhead (see #1181). Yet for data that we've imported and validated along the way (specifically, zar and zqd), this check is unneeded.

This commit addresses the above by making the TypeCheck() configurable via a flag in detector.OpenConfig.

zar import and zqd log ingest enable it. zq enables it and also exposes the knob through a new -check CLI option. It is disabled in other uses, such as zar zq or zqd queries.

closes #1181
Packet Post Endpoint: Stream status updates
Posting large pcap files to a space can often take a long time as zeek slowly chugs away. When a pcap is posted to a space via the POST /space/:space/packet endpoint api now provides streaming updates in order to provide clients with an idea of how far along the ingest process is:

When a the request is initiated, a zeek subprocess is spawned, and the contents of the posted pcap is piped sequentially into zeek stdin and a pcap index writer. Assuming this all starts with no errors, a 200 ok status messaged is returned once the first zeek log files in temp zeek log file directory have been written to disk. A TaskStart response is transmitted over the response stream. What data that has been written is also transformed to sorted bzng. At this point the space is queryable.

On a hardwired two second interval, the server streams back piped json status update payloads that contain: The start timestamp of ingest, the update timestamp, the total size of pcap file and the size of bytes read from the pcap file. From this information a client should be able to approximate the percentage completion of the ingest process (this won't be entirely accurate, however, it doesn't account for the time it will take transform the zeek logs in to time sorted bzng at the end).

A TaskEnd message is sent at the completion of ingest. If TaskEnd.Error is not null users will know that an error occurred during ingest. If an error occurs during ingest, the process is aborted and the space is reset.

Also:

Add api.Stream/JSONPipe functionality to make it easier to read/write piped json files over the wire.
zq command fails with "zngio: uncompressed length exceeds MaxSize" for large files

Hello,

I am trying to query a 100MB JSON file converted to ZNG format. However the query fails with "zngio: uncompressed length exceeds MaxSize" error. I tried setting -readsize and -readmax to higher values (1GiB). The error still persist.

However querying the JSON file directly instead of ZNG format, gave expected results. But it took more time to complete. I am trying to optimize it with ZNG format.

Also, the same query worked well for smaller files (about 10MB JSON)

Errors: ` $ zq -z -i zng 'over arr | where id==10001 | cut level0.level1' 100mb.zng 100mb.zng: zngio: uncompressed length exceeds MaxSize

$ zq -z -i zng -readmax 1GiB -readsize 1GiB 'over arr | where id==10001 | cut level0.level1' 100mb.zng 100mb.zng: zngio: uncompressed length exceeds MaxSize ` How files were converted from JSON to ZNG format:

` $ zq -f zng -i json 100mb.json > 100mb.zng

$ ll -h total 1.7G drwxrwxr-x 2 user user 4.0K May 11 02:09 ./ drwxrwxr-x 9 user user 4.0K May 6 10:52 ../ -rw-rw-r-- 1 user user 102M May 11 01:25 100mb.json -rw-rw-r-- 1 user user 728K May 11 01:26 100mb.zng -rw-rw-r-- 1 user user 11M May 11 01:24 10mb.json -rw-rw-r-- 1 user user 60K May 11 01:26 10mb.zng

` Can you suggest how to solve this problem?
pretty zng - aka ZSON
We went to great lengths in the zng architecture to create a human-readable text form (tzng) suitable for hand editing, tests, and demo. This is especially important because understanding the binary zng format takes developer-level commitment to get into it and understand the details.

But for broader audiences (and hence broader acceptance), tzng doesn't have the greatest ergonomics. The syntax is compact and cryptic and the type structure is separate from the DFS-ordered list of values so it's not the best format to illustrate the zng data model. And zjson suffers from this too.

Also, when we first developed tzng, we were concerned about parsing performance. We've now come to realize that tzng is never used in a performance-sensitive context so here we are now focused on ergonomics over perform-ity.

The intent is to adopt a "pretty" version of zng that would have broader appeal and be easier to understand. I will leave it open as to whether pretty-zng replaces or augments text-zng (though I am now leaning toward replacement).

We want it to have some overlap in the look and feel of JSON without its shortcomings. And, of course, needs to represent the full gamut of zng binary capabilities.

Proposal

Here is a proposal open for discussion.

Whitespace

All whitespace is equivalent so arbitrary pretty printing is possible. There is no notion of newline-terminated boundaries so values must be parsed to find their end.

Typedefs

Types are similar to tzng but can have arbitrary names. Types are optional as values can also be fully specified with embedded or implied types. A typedef has the form:

type <name>=<type>

where name is an identifier or an integer and <type> is a zng/zql style type definition. (Type strings are not yet implemented in zql but they will be.). Type names can be redefined in a stream so that files can be concatenating and still be correct.

An alias has a similar form:

alias <name>=<type>

The difference between an alias and a type is that aliases are encoded into the corresponding zng input/output as zng aliases, where type names are just a convenience tag in pretty-zng to refer to a native zng type, which are localized to the pretty-zng file.

Implied-type Values

There are 11 "Implied-type" values:

string a double-quoted string literal

int64 as in an integer text string, e.g., 123, -1

float64 as in an float text string, e.g., 123.0 or -1e6

time unquoted string that parses as an ISO time

bool unquoted identifier true or false

ip unquoted numeric IP address (i.e., format parseable by go library)

net unquoted numeric IP subnet address in cidr form (i.e., format parseable by go library)

array as in a javascript style array where the type is implied by the elements or if the types are mixed then an array of unions

record as in javascript-style object ''{ ... }' notation where the record type of the object is implied by the key names and types of the alements

map syntax TBD

null the unquoted identifier null

Note: I believe all of the above values are unambiguously parseable like the zql parser understanding that 10.0.0.1 is an IP but "10.0.0.1" is a string.

So here are some example implied-type values with their implied types:

"hello, world" -> string 123 -> int64 192.168.1.1 -> ip [ 10, 11, 12 ] -> array[int64] [ 10, "hello", 1.0, 2 ] -> array[union[int64,string,float64]] [ "hello", "world" ] -> array[string] { foo:10, bar:"hello, word" } -> record[foo:int64,bar:string]

Explicit-type Values

To be able to represent all of the types of zng, values can be explicitly typed using the zql type case syntax and a prefix syntax for records and arrays.

Primitive values have the form <value>:<type> where values it the tzng value and type is the type, e.g., uint8, bytes, etc.

Enum has the form <value>:<type>

Union has the form <value>:<type><selector> where type is a value and selector the integer position of the type in the union

Array has the form [ elements... ]:<type>

Record has the form { elements... }:<type>

Examples

Ex 1

#0:record[s:string] 0:[hello,world;]

is the same as

{s:"hello,world"}

Ex 2

#0:record[ts:time,addr:ip,msg:string] 0:[1;10.0.0.1;hello world;] 0:[2;192.168.1.1;here is a message;] 0:[3;10.0.0.2; here is another message;]

is the same as

{ ts:1970-01-01T00:00:00.000000001Z, addr:10.0.0.1, msg:"hello, world" } { ts:1970-01-01T00:00:00.000000002Z, addr:192.168.1.1, msg:"here is a message" } { ts:1970-01-01T00:00:00.000000003Z, addr:10.0.0.2, msg:"here is another message" }

is the same as

type log = record[ts:time,addr:ip,msg:string] { 1970-01-01T00:00:00.000000001Z, 10.0.0.1, "hello, world" }:log { 1970-01-01T00:00:00.000000002Z, 192.168.1.1, "here is a message" }:log { 1970-01-01T00:00:00.000000003Z, 10.0.0.2, "here is another message" }:log

is the same as

type log = record[ts:time,addr:ip,msg:string] { 1970-01-01T00:00:00.000000001Z, 10.0.0.1, "hello, world" }:log { ts:1970-01-01T00:00:00.000000002Z, addr:192.168.1.1, msg:"here is a message" } { 1970-01-01T00:00:00.000000003Z, 10.0.0.2, "here is another message" }:log

Ex 3

#port=uint16 #0:record[id:record[orig_h:ip,orig_p:port]] 0:[[10.0.0.1;8080]]

is the same as

alias port=uint16 {id: {orig_h:10.0.0.1, orig_p:8080:port}}

is the same as

alias port=uint16 type zeekID = record[orig_h:ip,orig_p:port] { id:{10.0.0.1,8080}:zeekID }

Ex 4

#vector=array[uint16] #0:record[x:int32,point:vector]] 0:[1;[2;3;4;]] 0:[2;[23;18;]] 0:[3;[99;1024;0;1;-6;]]

is the same as

alias vector=array[uint16] type pointVector=record[x:int32,point:vector] { 1, [ 2, 3, 4 ] }:pointVector { 2, [ 23, 18 ] }:pointVector { 3, [ 99, 1024, 0, 1, -6 ] }:pointVector
occasional file i/o "access is denied" errors during pcap ingest in windows CI with brim
Several tests for Brim / zq integration involve uploading a pcap through Brim and then doing something with it. In one run, there were i/o errors that prevented ingest from succeeding.

One was

CreateFile C:\\Users\\RUNNER~1\\AppData\\Local\\Temp\\ scoped_dir596_1117112332\\upload596_235124938\\ sample.pcap.brim\\.tmp.ingest\\capture_loss.log: Access is denied.

The other was

rename C:\\Users\\RUNNER~1\\AppData\\Local\\Temp\\ scoped_dir940_1959449935\\upload940_1015111360\\ sample.pcap.brim\\all.bzng.tmp C:\\Users\\RUNNER~1\\AppData\\Local\\Temp\\ scoped_dir940_1959449935\\upload940_1015111360\\ sample.pcap.brim\\all.bzng: Access is denied

The bug is filed here as the error is happening in zqd and being propagated to Brim.

The file being ingested is sample.pcap from brim.git.

The logs of the run are available here.

brim was at https://github.com/brimsec/brim/commit/1890b07c91cf63f8d64fdb5c764229046ee6bc8e . zq was at tag v0.11.1. Note that windows integration testing is quite new; we don't have many past data points. But so far this is the only known instance.
BZNG canonical set representation — PROD-1296

Define and enforce a normal form for BZNG sets. This makes it possible to compare sets for equality by comparing their BZNG byte sequences. It also fixes incorrect zng.Value.ContainerLength results caused by duplicate elements.

Tests are coming.
replace zio.WarningReader with service.warningsReader

zio.WarningReader is used only in service.handleBranchLoad, for which it's overgeneralized. It also formats its wrapped zio.Reader with %s and prepends the result to warnings regardless of whether the reader implements fmt.Stringer, with ugly and unhelpful results in handleBranchLoad since a reader from anyio.NewReaderWithOpts doesn't implement fmt.Stringer.

Replace it with service.warningsReader, which is simpler and doesn't prepend the %s-formatted zio.Reader.
Mention comments in Language Overview doc

A community zync user recently remarked:

I recently discovered we can use comments // in Zed query code. I guess I should have read the ZSON docs more throughly and figure this out sooner...

ZSON and the Zed language have some overlap but are in many ways wholly separate topics. I can see that comments are not currently mentioned at all in the Language Overview doc where they surely should be. In a quick scan my first instinct is to put it at the end of the Introduction right after it's shown how the language can extend from simple one-liner searches to longer source files that one might maintain in GitHub.
zson: tidy marshal.go
Use a type switch in (*MarshalZNGContext).encodeAny and (*UnmarshalZNGContext).decodeAny

Remove unnecessary "if zv.Bytes == nil {...}" statements in (*UnmarshalZNGContext).decodeAny, .decodeNewipAddr, and .decodeNetIp

Shorten indirect a little
add VNG segment compression
In vng/vector:

Add a Segment.CompressionFormat field and accompanying CompressionFormatNone and CompressionFormatLZ4 constants

Add a Segment.MemLength field indicating the segment's in-memory (i.e., uncompressed) length (in contrast with Segment.Length, its in-file length)

Add (*Segment).Read, which can read both compressed and uncompressed segments

Update (*Spiller).Write to write a compressed segment if compression is effective and an uncompressed segment if not
"zed serve" flag to specify additional AllowedOrigins

As described in the README for the prototype Zed data source Grafana plugin, such a plugin cannot currently query an out-of-the-box Zed service because the CORS configuration does not permit requests from the http://localhost:3000 origin.

As an interim hack, I was able to get it working in branch by adding the additional entry for http://localhost:3000 here:

https://github.com/brimdata/zed/blob/52c5061b6fbbe9abd1dd754d240104cca647c5aa/service/middleware.go#L33

In a discussion with the team, we agreed that we should introduce a flag to allow this, e.g., zed serve -origin http://localhost:3000. It seems like a flag that the user should be able to invoke multiple times on the same command line so multiple additional origins can be specified, similar to what we do with multiple includes on zed query -I.