Pure Go implementation of jq

Last update: Jan 9, 2023

Comments: 11

gojq

Pure Go implementation of jq

This is an implementation of jq command written in Go language. You can also embed gojq as a library to your Go products.

Usage

 $ echo '{"foo": 128}' | gojq '.foo'
128
 $ echo '{"a": {"b": 42}}' | gojq '.a.b'
42
 $ echo '{"id": "sample", "10": {"b": 42}}' | gojq '{(.id): .["10"].b}'
{
  "sample": 42
}
 $ echo '[{"id":1},{"id":2},{"id":3}]' | gojq '.[] | .id'
1
2
3
 $ echo '{"a":1,"b":2}' | gojq '.a += 1 | .b *= 2'
{
  "a": 2,
  "b": 4
}
 $ echo '{"a":1} [2] 3' | gojq '. as {$a} ?// [$a] ?// $a | $a'
1
2
3
 $ echo '{"foo": 4722366482869645213696}' | gojq .foo
4722366482869645213696  # keeps the precision of large numbers
 $ gojq -n 'def fact($n): if $n < 1 then 1 else $n * fact($n - 1) end; fact(50)'
30414093201713378043612608166064768844377641568960512000000000000 # arbitrary-precision integer calculation

Nice error messages.

 $ echo '[1,2,3]' | gojq '.foo & .bar'
gojq: invalid query: .foo & .bar
    .foo & .bar
         ^  unexpected token "&"
 $ echo '{"foo": { bar: [] } }' | gojq '.'
gojq: invalid json: <stdin>
    {"foo": { bar: [] } }
              ^  invalid character 'b' looking for beginning of object key string

Installation

Homebrew

brew install gojq

Build from source

go get github.com/itchyny/gojq/cmd/gojq

Docker

docker run -i --rm itchyny/gojq
docker run -i --rm ghcr.io/itchyny/gojq

Difference to jq

gojq is purely implemented with Go language and is completely portable. jq depends on the C standard library so the availability of math functions depends on the library. jq also depends on the regular expression library and it makes build scripts complex.
gojq implements nice error messages for invalid query and JSON input. The error message of jq is sometimes difficult to tell where to fix the query.
gojq does not keep the order of object keys. I understand this might cause problems for some scripts but basically, we should not rely on the order of object keys. Due to this limitation, gojq does not have keys_unsorted function and --sort-keys (-S) option. I would implement when ordered map is implemented in the standard library of Go but I'm less motivated. Also, gojq assumes only valid JSON input while jq deals with some JSON extensions; NaN and Infinity.
gojq supports arbitrary-precision integer calculation while jq does not. This is important to keep the precision of numeric IDs or nanosecond values. You can also use gojq to solve some mathematical problems which require big integers. Note that mathematical functions convert integers to floating-point numbers; only addition, subtraction, multiplication, modulo operation, and division (when divisible) keep integer precisions. When you want to calculate floor division of big integers, use def intdiv($x; $y): ($x - $x % $y) / $y;, instead of $x / $y.
gojq fixes some bugs of jq. gojq correctly deletes elements of arrays by |= empty (jq#2051). gojq fixes update assignments including try or // operator (jq#1885, jq#2140). gojq consistently counts by characters (not by bytes) in index, rindex, and indices functions; "１２３４５" | .[index("３"):] results in "３４５" (jq#1430, jq#1624). gojq can deal with %f in strftime and strptime (jq#1409).
gojq supports reading from YAML input while jq does not. gojq also supports YAML output.

Color configuration

The gojq command automatically disables coloring output when the output is not a tty. To force coloring output, specify --color-output (-C) option. When NO_COLOR environment variable is present or --monochrome-output (-M) option is specified, gojq disables coloring output.

Use GOJQ_COLORS environment variable to configure individual colors. The variable is a colon-separated list of ANSI escape sequences of null, false, true, numbers, strings, object keys, arrays, and objects. The default configuration is 90:33:33:36:32:34;1.

Usage as a library

You can use the gojq parser and interpreter from your Go products.

package main

import (
	"fmt"
	"log"

	"github.com/itchyny/gojq"
)

func main() {
	query, err := gojq.Parse(".foo | ..")
	if err != nil {
		log.Fatalln(err)
	}
	input := map[string]interface{}{"foo": []interface{}{1, 2, 3}}
	iter := query.Run(input) // or query.RunWithContext
	for {
		v, ok := iter.Next()
		if !ok {
			break
		}
		if err, ok := v.(error); ok {
			log.Fatalln(err)
		}
		fmt.Printf("%#v\n", v)
	}
}

Firstly, use gojq.Parse(string) (*Query, error) to get the query from a string.
Secondly, get the result iterator
- using query.Run or query.RunWithContext
- or alternatively, compile the query using gojq.Compile and then code.Run or code.RunWithContext. You can reuse the *Code against multiple inputs to avoid compilation of the same query.
- In either case, you cannot use custom type values as the query input. The type should be []interface{} for an array and map[string]interface{} for a map (just like decoded to an interface{} using the encoding/json package). You can't use []int or map[string]string, for example. If you want to query your custom struct, marshal to JSON, unmarshal to interface{} and use it as the query input.
Thirdly, iterate through the results using iter.Next() (interface{}, bool). The iterator can emit an error so make sure to handle it. Termination is notified by the second returned value of Next(). The reason why the return type is not (interface{}, error) is that the iterator can emit multiple errors and you can continue after an error.

gojq.Compile allows to configure the following compiler options.

gojq.WithModuleLoader allows to load modules. By default, the module feature is disabled. If you want to load modules from the file system, use gojq.NewModuleLoader.
gojq.WithEnvironLoader allows to configure the environment variables referenced by env and $ENV. By default, OS environment variables are not accessible due to security reasons. You can use gojq.WithEnvironLoader(os.Environ) if you want.
gojq.WithVariables allows to configure the variables which can be used in the query. Pass the values of the variables to code.Run in the same order.
gojq.WithFunction allows to add a custom internal function. An internal function can return a single value (which can be an error) each invocation. To add a jq function (which may include a comma operator to emit multiple values, empty function, accept a filter for its argument, or call another built-in function), use LoadInitModules of the module loader.
gojq.WithInputIter allows to use input and inputs functions. By default, these functions are disabled.

Bug Tracker

Report bug at Issues・itchyny/gojq - GitHub.

Author

itchyny (https://github.com/itchyny)

License

This software is released under the MIT License, see LICENSE.

Owner

itchyny

Professional of jq.

https://github.com/itchyny/gojq

Comments

ER: basic TCO

Basic TCO support would substantially improve gojq's speed and memory efficiency for cetain jq programs.

This note first considers a simple and direct test of recursion limits (see [*Recursion] below), and then a more practical test involving an optimized form of walk and a well-known large JSON file https://github.com/ryanwholey/jeopardy_bot/blob/master/JEOPARDY_QUESTIONS1.json

[*Recursion]

for jq in gojq ; do
    echo $jq ::
    /usr/bin/time -lp $jq --argjson max 100000000 -n '
    def zero_arity:
      if . == $max then "completed \($max)"
      else if (. % 100000 == 0) then . else empty end, ((.+1)| zero_arity)
      end;
    1 | zero_arity'
done

In abbreviated form, the output for jq is:

"completed 100000000"
user        90.48
sys          0.24
   1880064  maximum resident set size

For gojq, the program fails to complete in a reasonable amount of time. In fact, it takes many hours to reach 600,000.

Setting max to 100,000 gives these performance indicators:

user 78.83 sys 0.65 47173632 maximum resident set size

[*walk]

#!/bin/bash

for jq in jq gojq ; do
    echo $jq
  /usr/bin/time -lp $jq '
  def walk(f):
  def w:
    if type == "object"
    then . as $in
    | reduce keys[] as $key
        ( {}; . + { ($key):  ($in[$key] | w) } ) | f
    elif type == "array" then map( w ) | f
    else f
    end;
  w;

  walk(if type == "string" then 0 else 1 end)
  ' jeopardy.json > /dev/null

done

Results:

jq
user         6.44
sys          0.11
 226742272  maximum resident set size

gojq
user         9.99
sys          0.85
1861201920  maximum resident set size

Add custom iterator function support which enables implementing a REPL in jq
Hi! this is more of a feature request than a PR but I choose to use a PR to show some proof of concept code.

Background is that i'm working on tool based on gojq that has a interactive CLI REPL interface. The REPL used to be implemented in go with quite a lot of messy code to make it "feel" like jq. Some days I realized that much of this could probably be solved if the REPL itself was written in jq. After some thinking i realized that adding support for custom iterator functions would enable implementing eval as custom function.

With read and print as custom functions you can implement a REPL like this:

def repl: def _wrap: if (. | type) != "array" then [.] end; def _repl: try read("> ") as $e | (try (.[] | eval($e)) catch . | print), _repl; _wrap | _repl;

A bit more complicated than def repl: read | eval(.) | print, repl; repl to make it more user friendly.

Example usage showing basic expressions, nested REPL and errors:

$ go run cmd/repl/main.go > 2+2 4 > 1 | repl > .+2 3 > ^D > [1,2,3] | repl > .+10 11 12 13 > ^D > 123 123 > undefined function not defined: undefined/0 > [1,2,3] | repl > undefined function not defined: undefined/0 > ^D > ^D $

Some implementation notes:

The repl takes an array as input and will array wrap non-array values. This feels natural but maybe there are better solutions?

How to handle 1, 2, 3 | repl. The current behavior to give multiple REPLs feels ok i think.

Im unsure how to handle env.paths in *env.Next(). I guess it's related to assignment and paths? i haven't looked into how this could affect it.

The code to implement the iterator can probably be much clear and nicer

What do you think? could be useful?
int(math.Inf(1)) in funcOpMul is incorrect
This was caught out by a test failure on mips64:

--- FAIL: TestCliRun (0.86s) --- FAIL: TestCliRun/multiply_strings (0.00s) cli_test.go:87: standard output: ( """ ... // 8 identical lines "abcabc" null - null + "" """ ) FAIL FAIL github.com/itchyny/gojq/cli 0.872s

The result of int(math.Inf(1))¹ depends on the architecture, and is not correct in general to do.

On s390x / ppc64le (big-endian), it returns 9223372036854775807, which would have failed as well, would it not be for the limit condition.

A simple way to fix this is to check:

math.IsNaN(cnt - cnt)

Which catches both infinite (positive / negative) as well as NaN.
[ER]: capturing the output of a subprocess
The main need is to be able to write a pipeline along the lines of:

"abc" | run("md5")

with the result: "0bee89b07a248e27c83fc3d5951213c1"

Ideally, you could use this in conjunction with try/catch:

"abc" | try system("md5") catch .

This fits in nicely with jq's pipes-and-filters, but of course a reasonable alternative would be to follow the lead of Go's exec.Command(), and have the result be a JSON object with various keys holding the results.

This gojq issue is related to the jq issue at https://github.com/stedolan/jq/issues/147

My impression is that progress on this functionality has stagnated at stedolan/jq mainly because the issue became entangled with numerous other potential enhancements (see especially https://github.com/stedolan/jq/pull/1843).

So my suggestion would be to keep gojq's initial support for "shelling out" quite simple.

Thank you again.
Support for functions written in go when used as a library

Hello! Would it be possible to add support for own "internal" functions written in go when using gojq as a library? I had a quick look at the code and it seemed not that far away but maybe i'm missing something? If you think it's a good i would even be happy to try implement it if you give some pointers.
improve performance of join by make it internal

Avoids exponential string append

Before: BenchmarkJoinShort BenchmarkJoinShort-4 26947 44399 ns/op BenchmarkJoinLong BenchmarkJoinLong-4 145 7295904 ns/op

After: BenchmarkJoinShort BenchmarkJoinShort-4 81789 14456 ns/op BenchmarkJoinLong BenchmarkJoinLong-4 22002 53791 ns/op

index/1 and rindex/1 are inefficient

Currently, gojq's builtin.jq follows jq's inefficient defs of index/1 and rindex/1, though jq's builtin.jq has a "TODO:" note about optimization.

The following is a "jq-only" approach to rectifying the problem logically (i.e., without regard to implementation issues arising from memory management). It is relatively complex because it attempts to replicate the existing functionality exactly, and to replicate or improve jq's error messages.

For example, consider the bizarre error message produced by:

jq -n '{a:10} | index("a")'
jq: error (at <unknown>): Cannot index number with number

The new error message would be:

jq: error (at <unknown>): cannot index object with string "a"

index/1

# Both input and $x are assumed to be of the same type, either strings or arrays
def _index_in($x):
  . as $in
  | ($x | length) as $xl
  | first( if type == "string"
           then if $x == "" then null # for conformity with jq
                else range(0; 1 + length - $xl) | select( $in[. : ] | startswith($x))
                end
           else      range(0; 1 + length - $xl) | select( $in[. : .+$xl ] == $x)
           end )
    // null;

def index($x):
  def s: # for the error message
    if type == "object" or type == "array" then ""
    elif type == "string" then " \"\(.)\""
    else " \(.)"
    end;
   def ix($y):
    . as $in
    | first( range(0; length) | select($in[.] == $y) ) // null;
    
  ($x|type) as $xt
  | type as $t
  | if $xt == $t
    then if ($xt == "array" and ($x|length) == 1) then ix($x[0])
         elif $t == "string" or $t == "array" then _index_in($x)
	 else "cannot index \($t) with \($xt)\($x|s)" | error
	 end
    elif $t == "array" then ix($x)
    else "cannot index \($t) with \($xt)\($x|s)" | error
    end ;

Testing and comparing

Here are some test cases, which should be run after changing def index above to def myindex.

def test($in; $x):
  ([$in,$x] | debug) as $debug
  | try ($in | index($x)) catch . ,
    try ($in | myindex($x)) catch .,
    "" ;

test([0,1,2];2),
test([0,1,2];[2]),
test([0,1,2,3,4];[2,3]),
test([0,1,2];[2,3]),
test("abcdef"; []),
test("abcdef"; ""),
test("abcdef"; "c"),
test("abcdef"; "cd"),
test({a:10}; 10),

# jq gives a bizarre error message:
test({a:10}; "a")

[bug]: include .... {"source": ...}; [discrepancy] default value of -L path

Unfortunately, the jq specifications regarding modules are quite complex at best, so here are the relevant bits regarding the default value of the -L path:

The default search path is the search path given to the -L command-line option, else ["~/.jq", "$ORIGIN/../lib/jq", "$ORIGIN/../lib"].

where:

"For paths starting with "$ORIGIN/", the path of the jq executable is substituted for "$ORIGIN"."

To see the issues, consider this transcript:

$ cd ~/github

$ cat lib/jq/foo.jq
def foo: "This is ~/github/lib/jq/foo.jq speaking";

$ cat lib/gojq/foo.jq
def foo: "This is ~/github/lib/gojq/foo";

$ cat lib/gojq/foo.gojq
def foo: "This is ~/github/lib/gojq/foo.gojq";

jq works as advertised:

$ jqMaster/jq -nM 'include "foo"; foo'
"This is ~/github/lib/jq/foo.jq speaking"

$ jqMaster/jq -nM 'include "foo" {"search": "~/github/lib/jq"}; foo'
"This is ~/github/lib/jq/foo.jq speaking"

But ...

$ gojq/gojq -nM 'include "foo"; foo'
gojq: compile error: module not found: "foo"

$ gojq/gojq -nM 'include "foo" {"search": "~/github/lib/gojq"}; foo'
gojq: compile error: module not found: "foo"

Checking the files are still there:

$ file lib/gojq/foo.gojq
lib/gojq/foo.gojq: ASCII text

$ file lib/gojq/foo.jq
file lib/gojq/foo.jq
lib/gojq/foo.jq: ASCII text

So it looks like the "bug" regarding source is real, but maybe gojq's default search path is intentionally different?

[ER]: integer arithmetic builtins: isqrt and friends

The following function definitions exploit the unbounded-precision of integer arithmetic in gojq but are compatible with jq.

That is, if they were added as built-ins to gojq, a stedolan/jq user could simply copy-and-paste them into their programs.

Also, chances are that if filters with the same names and arities as these were added to stedolan/jq in the future, they would have essentially the same semantics, so it is unlikely there would ever be much of a conflict between jq and gojq.

If a user's jq program defines conflicting filters, and if that user used gojq, then the following defs would simply be overridden, so there would be no issue in that case either. Even if such a user wanted to use their private defs and the same-name-same-arity gojq-defined def in the same program, that would be possible, with a tiny amount of effort.

# If $j is 0, then an error condition is raised;
# otherwise, assuming infinite-precision integer arithmetic,
# if the input and $j are integers, then the result will be an integer.
def idivide($j):
  . as $i
  | ($i % $j) as $mod
  | ($i - $mod) / $j ;

# input should be a non-negative integer for accuracy
# but may be any non-negative finite number
def isqrt:
  def irt:
  . as $x
    | 1 | until(. > $x; . * 4) as $q
    | {$q, $x, r: 0}
    | until( .q <= 1;
        .q |= idivide(4)
        | .t = .x - .r - .q
        | .r |= idivide(2)
        | if .t >= 0
          then .x = .t
          | .r += .q
          else .
          end)
    | .r ;
  if type == "number" and (isinfinite|not) and (isnan|not) and . >= 0
  then irt
  else "isqrt requires a non-negative integer for accuracy" | error
  end ;

# It is assumed that $n >= 0 is an integer
# . ^ $n
def power($n):
  . as $a
  | {p:1, $a, $n}
  | until (.n == 0;
       if (.n % 2) == 1 then .p = (.p*.a) else . end
       | .n |= idivide(2)
       | .a |= ((.*.)) )
  | .p;

Array slice difference to jq when using non-integer indexes

Accidentally found this difference. Seems like jq floors start and ceils stop index. gojq floors both.

# version 728837713fb78c6ab6d83180e0c7659693f84d25
$ gojq -n '[0,1,2,3][1.9:2.9]'
[
  1
]

# version jq-1.6-159-gcff5336
$ jq -n '[0,1,2,3][1.9:2.9]'
[
  1,
  2
]

Behavior differnce on use of keywords as argument of functions

jq allows us to use a keyword as a (non-variable) function argument, although I'm not sure if there's a way to use that. gojq on the other hand throws an error.

$ jq -n 'def f(true): true; . | f(.)'
true
$ gojq -n 'def f(true): true; . | f(.)'
gojq: invalid query: def f(true): true; . | f(.)
    def f(true): true; . | f(.)
          ^  unexpected token "true"
[2]    519266 exit 3     gojq -n 'def f(true): true; . | f(.)'
$ gojq --version
gojq 0.12.6 (rev: HEAD/go1.17.6)

A high-performance, zero allocation, dynamic JSON Threat Protection in pure Go

Package gojtp provides a fast way to validate the dynamic JSON and protect against vulnerable JSON content-level attacks (JSON Threat Protection) based on configured properties.

Nov 9, 2022

jsonpointer - an RFC 6901 implementation for Go

jsonpointer - an RFC 6901 implementation for Go Package jsonpointer provides the ability to resolve, assign, and delete values of any type, including

Jun 13, 2022

COBS implementation in Go (Decoder) and C (Encoder & Decoder) with tests.

COBS Table of Contents About The project COBS Specification Getting Started 3.1. Prerequisites 3.2. Installation 3.3. Roadmap Contributing License Con

May 22, 2022

The pure golang implementation of nanomsg (version 1, frozen)

mangos NOTE: This is the legacy version of mangos (v1). Users are encouraged to use mangos v2 instead if possible. No further development is taking pl

Dec 7, 2022

Pure Go implementation of D. J. Bernstein's cdb constant database library.

Oct 19, 2022

A QUIC implementation in pure go

A QUIC implementation in pure Go quic-go is an implementation of the QUIC protocol in Go. It implements the IETF QUIC draft-29 and draft-32. Version c

Jan 9, 2023

Pure Go implementation of the WebRTC API

Pion WebRTC A pure Go implementation of the WebRTC API New Release Pion WebRTC v3.0.0 has been released! See the release notes to learn about new feat

Jan 1, 2023

A Windows named pipe implementation written in pure Go.

npipe Package npipe provides a pure Go wrapper around Windows named pipes. Windows named pipe documentation: http://msdn.microsoft.com/en-us/library/w

Jan 1, 2023

mangos is a pure Golang implementation of nanomsg's "Scalablilty Protocols"

mangos Mangos™ is an implementation in pure Go of the SP (“Scalability Protocols”) messaging system. These are colloquially known as a “nanomsg”. ❗ Th

Jan 1, 2023

Pure Go implementation of the NaCL set of API's

go-nacl This is a pure Go implementation of the API's available in NaCL: https://nacl.cr.yp.to. Compared with the implementation in golang.org/x/crypt

Dec 16, 2022

Package git provides an incomplete pure Go implementation of Git core methods.

git Package git provides an incomplete pure Go implementation of Git core methods. Example Code: store := git.TempStore() defer os.RemoveAll(string(st

Oct 6, 2022

Pure Go implementation of the WebRTC API

Jan 8, 2023

Pure Go implementation of the WebRTC API

Pion WebRTC A pure Go implementation of the WebRTC API New Release Pion WebRTC v3.0.0 has been released! See the release notes to learn about new feat

Jan 9, 2023

A Blurhash implementation in pure Go (Decode/Encode)

go-blurhash go-blurhash is a pure Go implementation of the BlurHash algorithm, which is used by Mastodon an other Fediverse software to implement a sw

Dec 27, 2022

A highly extensible Git implementation in pure Go.

go-git is a highly extensible git implementation library written in pure Go. It can be used to manipulate git repositories at low level (plumbing) or

Jan 8, 2023

Readline is a pure go(golang) implementation for GNU-Readline kind library

A powerful readline library in Linux macOS Windows Solaris Guide Demo Shortcut Repos using readline Feedback If you have any questions, please submit

Jan 8, 2023

NanoVGo NanoVGNanoVGo is pure golang implementation of NanoVG. The same author maintains the NanoGUI.go project mentioned above.

NanoVGo Pure golang implementation of NanoVG. NanoVG is a vector graphics engine inspired by HTML5 Canvas API. DEMO API Reference See GoDoc Porting Me

Dec 6, 2022

Pure Go implementation of jq

gojq

Pure Go implementation of jq

Usage

Installation

Homebrew

Build from source

Docker

Difference to jq

Color configuration

Usage as a library

Bug Tracker

Author

License

Owner

itchyny

Comments

ER: basic TCO

[*Recursion]

[*walk]

Add custom iterator function support which enables implementing a REPL in jq

int(math.Inf(1)) in funcOpMul is incorrect

[ER]: capturing the output of a subprocess

Support for functions written in go when used as a library

improve performance of join by make it internal

index/1 and rindex/1 are inefficient

index/1

Testing and comparing

[bug]: include .... {"source": ...}; [discrepancy] default value of -L path

[ER]: integer arithmetic builtins: isqrt and friends

Array slice difference to jq when using non-integer indexes

Behavior differnce on use of keywords as argument of functions

Related tags

A high-performance, zero allocation, dynamic JSON Threat Protection in pure Go

jsonpointer - an RFC 6901 implementation for Go

COBS implementation in Go (Decoder) and C (Encoder & Decoder) with tests.

The pure golang implementation of nanomsg (version 1, frozen)

Pure Go implementation of D. J. Bernstein's cdb constant database library.

A QUIC implementation in pure go

Pure Go implementation of the WebRTC API

A Windows named pipe implementation written in pure Go.

mangos is a pure Golang implementation of nanomsg's "Scalablilty Protocols"

Pure Go implementation of the NaCL set of API's

Package git provides an incomplete pure Go implementation of Git core methods.

Pure Go implementation of the WebRTC API

Pure Go implementation of the WebRTC API

A Blurhash implementation in pure Go (Decode/Encode)

A highly extensible Git implementation in pure Go.

Readline is a pure go(golang) implementation for GNU-Readline kind library

NanoVGo NanoVGNanoVGo is pure golang implementation of NanoVG. The same author maintains the NanoGUI.go project mentioned above.

Pure Go implementation of jq

Pure Go implementation of the NaCL set of API's

This an implementation of Jsonnet in pure Go.