Golang port of simdjson: parsing gigabytes of JSON per second

simdjson-go

Introduction

This is a Golang port of simdjson, a high performance JSON parser developed by Daniel Lemire and Geoff Langdale. It makes extensive use of SIMD instructions to achieve parsing performance of gigabytes of JSON per second.

Performance wise, simdjson-go runs on average at about 40% to 60% of the speed of simdjson. Compared to Golang's standard package encoding/json, simdjson-go is about 10x faster.

Documentation

Features

simdjson-go is a validating parser, meaning that it amongst others validates and checks numerical values, booleans etc. Therefore these values are available as the appropriate int and float64 representations after parsing.

Additionally simdjson-go has the following features:

  • No 4 GB object limit
  • Support for ndjson (newline delimited json)
  • Pure Go (no need for cgo)

Requirements

simdjson-go has the following requirements for parsing:

A CPU with both AVX2 and CLMUL is required (Haswell from 2013 onwards should do for Intel, for AMD a Ryzen/EPYC CPU (Q1 2017) should be sufficient). This can be checked using the provided SupportedCPU() function.

The package does not provide fallback for unsupported CPUs, but serialized data can be deserialized on an unsupported CPU.

Using the gccgo will also always return unsupported CPU since it cannot compile assembly.

Usage

Run the following command in order to install simdjson-go

go get -u github.com/minio/simdjson-go

In order to parse a JSON byte stream, you either call simdjson.Parse() or simdjson.ParseND() for newline delimited JSON files. Both of these functions return a ParsedJson struct that can be used to navigate the JSON object by calling Iter().

Using the type Iter you can call Advance() to iterate over the tape, like so:

for {
    typ := iter.Advance()

    switch typ {
    case simdjson.TypeRoot:
        if typ, tmp, err = iter.Root(tmp); err != nil {
            return
        }

        if typ == simdjson.TypeObject {
            if obj, err = tmp.Object(obj); err != nil {
                return
            }

            e := obj.FindKey(key, &elem)
            if e != nil && elem.Type == simdjson.TypeString {
                v, _ := elem.Iter.StringBytes()
                fmt.Println(string(v))
            }
        }

    default:
        return
    }
}

When you advance the Iter you get the next type currently queued.

Each type then has helpers to access the data. When you get a type you can use these to access the data:

Type Action on Iter
TypeNone Nothing follows. Iter done
TypeNull Null value
TypeString String()/StringBytes()
TypeInt Int()/Float()
TypeUint Uint()/Float()
TypeFloat Float()
TypeBool Bool()
TypeObject Object()
TypeArray Array()
TypeRoot Root()

You can also get the next value as an interface{} using the Interface() method.

Note that arrays and objects that are null are always returned as TypeNull.

The complex types returns helpers that will help parse each of the underlying structures.

It is up to you to keep track of the nesting level you are operating at.

For any Iter it is possible to marshal the recursive content of the Iter using MarshalJSON() or MarshalJSONBuffer(...).

Currently, it is not possible to unmarshal into structs.

Parsing Objects

If you are only interested in one key in an object you can use FindKey to quickly select it.

An object kan be traversed manually by using NextElement(dst *Iter) (name string, t Type, err error). The key of the element will be returned as a string and the type of the value will be returned and the provided Iter will contain an iterator which will allow access to the content.

There is a NextElementBytes which provides the same, but without the need to allocate a string.

All elements of the object can be retrieved using a pretty lightweight Parse which provides a map of all keys and all elements an a slide.

All elements of the object can be returned as map[string]interface{} using the Map method on the object. This will naturally perform allocations for all elements.

Parsing Arrays

Arrays in JSON can have mixed types. To iterate over the array with mixed types use the Iter method to get an iterator.

There are methods that allow you to retrieve all elements as a single type, []int64, []uint64, float64 and strings.

Number parsing

Numbers in JSON are untyped and are returned by the following rules in order:

  • If there is any float point notation, like exponents, or a dot notation, it is always returned as float.
  • If number is a pure integer and it fits within an int64 it is returned as such.
  • If number is a pure positive integer and fits within a uint64 it is returned as such.
  • If the number is valid number it is returned as float64.

If the number was converted from integer notation to a float due to not fitting inside int64/uint64 the FloatOverflowedInteger flag is set, which can be retrieved using (Iter).FloatFlags() method.

JSON numbers follow JavaScript’s double-precision floating-point format.

  • Represented in base 10 with no superfluous leading zeros (e.g. 67, 1, 100).
  • Include digits between 0 and 9.
  • Can be a negative number (e.g. -10).
  • Can be a fraction (e.g. .5).
  • Can also have an exponent of 10, prefixed by e or E with a plus or minus sign to indicate positive or negative exponentiation.
  • Octal and hexadecimal formats are not supported.
  • Can not have a value of NaN (Not A Number) or Infinity.

Parsing NDSJON stream

Newline delimited json is sent as packets with each line being a root element.

Here is an example that counts the number of "Make": "HOND" in NDSJON similar to this:

{"Age":20, "Make": "HOND"}
{"Age":22, "Make": "TLSA"}
func findHondas(r io.Reader) {
	// Temp values.
	var tmpO simdjson.Object{}
	var tmpE simdjson.Element{}
	var tmpI simdjson.Iter
	var nFound int
	
	// Communication
	reuse := make(chan *simdjson.ParsedJson, 10)
	res := make(chan simdjson.Stream, 10)

	simdjson.ParseNDStream(r, res, reuse)
	// Read results in blocks...
	for got := range res {
		if got.Error != nil {
			if got.Error == io.EOF {
				break
			}
			log.Fatal(got.Error)
		}

		all := got.Value.Iter()
		// NDJSON is a separated by root objects.
		for all.Advance() == simdjson.TypeRoot {
			// Read inside root.
			t, i, err := all.Root(&tmpI)
			if t != simdjson.TypeObject {
				log.Println("got type", t.String())
				continue
			}

			// Prepare object.
			obj, err := i.Object(&tmpO)
			if err != nil {
				log.Println("got err", err)
				continue
			}

			// Find Make key.
			elem := obj.FindKey("Make", &tmpE)
			if elem.Type != TypeString {
				log.Println("got type", err)
				continue
			}
			
			// Get value as bytes.
			asB, err := elem.Iter.StringBytes()
			if err != nil {
				log.Println("got err", err)
				continue
			}
			if bytes.Equal(asB, []byte("HOND")) {
				nFound++
			}
		}
		reuse <- got.Value
	}
	fmt.Println("Found", nFound, "Hondas")
}

More examples can be found in the examples subdirectory and further documentation can be found at godoc.

Serializing parsed json

It is possible to serialize parsed JSON for more compact storage and faster load time.

To create a new serialized use NewSerializer. This serializer can be reused for several JSON blocks.

The serializer will provide string deduplication and compression of elements. This can be finetuned using the CompressMode setting.

To serialize a block of parsed data use the Serialize method.

To read back use the Deserialize method. For deserializing the compression mode does not need to match since it is read from the stream.

Example of speed for serializer/deserializer on parking-citations-1M.

Compress Mode % of JSON size Serialize Speed Deserialize Speed
None 177.26% 425.70 MB/s 2334.33 MB/s
Fast 17.20% 412.75 MB/s 1234.76 MB/s
Default 16.85% 411.59 MB/s 1242.09 MB/s
Best 10.91% 337.17 MB/s 806.23 MB/s

In some cases the speed difference and compression difference will be bigger.

Performance vs simdjson

Based on the same set of JSON test files, the graph below shows a comparison between simdjson and simdjson-go.

simdjson-vs-go-comparison

These numbers were measured on a MacBook Pro equipped with a 3.1 GHz Intel Core i7. Also, to make it a fair comparison, the constant GOLANG_NUMBER_PARSING was set to false (default is true) in order to use the same number parsing function (which is faster at the expense of some precision; see more below).

In addition the constant ALWAYS_COPY_STRINGS was set to false (default is true) for non-streaming use case scenarios where the full JSON message is kept in memory (similar to the simdjson behaviour).

Performance vs encoding/json and json-iterator/go

Below is a performance comparison to Golang's standard package encoding/json based on the same set of JSON test files.

$ benchcmp                    encoding_json.txt      simdjson-go.txt
benchmark                     old MB/s               new MB/s         speedup
BenchmarkApache_builds-8      106.77                  948.75           8.89x
BenchmarkCanada-8              54.39                  519.85           9.56x
BenchmarkCitm_catalog-8       100.44                 1565.28          15.58x
BenchmarkGithub_events-8      159.49                  848.88           5.32x
BenchmarkGsoc_2018-8          152.93                 2515.59          16.45x
BenchmarkInstruments-8         82.82                  811.61           9.80x
BenchmarkMarine_ik-8           48.12                  422.43           8.78x
BenchmarkMesh-8                49.38                  371.39           7.52x
BenchmarkMesh_pretty-8         73.10                  784.89          10.74x
BenchmarkNumbers-8            160.69                  434.85           2.71x
BenchmarkRandom-8              66.56                  615.12           9.24x
BenchmarkTwitter-8             79.05                 1193.47          15.10x
BenchmarkTwitterescaped-8      83.96                  536.19           6.39x
BenchmarkUpdate_center-8       73.92                  860.52          11.64x

Also simdjson-go uses less additional memory and allocations.

Here is another benchmark comparison to json-iterator/go:

$ benchcmp                    json-iterator.txt      simdjson-go.txt
benchmark                     old MB/s               new MB/s         speedup
BenchmarkApache_builds-8      154.65                  948.75           6.13x
BenchmarkCanada-8              40.34                  519.85          12.89x
BenchmarkCitm_catalog-8       183.69                 1565.28           8.52x
BenchmarkGithub_events-8      170.77                  848.88           4.97x
BenchmarkGsoc_2018-8          225.13                 2515.59          11.17x
BenchmarkInstruments-8        120.39                  811.61           6.74x
BenchmarkMarine_ik-8           61.71                  422.43           6.85x
BenchmarkMesh-8                50.66                  371.39           7.33x
BenchmarkMesh_pretty-8         90.36                  784.89           8.69x
BenchmarkNumbers-8             52.61                  434.85           8.27x
BenchmarkRandom-8              85.87                  615.12           7.16x
BenchmarkTwitter-8            139.57                 1193.47           8.55x
BenchmarkTwitterescaped-8     102.28                  536.19           5.24x
BenchmarkUpdate_center-8      101.41                  860.52           8.49x

AVX512 Acceleration

Stage 1 has been optimized using AVX512 instructions. Under full CPU load (8 threads) the AVX512 code is about 1 GB/sec (15%) faster as compared to the AVX2 code.

benchmark                                   AVX2 MB/s    AVX512 MB/s     speedup
BenchmarkFindStructuralBitsParallelLoop      7225.24      8302.96         1.15x

These benchmarks were generated on a c5.2xlarge EC2 instance with a Xeon Platinum 8124M CPU at 3.0 GHz.

Design

simdjson-go follows the same two stage design as simdjson. During the first stage the structural elements ({, }, [, ], :, and ,) are detected and forwarded as offsets in the message buffer to the second stage. The second stage builds a tape format of the structure of the JSON document.

Note that in contrast to simdjson, simdjson-go outputs uint32 increments (as opposed to absolute values) to the second stage. This allows arbitrarily large JSON files to be parsed (as long as a single (string) element does not surpass 4 GB...).

Also, for better performance, both stages run concurrently as separate go routines and a go channel is used to communicate between the two stages.

Stage 1

Stage 1 has been converted from the original C code (containing the SIMD intrinsics) to Golang assembly using c2goasm. It essentially consists of five separate steps, being:

  • find_odd_backslash_sequences: detect backslash characters used to escape quotes
  • find_quote_mask_and_bits: generate a mask with bits turned on for characters between quotes
  • find_whitespace_and_structurals: generate a mask for whitespace plus a mask for the structural characters
  • finalize_structurals: combine the masks computed above into a final mask where each active bit represents the position of a structural character in the input message.
  • flatten_bits_incremental: output the active bits in the final mask as incremental offsets.

For more details you can take a look at the various test cases in find_subroutines_amd64_test.go to see how the individual routines can be invoked (typically with a 64 byte input buffer that generates one or more 64-bit masks).

There is one final routine, find_structural_bits_in_slice, that ties it all together and is invoked with a slice of the message buffer in order to find the incremental offsets.

Stage 2

During Stage 2 the tape structure is constructed. It is essentially a single function that jumps around as it finds the various structural characters and builds the hierarchy of the JSON document that it processes. The values of the JSON elements such as strings, integers, booleans etc. are parsed and written to the tape.

Any errors (such as an array not being closed or a missing closing brace) are detected and reported back as errors to the client.

Tape format

Similarly to simdjson, simdjson-go parses the structure onto a 'tape' format. With this format it is possible to skip over arrays and (sub)objects as the sizes are recorded in the tape.

simdjson-go format is exactly the same as the simdjson tape format with the following 2 exceptions:

  • In order to support ndjson, it is possible to have more than one root element on the tape. Also, to allow for fast navigation over root elements, a root points to the next root element (and as such the last root element points 1 index past the length of the tape).

  • Strings are handled differently, unlike simdjson the string size is not prepended in the String buffer but is added as an additional element to the tape itself (much like integers and floats).

    • In case ALWAYS_COPY_STRINGS is false: Only strings that contain special characters are copied to the String buffer in which case the payload from the tape is the offset into the String buffer. For string values without special characters the tape's payload points directly into the message buffer.
    • In case ALWAYS_COPY_STRINGS is true (default): Strings are always copied to the String buffer.

For more information, see TestStage2BuildTape in stage2_build_tape_test.go.

Non streaming use cases

The best performance is obtained by keeping the JSON message fully mapped in memory and setting the ALWAYS_COPY_STRINGS constant to false. This prevents duplicate copies of string values being made but mandates that the original JSON buffer is kept alive until the ParsedJson object is no longer needed (ie iteration over the tape format has been completed).

In case the JSON message buffer is freed earlier (or for streaming use cases where memory is reused) ALWAYS_COPY_STRINGS should be set to true (which is the default behaviour).

Fuzz Tests

simdjson-go has been extensively fuzz tested to ensure that input cannot generate crashes and that output matches the standard library.

The fuzzers and corpus are contained in a separate repository at github.com/minio/simdjson-fuzz

The repo contains information on how to run them.

License

simdjson-go is released under the Apache License v2.0. You can find the complete text in the file LICENSE.

Contributing

Contributions are welcome, please send PRs for any enhancements.

If your PR include parsing changes please run fuzz testers for a couple of hours.

Owner
High Performance, Kubernetes Native Object Storage
High Performance, Kubernetes Native Object Storage
Comments
  • checkptr: unsafe pointer arithmetic

    checkptr: unsafe pointer arithmetic

    I found a previous issue with this error that was marked as resolved: https://github.com/minio/simdjson-go/issues/13.

    I've recently introduced simdjson-go to parse LSIF input for Sourcegraph in https://github.com/sourcegraph/sourcegraph/pull/11703. Running go test -race (in the enterprise/cmd/precise-code-intel-worker/internal/correlation directory) on this PR results in a checkptr exception:

    $ go test -race
    fatal error: checkptr: unsafe pointer arithmetic
    
    goroutine 53 [running]:
    runtime.throw(0x13b23cc, 0x23)
    	/usr/local/go/src/runtime/panic.go:1116 +0x72 fp=0xc000c95a78 sp=0xc000c95a48 pc=0x10749a2
    runtime.checkptrArithmetic(0xc000d80002, 0x0, 0x0, 0x0)
    	/usr/local/go/src/runtime/checkptr.go:43 +0xb5 fp=0xc000c95aa8 sp=0xc000c95a78 pc=0x1047115
    github.com/minio/simdjson-go.parse_string_simd_validate_only(0xc000d80001, 0x12a7, 0xa003ff, 0xc000c95df8, 0xc000c95b58, 0xc000c95b4f, 0x0)
    	/Users/efritz/go/pkg/mod/github.com/minio/[email protected]/parse_string_amd64.go:39 +0x6d fp=0xc000c95b08 sp=0xc000c95aa8 pc=0x126ccdd
    github.com/minio/simdjson-go.parse_string(0xc000d20000, 0x1, 0x4, 0x0)
    	/Users/efritz/go/pkg/mod/github.com/minio/[email protected]/stage2_build_tape_amd64.go:90 +0x1a9 fp=0xc000c95de8 sp=0xc000c95b08 pc=0x127c349
    github.com/minio/simdjson-go.unified_machine(0xc000d80000, 0x12a8, 0xa00400, 0xc000d20000, 0x0)
    	/Users/efritz/go/pkg/mod/github.com/minio/[email protected]/stage2_build_tape_amd64.go:249 +0x2775 fp=0xc000c95f68 sp=0xc000c95de8 pc=0x1280185
    github.com/minio/simdjson-go.(*internalParsedJson).parseMessageInternal.func2(0xc000d20000, 0xc0000a4200, 0xc00001a130)
    	/Users/efritz/go/pkg/mod/github.com/minio/[email protected]/parse_json_amd64.go:89 +0x67 fp=0xc000c95fc8 sp=0xc000c95f68 pc=0x12828c7
    runtime.goexit()
    	/usr/local/go/src/runtime/asm_amd64.s:1373 +0x1 fp=0xc000c95fd0 sp=0xc000c95fc8 pc=0x10a7f31
    created by github.com/minio/simdjson-go.(*internalParsedJson).parseMessageInternal
    	/Users/efritz/go/pkg/mod/github.com/minio/[email protected]/parse_json_amd64.go:88 +0x2c4
    
    goroutine 1 [chan receive]:
    testing.(*T).Run(0xc0000ee120, 0x13abca8, 0xd, 0x13b7120, 0x1)
    	/usr/local/go/src/testing/testing.go:1043 +0x699
    testing.runTests.func1(0xc0000ee120)
    	/usr/local/go/src/testing/testing.go:1284 +0xa7
    testing.tRunner(0xc0000ee120, 0xc0000c7d50)
    	/usr/local/go/src/testing/testing.go:991 +0x1ec
    testing.runTests(0xc0000be3e0, 0x15e9360, 0x9, 0x9, 0x0)
    	/usr/local/go/src/testing/testing.go:1282 +0x528
    testing.(*M).Run(0xc000138000, 0x0)
    	/usr/local/go/src/testing/testing.go:1199 +0x300
    main.main()
    	_testmain.go:60 +0x224
    
    goroutine 25 [chan receive]:
    github.com/sourcegraph/sourcegraph/enterprise/cmd/precise-code-intel-worker/internal/correlation.correlateFromReader(0x1408c00, 0xc000151740, 0x13a9e94, 0x4, 0x0, 0x0, 0x0)
    	/Users/efritz/dev/sourcegraph/sourcegraph/enterprise/cmd/precise-code-intel-worker/internal/correlation/correlate.go:58 +0x6f9
    github.com/sourcegraph/sourcegraph/enterprise/cmd/precise-code-intel-worker/internal/correlation.TestCorrelate(0xc0000ee900)
    	/Users/efritz/dev/sourcegraph/sourcegraph/enterprise/cmd/precise-code-intel-worker/internal/correlation/correlate_test.go:19 +0x1a4
    testing.tRunner(0xc0000ee900, 0x13b7120)
    	/usr/local/go/src/testing/testing.go:991 +0x1ec
    created by testing.(*T).Run
    	/usr/local/go/src/testing/testing.go:1042 +0x661
    
    goroutine 29 [chan receive]:
    github.com/minio/simdjson-go.ParseNDStream.func3(0xc00016a840, 0xc00016a900)
    	/Users/efritz/go/pkg/mod/github.com/minio/[email protected]/simdjson_amd64.go:128 +0x107
    created by github.com/minio/simdjson-go.ParseNDStream
    	/Users/efritz/go/pkg/mod/github.com/minio/[email protected]/simdjson_amd64.go:123 +0x186
    
    goroutine 30 [chan send]:
    github.com/minio/simdjson-go.queueError(...)
    	/Users/efritz/go/pkg/mod/github.com/minio/[email protected]/simdjson_amd64.go:209
    github.com/minio/simdjson-go.ParseNDStream.func4(0xc00016a900, 0xc000151770, 0xc00016a8a0, 0xc0000b0180)
    	/Users/efritz/go/pkg/mod/github.com/minio/[email protected]/simdjson_amd64.go:199 +0x409
    created by github.com/minio/simdjson-go.ParseNDStream
    	/Users/efritz/go/pkg/mod/github.com/minio/[email protected]/simdjson_amd64.go:142 +0x1c9
    
    goroutine 50 [chan receive]:
    github.com/sourcegraph/sourcegraph/enterprise/cmd/precise-code-intel-worker/internal/correlation/lsif.Read.func1(0xc000d00060, 0xc00016a840, 0x140c360, 0xc0000999c0)
    	/Users/efritz/dev/sourcegraph/sourcegraph/enterprise/cmd/precise-code-intel-worker/internal/correlation/lsif/reader.go:37 +0x5bb
    created by github.com/sourcegraph/sourcegraph/enterprise/cmd/precise-code-intel-worker/internal/correlation/lsif.Read
    	/Users/efritz/dev/sourcegraph/sourcegraph/enterprise/cmd/precise-code-intel-worker/internal/correlation/lsif/reader.go:34 +0xe9
    
    goroutine 51 [semacquire]:
    sync.runtime_Semacquire(0xc00001a138)
    	/usr/local/go/src/runtime/sema.go:56 +0x42
    sync.(*WaitGroup).Wait(0xc00001a130)
    	/usr/local/go/src/sync/waitgroup.go:130 +0xd4
    github.com/minio/simdjson-go.(*internalParsedJson).parseMessageInternal(0xc000d20000, 0xc000d80000, 0x12a9, 0xa00400, 0x1, 0x0, 0x0)
    	/Users/efritz/go/pkg/mod/github.com/minio/[email protected]/parse_json_amd64.go:98 +0x2d2
    github.com/minio/simdjson-go.(*internalParsedJson).parseMessageNdjson(...)
    	/Users/efritz/go/pkg/mod/github.com/minio/[email protected]/parse_json_amd64.go:56
    github.com/minio/simdjson-go.ParseNDStream.func4.1(0xc0000b0180, 0xc000151770, 0xc000d80000, 0x12a9, 0xa00400, 0xc000022060)
    	/Users/efritz/go/pkg/mod/github.com/minio/[email protected]/simdjson_amd64.go:180 +0xf2
    created by github.com/minio/simdjson-go.ParseNDStream.func4
    	/Users/efritz/go/pkg/mod/github.com/minio/[email protected]/simdjson_amd64.go:168 +0x348
    exit status 2
    FAIL	github.com/sourcegraph/sourcegraph/enterprise/cmd/precise-code-intel-worker/internal/correlation	0.237s
    

    Go version:

    $ go version
    go version go1.14.3 darwin/amd64
    

    I'm fairly certain that I'm not misusing the library. At first I thought it may be a concurrency issue, but after simplifying the reading off of the parsed stream the issue still occurs. The files enterprise/cmd/precise-code-intel-worker/internal/correlation/lsif/reader.go and enterprise/cmd/precise-code-intel-worker/internal/correlation/lsif/unmarshal.go are the only files with code relevant to this failure. (cc @klauspost - this project was recommended from this reddit post).

    Let me know if I can provide additional context.

  • Benchmarks are misleading

    Benchmarks are misleading

    The benchmarks in benchmarks_test.go aren't really comparing apples to apples, the simdjson benchmark is only parsing the JSON, while the other benchmarks are parsing it and building an interface{}-typed object to represent the data.

  • overflowing uint64 being parsed as float64

    overflowing uint64 being parsed as float64

    I'm using 3d975b7 as the last commit and here's how to reproduce the issue:

    package main
    
    import (
    	"fmt"
    
    	"github.com/minio/simdjson-go"
    )
    
    func main() {
    	data := []byte(`{
    		"number": 27670116110564327426
    	}`)
    
    	parsed, err := simdjson.Parse(data, nil)
    	if err != nil {
    		panic(err)
    	}
    
    	iter := parsed.Iter()
    	iter.Advance()
    
    	tmp := &simdjson.Iter{}
    	obj := &simdjson.Object{}
    
    	_, tmp, err = iter.Root(tmp)
    	if err != nil {
    		panic(err)
    	}
    
    	obj, err = tmp.Object(obj)
    	if err != nil {
    		panic(err)
    	}
    
    	// convert to map[string]interface{}
    	m, err := obj.Map(nil)
    	if err != nil {
    		panic(err)
    	}
    
    	// prints: 2.7670116110564327e+19
    	//
    	// should overflow uint64
    	fmt.Println(m["number"])
    }
    

    Interestingly, I found an article from 2018 where Dgraph was making the same mistake.

  • Build error with gccgo (10.2.0)

    Build error with gccgo (10.2.0)

    When trying to build the current (ef1a2b46ad3790965aad804c5b82bab2033cacce) revision:

    $ go version
    go version go1.14.4 gccgo (GCC) 10.2.0 linux/amd64
    
    $ go get -v github.com/minio/simdjson-go
    github.com/minio/simdjson-go
    # github.com/minio/simdjson-go
    ../../../../../Go/src/github.com/minio/simdjson-go/finalize_structurals_amd64.s: Assembler messages:
    ../../../../../Go/src/github.com/minio/simdjson-go/finalize_structurals_amd64.s:6: Error: no such instruction: `text ·_finalize_structurals(SB),$0-48'
    ../../../../../Go/src/github.com/minio/simdjson-go/finalize_structurals_amd64.s:8: Error: junk `(FP)' after expression
    ../../../../../Go/src/github.com/minio/simdjson-go/finalize_structurals_amd64.s:8: Error: too many memory references for `movq'
    ../../../../../Go/src/github.com/minio/simdjson-go/finalize_structurals_amd64.s:9: Error: junk `(FP)' after expression
    ../../../../../Go/src/github.com/minio/simdjson-go/finalize_structurals_amd64.s:9: Error: too many memory references for `movq'
    ../../../../../Go/src/github.com/minio/simdjson-go/finalize_structurals_amd64.s:10: Error: junk `(FP)' after expression
    ../../../../../Go/src/github.com/minio/simdjson-go/finalize_structurals_amd64.s:10: Error: too many memory references for `movq'
    ../../../../../Go/src/github.com/minio/simdjson-go/finalize_structurals_amd64.s:11: Error: junk `(FP)' after expression
    ../../../../../Go/src/github.com/minio/simdjson-go/finalize_structurals_amd64.s:11: Error: too many memory references for `movq'
    ../../../../../Go/src/github.com/minio/simdjson-go/finalize_structurals_amd64.s:12: Error: junk `(FP)' after expression
    ../../../../../Go/src/github.com/minio/simdjson-go/finalize_structurals_amd64.s:12: Error: too many memory references for `movq'
    ../../../../../Go/src/github.com/minio/simdjson-go/finalize_structurals_amd64.s:14: Error: junk `(SB)' after expression
    ../../../../../Go/src/github.com/minio/simdjson-go/finalize_structurals_amd64.s:16: Error: too many memory references for `movq'
    ../../../../../Go/src/github.com/minio/simdjson-go/finalize_structurals_amd64.s:19: Error: no such instruction: `text ·__finalize_structurals(SB),$0'
    ../../../../../Go/src/github.com/minio/simdjson-go/finalize_structurals_amd64.s:21: Error: too many memory references for `andn'
    ../../../../../Go/src/github.com/minio/simdjson-go/finalize_structurals_amd64.s:22: Error: too many memory references for `or'
    ../../../../../Go/src/github.com/minio/simdjson-go/finalize_structurals_amd64.s:23: Error: too many memory references for `movq'
    ../../../../../Go/src/github.com/minio/simdjson-go/finalize_structurals_amd64.s:24: Error: too many memory references for `or'
    ../../../../../Go/src/github.com/minio/simdjson-go/finalize_structurals_amd64.s:25: Error: junk `(AX*1)' after expression
    ../../../../../Go/src/github.com/minio/simdjson-go/finalize_structurals_amd64.s:25: Error: too many memory references for `lea'
    ../../../../../Go/src/github.com/minio/simdjson-go/finalize_structurals_amd64.s:26: Error: too many memory references for `or'
    ../../../../../Go/src/github.com/minio/simdjson-go/finalize_structurals_amd64.s:28: Error: too many memory references for `movq'
    ../../../../../Go/src/github.com/minio/simdjson-go/finalize_structurals_amd64.s:30: Error: too many memory references for `andn'
    ../../../../../Go/src/github.com/minio/simdjson-go/finalize_structurals_amd64.s:31: Error: too many memory references for `and'
    ../../../../../Go/src/github.com/minio/simdjson-go/finalize_structurals_amd64.s:32: Error: too many memory references for `or'
    ../../../../../Go/src/github.com/minio/simdjson-go/finalize_structurals_amd64.s:34: Error: too many memory references for `or'
    ../../../../../Go/src/github.com/minio/simdjson-go/finalize_structurals_amd64.s:35: Error: too many memory references for `and'
    ../../../../../Go/src/github.com/minio/simdjson-go/finalize_structurals_amd64.s:38: Error: no such instruction: `text ·__finalize_structurals_avx512(SB),$0'
    ../../../../../Go/src/github.com/minio/simdjson-go/finalize_structurals_amd64.s:40: Error: too many memory references for `kmovq'
    ../../../../../Go/src/github.com/minio/simdjson-go/finalize_structurals_amd64.s:41: Error: too many memory references for `kmovq'
    ../../../../../Go/src/github.com/minio/simdjson-go/finalize_structurals_amd64.s:42: Error: too many memory references for `kmovq'
    ../../../../../Go/src/github.com/minio/simdjson-go/finalize_structurals_amd64.s:43: Error: too many memory references for `andn'
    ../../../../../Go/src/github.com/minio/simdjson-go/finalize_structurals_amd64.s:44: Error: too many memory references for `or'
    ../../../../../Go/src/github.com/minio/simdjson-go/finalize_structurals_amd64.s:45: Error: too many memory references for `movq'
    ../../../../../Go/src/github.com/minio/simdjson-go/finalize_structurals_amd64.s:46: Error: too many memory references for `or'
    ../../../../../Go/src/github.com/minio/simdjson-go/finalize_structurals_amd64.s:47: Error: junk `(AX*1)' after expression
    ../../../../../Go/src/github.com/minio/simdjson-go/finalize_structurals_amd64.s:47: Error: too many memory references for `lea'
    ../../../../../Go/src/github.com/minio/simdjson-go/finalize_structurals_amd64.s:48: Error: too many memory references for `or'
    ../../../../../Go/src/github.com/minio/simdjson-go/finalize_structurals_amd64.s:50: Error: too many memory references for `movq'
    ../../../../../Go/src/github.com/minio/simdjson-go/finalize_structurals_amd64.s:52: Error: too many memory references for `andn'
    ../../../../../Go/src/github.com/minio/simdjson-go/finalize_structurals_amd64.s:53: Error: too many memory references for `and'
    ../../../../../Go/src/github.com/minio/simdjson-go/finalize_structurals_amd64.s:54: Error: too many memory references for `or'
    ../../../../../Go/src/github.com/minio/simdjson-go/finalize_structurals_amd64.s:56: Error: too many memory references for `or'
    ../../../../../Go/src/github.com/minio/simdjson-go/finalize_structurals_amd64.s:57: Error: too many memory references for `and'
    
    $ cat ../../../../../Go/src/github.com/minio/simdjson-go/finalize_structurals_amd64.s
    //+build !noasm !appengine gc
    // AUTO-GENERATED BY C2GOASM -- DO NOT EDIT
    
    #include "common.h"
    
    TEXT ·_finalize_structurals(SB), $0-48
    
        MOVQ structurals_in+0(FP), DI
        MOVQ whitespace+8(FP), SI
        MOVQ quote_mask+16(FP), DX
        MOVQ quote_bits+24(FP), CX
        MOVQ prev_iter_ends_pseudo_pred+32(FP), R8
    
        CALL ·__finalize_structurals(SB)
    
        MOVQ AX, structurals+40(FP)
        RET
    
    TEXT ·__finalize_structurals(SB), $0
    
        ANDNQ DI, DX, DI             // andn    rdi, rdx, rdi
        ORQ  CX, DI                  // or    rdi, rcx
        MOVQ DI, AX                  // mov    rax, rdi
        ORQ  SI, AX                  // or    rax, rsi
        LEAQ (AX)(AX*1), R9          // lea    r9, [rax + rax]
        ORQ  (R8), R9                // or    r9, qword [r8]
        SHRQ $63, AX                 // shr    rax, 63
        MOVQ AX, (R8)                // mov    qword [r8], rax
        NOTQ SI                      // not    rsi
        ANDNQ SI, DX, AX             // andn    rax, rdx, rsi
        ANDQ R9, AX                  // and    rax, r9
        ORQ  DI, AX                  // or    rax, rdi
        NOTQ CX                      // not    rcx
        ORQ  DX, CX                  // or    rcx, rdx
        ANDQ CX, AX                  // and    rax, rcx
        RET
    
    TEXT ·__finalize_structurals_avx512(SB), $0
    
        KMOVQ K_WHITESPACE,  SI
        KMOVQ K_QUOTEBITS,   CX
        KMOVQ K_STRUCTURALS, DI
        ANDNQ DI, DX, DI             // andn    rdi, rdx, rdi
        ORQ  CX, DI                  // or    rdi, rcx
        MOVQ DI, AX                  // mov    rax, rdi
        ORQ  SI, AX                  // or    rax, rsi
        LEAQ (AX)(AX*1), R9          // lea    r9, [rax + rax]
        ORQ  (R8), R9                // or    r9, qword [r8]
        SHRQ $63, AX                 // shr    rax, 63
        MOVQ AX, (R8)                // mov    qword [r8], rax
        NOTQ SI                      // not    rsi
        ANDNQ SI, DX, AX             // andn    rax, rdx, rsi
        ANDQ R9, AX                  // and    rax, r9
        ORQ  DI, AX                  // or    rax, rdi
        NOTQ CX                      // not    rcx
        ORQ  DX, CX                  // or    rcx, rdx
        ANDQ CX, AX                  // and    rax, rcx
        RET
    

    CPU:

    Model name:                      Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
    

    Any more info I could provide?

  • all: run asmfmt

    all: run asmfmt

    Run github.com/klauspost/asmfmt/cmd/asmfmt.

    Note that only find_odd_backslash_sequences_amd64.s file was self-edted after run asmfmt because there are Y8 => (DI) and Y9 => (SI) inline hints.

    	VPCMPEQB  Y8/*(DI)*/, Y0, Y1  // vpcmpeqb    ymm1, ymm0, yword [rdi]
    	VPMOVMSKB Y1, CX              // vpmovmskb    ecx, ymm1
    	VPCMPEQB  Y9/*(SI)*/, Y0, Y0  // vpcmpeqb    ymm0, ymm0, yword [rsi]
    
  • How to handle values containing escaped JSON

    How to handle values containing escaped JSON

    Hi, i'm trying to parse JSON with the following structure:

    {
      "key" :  val,
      "list": [
                  {"key1": {\"key2\":val1}}
                  ...
              ]
    }
    

    I can't seem to find out how to get val1.

  • Modifying fields

    Modifying fields

    Can I modify a field value in the underlying byte slice if I know that it will have the same length or shorter? I have a field in JSON:

    "tmax": 120,
    

    and I need to reduce down on 30ms it before sending the changed JSON further:

    "tmax": 90,
    

    So here the value is shorter and takes two digits instead of three and I can just add a space after comma to keep JSON valid and not allocate a new buffer and copy. As far I understood I may try to do this with the raw tape but not sure.

  • Fix number parsing

    Fix number parsing

    Remove GOLANG_NUMBER_PARSING and remove the imprecise parsing and fix up the actual number parsing in Go.

    By default, everything that looked like a number would be accepted and a lot of errors were not caught.

    Uints will now actually be used if numbers are above maximum int64 and below uint64 with no float point markers.

    Even with all the additional checks we are still faster:

    λ benchcmp before.txt after.txt
    benchmark                               old ns/op     new ns/op     delta
    BenchmarkParseNumber/Pos/63bit-32       91.9          75.9          -17.41%
    BenchmarkParseNumber/Neg/63bit-32       106           77.2          -27.17%
    BenchmarkParseNumberFloat-32            190           72.5          -61.84%
    BenchmarkParseNumberFloatExp-32         212           98.6          -53.49%
    BenchmarkParseNumberBig-32              401           175           -56.36%
    BenchmarkParseNumberRandomBits-32       420           230           -45.24%
    BenchmarkParseNumberRandomFloats-32     305           172           -43.61%
    

    ... and full benchmarks:

    benchmark                                             old ns/op      new ns/op      delta
    BenchmarkApache_builds-32                             137091         139556         +1.80%
    BenchmarkCanada-32                                    30705862       19000003       -38.12%
    BenchmarkCitm_catalog-32                              1921474        2093471        +8.95%
    BenchmarkGithub_events-32                             77611          77873          +0.34%
    BenchmarkGsoc_2018-32                                 1220291        1215097        -0.43%
    BenchmarkInstruments-32                               366747         374568         +2.13%
    BenchmarkMarine_ik-32                                 27410259       18343775       -33.08%
    BenchmarkMesh-32                                      8200018        5896043        -28.10%
    BenchmarkMesh_pretty-32                               9793413        6947830        -29.06%
    BenchmarkNumbers-32                                   1967319        1213924        -38.30%
    BenchmarkRandom-32                                    1072071        1042956        -2.72%
    BenchmarkTwitter-32                                   645530         645529         -0.00%
    BenchmarkTwitterescaped-32                            1014456        1022548        +0.80%
    
  • Removing array elements

    Removing array elements

    Hi,

    We are currently looking at this library to achieve the following:

    • Quickly inspect the JSON
    • Remove elements from an inner array based on certain business rules
    • Emit the JSON buffer minus the filtered array elements

    Is this possible with this library? I have not found methods to nil an object or array element so far.

    Best,

    Dirk

  • Impact of GOAMD64 on simdjson-go

    Impact of GOAMD64 on simdjson-go

    Hi!

    Go 1.18 added the GOAMD64 build environment variable (https://tip.golang.org/doc/go1.18#amd64, https://github.com/golang/go/wiki/MinimumRequirements#amd64).

    On the surface, it would seem that GOAMD64=v3+ should affect performance of this package, since it's adding a bunch of SIMD instructions.

    Could you please verify that assumption? Is there any relationship between the environmental variable and this project?

    Thanks,

  • Parse from io.Reader

    Parse from io.Reader

    I'm currently using simdjson-go for the first time to parse a very big array. It's great and brought a big performance boost. One question though, is it a design decision to only take a full byte stream and not a reader?

    Compared to encoding/json this is significantly faster, but takes more memory (the entire document). It would be interesting to have a way around this and I wanted to know if it's even possible with how the library works right now.

  • Request to add more details to parsing errors

    Request to add more details to parsing errors

    It would be useful to get more debugging information in the case of parsing errors. Things like byte offsets or string sections. Errors like:

    Failed to find all structural indices for stage 1
    

    Are not really helpful when parsing large amounts of data.

  • minor comment issue

    minor comment issue

    Was browsing the source and found a comment that seems to be cut off. Minor issue, but the explanation is incomplete. https://github.com/minio/simdjson-go/blob/9743702f0b3d3cdc35176eccb52dcdcc3dd4b807/stage2_build_tape_amd64.go#L93

Easy JSON parsing, stringifying, and accesing

Easy JSON parsing, stringifying, and accesing

Nov 23, 2021
Get JSON values quickly - JSON parser for Go
Get JSON values quickly - JSON parser for Go

get json values quickly GJSON is a Go package that provides a fast and simple way to get values from a json document. It has features such as one line

Dec 28, 2022
JSON diff library for Go based on RFC6902 (JSON Patch)

jsondiff jsondiff is a Go package for computing the diff between two JSON documents as a series of RFC6902 (JSON Patch) operations, which is particula

Dec 4, 2022
Fast JSON encoder/decoder compatible with encoding/json for Go
Fast JSON encoder/decoder compatible with encoding/json for Go

Fast JSON encoder/decoder compatible with encoding/json for Go

Jan 6, 2023
Package json implements encoding and decoding of JSON as defined in RFC 7159

Package json implements encoding and decoding of JSON as defined in RFC 7159. The mapping between JSON and Go values is described in the documentation for the Marshal and Unmarshal functions

Jun 26, 2022
Json-go - CLI to convert JSON to go and vice versa
Json-go - CLI to convert JSON to go and vice versa

Json To Go Struct CLI Install Go version 1.17 go install github.com/samit22/js

Jul 29, 2022
JSON Spanner - A Go package that provides a fast and simple way to filter or transform a json document

JSON SPANNER JSON Spanner is a Go package that provides a fast and simple way to

Sep 14, 2022
Abstract JSON for golang with JSONPath support

Abstract JSON Abstract JSON is a small golang package provides a parser for JSON with support of JSONPath, in case when you are not sure in its struct

Jan 5, 2023
JSON query in Golang

gojq JSON query in Golang. Install go get -u github.com/elgs/gojq This library serves three purposes: makes parsing JSON configuration file much easie

Dec 28, 2022
Automatically generate Go (golang) struct definitions from example JSON

gojson gojson generates go struct definitions from json or yaml documents. Example $ curl -s https://api.github.com/repos/chimeracoder/gojson | gojson

Jan 1, 2023
Arbitrary transformations of JSON in Golang

kazaam Description Kazaam was created with the goal of supporting easy and fast transformations of JSON data with Golang. This functionality provides

Dec 18, 2022
Fast JSON serializer for golang.

easyjson Package easyjson provides a fast and easy way to marshal/unmarshal Go structs to/from JSON without the use of reflection. In performance test

Jan 4, 2023
Fastest JSON interperter for golang

Welcome To JIN "Your wish is my command" Fast and Easy Way to Deal With JSON Jin is a comprehensive JSON manipulation tool bundle. All functions teste

Dec 14, 2022
Fast Color JSON Marshaller + Pretty Printer for Golang
Fast Color JSON Marshaller + Pretty Printer for Golang

ColorJSON: The Fast Color JSON Marshaller for Go What is this? This package is based heavily on hokaccha/go-prettyjson but has some noticible differen

Dec 19, 2022
Kazaam was created with the goal of supporting easy and fast transformations of JSON data with Golang

kazaam Description Kazaam was created with the goal of supporting easy and fast transformations of JSON data with Golang. This functionality provides

Sep 17, 2021
Copy of Golang's json library with IsZero feature

json Copy of Golang's json library with IsZero feature from CL13977 Disclaimer It is a package primary used for my own projects, I will keep it up-to-

Oct 9, 2021
Golang JSON decoder supporting case-sensitive, number-preserving, and strict decoding use cases

Golang JSON decoder supporting case-sensitive, number-preserving, and strict decoding use cases

Dec 9, 2022
Benchmark of Golang JSON Libraries

Introduction This is a benchmark for the json packages. You are welcome to open an issue if you find anything wrong or out of date. TL;DR In conclusio

Nov 3, 2022
Senml-go - a Golang module for the JSON-based SenML sensor data format

ThingWave SenML module for Golang This is a Golang module for the JSON-based Sen

Jan 2, 2022