µjson - A fast and minimal JSON parser and transformer that works on unstructured JSON

µjson

GoDoc License Go Report Card Build Status Github code coverage

A fast and minimal JSON parser and transformer that works on unstructured JSON. It works by parsing input and calling the given callback function when encountering each item.

Read more on the introduction article.

Motivation

Sometimes we just want to make some minimal changes to a JSON document, or do some generic transformations without fully unmarshalling it. For example, removing blacklist fields from response JSON. Why spend all the cost on unmarshalling into a map[string]interface{} just to immediate marshal it again.

The following code is taken from StackOverflow:

{
  "responseHeader": {
    "status": 0,
    "QTime": 0,
    "params": {
      "q": "solo",
      "wt": "json"
    }
  },
  "response": {
    "numFound": 2,
    "start": 0,
    "docs": [
      { "name": "foo" },
      { "name": "bar" }
    ]
  }
}

With µjson, we can quickly write a simple transformation to remove "responseHeader" completely from all responses, once and forever:

func removeBlacklistFields(input []byte) []byte {
    blacklistFields := [][]byte{
		[]byte(`"responseHeader"`), // note the quotes
	}
	b := make([]byte, 0, 1024)
	err := ujson.Walk(input, func(_ int, key, value []byte) bool {
		if len(key) != 0 {
			for _, blacklist := range blacklistFields {
				if bytes.Equal(key, blacklist) {
					// remove the key and value from the output
					return false
				}
			}
		}
		// write to output
		if len(b) != 0 && ujson.ShouldAddComma(value, b[len(b)-1]) {
			b = append(b, ',')
		}
		if len(key) > 0 {
			b = append(b, key...)
			b = append(b, ':')
		}
		b = append(b, value...)
		return true
	})
	if err != nil {
		panic(err)
	}
	return b
 }

The original scenario that leads me to write the package is because of int64. When working in Go and PostgreSQL, I use int64 (instead of string) for ids because it’s more effective and has enormous space for randomly generated ids. It’s not as big as UUID, 128 bits, but still big enough for production use. In PostgreSQL, those ids can be stored as bigint and being effectively indexed. But for JavaScript, it can only process integer up to 53 bits (JavaScript has BigInt but that’s a different story, and using it will make things even more complicated).

So we need to wrap those int64s into strings before sending them to JavaScript. In Go and PostgreSQL, the JSON is {"order_id": 12345678} but JavaScript will see it as {"order_id": "12345678"} (note that the value is quoted). In Go, we can define a custom type and implement the json.Marshaler interface. But in PostgreSQL, that’s just not possible or too complicated. I wrote a service that receives JSON from PostgreSQL and converts it to be consumable by JavaScript. The service also removes some blacklisted keys or does some other transformations (for example, change orderId to order_id).

Example use cases:

  1. Walk through unstructured JSON:

  2. Transform unstructured JSON:

without fully unmarshalling it into a map[string]interface{}.

See usage and examples on pkg.go.dev.

Important: Behaviour is undefined on invalid JSON. Use on trusted input only. For untrusted input, you may want to run it through json.Valid() first.

Usage

The single most important function is Walk(input, callback), which parses the input JSON and call callback function for each key/value pair processed.

The callback function is called when an object key/value or an array key is encountered. It receives 3 params in order: level, key and value.

  • level is the indentation level of the JSON, if you format it properly. It starts from 0. It increases after entering an object or array and decreases after leaving.

  • key is the raw key of the current object or empty otherwise. It can be a double-quoted string or empty.

  • value is the raw value of the current item or a bracket. It can be a string, number, boolean, null, or one of the following brackets: { } [ ].

It’s important to note that key and value are provided as raw. Strings are always double-quoted. It’s there for keeping the library fast and ignoring unnecessary operations. For example, when you only want to reformat the output JSON properly; you don’t want to unquote those strings and then immediately quote them again; you just need to output them unmodified. And there are ujson.Unquote() and ujson.AppendQuote() when you need to get the original strings.

For valid JSON, values will never be empty. We can test the first byte of value (value[0]) to get its type:

  • n: Null (null)
  • f, t: Boolean (false, true)
  • 0-9, -: Number
  • ": String, see Unquote()
  • [, ]: Array
  • {, }: Object

When processing arrays and objects, first the open bracket ([, {) will be provided as value, followed by its children, and finally the close bracket (], }). When encountering open brackets, You can make the callback function return false to skip the array/object entirely.

Examples

1. Print all keys and values in order

This example gives a quick idea about how µjson works.

 input := []byte(`{  
     "id": 12345,    
     "name": "foo",  
     "numbers": ["one", "two"],  
     "tags": {"color": "red", "priority": "high"},   
     "active": true  
 }`) 
 ujson.Walk(input, func(level int, key, value []byte) bool { 
     fmt.Printf("%2v% 12s : %s\n", level, key, value)    
     return true 
 })
{
   "id": 12345,
   "name": "foo",
   "numbers": ["one", "two"],
   "tags": {"color": "red", "priority": "high"},
   "active": true
}

Calling Walk() with the above input will produce:

level key value
0 {
1 "id" 12345
1 "name" "foo"
1 "numbers" [
2 "one"
2 "two"
1 ]
1 "tags" {
2 "color" "red"
2 "priority" "high"
1 }
1 "active" true
0 }

0. The simplest examples

To easily get an idea on level, key and value, here are the simplest examples:

 input0 := []byte(`true`)
 ujson.Walk(input0, func(level int, key, value []byte) bool {
     fmt.Printf("level=%v key=%s value=%s\n", level, key, value)
     return true
 })
 // output:
 //   level=0 key= value=true

 input1 := []byte(`{ "key": 42 }`)
 ujson.Walk(input1, func(level int, key, value []byte) bool {
     fmt.Printf("level=%v key=%s value=%s\n", level, key, value)
     return true
 })
 // output:
 //   level=0 key= value={
 //   level=1 key="key" value=42
 //   level=0 key= value=}

 input2 := []byte(`[ true ]`)
 ujson.Walk(input2, func(level int, key, value []byte) bool {
     fmt.Printf("level=%v key=%s value=%s\n", level, key, value)
     return true
 })
 // output:
 //   level=0 key= value=[
 //   level=1 key= value=true
 //   level=0 key= value=]

In the first example, there is only a single boolean value. The callback function is called once with level=0, key is empty and value=true.

In the second example, the callback function is called 3 times. Two times for open and close brackets with level=0, key is empty and value is the bracketed character. The other time for the only key with level=1, key is "key" and value=42. Note that the key is quoted and you need to call ujson.Unquote() to retrieve the unquoted string.

The last example is like the second, but with an array instead. Keys are always empty inside arrays.

2. Reformat input

In this example, the input JSON is formatted with correct indentation. As processing the input key by key, the callback function reconstructs the JSON. It outputs each key/value pair in its own line, prefixed with spaces equal to the param level. There is a catch, though. Valid JSON requires commas between values in objects and arrays. So there is ujson.ShouldAddComma() for checking whether a comma should be inserted.

 input := []byte(`{"id":12345,"name":"foo","numbers":["one","two"],"tags":{"color":"red","priority":"high"},"active":true}`)

 b := make([]byte, 0, 1024)
 err := ujson.Walk(input, func(level int, key, value []byte) bool {
     if len(b) != 0 && ujson.ShouldAddComma(value, b[len(b)-1]) {
         b = append(b, ',')
     }
     b = append(b, '\n')
     for i := 0; i < level; i++ {
         b = append(b, '\t')
     }
     if len(key) > 0 {
         b = append(b, key...)
         b = append(b, `: `...)
     }
     b = append(b, value...)
     return true
 })
 if err != nil {
     panic(err)
 }
 fmt.Printf("%s\n", b)

Output:

{
	"id": 12345,
	"name": "foo",
	"numbers": [
		"one",
		"two"
	],
	"tags": {
		"color": "red",
		"priority": "high"
	},
	"active": true
}

There is a built-in method ujson.Reconstruct() when you want to remove all the whitespaces.

3. Remove blacklisted keys

This example demonstrates removing some keys from the input JSON. The key param is compared with a pre-defined list. If there is a match, the blacklisted key and its value are dropped. The callback function returns false for skipping the entire value (which may be an object or array). Note that the list is quoted, i.e. "numbers" and "active" instead of number and active. For more advanced checking, you may want to run ujson.Unquote() on the key.

 input := []byte(`{
     "id": 12345,
     "name": "foo",
     "numbers": ["one", "two"],
     "tags": {"color": "red", "priority": "high"},
     "active": true
 }`)

 blacklistFields := [][]byte{
     []byte(`"numbers"`), // note the quotes
     []byte(`"active"`),
 }
 b := make([]byte, 0, 1024)
 err := ujson.Walk(input, func(_ int, key, value []byte) bool {
     for _, blacklist := range blacklistFields {
         if bytes.Equal(key, blacklist) {
             // remove the key and value from the output
             return false
         }
     }

     // write to output
     if len(b) != 0 && ujson.ShouldAddComma(value, b[len(b)-1]) {
         b = append(b, ',')
     }
     if len(key) > 0 {
         b = append(b, key...)
         b = append(b, ':')
     }
     b = append(b, value...)
     return true
 })
 if err != nil {
     panic(err)
 }
 fmt.Printf("%s\n", b)

Output:

{"id":12345,"name":"foo","tags":{"color":"red","priority":"high"}}

4. Wrap int64 in string

This is the original motivation behind µjson. The following example finds keys ending with _id" ("order_id", "item_id", etc.) and converts their values from numbers to strings, by simply wrapping them in double-quotes.

For valid JSON, values will never be empty. We can test the first byte of value (value[0]) to get its type:

  • n: Null (null)
  • f, t: Boolean (false, true)
  • 0-9, -: Number
  • ": String, see Unquote()
  • [, ]: Array
  • {, }: Object

In this case, we check value[0] within 09 to see whether it’s a number, then insert double-quotes.

 input := []byte(`{"order_id": 12345678901234, "number": 12, "item_id": 12345678905678, "counting": [1,"2",3]}`)

 suffix := []byte(`_id"`) // note the ending quote "
 b := make([]byte, 0, 256)
 err := ujson.Walk(input, func(_ int, key, value []byte) bool {
     // Test for keys with suffix _id" and value is an int64 number. For valid json,
     // values will never be empty, so we can safely test only the first byte.
     shouldWrap := bytes.HasSuffix(key, suffix) && value[0] > '0' && value[0] <= '9'

     // transform the input, wrap values in double quotes
     if len(b) != 0 && ujson.ShouldAddComma(value, b[len(b)-1]) {
         b = append(b, ',')
     }
     if len(key) > 0 {
         b = append(b, key...)
         b = append(b, ':')
     }
     if shouldWrap {
         b = append(b, '"')
     }
     b = append(b, value...)
     if shouldWrap {
         b = append(b, '"')
     }
     return true
 })
 if err != nil {
     panic(err)
 }
 fmt.Printf("%s\n", b)

Output:

{"order_id":"12345678901234","number":12,"item_id":"12345678905678","counting":[1,"2",3]}

After processing, the numbers in "order_id" and "item_id" are quoted as strings. And JavaScript should be happy now! 🎉 🎉

License

MIT

Similar Resources

APFS parser written in pure Go

[WIP] go-apfs 🚧 APFS parser written in pure Go Originally from this ipsw branch Install go get github.com/blacktop/go-apfs apfs cli Install go instal

Dec 24, 2022

A golang tag key value parser

tag_parser A golang tag key value parser Installation go get github.com/gvassili/tag_parser Example package main import ( "fmt" "github.com/gvass

Nov 24, 2021

A Protobuf parser

A Protobuf parser for Go This package contains a cleanroom Protobuf parser for Go using Participle. This was originally an example within Participle.

Nov 9, 2022

Config File Parser

Config File Parser Speed It was Implemented by binary tree and only suitable for small project. Ignore Any line starting with specific prefix will be

May 20, 2022

Golisp-wtf - A lisp interpreter (still just a parser) implementation in golang. You may yell "What the fuck!?.." when you see the shitty code.

R6RS Scheme Lisp dialect interpreter This is an implementation of a subset of R6RS Scheme Lisp dialect in golang. The work is still in progress. At th

Jan 7, 2022

Ghissue - This repo contains a github issue parser, that is useful for Enterprise Github accounts.

Ghissue - This repo contains a github issue parser, that is useful for Enterprise Github accounts. Sometimes is needed to parse the content of the issue for some data extraction or statistics purposes.

Feb 6, 2022

News-parser-cli - Simple CLI which allows you to receive news depending on the parameters passed to it

News-parser-cli - Simple CLI which allows you to receive news depending on the parameters passed to it

news-parser-cli Simple CLI which allows you to receive news depending on the par

Jan 4, 2022

Golang parser for Intuit Interchange Format (.IIF) files

Intuit Interchange Format (.IIF) Parser Install go get github.com/joshuaslate/iif Usage iiifile, err := os.Open("./transactions.iif") if err != nil {

Jan 15, 2022

minigli is a tiny command argument parser for Go.

minigli is a tiny command argument parser for Go.

Jan 29, 2022
Yaf - Yet another system fetch that is minimal and customizable
Yaf - Yet another system fetch that is minimal and customizable

Yaf - Yet Another Fetch [Support] [Installation] [Usage] Brief Yet Another Fetch

Oct 11, 2022
A minimal CLI tool to enable (or disable) dependabot for all your repositories

Enable Dependabot A minimal CLI tool to enable (or disable) dependabot for all your repositories. Installation Install via Go go get -v github.com/RiR

Feb 10, 2022
CONTRIBUTIONS ONLY: A Go (golang) command line and flag parser

CONTRIBUTIONS ONLY What does this mean? I do not have time to fix issues myself. The only way fixes or new features will be added is by people submitt

Dec 29, 2022
Brigodier is a command parser & dispatcher, designed and developed for command lines such as for Discord bots or Minecraft chat commands. It is a complete port from Mojang's "brigadier" into Go.

brigodier Brigodier is a command parser & dispatcher, designed and developed to provide a simple and flexible command framework. It can be used in man

Dec 15, 2022
YANG parser and compiler to produce Go language objects

Current support for goyang is for the latest 3 Go releases. goyang YANG parser and compiler for Go programs. The yang package (pkg/yang) is used to co

Dec 12, 2022
go command line option parser

go-flags: a go library for parsing command line arguments This library provides similar functionality to the builtin flag library of go, but provides

Jan 4, 2023
Fully featured Go (golang) command line option parser with built-in auto-completion support.

go-getoptions Go option parser inspired on the flexibility of Perl’s GetOpt::Long. Table of Contents Quick overview Examples Simple script Program wit

Dec 14, 2022
Kong is a command-line parser for Go
Kong is a command-line parser for Go

Kong is a command-line parser for Go Introduction Help Help as a user of a Kong application Defining help in Kong Command handling Switch on the comma

Dec 27, 2022
HAProxy configuration parser
HAProxy configuration parser

HAProxy configuration parser autogenerated code if you change types/types.go you need to run go run generate/go-generate.go $(pwd) Contributing For co

Dec 14, 2022
A simple command line time description parser

Zeit Zeit is an extremely simple command line application to read a natural language time description and output it as a timestamp. The main usecase f

Aug 21, 2021