Peg, Parsing Expression Grammar, is an implementation of a Packrat parser generator.

PEG, an Implementation of a Packrat Parsing Expression Grammar in Go

GoDoc Go Report Card Coverage

A Parsing Expression Grammar ( hence peg) is a way to create grammars similar in principle to regular expressions but which allow better code integration. Specifically, peg is an implementation of the Packrat parser generator originally implemented as peg/leg by Ian Piumarta in C. A Packrat parser is a "descent recursive parser" capable of backtracking and negative look-ahead assertions which are problematic for regular expression engines .

See Also

Installing

go get -u github.com/pointlander/peg

Building

Using Pre-Generated Files

go install

Generating Files Yourself

You should only need to do this if you are contributing to the library, or if something gets messed up.

go run build.go or go generate

With tests:

go run build.go test

Usage

peg [<option>]... <file>

Usage of peg:
  -inline
      parse rule inlining
  -noast
      disable AST
  -output string
      specify name of output file
  -print
      directly dump the syntax tree
  -strict
      treat compiler warnings as errors
  -switch
      replace if-else if-else like blocks with switch blocks
  -syntax
      print out the syntax tree
  -version
      print the version and exit
	  

Sample Makefile

This sample Makefile will convert any file ending with .peg into a .go file with the same name. Adjust as needed.

.SUFFIXES: .peg .go

.peg.go:
	peg -noast -switch -inline -strict -output $@ $<

all: grammar.go

Use caution when picking your names to avoid overwriting existing .go files. Since only one PEG grammar is allowed per Go package (currently) the use of the name grammar.peg is suggested as a convention:

grammar.peg
grammar.go

PEG File Syntax

First declare the package name and any import(s) required:

package <package name>

import <import name>

Then declare the parser:

type <parser name> Peg {
	<parser state variables>
}

Next declare the rules. Note that the main rules are described below but are based on the peg/leg rules which provide additional documentation.

The first rule is the entry point into the parser:

<rule name> <- <rule body>

The first rule should probably end with !. to indicate no more input follows.

first <- . !.

This is often set to END to make PEG rules more readable:

END <- !.

. means any character matches. For zero or more character matches, use:

repetition <- .*

For one or more character matches, use:

oneOrMore <- .+

For an optional character match, use:

optional <- .?

If specific characters are to be matched, use single quotes:

specific <- 'a'* 'bc'+ 'de'?

This will match the string "aaabcbcde".

For choosing between different inputs, use alternates:

prioritized <- 'a' 'a'* / 'bc'+ / 'de'?

This will match "aaaa" or "bcbc" or "de" or "". The matches are attempted in order.

If the characters are case insensitive, use double quotes:

insensitive <- "abc"

This will match "abc" or "Abc" or "ABc" and so on.

For matching a set of characters, use a character class:

class <- [a-z]

This will match "a" or "b" or all the way to "z".

For an inverse character class, start with a caret:

inverse <- [^a-z]

This will match anything but "a" or "b" or all the way to "z".

If the character class is case insensitive, use double brackets:

insensitive <- [[A-Z]]

(Note that this is not available in regular expression syntax.)

Use parentheses for grouping:

grouping <- (rule1 / rule2) rule3

For looking ahead a match (predicate), use:

lookAhead <- &rule1 rule2

For inverse look ahead, use:

inverse <- !rule1 rule2

Use curly braces for Go code:

gocode <- { fmt.Println("hello world") }

For string captures, use less than and greater than:

capture <- <'capture'> { fmt.Println(text) }

Will print out "capture". The captured string is stored in buffer[begin:end].

Testing Complex Grammars

Testing a grammar usually requires more than the average unit testing with multiple inputs and outputs. Grammars are also usually not for just one language implementation. Consider maintaining a list of inputs with expected outputs in a structured file format such as JSON or YAML and parsing it for testing or using one of the available options for Go such as Rob Muhlestein's tinout package.

Files

  • bootstrap/main.go - bootstrap syntax tree of peg
  • tree/peg.go - syntax tree and code generator
  • peg.peg - peg in its own language

Author

Andrew Snodgrass

Projects That Use peg

Here are some projects that use peg to provide further examples of PEG grammars:

Owner
Comments
  • C grammar fails with trivial but legal C snippet

    C grammar fails with trivial but legal C snippet

    int main() { (a)||1; }

    In general, the two-place operator symbols fail to parse in this configuration (||, &&, ->, >>, <<) as well as the two-place postfix operators ("(a)--", "(a)++"). However I did notice that ">" and "<" also fail.

    The key here is the LPAR and RPAR wrapping the expression. This seems to trigger it.

    I'm rather losing my mind over the bug. Any help very much appreciated!

  • Parser very slow

    Parser very slow

    The parser allocates an enormous amount of memory

    var tree tokenTree = &tokens32{tree: make([]token32, math.MaxInt16)}
    

    And then uses own vector doubling scheme

    func (t *tokens32) Expand(index int) tokenTree {
            tree := t.tree
            if index >= len(tree) {
                    expanded := make([]token32, 2*len(tree))
                    copy(expanded, tree) 
                    t.tree = expanded
            }
            return nil
    }
    

    Both of these causes the parser to be very slow because it generates an big amount of garbage. Should probably be optimized.

  • Parser completely breaks without warning if you have more than 65536 tokens

    Parser completely breaks without warning if you have more than 65536 tokens

    I am parsing a medium sized file (60 kb) and the parser breaks if you have more than 16 bits worth of tokens parsed. The AST tree will be completely wrong and it will be missing tokens because it can't have more than 16 bits worth.

    I made it work correctly by manually editing the generated file and changed int16 to int32. However it also looks like you preallocated slices like this make([]int32, 1, Math.MaxInt16) which cannot simply be changed to 0 and cannot be changed to MaxInt32 because no one has that much memory. So I changed it to 18 bits, but this obviously will not work for files with more tokens.

    I don't feel comfortable submitting a patch for this myself because it looks like this is used in quite a few places and will probably require a significant change to remove these static limits.

  • Bug: an error occurs when importing a repository that contains numeric character

    Bug: an error occurs when importing a repository that contains numeric character

    The following peg file causes an error:

    parse error near PegText (line 3 symbol 9 - line 3 symbol 25): "github.com/hachi"

    package grammar
    
    import "github.com/hachi8833/sample/token"
    
    type Parser Peg {
      token.Program
    }
    
    Program <-
        expression EOF
        / expression <.+> {p.Err(begin, buffer)} EOF
        / <.+> {p.Err(begin, buffer)} EOF
    
    expression <-
        additive
    
    additive <-
        multitive (
            '+' multitive {p.PushOpe("+")}
          / '-' multitive {p.PushOpe("-")}
        )*
    
    multitive <-
        value (
            '*' value {p.PushOpe("*")}
          / '/' value {p.PushOpe("/")}
        )*
    
    value <-
        <[0-9]+> {p.PushDigit(text)}
        / '(' expression ')'
    
    EOF                 <- !.
    

    Just changing the repository name to like import "github.com/hachi/sample/token" works fine. (My GitHub user name contains some numeric characters😅)

  • using peg with io.RuneReader

    using peg with io.RuneReader

    hi, could you explain how to use peg generated code against text coming from a stream? if not supported, could you outline how you would want this implemented?

  • Parser hangs forever

    Parser hangs forever

    grammar.peg

    package main
    
    import "os/exec"
    
    type Prog Peg {
         Cmd *exec.Cmd
         In io.Writer
    }
    
    Command <- <(!nl)*> nl eof  
    
    eof <- !. {logln("end of file");p.In.Write([]byte{'\x03'})}
    
    nl <- "\n"
    

    main.go:

    // patmatch is a tool for pattern matching in bash scripts
    package main
    
    import (
    	"os"
    	"os/exec"
    	"log"
    )
    
    const Shell = "/bin/bash"
    
    //go:generate peg grammar.peg
    
    func main() {
    	src := `echo test
    `
    	p := Prog{Cmd:&exec.Cmd{
    		Path:Shell,
    		Stdout:os.Stdout,
    	},Buffer:src}
    	var err error
    	p.In, err = p.Cmd.StdinPipe()
    	fatal(err)
    	logln("starting shell")
    	err = p.Cmd.Start()	
    	fatal(err)
    	logln("initalizing parser")
    	p.Init()
    	fatal(err)
    	logln("parsing")
    	err = p.Parse()
    	fatal(err)
    	logln("executing")
    	p.Execute()
    	logln("done")
    }
    
    func fatal(err error) {
    	if err != nil {panic(err)}
    } 
    
    func logln(args ...interface{}) {
    	log.Println(args...)
    }
    

    peg -version: version: unknown-5cdb3adc061370cdd20392ffe2740cc8db104126

  • bootstrap with smaller grammar

    bootstrap with smaller grammar

    The language needed to define peg in peg is a subset of the language defined by peg.peg. To minimize the hardcoded grammar in the bootstrap phase, bootstrap with a smaller grammar, then build the full peg language.

  • Parse and Reset as struct methods instead of struct fields

    Parse and Reset as struct methods instead of struct fields

    Parse and Reset are now defined as fields in the generated parser struct, not methods on the struct. One consequence of this choice is that we cannot define an interface on the generated struct because interface cannot contain fields. This comes up when one wants to abstracts over multiple parsers.

    My current workaround is to define wrappers for each generated struct, say:

    func (p *Listing) ParseIntf() error {
    	return p.Parse()
    }
    

    and define the interface as (simplified example):

    type Listing interface {
    	Init()
    	ParseIntf() error
    	Execute()
    }
    

    I wish you would consider lifting Parse and Reset to methods. Thanks!

  • Parse Tree

    Parse Tree

    Hi We are trying to write a parser for a grammar that needs more lookahead than what LR(1) in http://code.google.com/p/gocc/ provides. I am pretty sure we will be able to Pegify the grammar, but I was wondering if we are able to access the parse tree in some way or is it easy to build an AST in a bottom up way in this PEG implementation?

    We have really easy to use SDT rules in gocc to build up an AST in a bottom up way.

    I have not used PEG before, but I have read an article and I am really amped :)

    Please help, Thank you Walter Schulze

  • [bug] undefined: RulePegText

    [bug] undefined: RulePegText

    ./m2.peg.go:630: undefined: RulePegText what is RulePegText ? how to define it?

    ./m2.peg

    package main
    
    type JsonParser Peg{
      Json
    }
    json <- may_space (json_object / json_array / json_string / json_number / json_true / json_false / json_null) may_space
    json_object <- '{' may_space '}' / '{' (json_object_pair ',')* json_object_pair  '}'
    json_object_pair <- may_space json_string may_space ':' json
    json_array <- '[' may_space ']' / '[' (json ',')* json ']'
    json_true <- 'true' { p.addJson(buffer[begin:end]) }
    json_false <- 'false'
    json_null <- 'null'
    json_string <- '"' json_double_char* '"'
    json_double_char <- [^"\\] / '\\' ["\\/bfnrt] / '\\u' json_hex_char json_hex_char json_hex_char json_hex_char
    json_hex_char <- [0-9a-fA-F]
    json_number <- '-'? ('0' / [1-9][0-9]*) ('.' [0-9]+)? ([eE][+-]?[0-9]+)? may_space
    
    space_char <- [ \n\r\t]
    #space <- space_char+
    may_space <- space_char*
    
    

    ./main.go

    package main
    
    import (
      "fmt"
      "io/ioutil"
      "launchpad.net/goyaml"
    )
    type Json struct{
    }
    func (j *Json) addJson(json string){
      fmt.Println(json)
    }
    type JsonTest map[string]string
    func main(){
      test_yaml_string,err:= ioutil.ReadFile("json_test.yml")
      if err!=nil{
        fmt.Println(err)
        return
      }
      json_test_data:=make(JsonTest)
      err = goyaml.Unmarshal(test_yaml_string,json_test_data)
      if err!=nil{
        fmt.Println(err)
        return
      }
      for test_name,test_data:=range json_test_data{
        parser:= &JsonParser{Buffer:test_data}
        parser.Init()
        err := parser.Parse()
        if err!=nil{
          fmt.Println("FAIL "+test_name+" ",err)
        }else{
          fmt.Println("PASS "+test_name)
        }
      }
      fmt.Println("success")
    }
    
  • case insensitive grammars

    case insensitive grammars

    Hi,

    I'm quite interested in using this, but I've found a fairly major sticking point. There doesn't appear to be any easy way to parse case insensitive grammars, at least that I can see. Given how prevalent case insensitive language grammars are, it'd be nice if peg supported an easier way to parse them.

    I've done some searching, and it appears that this is a common problem with things based peg. I see some discussion on the pegjs project to use a "characters"i syntax to denote case insensitive character chunks:

    https://github.com/dmajda/pegjs/issues/34

    I'm not sure if you like that syntax or not, but something similar to ease case insensitive grammars would be super useful.

  • Unpredictable

    Unpredictable "Code generated by" comment when invoking peg with "go run"

    Since go run has been made module aware, it is convenient to use go run with //go:generate directives, so that your project is able to trivially use a fixed version of its external dependencies.

    I want to write my go:generate directive like this:

    $ git grep go:generate
    peg.go://go:generate go run github.com/pointlander/peg -inline -switch query.peg
    

    But go run builds the target binary in a temporary directory, and main.go passes the entirety of os.Args to the template, such that os.Args[0] contains the full path to the built peg binary in a random temporary directory: https://github.com/pointlander/peg/blob/e7588a89197f28bc2191a42a0562d77b257e20fe/main.go#L87

    This results in a diff every time go generate has been run:

    $ go generate && git grep 'Code generated by'
    query.peg.go:// Code generated by /var/folders/_m/h25_32y958gbgk67m97141400000gq/T/go-build1253021897/b001/exe/peg -inline -switch query.peg DO NOT EDIT.
    
    $ go generate && git grep 'Code generated by'
    query.peg.go:// Code generated by /var/folders/_m/h25_32y958gbgk67m97141400000gq/T/go-build3041684327/b001/exe/peg -inline -switch query.peg DO NOT EDIT.
    

    (The go-build portion of the directory is different on each invocation above.)

    It would be nice if os.Args[0] was just set to peg by default, but if it is important to maintain backwards compatibility, you could add a new flag to peg. I would lean towards something like -fixedname to mean "just set os.Args[0] to peg regardless of its actual value". Another option would be something like -arg0name=peg, but I doubt anyone would need it customized to anything than some arbitrarily fixed name, hence my preference to a simple boolean flag.

    In the meantime, I can work around this by changing my //go:generate to just build the binary into a fixed directory:

    $  git grep go:generate
    peg.go://go:generate go build -o ./.bin/peg github.com/pointlander/peg
    peg.go://go:generate ./.bin/peg -inline -switch query.peg
    

    Then, the comment does not change on subsequent go generate calls.

    $ go generate && git grep 'Code generated by'
    query.peg.go:// Code generated by ./.bin/peg -inline -switch query.peg DO NOT EDIT.
    
    $ go generate && git grep 'Code generated by'
    query.peg.go:// Code generated by ./.bin/peg -inline -switch query.peg DO NOT EDIT.
    
  • tree: avoid using strings.Builder

    tree: avoid using strings.Builder

    strings.Builder was introduced in Go 1.10. Since all other code generated by peg is compatible with Go versions older than that, it would be nice not to require Go 1.10 just for writing the AST to a string.

  • Rule redeclaration causes segmentation fault

    Rule redeclaration causes segmentation fault

    Expected behavior

    In case of rule redeclaration the generator should return an error:

    package main
    
    type parser Peg {
    }
    
    main <- (a)+
    a <- 'a'
    a <- 'a'
    

    Actual behavior

    The invalid grammar crashes the generator due to a segmentation fault:

    romanscharkov@RomMac pegbug % peg grammar.peg
    panic: runtime error: invalid memory address or nil pointer dereference
    [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1164e55]
    
    goroutine 1 [running]:
    github.com/pointlander/peg/tree.(*Tree).Compile(0xc000074000, 0xc00001a0e0, 0xe, 0xc00000c060, 0x2, 0x2, 0x1200e60, 0xc00000e030, 0x0, 0x0)
            /Users/romanscharkov/go/src/github.com/pointlander/peg/tree/peg.go:1506 +0x1475
    main.main()
            /Users/romanscharkov/go/src/github.com/pointlander/peg/main.go:87 +0x575
    
  • README.md needs updating

    README.md needs updating

    • [ ] update usage to include new cli flags
    • [ ] acknowledge the fact that converted files are now .peg.go instead of .go
    • [ ] acknowledge go generate support, although people should be able to just run go install
Related tags
🔎Sniffing and parsing mysql,redis,http,mongodb etc protocol. 抓包截取项目中的数据库请求并解析成相应的语句。
🔎Sniffing and parsing mysql,redis,http,mongodb etc protocol. 抓包截取项目中的数据库请求并解析成相应的语句。

go-sniffer Capture mysql,redis,http,mongodb etc protocol... 抓包截取项目中的数据库请求并解析成相应的语句,如mysql协议会解析为sql语句,便于调试。 不要修改代码,直接嗅探项目中的数据请求。 中文使用说明 Support List: m

Dec 27, 2022
IPIP.net officially supported IP database ipdb format parsing library

IPIP.net officially supported IP database ipdb format parsing library

Dec 27, 2022
Kick dropper is a very simple and leightweight demonstration of SQL querying, and injection by parsing URl's

__ __ __ __ _____ ______ | |/ |__|.----.| |--.______| \.----.| |.-----.-----.-----.----.

Feb 6, 2022
grobotstxt is a native Go port of Google's robots.txt parser and matcher library.

grobotstxt grobotstxt is a native Go port of Google's robots.txt parser and matcher C++ library. Direct function-for-function conversion/port Preserve

Dec 27, 2022
Go http real ip header parser

remoteaddr Go http real ip header parser module A forwarders such as a reverse proxy or Cloudflare find the real IP address from the requests made to

Nov 18, 2022
Torrent-metainfo-parser - Generates a .torrent meta info from a file

torrent-metainfo-parser generates a .torrent meta info from a file required argu

Aug 23, 2022
A protoc-gen-go wrapper including an RPC stub generator

// Copyright 2013 Google. All rights reserved. // // Use of this source code is governed by a BSD-style // license that can be found in the LICENSE fi

Nov 17, 2022
protoc-gen-grpc-gateway-ts is a Typescript client generator for the grpc-gateway project. It generates idiomatic Typescript clients that connect the web frontend and golang backend fronted by grpc-gateway.

protoc-gen-grpc-gateway-ts protoc-gen-grpc-gateway-ts is a Typescript client generator for the grpc-gateway project. It generates idiomatic Typescript

Dec 19, 2022
Simple Web based configuration generator for WireGuard. Demo:
Simple Web based configuration generator for WireGuard. Demo:

Wg Gen Web Simple Web based configuration generator for WireGuard. Why another one ? All WireGuard UI implementations are trying to manage the service

Jan 1, 2023
An SSH key pair generator
An SSH key pair generator

Keygen An SSH key pair generator. Supports generating RSA and Ed25519 keys. Example k, err := NewWithWrite(".ssh", "my_awesome_key", []byte(""), key.E

Dec 24, 2022
Highly experimental generator for Dragonfly

gen Highly experimental generator for Dragonfly Please note that this project is not currently actively being worked on. It may not be stable and it m

Nov 7, 2021
Shrek is a vanity .onion address generator written in Go.
Shrek is a vanity .onion address generator written in Go.

Shrek Shrek is a vanity .onion address generator written in Go. Usage (CLI) Shrek compiles to a single binary that can be used on the CLI. To build an

Aug 16, 2022
Temporal Activity Protobuf Generator Proof of Concept

Temporal Activity Protobuf Generator Proof of Concept This is a protoc plugin for generating easy to use code for calling and implementing activities.

Oct 5, 2022
Kubernetes Custom Resource API Reference Docs generator

Kubernetes Custom Resource API Reference Docs generator If you have a project that is Custom Resource Definitions and wanted to generate API Reference

Dec 7, 2021
Vanitytorgen - Vanity Tor keys/onion addresses generator

Vanity Tor keys/onion addresses generator Assumptions You know what you are doing. You know where to copy the output files. You know how to set up a H

May 12, 2022
A Twirp RPC OpenAPI generator implemented as `protoc` plugin

twirp-openapi-gen A Twirp RPC OpenAPI generator implemented as protoc plugin Currently supports only OpenAPI 2.0 Usage Installing the generator for pr

May 26, 2022
Protoc-gen-apidocs: A simple and customizable protoc generator that translates

protoc-gen-apidocs protoc-gen-apidocs is a very simple and customizable protoc g

Sep 12, 2022
A go implementation of the STUN client (RFC 3489 and RFC 5389)

go-stun go-stun is a STUN (RFC 3489, 5389) client implementation in golang (a.k.a. UDP hole punching). RFC 3489: STUN - Simple Traversal of User Datag

Jan 5, 2023
A QUIC implementation in pure go
A QUIC implementation in pure go

A QUIC implementation in pure Go quic-go is an implementation of the QUIC protocol in Go. It implements the IETF QUIC draft-29 and draft-32. Version c

Jan 9, 2023