Command pigeon generates parsers in Go from a PEG grammar.

pigeon - a PEG parser generator for Go

GoDoc build status GoReportCard Software License

The pigeon command generates parsers based on a parsing expression grammar (PEG). Its grammar and syntax is inspired by the PEG.js project, while the implementation is loosely based on the parsing expression grammar for C# 3.0 article. It parses Unicode text encoded in UTF-8.

See the godoc page for detailed usage. Also have a look at the Pigeon Wiki for additional information about Pigeon and PEG in general.

Releases

  • v1.0.0 is the tagged release of the original implementation.
  • Work has started on v2.0.0 with some planned breaking changes.

Github user @mna created the package in April 2015, and @breml is the package's maintainer as of May 2017.

Breaking Changes since v1.0.0

  • Removed support for Go < v1.11 to support go modules for dependency tracking.

  • Removed support for Go < v1.9 due to the requirement golang.org/x/tools/imports, which was updated to reflect changes in recent versions of Go. This is in compliance with the Go Release Policy respectively the Go Release Maintenance, which states support for each major release until there are two newer major releases.

Installation

Provided you have Go correctly installed with the $GOPATH and $GOBIN environment variables set, run:

$ go get -u github.com/mna/pigeon

This will install or update the package, and the pigeon command will be installed in your $GOBIN directory. Neither this package nor the parsers generated by this command require any third-party dependency, unless such a dependency is used in the code blocks of the grammar.

Basic usage

$ pigeon [options] [PEG_GRAMMAR_FILE]

By default, the input grammar is read from stdin and the generated code is printed to stdout. You may save it in a file using the -o flag.

Example

Given the following grammar:

{
// part of the initializer code block omitted for brevity

var ops = map[string]func(int, int) int {
    "+": func(l, r int) int {
        return l + r
    },
    "-": func(l, r int) int {
        return l - r
    },
    "*": func(l, r int) int {
        return l * r
    },
    "/": func(l, r int) int {
        return l / r
    },
}

func toIfaceSlice(v interface{}) []interface{} {
    if v == nil {
        return nil
    }
    return v.([]interface{})
}

func eval(first, rest interface{}) int {
    l := first.(int)
    restSl := toIfaceSlice(rest)
    for _, v := range restSl {
        restExpr := toIfaceSlice(v)
        r := restExpr[3].(int)
        op := restExpr[1].(string)
        l = ops[op](l, r)
    }
    return l
}
}


Input <- expr:Expr EOF {
    return expr, nil
}

Expr <- _ first:Term rest:( _ AddOp _ Term )* _ {
    return eval(first, rest), nil
}

Term <- first:Factor rest:( _ MulOp _ Factor )* {
    return eval(first, rest), nil
}

Factor <- '(' expr:Expr ')' {
    return expr, nil
} / integer:Integer {
    return integer, nil
}

AddOp <- ( '+' / '-' ) {
    return string(c.text), nil
}

MulOp <- ( '*' / '/' ) {
    return string(c.text), nil
}

Integer <- '-'? [0-9]+ {
    return strconv.Atoi(string(c.text))
}

_ "whitespace" <- [ \n\t\r]*

EOF <- !.

The generated parser can parse simple arithmetic operations, e.g.:

18 + 3 - 27 * (-18 / -3)

=> -141

More examples can be found in the examples/ subdirectory.

See the godoc page for detailed usage.

Contributing

See the CONTRIBUTING.md file.

License

The BSD 3-Clause license. See the LICENSE file.

Owner
Comments
  • errors during compilation of grammar.go are reported as grammar.go rather than grammar.peg

    errors during compilation of grammar.go are reported as grammar.go rather than grammar.peg

    This is a nuisance when used, eg, with emacs compilation mode.

    lex/yacc/byacc/bison/et all use line number preprocessor lines in the generated output to refer back to the original source file even when it's the C compiler that is reporting the error. I don't know how to do that in go but I presume that it must be possible. If not, it should be.

  • Unable to generate optimized grammar

    Unable to generate optimized grammar

    Hello,

    First, thanks a lot for maintaining this project, it's a great library! I'm currently using it in https://github.com/bytesparadise/libasciidoc and it's working really well 🙌

    However, since https://github.com/mna/pigeon/commit/9fec3898cef80afe60fbe5df398fceca513566b8 was merged, I've been getting the following error when running the command below in the project's root:

    $ pigeon -optimize-grammar -alternate-entrypoints PreparsedDocument,InlineElementsWithoutSubtitution,VerbatimBlock -o ./pkg/parser/asciidoc_parser.go  ./pkg/parser/asciidoc-grammar.peg
    panic: runtime error: invalid memory address or nil pointer dereference [recovered]
            panic: runtime error: invalid memory address or nil pointer dereference
    [signal SIGSEGV: segmentation violation code=0x1 addr=0x40 pc=0x1284d49]
    
    goroutine 1 [running]:
    main.main.func1(0x13d3500, 0xc000072020, 0xc0005fde50)
            /Users/xcoulon/code/go/src/github.com/mna/pigeon/main.go:87 +0x13d
    panic(0x12f2440, 0x15f2fd0)
            /usr/local/Cellar/go/1.12.1/libexec/src/runtime/panic.go:522 +0x1b5
    github.com/mna/pigeon/ast.(*grammarOptimizer).optimizeRule(0xc0004bfec0, 0x13d0d80, 0xc0003e2300, 0xc00045d080, 0xc0003e4d20)
            /Users/xcoulon/code/go/src/github.com/mna/pigeon/ast/ast_optimize.go:243 +0x309
    github.com/mna/pigeon/ast.(*grammarOptimizer).optimize(0xc0004bfec0, 0x13d0d60, 0xc0003e4f00, 0x13d0e40, 0xc0004bfec0)
            /Users/xcoulon/code/go/src/github.com/mna/pigeon/ast/ast_optimize.go:182 +0x27ec
    github.com/mna/pigeon/ast.(*grammarOptimizer).Visit(0xc0004bfec0, 0x13d0d60, 0xc0003e4f00, 0xc0003e4d20, 0xc0004bfec0)
            /Users/xcoulon/code/go/src/github.com/mna/pigeon/ast/ast_optimize.go:39 +0x3e
    github.com/mna/pigeon/ast.Walk(0x13d0e40, 0xc0004bfec0, 0x13d0d60, 0xc0003e4f00)
            /Users/xcoulon/code/go/src/github.com/mna/pigeon/ast/ast_walk.go:20 +0x55
    github.com/mna/pigeon/ast.Walk(0x13d0e40, 0xc0004bfec0, 0x13d0c80, 0xc0004dba90)
            /Users/xcoulon/code/go/src/github.com/mna/pigeon/ast/ast_walk.go:41 +0x43b
    github.com/mna/pigeon/ast.Optimize(0xc0004dba90, 0xc0001044e0, 0x3, 0x3)
            /Users/xcoulon/code/go/src/github.com/mna/pigeon/ast/ast_optimize.go:464 +0x14e
    main.main()
            /Users/xcoulon/code/go/src/github.com/mna/pigeon/main.go:120 +0x106f
    

    The grammar in my project is already quite big: https://github.com/bytesparadise/libasciidoc/blob/master/pkg/parser/asciidoc-grammar.peg and unfortunalely the stack trace does not give much information about the rule(s) that cause the error, so I can't really narrow down the grammar to a simpler form :/

    Note: building and running pigeon with the previous commit works like a charm.

  • Should not allow left recursion grammar to pass conversion

    Should not allow left recursion grammar to pass conversion

    eg:

    {
    //------ start
    package main
    
    func main() {
        if len(os.Args) != 2 {
            log.Fatal("Usage: calculator 'EXPR'")
        }
        got, err := ParseReader("", strings.NewReader(os.Args[1]))
        if err != nil {
            log.Fatal(err)
        }
        fmt.Printf("%#v\n", got)
    }
    
    // ------ end
    }
    
    Input <- expr:Expr EOF {
        return expr, nil
    }
    
    Expr <- _ Expr _ LogicOp _ Expr _/ _ Value _
    
    LogicOp <- ("and" / "or") {
        return string(c.text), nil
    }
    
    Value <- [0-9]+ {
        return string(c.text),nil
    }
    
    _ "whitespace" <- [ \n\t\r]*
    
    EOF <- !.
    
    

    go run main.go "1 and 1"

    Will cause dead loop

  • [Feature Request] Operator Precedence Climbing

    [Feature Request] Operator Precedence Climbing

    {
    //------ start
    package main
    
    type CompExpr struct {
        left string
        op string
        right string
    }
    
    type LogicExpr struct {
        left interface{}
        op string
        right interface{}
    }
    
    func main() {
        if len(os.Args) != 2 {
            log.Fatal("Usage: calculator 'EXPR'")
        }
        got, err := ParseReader("", strings.NewReader(os.Args[1]))
        if err != nil {
            log.Fatal(err)
        }
        fmt.Printf("%#v\n", got)
    }
    
    // ------ end
    }
    
    Input <- expr:Expr EOF {
        return expr, nil
    }
    
    Expr <- LogicExpr / Atom
    
    LogicExpr <- _ atom:Atom _ op: LogicOp _ expr: Expr _ {
        return  LogicExpr {left : atom, op : op.(string), right: expr}, nil
    }
    
    Atom <- '(' expr:Expr ')' {
        return expr, nil
    } / _ field: Ident _ op:BinOp _ value:Value _{
        return CompExpr{left : field.(string), op: op.(string), right : value.(string)}, nil
    }
    
    LogicOp <- ("and" / "or"){
        return string(c.text), nil
    }
    
    BinOp <- ("!=" / ">=" / "<=" / "=" / "<>" / ">" / "<") {
        return string(c.text),nil
    }
    
    Ident <- [a-zA-Z][a-zA-Z0-9]* {
        return string(c.text),nil
    }
    
    Value <- [0-9]+ {
        return string(c.text),nil
    }
    
    _ "whitespace" <- [ \n\t\r]*
    
    EOF <- !.
    
    

    The right recursion grammar will auto-generate right association

    If pigeon can implement precedence climbing or something else, it will be very convenient ~

  • Certain inputs take an extremely long time to parse

    Certain inputs take an extremely long time to parse

    Hello!

    First of all, thank you very much for maintaining this project!

    I'm hoping that someone can provide a bit of guidance. I apologize in advance for not having a minimal test case to reproduce this issue.

    The issue

    I've been doing some fuzz testing on OPA and I ran into one case where certain inputs would cause the program to hang and then crash. Here's a snippet of the crash:

    program hanged (timeout 10 seconds)
    
    SIGABRT: abort
    PC=0x45766f m=0 sigcode=0
    
    goroutine 1 [running]:
    runtime.aeshashbody()
            /tmp/go-fuzz-build473666132/goroot/src/runtime/asm_amd64.s:917 +0x5f fp=0xc0041a78f8 sp=0xc0041a78f0 pc=0x45766f
    runtime.mapassign_faststr(0x764fc0, 0xc0041a7a20, 0xc0005d1fb0, 0x3, 0xc0118f8d68)
            /tmp/go-fuzz-build473666132/goroot/src/runtime/map_faststr.go:202 +0x62 fp=0xc0041a7960 sp=0xc0041a78f8 pc=0x4135d2
    github.com/open-policy-agent/opa/ast.(*parser).parse(0xc00049a180, 0xa53e00, 0x0, 0x0, 0x0, 0x0)
            /tmp/go-fuzz-build473666132/gopath/src/github.com/open-policy-agent/opa/ast/parser.go:4362 +0x272 fp=0xc0041a7b50 sp=0xc0041a7960 pc=0x6dc782
    github.com/open-policy-agent/opa/ast.Parse(0x0, 0x0, 0xc0001c95e8, 0x8, 0x8, 0xc000527cb8, 0x2, 0x2, 0xc0002f4460, 0x0, ...)
            /tmp/go-fuzz-build473666132/gopath/src/github.com/open-policy-agent/opa/ast/parser.go:3784 +0x98 fp=0xc0041a7ba8 sp=0xc0041a7b50 pc=0x6da558
    github.com/open-policy-agent/opa/ast.ParseStatements(0x0, 0x0, 0xc0001c95e0, 0x8, 0xc0001c95e0, 0x8, 0x200000003, 0xc000000300, 0xc000022000, 0xc000527df8, ...)
            /tmp/go-fuzz-build473666132/gopath/src/github.com/open-policy-agent/opa/ast/parser_ext.go:468 +0x173 fp=0xc0041a7d50 sp=0xc0041a7ba8 pc=0x6e62b3
    github.com/open-policy-agent/fuzz-opa.Fuzz(0x7f734c798000, 0x8, 0x200000, 0x3)
    

    The crash above occurs here: https://github.com/open-policy-agent/opa/blob/master/ast/parser.go#L4362

    I modified the code to print the size of the maxFailExpected slice and found that it grew to very large sizes in pathological cases. For example the input {{{{{{{{ takes 3.5s to parse (error) and the slice holds around 3,000,000 elements.

    Expected behaviour

    It's not clear whether much can be done about this. In the case of OPA, we don't display the expected values (because we found them too noisy to be helpful) so disabling the code that generates them is an option, however, I'm not sure that would resolve the problem because valid inputs with a similar structure also take a very long time to parse (e.g., {{{{{{{{}}}}}}}} takes ~1.5s before succeeding.)

    The PEG file is here: https://github.com/open-policy-agent/opa/blob/master/ast/rego.peg

    The vendored version is bb0192cfc2ae6ff30b9726618594b42ef2562da5.

    Any suggestions would be appreciated.

This command line converts thuderbird's exported RSS .eml file to .html file

thunderbird-rss-html This command line tool converts .html to .epub with images fetching. Install > go get github.com/gonejack/thunderbird-rss-html Us

Dec 15, 2021
Peg, Parsing Expression Grammar, is an implementation of a Packrat parser generator.

PEG, an Implementation of a Packrat Parsing Expression Grammar in Go A Parsing Expression Grammar ( hence peg) is a way to create grammars similar in

Dec 31, 2022
Tiny binary serializer and deserializer to create on demand parsers and compilers

Parco Hobbyist binary compiler and parser built with as less reflection as possible, highly extensible and with zero dependencies. There are plenty pa

Nov 9, 2022
Repository for the Bott the Pigeon Discord bot.

Bott The Pigeon Monorepo for the Discord Bot "Bott The Pigeon" (Or Scott the Pigeon). It is written entirely in Golang, using the Discord API, and is

Dec 22, 2022
Query and Provision Cloud Infrastructure using an extensible SQL based grammar
Query and Provision Cloud Infrastructure using an extensible SQL based grammar

Deploy, Manage and Query Cloud Infrastructure using SQL [Documentation] [Developer Guide] Cloud infrastructure coding using SQL InfraQL allows you to

Oct 25, 2022
A data parser lib for Go with pythonic grammar sugar and as concern as possible for high performance

mapinterface - A data parser lib for Go with pythonic grammar sugar and as concern as possible for high performance mapinterface 旨在消灭对map/list解析而产生的层层

Nov 10, 2021
linenoise-classic is a command-line tool that generates strings of random characters that can be used as reasonably secure passwords.

linenoise-classic is a command-line tool that generates strings of random characters that can be used as reasonably secure passwords.

Aug 21, 2022
:zap: boilerplate template manager that generates files or directories from template repositories
:zap: boilerplate template manager that generates files or directories from template repositories

Boilr Are you doing the same steps over and over again every time you start a new programming project? Boilr is here to help you create projects from

Jan 6, 2023
Reads from existing Cloud Providers (reverse Terraform) and generates your infrastructure as code on Terraform configuration
Reads from existing Cloud Providers (reverse Terraform) and generates your infrastructure as code on Terraform configuration

TerraCognita Imports your current Cloud infrastructure to an Infrastructure As Code Terraform configuration (HCL) or/and to a Terraform State. At Cycl

Dec 30, 2022
Golang package that generates clean, responsive HTML e-mails for sending transactional mail
Golang package that generates clean, responsive HTML e-mails for sending transactional mail

Hermes Hermes is the Go port of the great mailgen engine for Node.js. Check their work, it's awesome! It's a package that generates clean, responsive

Dec 28, 2022
Takes an input http.FileSystem (likely at go generate time) and generates Go code that statically implements it.

vfsgen Package vfsgen takes an http.FileSystem (likely at go generate time) and generates Go code that statically implements the provided http.FileSys

Dec 18, 2022
Generates go code to embed resource files into your library or executable

Deprecating Notice go is now going to officially support embedding files. The go command will support //go:embed tags. Go Embed Generates go code to e

Jun 2, 2021
Takes an input http.FileSystem (likely at go generate time) and generates Go code that statically implements it.

vfsgen Package vfsgen takes an http.FileSystem (likely at go generate time) and generates Go code that statically implements the provided http.FileSys

Dec 18, 2022
PiHex Library, written in Go, generates a hexadecimal number sequence in the number Pi in the range from 0 to 10,000,000.

PiHex PiHex Library generates a hexadecimal number sequence in the number Pi in the range from 0 to 1.0e10000000. To calculate using "Bailey-Borwein-P

Nov 18, 2022
RTS: request to struct. Generates Go structs from JSON server responses.

RTS: Request to Struct Generate Go structs definitions from JSON server responses. RTS defines type names using the specified lines in the route file

Dec 7, 2022
dfg - Generates dockerfiles based on various input channels.

dfg - Dockerfile Generator dfg is both a go library and an executable that produces valid Dockerfiles using various input channels. Table of Contents

Dec 23, 2022
Generates data structure definitions from JSON files for any kind of programming language

Overview Archivist generates data structure definitions from JSON files for any kind of programming language. It also provides a library for golang to

Jun 28, 2022
A CLI tool that generates OpenTelemetry Collector binaries based on a manifest.

OpenTelemetry Collector builder This program generates a custom OpenTelemetry Collector binary based on a given configuration. TL;DR $ go get github.c

Sep 14, 2022
Generates Golang client and server based on OpenAPI2 (swagger) definitions
Generates Golang client and server based on OpenAPI2 (swagger) definitions

ExperienceOne Golang APIKit ExperienceOne Golang APIKit Overview Requirements Installation Usage Generate standard project structure Define the API wi

Aug 9, 2022
Faker is a Go library that generates fake data for you.
Faker is a Go library that generates fake data for you.

Faker is a Go library that generates fake data for you. Whether you need to bootstrap your database, create good-looking XML documents, fill-in your p

Jan 7, 2023