PHP parser written in Go

PHP Parser written in Go

PHP Parser written in Go

GoDoc Build Status Go Report Card Maintainability Test Coverage

This project uses goyacc and ragel tools to create PHP parser. It parses source code into AST. It can be used to write static analysis, refactoring, metrics, code style formatting tools.

Try it online: demo

Features:

  • Fully support PHP 5 and PHP 7 syntax
  • Abstract syntax tree (AST) representation
  • Traversing AST
  • Resolving namespaced names
  • Parsing syntax-invalid PHP files
  • Saving and printing free-floating comments and whitespaces

Who Uses

VKCOM/noverify - NoVerify is a pretty fast linter for PHP

quasilyte/phpgrep - phpgrep is a tool for syntax-aware PHP code search

Usage example

package main

import (
	"log"
	"os"

	"github.com/z7zmey/php-parser/pkg/cfg"
	"github.com/z7zmey/php-parser/pkg/errors"
	"github.com/z7zmey/php-parser/pkg/parser"
	"github.com/z7zmey/php-parser/pkg/version"
	"github.com/z7zmey/php-parser/pkg/visitor/dumper"
)

func main() {
	src := []byte(`<? echo "Hello world";`)

	// Error handler

	var parserErrors []*errors.Error
	errorHandler := func(e *errors.Error) {
		parserErrors = append(parserErrors, e)
	}

	// Parse

	rootNode, err := parser.Parse(src, cfg.Config{
		Version:          &version.Version{Major: 5, Minor: 6},
		ErrorHandlerFunc: errorHandler,
	})

	if err != nil {
		log.Fatal("Error:" + err.Error())
	}

	// Dump

	goDumper := dumper.NewDumper(os.Stdout).
		WithTokens().
		WithPositions()

	rootNode.Accept(goDumper)
}

Roadmap

  • Control Flow Graph (CFG)
  • PHP8

Install

go get github.com/z7zmey/php-parser/cmd/php-parser

CLI

php-parser [flags] <path> ...
flag type description
-p bool print filepath
-e bool print errors
-d bool dump in golang format
-r bool resolve names
-prof string start profiler: [cpu, mem, trace]
-phpver string php version (default: 7.4)

Namespace resolver

Namespace resolver is a visitor that resolves nodes fully qualified name and saves into map[node.Node]string structure

  • For Class, Interface, Trait, Function, Constant nodes it saves name with current namespace.
  • For Name, Relative, FullyQualified nodes it resolves use aliases and saves a fully qualified name.
Comments
  • Does not work correctly if used from several goroutines

    Does not work correctly if used from several goroutines

    Parser sometimes gives a lot of strange errors, see below. When I parse file using 1 goroutine then it works just fine.

    Function that does the parsing does not rely on any global state:

    func parse(filename string) {
    	fp, err := os.Open(filename)
    	if err != nil {
    		log.Fatalf("Could not open file %s: %s", filename, err.Error())
    	}
    
    	defer fp.Close()
    
    	var b bytes.Buffer
    
    	conv := transform.NewReader(fp, charmap.Windows1251.NewDecoder())
    	parser := php7.NewParser(io.TeeReader(conv, &b), filename)
    	parser.Parse()
    
    	for _, e := range parser.GetErrors() {
    		fmt.Printf("ERROR: parsing %s: %s", filename, e)
    	}
    
    	rootNode := parser.GetRootNode()
    
    	if rootNode == nil {
    		log.Printf("Could not parse %s at all due to errors", filename)
    		return
    	}
    
    	rootNode.Walk(&rootWalker{
    		w:         os.Stdout,
    		filename:  filename,
    		comments:  parser.GetComments(),
    		positions: parser.GetPositions(),
    		lines:     bytes.Split(b.Bytes(), []byte("\n")),
    	})
    }
    

    Errors example:

    syntax error: unexpected T_ENCAPSED_AND_WHITESPACE at line 409
    syntax error: unexpected '}' at line 480
    syntax error: unexpected T_STRING, expecting T_VARIABLE or T_ENCAPSED_AND_WHITESPACE or T_DOLLAR_
    OPEN_CURLY_BRACES or T_CURLY_OPEN at line 605
    ...
    
  • Performance suggestion: reduce allocations

    Performance suggestion: reduce allocations

    The parser is not as performant as it could be (PHP7+ also creates AST in order to execute file):

    $ php -n -r '$start = microtime(true); require("some_big_file.php"); echo "time: " . (microtime(true) - $start) . " sec\n";'
    time: 0.0185 sec
    
    $ go run test.go some_big_file.php
    Errors count:  0
    Parser time: 167.636796ms
    

    test.go.zip

    I made a few (very hacky) patches that significantly reduce allocations count for some critical parts and it reduced parsing time by a factor of 1.5:

    # after patches
    $ go run test.go some_big_file.php
    Errors count:  0
    Parser time: 109.522377ms
    

    There are still plenty more allocations that profiler shows, my patches are just proof-of-concept. speedup.patch.txt

  • No reliable way to work with parenthesis

    No reliable way to work with parenthesis

    Proposal: add WithParens in analogy with func (l *Parser) WithFreeFloating(). That function asks parser to preserve info about parenthesis.

    It can either introduce AST nodes that represent parenthesis or make them implicit, but positions info should respect them.

    It's almost impossible to handle parenthesis in the source code right now. We can't reliably count a number of expression surrounding ( and ).

    Imagine this binary expression:

    (($x) + ((($y))))
    

    If we use GetPosition and take a slice of src[pos.StartPos-1 : pos.EndPos], result would be:

    $x) + ((($y
    

    If we go to the left and right, we'll count 2 parens, but the binary expression itself has only 1 surrounding, while $y is surrounded by 3 pairs of ().

    With proposed feature, I would expect binary expression position to capture the entire (($x) + ((($y)))). Sub-expression 1 should be ($x) and the 2 is ((($y))).

    For clone expr like clone ($x) we get positions that encloses clone ($x.

  • NamespaceResolver should remove unresolved names

    NamespaceResolver should remove unresolved names

    I'm having issues with the namespace resolver. It contains unresolved names like void, true and null. Shouldn't these be removed when they are not resolved

    test.php

    <?php
    
    declare(strict_types=1);
    
    namespace App\Domain\Handler\Cart;
    
    use SimpleBus\Message\Recorder\RecordsMessages;
    use App\Domain\Command\ChangeCurrencyCommand;
    use App\Domain\Repository\CartRepository;
    use App\Domain\Event\CurrencyChangedEvent as CurrencyChangedEventWithAlias;
    
    class ChangeCurrencyHandler
    {
        /**
         * @var CartRepository
         */
        private $cartRepository;
    
        /**
         * @var RecordsMessages
         */
        private $eventRecorder;
    
        public function __construct(
            CartRepository $cartRepository,
            RecordsMessages $eventRecorder
        ) {
            $this->cartRepository   = $cartRepository;
            $this->eventRecorder    = $eventRecorder;
        }
    
        public function __invoke(ChangeCurrencyCommand $command) : void
        {
            if (true === $command->getBool()) {
                // Do something
            }
            
            if (null !== $command->getNull()) {
                // Do something
            }
    
            $this->eventRecorder->record(new CurrencyChangedEventWithAlias());
        }
    }
    
    

    main.go

    package main
    
    import (
    	"fmt"
    	"github.com/z7zmey/php-parser/php7"
    	"github.com/z7zmey/php-parser/visitor"
    	"os"
    	"reflect"
    )
    
    func main() {
    	for _, file := range os.Args[1:] {
    		fmt.Printf("Checking %s\n", file)
    
    		checkFile(file)
    	}
    }
    
    func checkFile(file string) {
    	src, err := os.Open(file)
    	if err != nil {
    		panic(err)
    	}
    
    	parser := php7.NewParser(src, file)
    	parser.Parse()
    
    	for _, e := range parser.GetErrors() {
    		fmt.Println(e)
    	}
    
    	nsResolver := visitor.NewNamespaceResolver()
    	parser.GetRootNode().Walk(nsResolver)
    
    	for n, fqcn := range nsResolver.ResolvedNames {
    		fmt.Printf("Found %s: %s\n", reflect.TypeOf(n), fqcn)
    	}
    }
    

    output

    Checking ./test.php
    Found *name.Name: SimpleBus\Message\Recorder\RecordsMessages
    Found *name.Name: App\Domain\Command\ChangeCurrencyCommand
    Found *name.Name: void
    Found *name.Name: true
    Found *name.Name: null
    Found *name.Name: App\Domain\Event\CurrencyChangedEvent
    Found *stmt.Class: App\Domain\Handler\Cart\ChangeCurrencyHandler
    Found *name.Name: App\Domain\Repository\CartRepository
    
  • Calling parser concurrently

    Calling parser concurrently

    I wanted to call concurrently the php7 parser but as rootnode, comments and positions are defined as php7 module variable it does not work. What I did to solve this issue, was to move those structures to the lexer struct https://github.com/z7zmey/php-parser/blob/master/scanner/lexer.go#L438 and update the yacc parser file and it has worked.

    Did I miss something somewhere to have it working with goroutines ?

    If not, do you have a different idea to handle such a case or would you consider a pull request ?

  • Parser fails on (valid) string interpolation code

    Parser fails on (valid) string interpolation code

    This is a really interesting project, thanks for working on it. I've run into what looks like an erroneous parse failure with regard to string interpolation code.

    <?php
    $filename = "something.txt";
    @header("Content-Disposition: attachment; filename=\"$filename\"");
    

    This fails due to parse errors of various descriptions, depending on what the surrounding code looks like. When using the php-parser binary, this dumps out

    $ php-parser /tmp/brokenparse.php
    ==> /private/tmp/brokenparse.php
    syntax error: unexpected $end, expecting ')'
      | [*stmt.StmtList]
      |   "Stmts":
    

    PHP itself doesn't complain about this code and parses it just fine. Running php -l /tmp/brokenparse.php results in no errors. I believe this is related to a flaw in the grammar, because if $filename is followed by a space things work fine:

    <?php
    $filename = "something.txt";
    @header("Content-Disposition: attachment; filename=\"$filename \"");
    
    ➜  analyze php-parser /tmp/brokenparse.php
    ==> /private/tmp/brokenparse.php
      | [*stmt.StmtList]
      |   "Position": Pos{Line: 3-4 Pos: 8-104};
      |   "Stmts":
      |     [*stmt.Expression]
    
    ... 52 lines elided ...
    
  • Empty array elements and trailing comma in array literals

    Empty array elements and trailing comma in array literals

    At some point, php parser started to parse [1, 2,] as array of 3 elements where the last array element consisting of key=nil and val=nil.

    If we dig further, it also parses something like [,,,] as an array of 4 (empty) elements.

    Is it intended behavior?

  • ignore everything after __halt_compiler()

    ignore everything after __halt_compiler()

    fixes #56

    halt_compiler needs to be pulled out to the top level for this so return 0 doesn't skip unwinding the stack

    __halt_compiler() may only be used in the outermost scope, so this removes any reference to in within blocks, and pulls it out of the function in the tests.

  • printer: Does not keep formatting as-is

    printer: Does not keep formatting as-is

    What I expected When using the package simply labelled `printer*, I assumed it would just print the file with all the formatting it had previously, however it seems this is just meant to be a pretty printer.

    What's the plan for retaining formatting, if any? My use case is that I'd like to make something that'll resolve all my PHP namespaces in Sublime Text when you hit save.

    For saving the data back out, I'm sure I could do a sort of hack where I only modify the lines affected, but I'm hoping retention of formatting is do-able and not too difficult so I can avoid that effort

    At the very least, can the Printer struct perhaps just be renamed to PrettyPrinter?

  • Runtime exception when using

    Runtime exception when using "go get github.com/z7zmey/php-parser"

    I'm on (go version go1.14 linux/amd64)

    go get github.com/z7zmey/php-parser
    

    Shows

    # github.com/z7zmey/php-parser
    runtime.main_main·f: function main is undeclared in the main package
    
  • [dev] syntax error: unexpected T_STRING at line 5

    [dev] syntax error: unexpected T_STRING at line 5

    Found a problem with the following code using the dev branch:

    <?php
    
    declare(strict_types=1);
    
    $a = "JSON_MERGE('{\"vat\"}')";
    

    results in:

    syntax error: unexpected T_STRING at line 5
    
Related tags
PHP bindings for the Go programming language (Golang)

PHP bindings for Go This package implements support for executing PHP scripts, exporting Go variables for use in PHP contexts, attaching Go method rec

Jan 1, 2023
High-performance PHP-to-Golang IPC bridge

High-performance PHP-to-Golang IPC bridge Goridge is high performance PHP-to-Golang codec library which works over native PHP sockets and Golang net/r

Dec 28, 2022
A parser library for Go
A parser library for Go

A dead simple parser package for Go V2 Introduction Tutorial Tag syntax Overview Grammar syntax Capturing Capturing boolean value Streaming Lexing Sta

Dec 30, 2022
A Lua 5.3 VM and compiler written in Go.

DCLua - Go Lua Compiler and VM: This is a Lua 5.3 VM and compiler written in Go. This is intended to allow easy embedding into Go programs, with minim

Dec 12, 2022
gpython is a python interpreter written in go "batteries not included"

gpython gpython is a part re-implementation / part port of the Python 3.4 interpreter to the Go language, "batteries not included". It includes: runti

Dec 28, 2022
A BASIC interpreter written in golang.
A BASIC interpreter written in golang.

05 PRINT "Index" 10 PRINT "GOBASIC!" 20 PRINT "Limitations" Arrays Line Numbers IF Statement DATA / READ Statements Builtin Functions Types 30 PRINT "

Dec 24, 2022
Scriptable interpreter written in golang
Scriptable interpreter written in golang

Anko Anko is a scriptable interpreter written in Go. (Picture licensed under CC BY-SA 3.0, photo by Ocdp) Usage Example - Embedded package main impor

Dec 23, 2022
An interpreted languages written in Go

Monkey My changes 1. Installation Source Installation go <= 1.11 Source installation go >= 1.12 Binary Releases 1.1 Usage 2 Syntax 2.1 Definitions 2.2

Jan 8, 2023
A compiler for the ReCT programming language written in Golang

ReCT-Go-Compiler A compiler for the ReCT programming language written in Golang

Nov 30, 2022
Goridge is high performance PHP-to-Golang codec library which works over native PHP sockets and Golang net/rpc package.
Goridge is high performance PHP-to-Golang codec library which works over native PHP sockets and Golang net/rpc package.

Goridge is high performance PHP-to-Golang codec library which works over native PHP sockets and Golang net/rpc package. The library allows you to call Go service methods from PHP with a minimal footprint, structures and []byte support.

Dec 28, 2022
go-for-php php-to-go

easy-func 项目介绍 使用 golang 来翻译 php 函数。可以看做php函数在golang的映射字典。

Nov 12, 2022
PHP parser written in Go
PHP parser written in Go

PHP Parser written in Go This project uses goyacc and ragel tools to create PHP parser. It parses source code into AST. It can be used to write static

Dec 30, 2022
PHP parser written in Go
PHP parser written in Go

PHP Parser written in Go This project uses goyacc and ragel tools to create PHP parser. It parses source code into AST. It can be used to write static

Dec 25, 2022
PHP session encoder/decoder written in Go

php_session_decoder PHP session encoder/decoder written in Go Installation Install: The recommended way to install is using gonuts.io: nut get yvasiya

Sep 27, 2022
High-performance PHP application server, load-balancer and process manager written in Golang
High-performance PHP application server, load-balancer and process manager written in Golang

[RR2-BETA] RoadRunner is an open-source (MIT licensed) high-performance PHP application server, load balancer, and process manager. It supports runnin

Jan 4, 2023
High-performance PHP application server, load-balancer and process manager written in Golang
High-performance PHP application server, load-balancer and process manager written in Golang

RoadRunner is an open-source (MIT licensed) high-performance PHP application server, load balancer, and process manager. It supports running as a serv

Jan 1, 2023
High-performance PHP application server, load-balancer and process manager written in Golang
High-performance PHP application server, load-balancer and process manager written in Golang

RoadRunner is an open-source (MIT licensed) high-performance PHP application server, load balancer, and process manager. It supports running as a serv

Dec 30, 2022
REST API written in GO with PostgreSQL and Nginx Proxy + Certbot Let's Encrypt HTTPS certificates + Graphical Frontend in PHP. Deployed via docker-compose.

SSOA-PT REST APP Services Backend: REST API in Go Database: PostgreSQL Web Proxy: Nginx Let's Encrypt HTTPS certificates with certbot Frontend: PHP Ap

Mar 19, 2022
High-performance PHP application server, load-balancer and process manager written in Golang
High-performance PHP application server, load-balancer and process manager written in Golang

RoadRunner is an open-source (MIT licensed) high-performance PHP application server, load balancer, and process manager. It supports running as a serv

Dec 9, 2021