An implementation of sed in Go. Just because!

Sed-Go

An implementation of sed in Go. Just because!

Status

  • Command-Line processing: Done. It accepts '-e', '-f', '-n' and long versions of the same. It takes '-help'.
  • Lexer: Complete.
  • Parser/Engine: Has every command in a typical sed now. It has: a, i, c, d, D, p, P, g, G, x, h, H, r, w, s, y, b, t, :label, n, N, q, =.

This sed engine can be embedded in your program, wrapping any io.Reader so that the stream is lazily processed as you read from it. Of course I also have a command-line driver program here (in package github.com/rwtodd/Go.Sed/cmd/sed-go).

Differences from Standard Sed

Regexps: The only thing you really have to keep in mind when using go-sed, is I use Go's "regexp" package. Therefore, you have to use that syntax for the regular expressions. The main differences I've noticed in practice are:

Go-sed Traditional RE Notes
s/a(bc*)d/$1/g s/a\(bc*\)d/\1/g Don't escape (); Use $1, $2, etc.
s/(?s).// s/.// If you want dot to match \n, use (?s) flag.

Go's regexps have many rich options, which you can see here.

There are a few niceties though, such as I interpret '\t' and '\n' in replacement strings:

s/\w+\s*/\t$0\n/g

You can also escape the newline like in a typical sed, if you want.

Slightly Friendlier Syntax: Go-sed is a little more user-friendly when it comes to syntax. In a normal sed, you have to use one (and ONLY one) space between a r or w and the filename. Go-sed eats whitespace until it sees the filename.

Also, in a typical sed, you need seemingly-extraneous semicolons like the one after the d below:

sed -e '/re/ { p ; d; }' < in > out

... but go-sed is nicer about it:

sed-go -e '/re/ { p ; d }' < in > out 

Unicode: Original sed had no idea about unicode, but modern ones do, and Go-sed is unicode-friendly:

sed-go -e 'y/go/世界/' < in > out

Embed in your Code

I built the program as a library so that sed can be embedded into programs, wrapping any available Reader. The library processes the input lazily as you read bytes from the wrapped Reader.

Obviously, custom string-processing code is faster, but for quick and one-off tasks, this can be just what you need. For example, you want to strip out unix-style comments from a file, but can't be bothered to write the Go code at the moment:

engine, err := sed.New(strings.NewReader(`/^#/d  s/ *#.*//`))
n, err := io.Copy(myOutput, engine.Wrap(myInput))

If your input is a string, and you just want to get a processed string back, there is RunString:

output, err := engine.RunString(inString)

Note that, if you want an engine that emulates sed's -n quiet mode, use NewQuiet instead of New.

Go Get

You can get the code/executable by saying:

go get github.com/rwtodd/Go.Sed/cmd/sed-go

And, if you want to embed a sed engine in your own program, you can import:

import "github.com/rwtodd/Go.Sed/sed"

(2019-01-03: I added a go.mod file to the repo so it can be built outside of $GOPATH)

Implementation Notes

I have never looked at how a "real" implementation of sed is done. I'm just going by the sed man pages and tutorials. I will note that in speed comparisons, go-sed outperforms Mac OS X's sed on my iMac, as long as the input isn't tiny. So, I think the architecture here is pretty good.

The library is spread out among several files:

  • lex.go: Lexes the input into tokens. Skips over comments. These are pretty large-grained tokens. For example, when it reads a 's'ubstitution command, it pulls in the arguments and modifiers and packages them into a single token. This makes the parser simpler. The tokens are sent over a channel so that the lexer can run concurrently with the parser. This is more of a design win than a performance win, but it is more than fast enough.
  • parse.go: Takes tokens from the lexer and parses the sed program. It's really a parser+compiler, becuase the output is an array of instructions for the VM to interpret. Because the tokens are designed to be pretty self-contained, this parser doesn't ever need to backtrack. I always like it when I can achieve that.

When branch targets (e.g., :loop) are parsed, a branch instruction to that location is stored off, along with the name. Then, after the initial parse, each branch is fixed up against the proper target.

  • instructions.go: This file holds all of the VM instructions except for substitution and translation, which are in substitution.go. An instruction for the VM I've built is just a func (svm *vm) error. This turned out to be a very flexible arrangement. Most instructions have a 1:1 mapping to sed commands, but others (like the 'n' command) are broken up at parse time into multiple engine instructions.

Simple instructions, like cmd_get (which handles the g sed command), are package functions, which can get inserted into the instruction stream context-free. A few instructions have state, such as the string to insert in an i\ command. For those, a closure holds the state:

  func cmd_newInserter(text string) instruction {
       return func(svm *vm) error {
           vm.ip++
           _,err := svm.output.WriteString(text)
           return err
       }
  }

A couple commands have so much state that a simple closure would be unwieldy, so those get a struct and an associated run method. That run method pointer becomes the instruction.

You can see in the example above that each instruction is responsible for incrementing the IP (instruction pointer) in the VM. That's flexible because many of the instructions branch, and they can set the IP to whatever they need. However, this was the number one cause of bugs during development: I'd add a new command, and forget to increment the IP, leading to an infinite loop on that instruction. So, I possibly should have had the VM auto-increment the IP, and have the branching instructions account for that when setting IP. It was a trade-off between keeping the inner loop as tight as possible and keeping the instructions as simple as possible. I might have made the wrong choice there.

  • engine.go: This is the sed-VM, and this file also has the entire public interface to the library. It is arranged for simplicity. You have one function to create an Engine from a sed program, and you can use that Engine to wrap an io.Reader. The same engine can be re-used against multiple inputs.

The inner loop of the interpreter very compact:

  for err == nil {
     err = svm.ins[svm.ip](svm)
  }
  • conditions.go: Conditions are what I call the guards around commands (like the 1,10 in 1,10d). The sed man pages act like the conditions are part of the command, but in my engine VM, they are commands themselves. In instructions.go, simplecond and twocond are the commands that make use of the conditions. The condition interface defines just one methon, isMet, which can inspect an engine and determine if the condition in question is met or not. The code in instructions.go does the rest.
Similar Resources

Reference go implementation of globalDCE protocol

globalDCE-go This is the reference implementation of the command line interface of globalDCE coded in the go programming language. This project is sti

Nov 15, 2021

Reference implementation of globaldce protocol coded in go

globaldce This is the reference implementation of the command line interface of globaldce coded in the go programming language. This project is still

Nov 15, 2021

A go language implementation of a CLI, based on input directory files

A go language implementation of a CLI, based on input directory files, using tree output of all file names, based on input database links, dynamic reading of database table information, based on input swagger files to automate the generation of the RESTFUL API.

Nov 20, 2021

Simple trie based auto-completion engine implementation in golang.

Simple trie based auto-completion engine implementation in golang.

Simple auto-complete engine implementation in golang. Quick start $ git clone https://github.com/benbarron/trie-auto-completion-engine $ cd trie-auto-

Jul 12, 2022

Golang implementation of the research by @jonaslyk and the drafted PoC from @LloydLabs

Doge-SelfDelete Golang implementation of the research by @jonaslyk and the drafted PoC from @LloydLabs Golang 实现的文件自删除,来自@jonaslyk和@LloydLabs etc add

Jun 22, 2022

Golang implementation of Reflective load PE from memory

🐸 Frog For Automatic Scan 🐶 Doge For Defense Evasion & Offensive Security Doge-MemX Golang implementation of Reflective load PE from memory Only Sup

Jul 25, 2022

Nano API Implementation in Golang

nanoapi Nano API Implementation in GO TL;DR The idea is to create a very simple

Jan 9, 2022

An experimental AOT implementation of PHP

Trunk An experimental PHP implementation that transpiles PHP code into Go code.

Jul 29, 2022

An implementation of the Nano cryptocurrency protocol in golang

Go Nano An implementation of the Nano protocol written from scratch in Go (golang). About the Project A crypto currency has to be resilient to survive

Mar 21, 2022
Comments
  • How do I write GNU sed equivalent for this?

    How do I write GNU sed equivalent for this?

    Hi,

    ➤ echo '200' | sed 's/\(.\).*/\1xx/'
    2xx
    ➤ echo '200' | sed-go 's/(.)../$1 xx/'
    2 xx
    ➤ echo '200' | sed-go 's/(.)../$1xx/'
    
    ➤
    

    How do I write an expression such that I don't have a space between 2 and xx? If I remote the space, it doesn't seem to work.

    Thanks in advance!

  • sed.NewQuiet doesn't work as expected

    sed.NewQuiet doesn't work as expected

    Hello

    When I run the following code:

    package main
    
    import (
    	"log"
    	"strings"
    
    	"github.com/rwtodd/Go.Sed/sed"
    )
    
    func main() {
    	engine, err := sed.NewQuiet(strings.NewReader(`s/^(.*)$/$1/`))
    	if err != nil {
    		log.Fatal(err)
    	}
    	output, err := engine.RunString("hello world")
    	if err != nil {
    		log.Fatal(err)
    	}
    	log.Printf("Result: '%s'", output)
    }
    

    I get the following result:

    2022/07/09 23:49:44 Result: ''
    

    but I would have expected:

    2022/07/09 23:49:44 Result: 'hello world'
    

    It works as it should when the engine is created with sed.New, but then with a newline that has to be stripped off afterwards.

    The same bug can be reproduced using command line:

    $ echo "red" | ./sed-go  's:e:a:g'
    rad
    $ echo "red" | ./sed-go -n 's:e:a:g'
    $ 
    
  • there is no T  branch command

    there is no T branch command

    'T branch' command is the reverse of command 't branch'.

    see the doc: gnu-sed

    t label (test) Branch to label only if there has been a successful substitution since the last input line was read or conditional branch was taken. The label may be omitted, in which case the next cycle is started.

    T label (test) Branch to label only if there have been no successful substitutions since the last input line was read or conditional branch was taken. The label may be omitted, in which case the next cycle is started.

V2 because I wasn't happy with the way things were going in V1

Pokerchips V2 A client/server app that simulates poker chips so you and your friends can play poke without physical chips! Client The client is writte

Jan 11, 2022
Golisp-wtf - A lisp interpreter (still just a parser) implementation in golang. You may yell "What the fuck!?.." when you see the shitty code.

R6RS Scheme Lisp dialect interpreter This is an implementation of a subset of R6RS Scheme Lisp dialect in golang. The work is still in progress. At th

Jan 7, 2022
Just a simple CLI tool to group dependabot PRs by dependency and merge them.
Just a simple CLI tool to group dependabot PRs by dependency and merge them.

Dependabotbot Have you been the victim of a lodash update? Has your notification page in Github been assaulted by needing to update a patch version of

Jun 30, 2022
It is an easy and fast tool to install your packages with just one command.

Trouxa It is an easy and fast tool to install your packages with just one command. What means "Trouxa"? In portuguese, Trouxa means something like a "

Feb 13, 2022
Get Brew Packages to update, just like you've experienced from Ubuntu
Get Brew Packages to update, just like you've experienced from Ubuntu

Get the number of Brew Packages to update, just like you've experienced from Ubuntu BrewUpdate is a simple utility written in Go, notify you how many

Nov 6, 2021
react, fiber, api just for saying hello

say-hello Introduction Sometimes we need to have a frontend besides the backend. This repository uses React to show how we can server a React build an

Jun 18, 2022
Count once - Just once? no, when appear many it run once, but it can run many times

countOnce just once? no, when appear many it run once, but it can run many times

Jan 29, 2022
Turn .mp3 files in current directory to a podcast feed just one command.

dir2cast Turn .mp3 files in current directory to a podcast feed just one command. Then you can subscribe to it with your favorite podcast client, down

Jun 27, 2022
Rule engine implementation in Golang
Rule engine implementation in Golang

"Gopher Holds The Rules" Grule-Rule-Engine import "github.com/hyperjumptech/grule-rule-engine" Rule Engine for Go Grule is a Rule Engine library for t

Aug 5, 2022
Readline is a pure go(golang) implementation for GNU-Readline kind library
Readline is a pure go(golang) implementation for GNU-Readline kind library

A powerful readline library in Linux macOS Windows Solaris Guide Demo Shortcut Repos using readline Feedback If you have any questions, please submit

Aug 1, 2022