A simple virtual machine - compiler & interpreter - written in golang

Go Report Card license Release

go.vm

This project is a golang based compiler and interpreter for a simple virtual machine. It is a port of the existing project:

You can get a feel for what it looks like by referring to either the parent project, or the examples contained in this repository.

This particular virtual machine is intentionally simple, but despite that it is hopefully implemented in a readable fashion. "Simplicity" here means that we support only a small number of instructions, and the 16-registers the virtual CPU possesses can store strings and integers, but not floating-point values.

If you want to see a real virtual machine, interpreting a scripting language, which you can embed inside your Golang applications:

Installation

There are two ways to install this project from source, which depend on the version of the go version you're using.

If you prefer you can fetch a binary from our release page. Currently there is only a binary for Linux (amd64) due to the use of cgo in our dependencies.

Build without Go Modules (Go before 1.11)

go get -u github.com/skx/go.vm

Build with Go Modules (Go 1.11 or higher)

git clone https://github.com/skx/go.vm ;# make sure to clone outside of GOPATH
cd go.vm
go install

Usage

Once installed there are three sub-commands of interest:

  • go.vm compile $file.in
    • Compiles the given program into bytecode.
  • go.vm execute $file.raw
    • Given the path to a file of bytecode, then interpret it.
  • go.vm run $file.in
    • Compiles the specified program, then directly executes it.

So to compile the input-file examples/hello.in into bytecode:

 $ go.vm compile examples/hello.in

Then to execute the resulting bytecode:

 $ go.vm execute examples/hello.raw

Or you can handle both steps at once:

 $ go.vm run examples/hello.in

Opcodes

The virtual machine has 16 registers, each of which can store an integer or a string. For example to set the first two registers you might write:

 store #0, "This is a string"
 store #1, 0xFFFF

In addition to this there are several mathematical operations which have the general form:

 $operation $result, $src1, $src2

For example to add the contents of register #1 and register #2, storing the result in register #0 you would write:

 add #0, #1, #2

Strings and integers may be displayed to STDOUT via:

 print_str #1
 print_int #3

Control-flow is supported via call, ret (for subroutines) and jmp for absolute jumps. You can also use the Z-flag which is set by comparisons and the inc/dec instructions and make conditional jumps:

    store #1, 0x42
    cmp #1, 0x42
    jmpz ok

    store #1, "Something weird happened!\n"
    print_str #1
    exit
  :ok
    store #1, "Comparing register #01 to 0x42 succeeded!\n"
    print_str #1
    exit

Further instructions are available and can be viewed beneath examples/. The instruction-set is pretty limited, for example there is no notion of reading from STDIN - however this is supported via the use of traps, as documented below.

Notes

Some brief notes on parts of the code / operation:

The compiler

The compiler is built in a traditional fashion:

  • Input is split into tokens via lexer.go
    • This uses the token.go for the definition of constants.
  • The stream of tokens is iterated over by compiler.go
    • This uses the constants in opcode.go for the bytecode generation.

The approach to labels is the same as in the inspiring-project: Every time we come across a label we output a pair of temporary bytes in our bytecode. Later, once we've read the whole program and assume we've found all existing labels, we go back up and fix the generated addresses.

You can use the dump command to see the structure the lexer generates:

 $ go.vm dump ./examples/hello.in
 {STORE store}
 {IDENT #1}
 {, ,}
 {STRING Hello, World!
 }
 {PRINT_STR print_str}
 {IDENT #1}
 {EXIT exit}

The interpreter

The core of the interpreter is located in the file cpu.go and is as simple and naive as you would expect. There are some supporting files in the same directory:

Changes

Compared to the original project there are two main changes:

  • The DB/DATA operation allows storing string data directly in the generated bytecode.
  • There is a notion of traps.
    • Rather than defining opcodes for complex tasks it is now possible to callback into the CPU-emulator to do work.

DB/DATA Changes

For example in simple.vm project this is possible:

DB 0x01, 0x02,

But this is not:

 DB "This is a string, with terminator to follow"
 DB 0x00

go.vm supports this, and it is demonstrated in examples/peek-strlen.in.

Traps

The instruction int can be used to call back to the emulator to do some work on behalf of a program. The following traps are currently defined & available:

  • int 0x00
    • Set the contents of the register #0 with the length of the string in register #0.
  • int 0x01
  • int 0x02
    • Update the (string) contents of register #0 to remove any trailing newline.
    • See examples/trap.box.in.

Adding your own trap-functions should be as simple as editing cpu/traps.go.

Fuzzing

Fuzz-testing is a powerful technique to discover bugs, in brief it consists of running a program with numerous random inputs and waiting for it to die.

I've fuzzed this repository repeatedly via go-fuzz and fixed a couple of minor issues.

Note however that fuzzing will trigger some expected failures. Our virtual CPU has only 16 registers, so for example a program that tries to set register #30 to a particular value is invalid, and will terminate the virtual machine.

Because fuzzing involves using "random" input it is possible there are bugs lurking in the virtual-machine which I've not been lucky enough to catch, so if you wish to fuzz this is how you do it. First of all install the tool:

 $ go get github.com/dvyukov/go-fuzz/go-fuzz
 $ go get github.com/dvyukov/go-fuzz/go-fuzz-build

Now you can build the interpreter using it:

 $ go-fuzz-build github.com/skx/go.vm/fuzz

Finally you can launch the fuzzer:

 $ go-fuzz -nprocs=1 -bin=fuzz-fuzz.zip -workdir=workdir

Interesting results will appear in workdir/crashers/ some crashes will be invalid and you can see that via the *.output files which contain STDOUT from the run (more or less). For example this is an "expected" failure:

 $ cat workdir/crashers/92108737efbd0ac6b42ae4473db5a257314b36cf.output
 Register 99 out of range
 exit status 1

See-Also

The original virtual-machine compiler and interpreter is available from the following repository:

In addition to that you can find a real virtual-machine inside the embedded scripting engine I wrote, also for golang. In that case a scripting language is parsed and converted into a series of bytecode instructions, which are executed by a virtual machine. Similar to this project, but the bytecode operations are more complex and high-level:

If you're interested in compilers, and interpreters, generally you might enjoy these other projects too:

Github Setup

This repository is configured to run tests upon every commit, and when pull-requests are created/updated. The testing is carried out via .github/run-tests.sh which is used by the github-action-tester action.

Releases are automated in a similar fashion via .github/build, and the github-action-publish-binaries action.

Steve

Owner
Comments
  • I should investigage fuzzing.

    I should investigage fuzzing.

    The original c intepreter was fuzzed with AFL, which resulted in a couple of bug-fixes.

    It would be worth fuzzing this application with a go-based fuzzer. I'm not yet familiar with any, but no doubt they exist.

  • Registers overflow - two possible solutions!

    Registers overflow - two possible solutions!

    This is supposed to be a 16-bit system, as our address-space is capped at 64k, and our registers hold (integer) values in the range 0x0000-0xffff only.

    However our registers overflow:

     store #0, 0xFEFE
     store #1, 0xFEFE
     add #3, #0, #1
     print_int #3
    

    The result of this program is 0x1FDFC. Instead we should be % 0xFFFF. We handled the case of inc and dec wrapping correctly in #2 but we need to cap registers more generally.

    A simple solution would be to update SetInt to add & 0xFFFF which would limit the value. However a better solution might be to test for this overflow in all the math-operations and set a (new) [C]arry flag.

    To do this we'd need to:

    • Add the new flag.
    • Set it on all suitable math-operations.
    • Add JMP_C and JMP_NC operations.

    I could go either way, since this machine is posted for learning it could make sense to either:

    • Simplify because it isn't real.
    • Add the opcodes because they demonstrate something useful.

    Feedback welcome...

  • inc + dec don't wrap

    inc + dec don't wrap

    Since our registers hold integers in the range 0x0000 0xFFFF they should wrap when they reach the limits. This example:

        store #1,0
        dec #1 
    

    Should result in register #1 containing 0xFFFF.

  • If this were a real 80s CPU ..

    If this were a real 80s CPU ..

    We have a lot of special purpose instructions/opcodes which are fun to play with but which wouldn't exist in a real processor, for example:

    • MEMCPY
      • Could be implemented via PEEK/POKE and a loop
    • SYSTEM
      • Can't be implemented by hand, but isn't so terribly useful
      • At least in part because the results are output to the console and not stored in a register.

    We're also missing the ability to carry out some operations, either in a simple fashion, or at all:

    • We can't convert a string to uppercase easily
      • Though we could iterate over some memory and cmp >= 'a' && <= 'z'
      • We can't iterate over the characters in a string stored in a register though, or modify them in-place.
    • We can't input string from the user into a register/RAM.
    • We can't draw to the console.
      • Meaning we can't write "tetris" or other simple games.

    If this were a real CPU we'd call into fixed addresses into ROM to get support for things, or we'd have an interrupt function. For example we could define the INT 0xNN instruction. Then using that we could provide facilities

    • INT 0x00
      • Read a string from the user, store in register #0.
    • INT 0x01
      • Output a single character, as stored in #0
    • INT 0x02
      • Return the length of the string stored in #0
    • INT 0x03
      • Draw the character stored in #0 at coordinates #1,#2
    • ..

    In short we could add higher-level useful utilities in a way that didn't require the addition of more single-purpose instructions. I don't have a strong view of what things are missing, because I've never tried to write "complex" programs for this CPU, but if we're all about the learning then it should be possible to do so!

  • Fix stack removal.

    Fix stack removal.

    Stacks should be implemented as LIFO, but ours was accidentally FIFO which is very wrong. This would have caused nested-calls to be handled badly.

    Fixed implementation, added test-cases, and now this closes #12.

  • Errors

    Errors

    This pull-request overhauls our CPU execution to improve error-handling:

    • The CPU will no longer call os.Exit
    • The fuzz-testing has been updated to use the native facilities available in go 1.18+

    As part of this update I've added SetContext to allow catching infinite loops, and fixed numerous errors when handling malformed programs.

  • 9 new workflow

    9 new workflow

    This pull-request ports the github-actions from the old HCL-based syntax to using the newer YAML flavour.

    As part of that I've added support for staticcheck.

    This closes #9.

  • Linting

    Linting

    This pull request:

    • Updates the README to document go modules + add a table of contents.
    • Adds more linting and checks as part of the github workflow.
ReCT-Go-Compiler - A compiler for the ReCT programming language written in Golang

ReCT-Go-Compiler A compiler for the ReCT programming language written in Golang

Nov 30, 2022
Interpreter - The Official Interpreter for the Infant Lang written in Go

Infant Lang Interpreter Infant Lang Minimalistic Less Esoteric Programming Langu

Jan 10, 2022
Virtual-Operating-System - Virtual Operating System Using Golang And Fyne Implemented Gallery app
Virtual-Operating-System - Virtual Operating System Using Golang And Fyne Implemented Gallery app

Virtual Operating System Virtual Operating System Using Golang And Fyne Implemen

Jan 1, 2022
A customisable virtual machine written in Go

== About GoLightly == GoLightly is a lightweight virtual machine library implemented in Go, designed for flexibility and reuse. Traditionally popular

Nov 16, 2022
Expr – a tiny stack-based virtual machine written in Go

Expr – a tiny stack-based virtual machine written in Go The executor is designed to interpret a simple expression language and it's useful in delegati

Nov 11, 2022
Go compiler made from scratch, which can compile itself. It's going to be the smallest and simplest go compiler in the world.

Babygo, a go compiler made from scratch Babygo is a small and simple go compiler. (Smallest and simplest in the world, I believe.) It is made from scr

Jan 8, 2023
Monkey programming language project from 'Writing An Interpreter In Go'and 'Writing A Compiler In Go' Books
Monkey programming language project from 'Writing An Interpreter In Go'and 'Writing A Compiler In Go' Books

Monkey Monkey programming language ?? project from "Writing An Interpreter In Go

Dec 16, 2021
A bytecode-based virtual machine to implement scripting/filtering support in your golang project.

eval-filter Implementation Scripting Facilities Types Built-In Functions Conditionals Loops Functions Case/Switch Use Cases Security Denial of service

Dec 30, 2022
Forth virtual machine in Go

forego - A Forth implementation in Go ===================================== Why? ---- For ego. This is me learning the language. Both of them. So

Sep 9, 2022
Weave Ignite is an open source Virtual Machine (VM) manager with a container UX and built-in GitOps management.
Weave Ignite is an open source Virtual Machine (VM) manager with a container UX and built-in GitOps management.

Weave Ignite is an open source Virtual Machine (VM) manager with a container UX and built-in GitOps management.

Nov 16, 2021
Simple-lisp - A Simple Lisp Interpreter written in Go

Simple Lisp A simple Lisp interpreter written in Go. The fixed-precision-numbers

Jun 21, 2022
Gentee - script programming language for automation. It uses VM and compiler written in Go (Golang).

Gentee script programming language Gentee is a free open source script programming language. The Gentee programming language is designed to create scr

Dec 15, 2022
Scriptable interpreter written in golang
Scriptable interpreter written in golang

Anko Anko is a scriptable interpreter written in Go. (Picture licensed under CC BY-SA 3.0, photo by Ocdp) Usage Example - Embedded package main impor

Dec 23, 2022
A BASIC interpreter written in golang.
A BASIC interpreter written in golang.

05 PRINT "Index" 10 PRINT "GOBASIC!" 20 PRINT "Limitations" Arrays Line Numbers IF Statement DATA / READ Statements Builtin Functions Types 30 PRINT "

Dec 24, 2022
interpreter for the basic language written in golang
interpreter for the basic language written in golang

jirachi interpreter for the basic language written in golang The plan supports the following functions: Arithmetic Operations (+, -, *, /, ^) Comparis

Sep 17, 2022
A multi-pass compiler written in Go comprised of scanner, recursive-descent parser, generation of AST, intermediate representation (ILOC), and code generation (Armv8).

GoLite Project - Go Huskies! This is a project conducted and led in the course MPCS 51300 Compilers at the University of Chicago. In a group of two, w

Jan 10, 2022
A POSIX-compliant AWK interpreter written in Go

GoAWK: an AWK interpreter written in Go AWK is a fascinating text-processing language, and somehow after reading the delightfully-terse The AWK Progra

Dec 31, 2022
Mini lisp interpreter written in Go.

Mini Go Lisp Mini lisp interpreter written in Go. It is implemented with reference to the d-tsuji/SDLisp repository written in Java. Support System Fu

Nov 25, 2022
◻️ A Whitespace interpreter written in Go.

whitespace-go A Whitespace interpreter written in Go. Whitespace is a esoteric programming language made up entirely of spaces, tabs, and newlines. Th

May 18, 2022