Mimic - a eBPF virtual machine and emulator which runs in userspace

Mimic (eBPF userspace emulator)

Go Reference

Mimic is a eBPF virtual machine and emulator which runs in userspace. Mimic attempts to 'mimic' the eBPF machinery we find in the Linux kernel, as well as other possible implementation/environments.

Goals / use cases

Mimics main purpose is to serve as a backend for the edb eBPF debugger, which is needed since we can't interrupt the kernel and "step" through its execution like we can in userspace.

Secondly, running eBPF programs typically requires Linux, so developing on a machine which itself doesn't have eBPF support makes testing difficult. Since Mimic is written in Go, developers should be able to run eBPF programs on their own machine.

Third, Mimic might be useful for embedding external/untrusted programs in Go applications, or as a plugin system. This is what eBPF is designed to do after all, except in the Linux kernel. It should be noted that this is not a JIT-ed implementation, so it is not as fast as native code. But perhaps it can compete with WASM, JS and LUA

Getting started

Just like running eBPF in a Linux kernel, we need to do a few steps to setup, most of which will be familiar.

// Create a new Linux emulator, which handles some Linux specific eBPF features like maps.
emu := &mimic.LinuxEmulator{}
// Create a VM, which can spawn multiple processes, we pass the emulator to the VM, the VM will call the emulator
// at certain points, namely when loading programs and for helper functions.
vm := mimic.NewVM(mimic.VMOptEmulator(emu))

// Parse an ELF file containing programs and maps
coll, err := ebpf.LoadCollectionSpec("./example-ebpf-elf-file")
if err != nil {
    panic(err)
}

// Get a map specification from the collection(this is just information about the map type)
exampleMapSpec := coll.Maps["example_map"]
// Get an emulated map based on the spec(which can actually store data)
exampleMap, err := mimic.MapSpecToLinuxMap(exampleMapSpec)
if err != nil {
    panic(err)
}

// Register the emulated map with the emulator.
err = emu.AddMap(exampleMap)
if err != nil {
    panic(err)
}

// Note: After registering the map, it will be initialized, you can now modify it.

// Get and load the program specification.
prog := coll.Programs["example_prog"]
progID, err := vm.AddProgram(prog)
if err != nil {
    panic(err)
}

// Open the program context file.
f, err := os.Open("./example.ctx.json")
if err != nil {
    panic(err)
}

// Unmarshal the JSON into an actual context object, more about what a context is and its JSON format in the
// "Context" readme section.
ctx, err := mimic.UnmarshalContextJSON(f)
if err != nil {
    panic(err)
}

// Start a new process by providing the unique ID of the loaded program and a context. This links `ctx` to the process
// until p is cleaned up and can't be used by other processes.
p, err := vm.NewProcess(progID, ctx)
if err != nil {
    panic(err)
}

// Run the instance of the program. Alternatively, one can manually execute each program instruction via the p.Step()
// function.
err = p.Run(context.Background())

// Note: after running a process, one can inspect maps, memory, registers and the context to get the program results

// Cleanup the process, this frees memory associated with the process, p should not be stored or used after calling
// cleanup. The `ctx` that was attached is also cleaned up and can now be re-used.
err = p.Cleanup()

VM vs Emulator

eBPF is no longer Linux specific(eBPF for windows). Now the eBPF instruction set is well defined as must be constant across all platforms(perhaps with the exception of legacy cBPF instructions for packet access). However, the Linux eBPF subsystem makes use of a lot of maps, some if which are generic, but most of which are very specialized and aimed at Linux specific features like cGroups or LSM.

To be able to support all use cases and emulator multiple platforms/environments, Mimic splits functionality into two parts(reals). The "Virtual Machine" which is responsible for all generic features(Instructions, Registers, Memory, Processes). The "Emulator" is interchangeable and provides platform/environment specific features(Helper functions/Maps/Some other construct), besides a Linux emulator we can have a Windows emulator and users of Mimic might even implement their own emulator with custom helper functions specific to their needs.

Thread safety / concurrent execution

Mimic considers two different levels of safety and race-conditions. The first is race-conditions in Go-land, meaning in the VM or emulator itself, cases like concurrent access to maps. The second level is in eBPF-land, which would be concurrent reads and writes to the same map key. The emulator allocates memory in the form of []byte which will not change in size during their lifetime, therefor race-conditions in the contents of these allocations are considered "safe" from a emulator perspective. This is very much like the behavior of the Linux kernel itself.

So the vm/emulator is thread safe, but the eBPF programs not necessary, running multiple processes concurrently might give cause race-conditions in the eBPF programs, eBPF programs are responsible for their own race-conditions and must guard against them using eBPF mechanism like per-CPU maps and emulated spin locks.

Contexts

When the Linux kernel invokes a eBPF program it will pass it arguments also called a context. The type of the context depends on the program type and even where it is attached. An XDP program will get a *xdp_md passed while a socket program will get a *sk_buff. So when we start a eBPF program we also need to pass in a context.

Just passing in some arguments is enough for some programs, but more complex programs often make extensive use of helper functions such as bpf_get_current_pid_tgid or bpf_probe_read which return memory from the kernel. Therefor the context also contains data which will be used to return environment data to the eBPF calls.

Any Go struct which implements mimic.Context can be used as a context. The mimic.UnmarshalContextJSON function can be used to unmarshal JSON files into the contexts types provided by mimic. The mimic.RegisterContextUnmarshaller function can be used to register additional context types, so all context parsing can work via a single function.

Generic context

The goal of the generic context type is to provide a context which should cover all situations. It enabled users to craft any data structure. This makes it also very verbose, which is why for some program types, more specialized contexts types are implemented.

{
    "name": "{name of the context}",
    "type": "generic",
    "ctx": {
        "registers": {
            "r1": "{name of a memory section}",
            "r2": "{name of a memory section}",
            "r3": "{name of a memory section}",
            "r4": "{name of a memory section}",
            "r5": "{name of a memory section}"
        },
        "memory": [
            {
                "name": "{Name of this piece of memory}",
                "type": "{block|ptr|struct|int}",
                "value": "{type dependant value}"
            },
            {
                "name": "{block of memory is just a collection of bytes}",
                "type": "block",
                "value": {
                    "value": "{base64 encoded string}",
                    "byteorder": "{optional byte order(big-endian|be|little-endian|le)}"
                }
            },
            {
                "name": "{pointer points at an offset within another memory section}",
                "type": "ptr",
                "value": {
                    "memory": "{name of the memory block we point to}",
                    "offset": 20,
                    "size": 32
                }
            },
            {
                "name": "{struct is a memory structure}",
                "type": "struct",
                "value": [
                    {
                        "name": "{field name}",
                        "memory": "{name of other memory}"
                    },
                    {
                        "name": "{field name}",
                        "memory": "{name of other memory}"
                    }
                ]
            },
            {
                "name": "{Name of this integer}",
                "type": "int",
                "value": {
                    "value": 123,
                    "size": 8
                }
            }
        ],
        "emulator": {
            "key": "this can be anything. The schema will be determined by the specific emulator",
            "other-key": "these k/v pairs are not loaded into the VM's memory, but can reference it"
        }
    }
}

XDP_MD

Features / TODO

eBPF is complex, the Linux kernel is complex, thus so is an emulator. This is a list of features which we might want to add but haven't yet. Contributions are welcome.

Note: Marked list items were on the list, but have since been implemented.

Project quality

  • Linting
  • CI
  • Decent unit test coverage
  • Comments on all exported structs for GoDoc generation
  • Usage examples
  • Linux selftest mirroring (if we execute the linux eBPF selftests and get a positive result, can can conclude that the implementation matches)
  • Benchmarking

Generic features

  • Process schedulers
    • Per-CPU scheduler. Running eBPF processes on a worker pool no larger than the amount of logical CPU's. Theoretically the fastest for CPU heavy workloads.
    • Naive scheduling. Running each process in a Goroutine, mostly useful for programs which don't rely on Per-CPU maps and are I/O heavy or blocking.
  • sk_buff context. The Linux sk_buff structure is very complex, and would be hard to craft with the generic context. Making a purpose built context type for it seems logical.

eBPF Instructions

  • Atomic add
  • Atomic and
  • Atomic or
  • Atomic xor
  • Atomic exchange
  • Atomic compare and exchange
  • Packet access (To be implemented as a special helper function in emulator)

Mechanisms

  • BPF-to-BPF function calls
  • Tailcalling (Switching of the current program)

Linux helper functions

  • bpf_lookup_element
  • bpf_update_element
  • bpf_delete_element
  • bpf_probe_read
  • bpf_ktime_get_ns
  • bpf_trace_printk
  • bpf_get_prandom_u32
  • bpf_get_smp_processor_id
  • bpf_skb_store_bytes
  • bpf_l3_csum_replace
  • bpf_l4_csum_replace
  • bpf_tail_call
  • bpf_clone_redirect
  • bpf_get_current_pid_tgid
  • bpf_get_current_uid_gid
  • bpf_get_current_comm
  • bpf_get_cgroup_classid
  • bpf_skb_vlan_push
  • bpf_skb_vlan_pop
  • bpf_skb_get_tunnel_key
  • bpf_skb_set_tunnel_key
  • bpf_perf_event_read
  • bpf_redirect
  • bpf_get_route_realm
  • bpf_perf_event_output
  • bpf_skb_load_bytes
  • bpf_get_stackid
  • bpf_csum_diff
  • bpf_skb_get_tunnel_opt
  • bpf_skb_set_tunnel_opt
  • bpf_skb_change_proto
  • bpf_skb_change_type
  • bpf_skb_under_cgroup
  • bpf_get_hash_recalc
  • bpf_get_current_task
  • bpf_probe_write_user
  • bpf_current_task_under_cgroup
  • bpf_skb_change_tail
  • bpf_skb_pull_data
  • bpf_csum_update
  • bpf_set_hash_invalid
  • bpf_get_numa_node_id
  • bpf_skb_change_head
  • bpf_xdp_adjust_head
  • bpf_probe_read_str
  • bpf_get_socket_cookie
  • bpf_get_socket_uid
  • bpf_set_hash
  • bpf_setsockopt
  • bpf_skb_adjust_room
  • bpf_redirect_map
  • bpf_sk_redirect_map
  • bpf_sock_map_update
  • bpf_xdp_adjust_meta
  • bpf_perf_event_read_value
  • bpf_perf_prog_read_value
  • bpf_getsockopt
  • bpf_override_return
  • bpf_sock_ops_cb_flags_set
  • bpf_msg_redirect_map
  • bpf_msg_apply_bytes
  • bpf_msg_cork_bytes
  • bpf_msg_pull_data
  • bpf_bind
  • bpf_xdp_adjust_tail
  • bpf_skb_get_xfrm_state
  • bpf_get_stack
  • bpf_skb_load_bytes_relative
  • bpf_fib_lookup
  • bpf_sock_hash_update
  • bpf_msg_redirect_hash
  • bpf_sk_redirect_hash
  • bpf_lwt_push_encap
  • bpf_lwt_seg6_store_bytes
  • bpf_lwt_seg6_adjust_srh
  • bpf_lwt_seg6_action
  • bpf_rc_repeat
  • bpf_rc_keydown
  • bpf_skb_cgroup_id
  • bpf_get_current_cgroup_id
  • bpf_get_local_storage
  • bpf_sk_select_reuseport
  • bpf_skb_ancestor_cgroup_id
  • bpf_sk_lookup_tcp
  • bpf_sk_lookup_udp
  • bpf_sk_release
  • bpf_map_push_elem
  • bpf_map_pop_elem
  • bpf_map_peek_elem
  • bpf_msg_push_data
  • bpf_msg_pop_data
  • bpf_rc_pointer_rel
  • bpf_spin_lock
  • bpf_spin_unlock
  • bpf_sk_fullsock
  • bpf_tcp_sock
  • bpf_skb_ecn_set_ce
  • bpf_get_listener_sock
  • bpf_skc_lookup_tcp
  • bpf_tcp_check_syncookie
  • bpf_sysctl_get_name
  • bpf_sysctl_get_current_value
  • bpf_sysctl_get_new_value
  • bpf_sysctl_set_new_value
  • bpf_strtol
  • bpf_strtoul
  • bpf_sk_storage_get
  • bpf_sk_storage_delete
  • bpf_send_signal
  • bpf_tcp_gen_syncookie
  • bpf_skb_output
  • bpf_probe_read_user
  • bpf_probe_read_kernel
  • bpf_probe_read_user_str
  • bpf_probe_read_kernel_str
  • bpf_tcp_send_ack
  • bpf_send_signal_thread
  • bpf_jiffies64
  • bpf_read_branch_records
  • bpf_get_ns_current_pid_tgid
  • bpf_xdp_output
  • bpf_get_netns_cookie
  • bpf_get_current_ancestor_cgroup_id
  • bpf_sk_assign
  • bpf_ktime_get_boot_ns
  • bpf_seq_printf
  • bpf_seq_write
  • bpf_sk_cgroup_id
  • bpf_sk_ancestor_cgroup_id
  • bpf_ringbuf_output
  • bpf_ringbuf_reserve
  • bpf_ringbuf_submit
  • bpf_ringbuf_discard
  • bpf_ringbuf_query
  • bpf_csum_level
  • bpf_skc_to_tcp6_sock
  • bpf_skc_to_tcp_sock
  • bpf_skc_to_tcp_timewait_sock
  • bpf_skc_to_tcp_request_sock
  • bpf_skc_to_udp6_sock
  • bpf_get_task_stack
  • bpf_load_hdr_opt
  • bpf_store_hdr_opt
  • bpf_reserve_hdr_opt
  • bpf_inode_storage_get
  • bpf_inode_storage_delete
  • bpf_d_path
  • bpf_copy_from_user
  • bpf_snprintf_btf
  • bpf_seq_printf_btf
  • bpf_skb_cgroup_classid
  • bpf_redirect_neigh
  • bpf_per_cpu_ptr
  • bpf_this_cpu_ptr
  • bpf_redirect_peer
  • bpf_task_storage_get
  • bpf_task_storage_delete
  • bpf_get_current_task_btf
  • bpf_bprm_opts_set
  • bpf_ktime_get_coarse_ns
  • bpf_ima_inode_hash
  • bpf_sock_from_file
  • bpf_check_mtu
  • bpf_for_each_map_elem
  • bpf_snprintf
  • bpf_sys_bpf
  • bpf_btf_find_by_name_kind
  • bpf_sys_close
  • bpf_timer_init
  • bpf_timer_set_callback
  • bpf_timer_start
  • bpf_timer_cancel
  • bpf_get_func_ip
  • bpf_get_attach_cookie
  • bpf_task_pt_regs
  • bpf_get_branch_snapshot
  • bpf_trace_vprintk
  • bpf_skc_to_unix_sock
  • bpf_kallsyms_lookup_name
  • bpf_find_vma
  • bpf_loop
  • bpf_strncmp
  • bpf_get_func_arg
  • bpf_get_func_ret
  • bpf_get_func_arg_cnt

Linux maps

  • BPF_MAP_TYPE_HASH
  • BPF_MAP_TYPE_ARRAY
  • BPF_MAP_TYPE_PROG_ARRAY
  • BPF_MAP_TYPE_PERF_EVENT_ARRAY
  • BPF_MAP_TYPE_PERCPU_HASH
  • BPF_MAP_TYPE_PERCPU_ARRAY
  • BPF_MAP_TYPE_STACK_TRACE
  • BPF_MAP_TYPE_CGROUP_ARRAY
  • BPF_MAP_TYPE_LRU_HASH
  • BPF_MAP_TYPE_LRU_PERCPU_HASH
  • BPF_MAP_TYPE_LPM_TRIE
  • BPF_MAP_TYPE_ARRAY_OF_MAPS
  • BPF_MAP_TYPE_HASH_OF_MAPS
  • BPF_MAP_TYPE_DEVMAP
  • BPF_MAP_TYPE_SOCKMAP
  • BPF_MAP_TYPE_CPUMAP
  • BPF_MAP_TYPE_XSKMAP
  • BPF_MAP_TYPE_SOCKHASH
  • BPF_MAP_TYPE_CGROUP_STORAGE
  • BPF_MAP_TYPE_REUSEPORT_SOCKARRAY
  • BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE
  • BPF_MAP_TYPE_QUEUE
  • BPF_MAP_TYPE_STACK
  • BPF_MAP_TYPE_SK_STORAGE
  • BPF_MAP_TYPE_DEVMAP_HASH
  • BPF_MAP_TYPE_STRUCT_OPS
  • BPF_MAP_TYPE_RINGBUF
  • BPF_MAP_TYPE_INODE_STORAGE
  • BPF_MAP_TYPE_TASK_STORAGE
  • BPF_MAP_TYPE_BLOOM_FILTER

Misc

  • (optional) real map backing. The idea being that instead of emulating a map, we can use actual BPF maps of the host. It may not be able to simulate all features, disabling syscall writes on a map would also block access for the eBPF program in this case. But baring some limitations, this could allow someone to share a map between the emulator and a loader program, or a real and emulated program.
  • emulator helper customization. Allowing users to change or extend the existing Linux emulator. For example to replace the bpf_probe_read callback so the Host program can return its own custom memory objects, thus being able integrate the VM/Emulator with the rest of the host application.
  • high-security settings on the vm. There are moments when we reuse memory(deleting map keys, tailcalls, bpf-to-bpf calls), but don't zero it out before reuse. There might be use cases, especially when running foreign code in a multi tenant environment where this might lead to security issues. Zeroing out memory can be costly, especially if the security use-case doesn't apply, so being able to enable or disable this feature with a VM setting would be nice. (Note: we currently don't zero-memory at all). (Note2: zeroing out memory should also enforce correctness for programs)
Similar Resources

GC2 is a Command and Control application that allows an attacker to execute commands on the target machine using Google Sheet and exfiltrate data using Google Drive.

GC2 is a Command and Control application that allows an attacker to execute commands on the target machine using Google Sheet and exfiltrate data using Google Drive.

GC2 GC2 (Google Command and Control) is a Command and Control application that allows an attacker to execute commands on the target machine using Goog

Dec 13, 2022

archy is an static binary to determine current kernel and machine architecture, with backwards compatible flags to uname, and offers alternative output format of Go runtime (i.e. GOOS, GOARCH).

archy archy is an simple binary to determine current kernel and machine architecture, which wraps uname and alternatively can read from Go runtime std

Mar 18, 2022

Run your MapReduce workloads as a single binary on a single machine with multiple CPUs and high memory. Pricing of a lot of small machines vs heavy machines is the same on most cloud providers.

gomap Run your MapReduce workloads as a single binary on a single machine with multiple CPUs and high memory. Pricing of a lot of small machines vs he

Sep 16, 2022

👀 A modern watch command. Time machine and pager etc.

👀 A modern watch command. Time machine and pager etc.

Viddy Modern watch command. Viddy well, gopher. Viddy well. Demo Features Basic features of original watch command. Execute command periodically, and

Jan 2, 2023

This is a command line application to manage and fine-tune Time Machine exclude paths.

heptapod This is a command line application to manage and fine-tune Time Machine exclude paths. This repository is a WIP! The advertised functionality

Dec 28, 2022

Another CLI framework for Go. It works on my machine.

Another CLI framework for Go. It works on my machine.

Command line interface framework Go framework for rapid command line application development

Dec 30, 2022

A small CLI tool to check connection from a local machine to a remote target in various protocols.

CHK chk is a small CLI tool to check connection from a local machine to a remote target in various protocols.

Oct 10, 2022

The Keel CLI allows you to setup Keel on your local dev machine or on a Kubernetes cluster

The Keel CLI allows you to setup Keel on your local dev machine or on a Kubernetes cluster

keel-cli What is keel-cli The Keel CLI allows you to setup Keel on your local dev machine or on a Kubernetes cluster, launches and manages Keel instan

Oct 7, 2021

The Dapr CLI allows you to setup Dapr on your local dev machine or on a Kubernetes cluster

Dapr CLI The Dapr CLI allows you to setup Dapr on your local dev machine or on a

Dec 23, 2021
Comments
  • LPMTrie support

    LPMTrie support

    We support a fair amount of linux map types already, however BPF_MAP_LPM_TRIE is missing. This map type is a trie(prefix tree) with some unique features to complement its intended use, best described in the code itself. TLDR.: It is a tree with keys that contain both a value and prefix/mask length. It does bit level comparison on the key value and picks the next child node to traverse based on the value of the bit, like a binary tree.

    I was unable to find a library which is both generic(works with []byte up to 256 bytes long), does longest prefix matching, compares on bits and not whole bytes and has a sparse data representation. So we will have to write our own based on the C code of linux.

    Another issue to take into account is how we will be registering the nodes in the memory controller, there are two approaches here:

    • Register each individual node as its own memory entry, easy to implement, makes the memory overview very noisy and makes a lot of records.
    • Take the hash approach, maintain a PlainMemory buffer which will be used as an array, maintain a freelist of addresses to be used and store the addresses in the tree. This is more overhead and complexer to implement, it also makes the memory controller listing and fragmentation less.
Related tags
A simple go program which checks if your websites are running and runs forever (stop it with ctrl+c). It takes two optional arguments, comma separated string with urls and an interval.

uptime A simple go program which checks if your websites are running and runs forever (stop it with ctrl+c). It takes two optional arguments: -interva

Dec 15, 2022
Darktile is a GPU rendered terminal emulator designed for tiling window managers.
Darktile is a GPU rendered terminal emulator designed for tiling window managers.

Darktile is a GPU rendered terminal emulator designed for tiling window managers.

Jan 3, 2023
being a gh extension that runs animated terminal "screensavers"

gh-screensaver being a gh extension that runs animated terminal "screensavers" usage gh screensaver run a random screensaver gh screensaver -s pipes r

Nov 29, 2022
ntest is a cross-platform cli app that runs multiple tests against any address.
ntest is a cross-platform cli app that runs multiple tests against any address.

ntest ntest is a cross-platform cli app that runs multiple tests against any address. About ntest Having the ability to run common tests against any d

Jan 3, 2022
Clirunner - Package clirunner runs a legacy shell-style CLI as if a human were running it.

clirunner Package clirunner runs a legacy shell-style command-line interpreter (CLI) as if a human were running it. A shell-style CLI offers a prompt

Jan 4, 2022
Jsos - A operating system that runs system-level javascript, based on the Linux kernel

JsOS ?? An linux-based operating system that runs Javascript code at the system-

Jan 6, 2023
Lightweight CLI tool to programmatically rescale your Hetzner virtual server daily to optimize your budget spending

Lightweight CLI tool to programmatically rescale your Hetzner virtual server daily to optimize your budget spending

Nov 28, 2022
Run your workloads on ephemeral Virtual Machines

vm-spinner Run your workloads on ephemeral Virtual Machines. Descriprion A simple tool that spawns an arbitrary number of VMs in parallel, runs the sa

Jan 21, 2022
Tabouli: a TUI for interacting with firmware/embedded devices that support a CLI via serial interface/virtual COM Port
Tabouli: a TUI for interacting with firmware/embedded devices that support a CLI via serial interface/virtual COM Port

Tabouli Information Tabouli is a TUI for interacting with firmware/embedded devi

Apr 2, 2022