Skynet 1M threads microbenchmark

Issue Stats Issue Stats

Skynet 1M concurrency microbenchmark

Creates an actor (goroutine, whatever), which spawns 10 new actors, each of them spawns 10 more actors, etc. until one million actors are created on the final level. Then, each of them returns back its ordinal number (from 0 to 999999), which are summed on the previous level and sent back upstream, until reaching the root actor. (The answer should be 499999500000).

Results (on my shitty Macbook 12" '2015, Core M, OS X):

Actors

  • Scala/Akka: 6379 ms.
  • Erlang (non-HIPE): 4414 ms.
  • Erlang (HIPE): 3999 ms.

Coroutines / fibers / channels

  • Haskell (GHC 7.10.3): 6181 ms.
  • Go: 979 ms.
  • Quasar fibers and channels (Java 8): TODO

Futures / promises

  • .NET Core: 650 ms.
  • RxJava: 219 ms.

Results (i7-4770, Win8.1):

Actors

  • Scala/Akka: 4419 ms
  • Erlang (non-HIPE): 1700 ms.

Coroutines / fibers / channels

  • Haskell (GHC 7.10.3): 2820 ms.
  • Go: 629 ms.
  • F# MailboxProcessor: 756ms. (should be faster?..)
  • Quasar fibers and channels (Java 8): TODO

Futures / promises

  • .NET Core: Async (8 threads) 290 ms
  • Node-bluebird (Promise) 285ms / 195ms (after warmup)
  • .NET Full (TPL): 118 ms.

Results (i7-4771, Ubuntu 15.10):

  • Scala/Akka: 1700-2700 ms
  • Haskell (GHC 7.10.3): 41-44 ms
  • Erlang (non-HIPE): 700-1100 ms
  • Erlang (HIPE): 2100-3500 ms
  • Go: 200-224 ms
  • LuaJit: 297 ms

How to run

Scala/Akka

Install latest Scala and SBT.

Go to scala/, then run sbt compile run.

Java/Quasar

Install the Java 8 SDK.

Go to java-quasar/ ./gradlew

Go

Install latest Go compiler/runtime.

In go/, run go run skynet.go.

Pony

Install latest Pony compiler.

In pony/, run ponyc -b skynet && ./skynet.

Erlang

Install latest Erlang runtime.

In erlang, run erl +P 2000000 (to raise process limit), then compile:

  • For non-HIPE: c(skynet).
  • For HIPE (if supported on your system): hipe:c(skynet).

Then, run:

skynet:skynet(1000000,10).

.NET Core:

Install latest version of .NET Core

Go to dnx/
dotnet restore (first time)
dotnet run --configuration Release

Haskell

Install Stack

In haskell/, run stack build && stack exec skynet +RTS -N

Node (bluebird)

Install node.js

in node-bluebird/ run npm install then node skynet

FSharp

Install FSharp Interactive

Run fsi skynet.fsx, or run fsi and paste the code in (runs faster this way)

Crystal:

Install latest version of Crystal.

Go to crystal/ crystal build skynet.cr --release ./skynet

.NET/TPL

Build the solution with VS2015. Windows only :(

=======

Java

Install the Java 8 SDK.

Go to java/ ./gradlew :run

Rust (with coroutine-rs)

cd ./rust-coroutine
cargo build --release
cargo run --release

LuaJIT

Install luajit

Run luajit luajit/skynet.lua

Scheme/Guile Fibers

Install Guile, Guile fibers, and wisp; for example via guix package -i guile guile-fibers guile-wisp.

Go to guile-fibers ./skynet.scm

Owner
Comments
  • OCaml version

    OCaml version

    This version is here to prove a point, that was actually made here (and in many other places). This uses Lwt.

    node for comparison : regular: 488.650ms warmed-up: 367.576ms

    ocaml lwt: 0.035144s ocaml direct: 0.020431s ocaml trunk lwt: 0.025163s ocaml trunk direct: 0.018966s

    This is not averaged or anything, I didn't bother, the whole thing is already comparing apple to oranges. Comparing vastly different threading techniques is not useful.

    Side note: the code is not optimized at all, it's the most naive thing you would do with lwt.

  • Make Haskell Use Control.Parallel

    Make Haskell Use Control.Parallel

    This needs the parallel, but most Haskellers doing any kind of parallel work would likely have it. See Parallel and Concurrent Haskell.

    The 10 child threads are spawned by parMap.

    Here is the run I got:

    $ ghc -O2 Skynet.hs -o skynet -threaded
    [1 of 1] Compiling Main             ( Skynet.hs, Skynet.o )
    Linking skynet ...
    
    
    $ ./skynet +RTS -N2
    Result: 499999500000 in 0.141922s
    Result: 499999500000 in 0.063772s
    Result: 499999500000 in 0.069988s
    Result: 499999500000 in 0.046569s
    Result: 499999500000 in 0.05068s
    Result: 499999500000 in 0.045631s
    Result: 499999500000 in 0.044842s
    Result: 499999500000 in 0.044538s
    Result: 499999500000 in 0.046612s
    Result: 499999500000 in 0.044391s
    

    Given that my computer was literally unable to run the old version, I would say this is faster. (It seems like the first run is always slower for some reason. I guess the parallel engine warming up for something.)

  • Added node-bluebird

    Added node-bluebird

    .NET on my machine:

    24906 % dotnet run
    Project dnx (DNXCore,Version=v5.0) was previously compiled. Skipping compilation.
    499999500000
    Sync sec: 0.162
    499999500000
    Async sec: 0.482
    

    node-bluebird:

    24908 % node skynet
    499999500000
    regular: 365.884ms
    499999500000
    warmed-up: 261.911ms
    
  • Skynet Scheme Guile Wisp

    Skynet Scheme Guile Wisp

    This ports the skynet test to Guile fibers. Test result:

    serial: Result: 499999500000 in 0.064945607 seconds
    4: Result: 499999500000 in 8.707639791 seconds, speedup of 0.0074584627475204205
    3: Result: 499999500000 in 7.834723658 seconds, speedup of 0.00828945727188276
    2: Result: 499999500000 in 6.29440976 seconds, speedup of 0.01031798206286462
    1: Result: 499999500000 in 6.254672627 seconds, speedup of 0.010383534178854474
    
  • Use Async library.  Compile, run as a threaded application.

    Use Async library. Compile, run as a threaded application.

    The following pull:

    • Uses the Async library to abstract, channel, MVar, forkIO boilerplate.
    • Compile and run the example as a multithreaded application using all CPUs.
    • Adds suggested compile and run commands in the file.
  • Scala Future monad

    Scala Future monad

    Fair async version without Future.successful:

    [info] Running skynet.SkynetAsync 0. Result: 499999500000 in 647 ms.

    1. Result: 499999500000 in 377 ms.
    2. Result: 499999500000 in 372 ms.
    3. Result: 499999500000 in 371 ms.
    4. Result: 499999500000 in 342 ms.
    5. Result: 499999500000 in 347 ms.
    6. Result: 499999500000 in 367 ms.
    7. Result: 499999500000 in 321 ms.
    8. Result: 499999500000 in 308 ms.
    9. Result: 499999500000 in 320 ms.
    10. Result: 499999500000 in 338 ms. Best time 308 ms.

    Sync version for comparison:

    [info] Running skynet.SkynetSync 0. Result: 499999500000 in 97 ms.

    1. Result: 499999500000 in 60 ms.
    2. Result: 499999500000 in 61 ms.
    3. Result: 499999500000 in 67 ms.
    4. Result: 499999500000 in 56 ms.
    5. Result: 499999500000 in 64 ms.
    6. Result: 499999500000 in 63 ms.
    7. Result: 499999500000 in 62 ms.
    8. Result: 499999500000 in 64 ms.
    9. Result: 499999500000 in 63 ms.
    10. Result: 499999500000 in 66 ms. Best time 56 ms.
  • Scala Future monad

    Scala Future monad

    Scala benchmark with Future monad in separate folder 'scala-future' (keeps original 'scala' folder with Akka implementation)

    Results:

    scala-future $ sbt run [info] Running Skynet 0. Result: 499999500000 in 776 ms.

    1. Result: 499999500000 in 320 ms.
    2. Result: 499999500000 in 247 ms.
    3. Result: 499999500000 in 233 ms.
    4. Result: 499999500000 in 225 ms.
    5. Result: 499999500000 in 210 ms.
    6. Result: 499999500000 in 224 ms.
    7. Result: 499999500000 in 209 ms.
    8. Result: 499999500000 in 198 ms.
    9. Result: 499999500000 in 211 ms.
    10. Result: 499999500000 in 202 ms. Best time 198 ms.
  • Scala Future monad

    Scala Future monad

    Scala (Future monad): 0,2 s (219 ms)

    scala $ sbt run [info] Running Skynet 0. Result: 499999500000 in 1306 ms.

    1. Result: 499999500000 in 238 ms.
    2. Result: 499999500000 in 266 ms.
    3. Result: 499999500000 in 230 ms.
    4. Result: 499999500000 in 221 ms.
    5. Result: 499999500000 in 249 ms.
    6. Result: 499999500000 in 233 ms.
    7. Result: 499999500000 in 219 ms.
    8. Result: 499999500000 in 230 ms.
    9. Result: 499999500000 in 229 ms.
    10. Result: 499999500000 in 222 ms. Best time 219 ms.
  • Concurrent Haskell Version using MVars

    Concurrent Haskell Version using MVars

    This adds a concurrent Haskell Version using MVars.

    On my machine

    • Intel(R) Core(TM) i7-5500U CPU @ 2.40GHz
    • Linux gghf 4.2.0-27-generic #32-Ubuntu SMP Fri Jan 22 04:49:08 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

    this gives me (it runs 10 times, run using make run) while watching Netflix O.o:

    ./skynet +RTS -N8 -H4G -RTS
    Result: 499999500000 in 0.660476s
    Result: 499999500000 in 0.486071s
    Result: 499999500000 in 0.435496s
    Result: 499999500000 in 0.381848s
    Result: 499999500000 in 2.047777s
    Result: 499999500000 in 0.371246s
    Result: 499999500000 in 1.425728s
    Result: 499999500000 in 0.372302s
    Result: 499999500000 in 1.379577s
    Result: 499999500000 in 0.400494s
    

    which is better than the Chan or Async version on my machine.

  • Add C with libmill example.

    Add C with libmill example.

    This isn't much more than quite interesting at the moment as libmill's current behaviour isn't suitable for this task, it works but not amazingly, it will be more interesting to see it libmill can and should be modified to make this work well.

    I put some info in c_libmill/skynet.c to explain the situation further.

  • Faster dotnetcore, add parallel, better timing

    Faster dotnetcore, add parallel, better timing

    Added Parallel, Single threaded Async; multithreaded Async and ValueTask Async

    If you use the Release config

    dotnet run --configuration Release
    

    or compile it in release mode...

    dotnet compile --configuration Release -o bin
    bin\dnx
    

    The results are (Win10 on iMac Intel i5 @ 2.9GHz)

    Arch 64 bit - Cores 4
    499999500000
    1 Thread - Sync: 5.756ms
    499999500000
    1 Thread - Async: 146.579ms
    499999500000
    1 Thread - ValueTask Async: 16.917ms
    499999500000
    Parallel Async: 86.191ms
    499999500000
    Parallel - ValueTask Async: 4.988ms
    499999500000
    Parallel Sync: 1.661ms
    

    Debug mode matches the current results

    Arch 64 bit - Cores 4
    499999500000
    1 Thread - Sync: 42.653ms
    499999500000
    1 Thread - Async: 186.333ms
    499999500000
    1 Thread - ValueTask Async: 64.792ms
    499999500000
    Parallel Async: 84.977ms
    499999500000
    Parallel - ValueTask Async: 19.903ms
    499999500000
    Parallel Sync: 13.490ms
    
  • Crystal version doesn't compile with 1.4.1

    Crystal version doesn't compile with 1.4.1

    As in the subject the Crystal version doesn't compile with version 1.4.1, something has changed. This is how it should be:

    `def skynet(c, num, size, div) if size == 1 c.send num else rc = Channel(Int64|Float64).new sum = 0_i64 div.times do |i| sub_num = num + i*(size/div) spawn skynet(rc, sub_num, size/div, div) end div.times do sum += rc.receive end c.send sum end end

    c = Channel(Int64|Float64).new start_time = Time.local spawn skynet(c, 0_i64, 1_000_000, 10) result = c.receive end_time = Time.local puts "Result: #{result} in #{(end_time - start_time).total_milliseconds.to_i}ms."`

    Note for linux users, the number of allowed threads have to be raised with

    sysctl -w vm.max_map_count=10000000

  • Haskell Stack swallows RTS flags

    Haskell Stack swallows RTS flags

    I ran into an issue when I tried to get stack to generate .prof files with the +RTS flags. It seems like the stack/ghc on windows swallows the +RTS flags meant for the application! This bug means the windows code never even got the -N flag and wasn't even parallelized. One workaround is to run the benchmarks against the executable directly without calling stack (somewhere in .stack-work\install\xxxxxxx\bin) until this is resolved.

  • Can't compile `rust-coroutine` with Rust 1.8.0

    Can't compile `rust-coroutine` with Rust 1.8.0

    With Rust 1.8.0 installed by Linuxbrew on Linux Mint 17.4 Rosa x64 (based on Trusty), when running cargo build --release:

    /home/fabio/.cargo/registry/src/github.com-88ac128001ac3a9a/fs2-0.2.2/src/unix.rs:63:67: 63:79 error: mismatched types:
     expected `i64`,
        found `u64` [E0308]
    /home/fabio/.cargo/registry/src/github.com-88ac128001ac3a9a/fs2-0.2.2/src/unix.rs:63     let ret = unsafe { libc::posix_fallocate(file.as_raw_fd(), 0, len as off_t) };
                                                                                                                                                           ^~~~~~~~~~~~
    

    Sorry, not knowledgeable enough in Rust to open a PR but wanted to give it a try nevertheless.

Related tags
httpx is a fast and multi-purpose HTTP toolkit allows to run multiple probers using retryablehttp library, it is designed to maintain the result reliability with increased threads.
httpx is a fast and multi-purpose HTTP toolkit allows to run multiple probers using retryablehttp library, it is designed to maintain the result reliability with increased threads.

Features • Installation • Usage • Running httpx • Notes • Join Discord httpx is a fast and multi-purpose HTTP toolkit allow to run multiple probers us

Jan 8, 2023
Small utility package for stealing tokens from other processes and using them in current threads, or duplicating them and starting other processes

getsystem small utility for impersonating a user in the current thread or starting a new process with a duplicated token. must already be in a high in

Dec 24, 2022
Connect Discourse threads to Matterbridge

Matterbabble Matterbabble is an API client for Discourse and Matterbridge. It mirrors Discourse posts in a topic to Matterbridge messages on a gateway

Dec 7, 2022