Sequence-based Go-native audio mixer for music apps

Last update: Dec 1, 2022

Comments: 14

Mix

Sequence-based Go-native audio mixer for music apps

See demo/demo.go:

package main

import (
  "fmt"
  "os"
  "time"
  
  "github.com/go-mix/mix"
  "github.com/go-mix/mix/bind"
)

var (
  sampleHz   = float64(48000)
  spec = bind.AudioSpec{
    Freq:     sampleHz,
    Format:   bind.AudioF32,
    Channels: 2,
    }
  bpm        = 120
  step       = time.Minute / time.Duration(bpm*4)
  loops      = 16
  prefix     = "sound/808/"
  kick1      = "kick1.wav"
  kick2      = "kick2.wav"
  marac      = "maracas.wav"
  snare      = "snare.wav"
  hitom      = "hightom.wav"
  clhat      = "cl_hihat.wav"
  pattern    = []string{
    kick2,
    marac,
    clhat,
    marac,
    snare,
    marac,
    clhat,
    kick2,
    marac,
    marac,
    hitom,
    marac,
    snare,
    kick1,
    clhat,
    marac,
  }
)

func main() {
  defer mix.Teardown()    
  
  mix.Debug(true)
  mix.Configure(spec)
  mix.SetSoundsPath(prefix)
  mix.StartAt(time.Now().Add(1 * time.Second))

  t := 2 * time.Second // padding before music
  for n := 0; n < loops; n++ {
    for s := 0; s < len(pattern); s++ {
      mix.SetFire(pattern[s], t+time.Duration(s)*step, 0, 1.0, 0)
    }
    t += time.Duration(len(pattern)) * step
  }

  fmt.Printf("Mix, pid:%v, spec:%v\n", os.Getpid(), spec)
  for mix.FireCount() > 0 {
    time.Sleep(1 * time.Second)
  }
}

Play this Demo from the root of the project, with no actual audio playback:

make demo

Or export WAV via stdout > demo/output.wav:

make demo.wav

Credit

Charney Kaye

XJ Music Inc.

What?

Game audio mixers are designed to play audio spontaneously, but when the timing is known in advance (e.g. sequence-based music apps) there is a demand for much greater accuracy in playback timing.

Read the API documentation at godoc.org/github.com/go-mix/mix

Mix seeks to solve the problem of audio mixing for the purpose of the playback of sequences where audio files and their playback timing is known in advance.

Mix stores and mixes audio in native Go []float64 and natively implements Paul Vögler's "Loudness Normalization by Logarithmic Dynamic Range Compression" (details below)

Best efforts will be made to preserve each API version in a release tag that can be parsed, e.g. github.com/go-mix/mix

Why?

Even after selecting a hardware interface library such as PortAudio or C++ SDL 2.0, there remains a critical design problem to be solved.

This design is a music application mixer. Most available options are geared towards Game development.

Game audio mixers offer playback timing accuracy +/- 2 milliseconds. But that's totally unacceptable for music, specifically sequence-based sample playback.

The design pattern particular to Game design is that the timing of the audio is not know in advance- the timing that really matters is that which is assembled in near-real-time in response to user interaction.

In the field of Music development, often the timing is known in advance, e.g. a sequencer, the composition of music by specifying exactly how, when and which audio files will be played relative to the beginning of playback.

Ergo, mix seeks to solve the problem of audio mixing for the purpose of the playback of sequences where audio files and their playback timing is known in advance. It seeks to do this with the absolute minimal logical overhead on top of the audio interface.

Mix takes maximum advantage of Go by storing and mixing audio in native Go []float64 and natively implementing Paul Vögler's "Loudness Normalization by Logarithmic Dynamic Range Compression"

Time

To the Mix API, time is specified as a time.Duration-since-epoch, where the epoch is the moment that mix.Start() was called.

Internally, time is tracked as samples-since-epoch at the master out playback frequency (e.g. 48000 Hz). This is most efficient because source audio is pre-converted to the master out playback frequency, and all audio maths are performed in terms of samples.

The Mixing Algorithm

Inspired by the theory paper "Mixing two digital audio streams with on the fly Loudness Normalization by Logarithmic Dynamic Range Compression" by Paul Vögler, 2012-04-20. A .PDF has been included here, from the paper originally published here.

Usage

There's a demo implementation of mix included in the demo/ folder in this repository. Run it using the defaults:

cd demo && go get && go run demo.go

Or specify options, e.g. using WAV bytes to stdout for playback (piped to system native aplay)

go run demo.go --out wav | aplay

To show the help screen:

go run demo.go --help

Owner

https://github.com/go-mix/mix https://gopkg.in/mix.v0

Comments

Proposal: let the client specify an `io.Writer` instead of assuming `os.Stdout`

Hello, this looks like a very cool library. I'm excited to jump in and play with it.

Would it be a good idea to let the client choose their own io.Writer for audio data to be written to. It would be nice to let the client flexibly stream over tcp / write to file / pipe to another process all using the same underlying mixing implementation with different writers.

It does seem like it would require a change to the public interface, so that's something to consider.

Thoughts?
Allow output to io.Writer and implement teardown

Allows specifying the location to write data to (write to any io.Writer).

Implements Teardown in mix to reset fires and outputToDur to allow subsequent outputs to work.
SetFire() blocks process.
Tested on Ubuntu with output to SDL:

500 sounds are queued to ontomix

During playback, 500 more sounds are queued.

Expect that playback with be uninterrupted. Actually, playback goes null momentarily. Larger amounts of sounds during step 2 result in longer playback gaps.
"Get More Fires" Callback function in API
The use case is a Music app that implements ontomix for

indefinitely long real-time playback, or

output to file (likely to happen faster-then-real-time)

The Music app will call the new Ontomix API method, SetNextFiresCallback and pass in a func() which in turn lets the Music app know that Ontomix is ready for more fires.

The Music app will call new Ontomix API method, SetNextFiresBufferDuration in order to configure the amount of time that Ontomix considers enough "buffer" before sending the next callback message.
Is the PortAudio playback binding unstable?

Early testing on Ubuntu 14 has crashed often. Relates to #17

So far, I've discovered no mis-implementation in the portaudio bindings. Comments welcome.

ALSA #21 and SDL #22 enable more options for playback bindings, to choose the most stable option for any given system.
Add go-sox loader interface

Use github.com/krig/go-sox package to use formats supported by sox for input. Unfortunately it requires linking to libsox via cgo, which is quite heavy dependency.
Mix cycle includes garbage collection of unused sources.
Each mix cycle:

Make a new empty map[string]*Source, e.g. keepSources

While iterating over the ready & active fires (see issues #11 and #18; implemented as of pull #29) copy any used *Source to the new keepSources

Replace the mixSources with keepSources
Deeper native implementation of go-riff with custom WAV binding

Researching available options, I'm disappointed with what's available for WAV file decoding (specifically, integer vs. float files and parsing extended metadata) therefore I'm rolling my own wav binding on top of youpy/go-riff
Is it better to track mix-output frequency as int or float64?
Currently the mix-output frequency is tracked as float64.

There is debate whether a "sample rate" is always an integer. I could make an argument for using int instead:

Assuming the mixing frequency will be:

greater than zero

in the hundreds of thousands, at the largest.

But perhaps it's better to leave it as-is, a float64, because it does function nominally at present (if for whatever reason it is desirable to specify an extremely precise velocity of samples-per-second)
Deprecate hardware playback functionality for an agnostic next-bytes-provider function
The concept of hardware playback is ultimately outside of the focus of this project.

Depending on how a developer wants to implement this library in their project, they might:

Pipe the stdout bytes to a .WAV file, or system-native aplay

Implement hardware playback using the library of their choice, and retrieve bytes on callback from an agnostic next-bytes-provider function
Audio time-scale/pitch modification

From https://en.wikipedia.org/wiki/Audio_time-scale/pitch_modification

Time stretching is the process of changing the speed or duration of an audio signal without affecting its pitch. Pitch scaling or pitch shifting is the opposite: the process of changing the pitch without affecting the speed. Similar methods can change speed, pitch, or both at once, in a time-varying way.

These processes are used, for instance, to match the pitches and tempos of two pre-recorded clips for mixing when the clips cannot be reperformed or resampled. (A drum track containing no pitched instruments could be moderately resampled for tempo without adverse effects, but a pitched track could not). They are also used to create effects such as increasing the range of an instrument (like pitch shifting a guitar down an octave).
Concurrent processor usage for mixing
Especially in a situation where the output is being written directly to stdout, it would be optimal to be able to distribute the load of mathematical operations to multiple processors.

Examine the sample.OutNext() binding:

// OutNext to mix the next sample for all channels, in []float64 func OutNext() []Value { return outNextCallback() }

If we assign some number n to represent the number of concurrent mix processes, then usage of this sample.OutNext() could be batched such that instead of retrieving one sample at a time, the samples are retrieved in a single map-reduce sweep of n goroutines performing the math. Ergo, the final function signature might look more like:

// OutNextConcurrent() to concurrently mix the next samples for all channels, in []float64 func OutNextConcurrent() []Sample { // TODO: map-reduce goroutines of n mix-samples }
Evaluate and write tests around WAV binding reading & writing
Is it enough to support only 8- and 16- bit integer, 32- and 64- bit float audio?

Is WAV reading correctly for signed vs. unsigned integer audio?

Is WAV writing correctly for signed vs. unsigned integer audio?
Write tests to ensure respect for implicit panning of source channels, e.g. 2 channels = stereo L/R

2-channel source audio playing through a 2-channel output does in fact successfully carry the stereo assumption from source to output However, this issue will officially persist until there are at least tests to prove otherwise.

In mix.go, func mixNextSample() []float64 returns one sample per channel- hence the return array of samples []float64 is "one" sample with multiple channels.

It will be necessary for the mixer to respect the implied panning of certain channel layouts, e.g. "Stereo" = 2 channels = channel 1 pan left + channel 2 pan right

For now, the only implied panning assumption we will implement is Stereo.

In the future, it's conceivable that more complex panning assumptions could be implemented, e.g. "Surround Sound"

Related tags

Audio & Music mix

Sequence-based Go-native audio mixer for music apps

Mix

Sequence-based Go-native audio mixer for music apps

Credit

What?

Why?

Time

The Mixing Algorithm

Usage

Owner

Comments

Proposal: let the client specify an `io.Writer` instead of assuming `os.Stdout`

Allow output to io.Writer and implement teardown

SetFire() blocks process.

"Get More Fires" Callback function in API

Is the PortAudio playback binding unstable?

Add go-sox loader interface

Mix cycle includes garbage collection of unused sources.

Deeper native implementation of go-riff with custom WAV binding

Is it better to track mix-output frequency as int or float64?

Deprecate hardware playback functionality for an agnostic next-bytes-provider function

Audio time-scale/pitch modification

Concurrent processor usage for mixing

Evaluate and write tests around WAV binding reading & writing

Write tests to ensure respect for implicit panning of source channels, e.g. 2 channels = stereo L/R

Related tags

Package flac provides access to FLAC (Free Lossless Audio Codec) streams.

GAAD (Go Advanced Audio Decoder)

Go tools for audio processing & creation ?

Mini audio library

Go bindings for the PortAudio audio I/O library

Go package capable of generating waveform images from audio streams. MIT Licensed.

CLI audio player written in go.

alto is a program built for audio management.

Terrible Audio Downloader

Audio visualizer in Go

Unlock Music Project - CLI Edition

Go library for searching on YouTube Music.

A music programming language for musicians. :notes:

Self-hosted music streaming server 🎶 with RESTful API and Web interface

Download and listen music in the terminal!

A tool coded by GO to decode cryptoed netease music files and qqmusic files

Small application to convert my music library folder structure to 'crates' in the open-source DJ software Mixxx

Gomu is intuitive, powerful CLI music player.

Kwed-dl - A tool to download latest music files from remix.kwed.org