Community contributions have been a staple of this open source project since its inception. However, as balenaSound grew in features it also grew in terms of complexity. It's currently a multi-container application with four core services and as many plugin services. This documentation section aims to provide an overview of balenaSound's architecture, with the intention of lowering the barrier to entry for folks out there wanting to contribute. If you are interested in contributing and after reading this guide you still have questions please reach out and we'll gladly help.
balenaSound services can be divided in three groups:
This is the heart of balenaSound as it contains the most important services:
audio block is a balena block that provides an easy way to work with audio applications in containerized environments such as balenaOS. You can read more about it here. In a nutshell, the
audio block is the main "audio router". It connects to all audio sources and sinks and handles audio routing, which will change depending on the mode of operation (multi-room vs standalone), the output interface selected (onboard audio, HDMI, DAC, USB soundcard), etc. The
audio block allows you to build complex audio applications such as balenaSound without having to deep dive into ALSA or PulseAudio configuration. One of the key features for balenaSound is that it allows us to define input and output audio layers and then do all the complex audio routing without knowing/caring about where the audio is being generated or where it should go to. The
audio routing section belows covers this process in detail.
sound-supervisor, as its name suggests, is the service that orchestrates all the others. It's not really involved in the audio routing but it does a few key things that enable the other services to be simpler. Here are some of the most important features of the
sound-supervisorensures that all devices on the same local network agree on which is the
masterdevice. To achieve this,
sound-supervisorservices on different devices exchange event messages constantly.
sound-supervisorand not on environment variables. At this moment, all of the services support this behaviour but their configuration is mostly static: you set it at startup via environment variables and that's it. However, there are experimental endpoints in the API to update configuration values and all of the services support it already. There's even a secret UI that allows for some configuration changes at runtime, it's located at
Multi-room services provide multiroom capabilities to balenaSound.
This service runs a Snapcast server which is responsible for broadcasting (and syncing) audio from the
audio service into Snapcast clients. Clients can be running on the same device or on separate devices.
Runs the client version of Snapcast. It needs to connect to a Snapcast server (can be a separate device) to receive audio packets. It will then forward the audio back into the
Plugins are the audio sources that generate the audio to be streamed/played (e.g. Spotify). Refer to the plugins section below for pointers on how to add new plugins.
Audio routing is the most cruicial part of balenaSound, and it also changes significantly depending on what the current configuration is with the biggest change being the mode of operation (multi-room vs standalone). There are two services controlling the audio routing:
audioblock is the key one as it's the one actually routing audio, so we'll zoom into it in sections below.
sound-supervisoron the other hand, is responsible for changing the routing according to what the current mode is. It will modify how sinks are internally connected depending on the mode of operation.
Note: audio routing relies mainly on routing PulseAudio sinks. Here is an awesome resource on PulseAudio in case you are not familiar with it.
One of the advantages of using the
audio block is that, since it's based on PulseAudio, we can use all the audio processing tools and tricks that are widely available, in this particular case
virtual sinks. PulseAudio clients can send audio to sinks; usually audio soundcards have a sink that represents them, so sending audio to the audio jack sink will result in that audio coming out of the audio jack. Virtual sinks are virtual nodes that can be used to route audio in and out of them.
For balenaSound we use two virtual sinks in order to simplify how audio is being routed:
Creation and configuration scripts for these virtual sinks are located at
balena-sound.input acts as an input audio multiplexer/mixer. It's the default sink on balenaSound, so all plugins that send audio to the
audio block will send it to this sink by default. This allows us to route audio internally without worrying where it came from: any audio generated by a plugin will pass through the
balena-sound.input sink, so by controlling where it sends it's audio we are effectively controlling all plugins at the same time.
balena-sound.output on the other hand is the output audio multiplexer/mixer. This one is pretty useful in scenarios where there are multiple soundcards available (onboard, DAC, USB, etc).
balena-sound.output is always wired to whatever the desired soundcard sink is. So even if we dynamically change the output selection, sending audio to
balena-sound.output will always result in audio going to the current selection. Again, this is useful to route audio internally without worrying about user selection at runtime.
Standalone mode is easy to understand. You just pipe
balena-sound.output and that's it. Audio coming in from any plugin will find it's way to the selected output. If this was the only mode, we could simplify the setup and use a single sink. Having the two layers however is important for the next mode which is more complicated.
Multiroom feature relies on
snapcast to broadcast the audio to multiple devices. Snapcast has two binaries working alonside, server and client.
Snapcast server expects audio to be written into a FIFO file, so we create an additional sink (
snapcast sink) that routes audio from
balena-sound.input into said FIFO file. The server will then read the file and use TCP packets to broadcast audio to all clients that are connected to it, wether they run in the same device or others. Note that when writting into the FIFO file the audio is "exiting" the
audio block and no longer under PulseAudio's control.
Snapcast client receives the audio from the server and sends it back into the
audio block, in particular to
balena-sound.output sink which will in turn send the audio to whatever output was selected by the user.
This setup allows us to decouple the multiroom feature from the
audio block while retaining it's advantages.
As described above, plugins are the services generating the audio to be streamed/played. Plugins are responsible for sending the audio into the
audio block, particularily into
balena-sound.input sink. There are two alternatives for how this can be acomplished. A detailed explanation can be found here, in our case:
Most audio applications support using PulseAudio as an audio backend. This means the application was coded to allow sending audio directly to PulseAudio (and hence the
audio block). This is usually configurable via a CLI option flag or configuration files. You should check your application's documentation and figure out if this is the case.
If the application supports PulseAudio backend, the only configuration you need is to specify where the PulseAudio server can be located. This can be done by setting the
PULSE_SERVER environment variable, we recommend doing it in the
ALSA bridge If your application does not have built-in PulseAudio support, you can create a bridge to it by using ALSA. This can't be added in easily, so we wrote a little script that will do the work for you:
ENV PULSE_SERVER=tcp:localhost:4317 RUN curl -sL https://raw.githubusercontent.com/balenablocks/audio/master/scripts/alsa-bridge/debian-setup.sh | sh
Note that you still need to set the