Tuesday, November 15, 2022

Long Live 16 Color Terminals

This blog entry is about the process I went through while designing my own 16 color terminal scheme, as an improvement to "Solarized light". Since I invested some time in it, I decided that I want to document it somewhere, just in case if later I will need to go back and revisit things.

What Is This All About

I need to make some introduction into terminals to ensure that I'm on the same page with readers. Terminals were one of the first ways to establish truly interactive communication between people and computers. You type a command, and the computer prints the result, or vice versa—the computer asks you "do you really want to delete this file?", and you type "y" or "n". First terminals were sort of electric typewriters—noisy and slow, thus the conversation between computers and humans was really terse. However, even then interactive text editors had become technically feasible, take a look at the "standard text editor" of UNIX—ed. Later, so called "glass terminals" (CRT monitors with keyboards) arrived, giving an opportunity to more "visual" and thus more productive interaction, and the "Editor war" had begun.

And basically, these visual terminals is what is still being emulated by all UNIX derivatives these days: the "text mode" of Linux, XTerm program, macOS Terminal app, countless 3rd party terminals, even browser-based terminals—these can run on any desktop OS. In fact, I use hterm for hosting my editor where I'm preparing this text.

As the terminal technology was evolving over time, it was becoming more sophisticated. Capabilities of teletype terminals were very basic: print a character, move the caret left or right, go to next line. "Glass terminals" enabled arbitrary cursor positioning, and then with the advent of new hardware technologies, color was added. Having that the evolution of hardware was taking time, color capabilities were developing in steps: monochrome, 8 colors, 16 colors, 256 colors, and finally these days—"truecolor" (8-bit color). Despite all the crowd excitement about the latter, I believe that "less is more," and the use a restricted color set in text-based programs still has some benefits.

Before I go into details, one thing that I would like to clarify is the difference between the number of colors that are available to programs running under a terminal emulator (console utilities, editors with text UI, etc), and the number of colors used by the terminal emulator program itself. The terminal program is in fact only limited by the color capabilities of the display. Even when a console utility outputs monochrome text, the terminal emulator can still use full color display capabilities for implementing nice-looking rendering of fonts and for displaying semi-opaque overlays—the cursor being the simplest example. Thus, setting the terminal to the 16-color mode does not mean we get back to 1980-s in terms of the quality of the picture. And unless one runs console programs that, for example attempt to display full color images using pseudo-graphic characters, or want to use gradient backgrounds, it might get unnoticed that a 16-color terminal mode is in fact being used.

Getting Solarized

I remember the trend popular among computer users of avoiding being exposed to the blue light from computer displays—maybe it is still a thing?—even Apple products offer the "Night shift" feature. Users of non-Apple products got themselves yellow-tinted "computer" glasses or were following advises to turn down the blue component in color settings of their monitors. The resulting image looks more like a page of a printed book when reading outdoors (if tuning is done sensibly, not to the point when white color becomes bright yellow), and probably makes less strain on the eyes. The same result on a terminal emulator can be achieved without any hardware tweaks by applying a popular 16-color theme called "Solarized light" by Ethan Schoonover.

I remember being hooked on the "signature" yellowish background color of this theme (glancing over shoulders of my colleagues, a lot of people are). I never liked the "dark" version because it does not look like a paper page at all. So I was setting up all my terminal emulators to use "Solarized light", and was quite happy about the result.

However, at some point I noticed that color-themed code in my Emacs editor—I run it in "non-windowed", that is, text mode under the aforementioned hterm terminal emulator—does not look like screenshots on the Ethan's page. Instead, C++ code, for example, looked like this (using some code from the Internet as an example):

I started digging down for the cause of that and discovered that every "mature" mode of Emacs basically declares 4 different color schemes: for use with 16-color terminals, and for 256- (actually, >88) color terminals, both having a version for dark and light terminal background. Sometimes, a scheme specific to 8-color terminals is also added. Below is an example from font-lock.el:

(defface font-lock-function-name-face
  '((((class color) (min-colors 88) (background light)) :foreground "Blue1")
    (((class color) (min-colors 88) (background dark))  :foreground "LightSkyBlue")
    (((class color) (min-colors 16) (background light)) :foreground "Blue")
    (((class color) (min-colors 16) (background dark))  :foreground "LightSkyBlue")
    (((class color) (min-colors 8)) :foreground "blue" :weight bold)
    (t :inverse-video t :weight bold))
  "Font Lock mode face used to highlight function names."
  :group 'font-lock-faces)

Thus, in reality in my C++ example display only foreground and background text colors are originating from the Solarized theme, and all other colors are coming from the 256-color scheme of the Emacs C++ mode. Names of colors used in this case (like "LightSkyBlue" above) come from the "X11 palette", and there are many gradations and tints to choose from for every basic color.

In fact, this is one of the drawbacks of the 256- and true-color modes (in my opinion, of course)—apps have too much control over colors, and this leads to inconsistency. For me, too much effort would be required to go over all Emacs modes that I use and ensure that their use of colors is mutually consistent. Whereas, in the 16-color mode not only apps have to use a restricted set of colors, but the set itself is in fact a terminal-controlled palette. Thus, the app only specifies the name of the color it wants to use, for example "red", and then the terminal setup defines which exact tint of red to use. So, one day I switched my terminal to only allow 16 colors, and restarted Emacs...

...And I did not like the result at all! Yes, now I could see I'm indeed using the palette of the "Solarized light" theme, but the result looks quite bleak. I took another look at the screenshots on the Ethan's page and realized that to me the colors of the Solarized palette look more engaging on a dark background. I read that the point of Ethan's design was to allow switching between dark and light backgrounds with a minimal reshuffling of colors, and still having "identical readability." However, to my eyes "readability" wasn't the same as "looking attractive."

As I tried using the Solarized light palette for doing may usual tasks in Emacs, I found that it has a couple more shortcomings. Let's look at the palette:

One thing that bugged me is that the orange color does not look much different from red. I can see that even with color blocks, and with text the similarity goes up to the point that when I was looking at text colored in orange, I could not stop myself from perceiving it as being in red. People are not very good at recalling how "absolute" colors look, we are much better at comparing them when they are side by side.

Another serious problem was the lack of "background" colors enough for my text highlighting needs. I'm not sure about Vim users, but in Emacs I have a lot of uses for background highlights. I can enumerate them:

  • highlighting the current line;
  • text selection;
  • the current match of interactive search;
  • all other matches of interactive search;
  • character differences in diffs (highlighted over line differences);
  • highlighting of "ours", "theirs", and patch changes in a 3-way merge;
  • and so on.

Most of those highlights must have a color on their own so they don't hide each other when I combine them, and they must not make any of text colors unreadable due to a poor contrast. As an example, if I have a colorized source code, and I'm selecting text, I still should be able to see clearly every symbol of it. This is where the Solarized palette falls short, and I can easily explain why.

Color Engineering

One of the defining features of the Solarized palette is that it was created using the Lab color space. Previously, 16 color palettes were usually assigned colors based on the mapping of the color number in the binary form: from 0000 to 1111, onto a tuple of (intensity, red, green, blue) bits, not caring too much about how the resulting colors look to users. Whereas, the Lab color space is modeled after human perception of color, and can help in achieving results which are more consistent and thus more aesthetically pleasing.

The first number in the Lab triad is the luminosity of the color. Let's look at the "official" palette definition in this model:

--------- ----------
base03    15 -12 -12
base02    20 -12 -12
base01    45 -07 -07
base00    50 -07 -07
base0     60 -06 -03
base1     65 -05 -02
base2     92 -00  10
base3     97  00  10
yellow    60  10  65
orange    50  50  55
red       50  65  45
magenta   50  65 -05
violet    50  15 -45
blue      55 -10 -45
cyan      60 -35 -05
green     60 -20  65

We can see that most colors have luminosity in the range between 45 and 65, with only two of them having either low luminosity: base03 and base02, or high luminosity: base2 and base3. Thus, these colors are the only ones that can serve as backgrounds that work with any text color. Having that one of those 4 background colors is the actual background, only 3 remain—certainly not enough for my use case.

After considering these shortcomings, I decided to tweak the "Solarized light" palette to better suit my needs. Below is the list of my goals:

  1. Use colors that look more vivid with a light background.
  2. Make sure that no two colors look alike when used for text.
  3. Provide more background colors.

And the list of my non-goals, compared to Ethan's goals:

  1. No need to use text colors with a dark background.
  2. Can consider bold text as yet another text color.

In the design process I also used the Lab color space. Thanks to the non-goal 1., I was able to lower the minimum luminosity down to 35. I made some of the colors more vivid by increasing color intensities—as a starting point I took some of the colors used by the 256 color scheme of the C++ mode in Emacs.

In order to make the orange color to be visually different from red, I created a gradient between red and yellow, and picked up the orange tint which I was seeing as "dividing" between those two, in order to guarantee that it is the most distant tint from both red and yellow.

I decided to reduce the number of gray shades in the palette. For non-colored text, I planned to use the following monotones:

  • base1 for darker than normal text;
  • base2 for lighter than normal, less readable text, I moved it to the "bright black" position in the palette;
  • bold normal text for emphasis.

And here comes a hack! I moved the normal text color (base00 in the "Solarized light" theme) out of the palette and made it the "text color" of the terminal. Remember when I said that the terminal emulator program does not have to restrict itself to 16 colors. Most contemporary terminal emulators allow to define at least 3 additional colors which do not have to coincide with any of the colors from the primary palette: the text color, the background color, and the cursor color. The first two are used "by default" when the program running in the terminal does not make any explicit text color choice. Also, any program that does use colors can always reset the text color to this default.

Let's pause for a moment and do some accounting for colors that I have already defined in the 16 color palette:

  • 2 text colors (plus the text color in the terminal app);
  • 8 accent colors: red, orange, yellow, green, cyan, blue, magenta, and violet;
  • 1 background color (this is used when one needs to print text on a dark background, without dealing with "reverse" text attribute which usually looks like a disaster).

Thus we have 16 - 11 = 5, which means there are 5 color slots left for highlights, that's 2 more than in the "Solarized light" theme, and they are real colors, not shades of gray! Since I removed or moved away the shades of gray used by the original Colorized palette, I placed the highlights where grays used to be, as "bright" versions of corresponding colors.

When choosing color values for the highlights, I deliberately made them very bright (high value of luminosity) to make a good contrast will any color used for text. One difficulty with very bright colors is making them visually distinctive, to avoid confusing "light cyan" with "light gray" for example.

This is the palette I ended up with, and it's comparison with "Solarized light" (on the left):

And below is the comparison of the Lab values, along with a "web color" RGB triplet. Compared to the initial table I took from the Solarized page, I have rearranged colors in the palette order:

-------------- --------- ----------  ----------  -------
Black          base02    20 -12 -12  20 -12 -12  #043642
Red            red       50  65  45  40  55  40  #b12621
Green          green     60 -20  65  50 -45  40  #1f892b
Yellow         yellow    60  10  65  60  10  65  #b68900
Blue           blue      55 -10 -45  55 -10 -45  #268bd2
Magenta        magenta   50  65 -05  35  50 -05  #94245c
Cyan           cyan      60 -35 -05  55 -35  00  #249482
Light Gray     base2     92  00  10  92  00  10  #eee8d6
Bright Black   base03    15 -12 -12  65 -05 -02  #93a1a1
Bright Red     orange    50  50  55  55  35  50  #c7692b
Bright Green   base01    45 -07 -07  96 -10  25  #eef9c2
Bright Yellow  base00    50 -07 -07  94  03  20  #ffeac7
Bright Blue    base0     60 -06 -03  90  00 -05  #dfe3ec
Bright Magenta violet    50  15 -45  40  20 -65  #3657cb
Bright Cyan    base1     65 -05 -02  94 -08  00  #ddf3ed
White          base3     97  00  10  97  00  10  #fdf6e4
Text Color                           50 -07 -07  #657b83
Cursor Color                     Bright Magenta, opacity 40%

(Note that even for colors that retain their Lab values from Solarized, I may have provided slightly different RGB values compared to those you can find on Ethan's page. This could be due to small discrepancies in color profiles used for conversion, and unlikely produce noticeable differences.)

Compared to the "Solarized light" palette, I have redefined 6 accent colors, and thrown away 2 "base" colors. I decided to name my palette "Colorized," both as a nod to "Solarized" which it is based on, and as a reference to the fact that it looks more colorful than its parent.

Emacs Customizations

Besides defining my own palette, I also had to make some tweaks in Emacs in order to use it to full extent. While customizing colors of the C/C++ mode, I made it visually similar to the 256 color scheme I was using before, but more well-tempered:

Shell Mode

It's a well known trick to enable interpretation of ANSI escape sequences for setting color in the "shell" mode of Emacs:

(require 'ansi-color)
(add-hook 'shell-mode-hook 'ansi-color-for-comint-mode-on)

What is less known is that we can then properly advertise this ability to terminal applications via the TERM variable by setting it to dumb-emacs-ansi. This is a valid termcap / terminfo entry, you can find it in the official terminfo source from the ncurses package.

Besides that, it's also possible to map these ANSI color sequences to terminal colors arbitrarily. For example, I mapped "bright" colors onto bold text. This comes handy both for the original Solarized palette and my Colorized one because "bright" colors in it are in reality not bright versions of the first 8 colors, thus when apps try using them the resulting output looks unreadable.

The full list of Emacs customizations is in this setup file. It's awesome that when using it, I naturally forget that only 16 colors (OK, to be fair, 17, if you recall the terminal text color hack) are used. This way, I have proven to myself that use of "true color", or even 256 color terminal is not required for achieving good looks of terminal applications.


Big kudos to Ethan Schoonover for creating the original Solarized theme and explaining the rationale behind it. The theme is minimalist yet attractive, and proves that it's possible to achieve more with less.

Monday, September 5, 2022

MOTU: Multichannel Volume Control

Going beyond simple 2-channel volume control still presents a challenge, unfortunately. The traditional design is to view a multichannel audio interface as a group of multiple stereo outputs, and provide an independent volume control for each group, without an option to "gang" multiple outputs together. True multichannel output devices are normally associated with A/V playback, and indeed modern AVRs do present flexible options for controlling the volume, including support even for active crossovers. For example, the Marantz AV7704 which I was using for some time has this option. However, AVRs usually have a large footprint in terms of consumed space.

Computer-based solutions are even more flexible, and in recent years come in compact forms and fanless cases, making them a more attractive alternative to AVRs. I was using a PC running AcourateConvolver also for a long time. I didn't mind that it applies attenuation in the digital domain, because it does that correctly, with a proper dithering. However, Windows does not appear as a hassle-free platform to me, because it always unceremoniously wants to update itself and restart exactly when you don't want it to do so.

After the Windows computer which I was using for AcourateConvolver broke, the solution I had switched to was RCU-VCA6A by RDL (Radio Design Labs), which seems to work reliably, does not want to update itself, and does not introduce any audible degradation into the audio path (at least, compared to the VCA built into the QSC amplifier). But still, it's an extra analog unit, could we get rid of it?

Turns out, the solution was right there all the time since I started my audio electronics hobby. It can be trivially done using ingenious control software of "MOTU Pro Audio" line of audio interfaces, which includes my MOTU Ultralite AVB. Interestingly, the first thing that I had noticed when I bought this audio interface is that the control app runs in the browser, talking to the web server running on the card. This was in a sharp contrast to the traditional approach to install native apps on the host Mac or PC.

What I failed to realize, though, is that the control app actually has two layers. There is the visible UI part, but also the invisible server which provides a tree-like list of the audio card resources. Thus, in order to automate anything related to the audio interface management there is no need to work "through" the web app, it's allowed to talk to the server part directly. This is great, because any automation which tries to manipulate the UI is fragile by definition.

I've found the reference manual for MOTU's AVB Datastore API, which is indeed very simple, and any CLI web client can work with it. Another useful fact that I have discovered by accident is that although the datastore server reports the range of accepted values for the trim level of analog outputs 1–6 being only from -24 dB to 0 dB, it happily accepts lower values, thus the effective trim level of all analog outputs is the same, going down to -127 dB.

I decided to re-purpose some of the physical controls on the card itself to serve my needs. Since I have a dedicated headphone amplifier, I never use the phones output, thus the rotary knob and the associated trim level for the phones output can instead be used to control the trim level of the analog outputs. When turning the knob, the current volume level is displayed on the audio interfaces' LCD screen. This is needed because the knob is just a rotary encoder, not a real attenuator, thus it does not have an association between the current level and the knob's position. This fact actually makes it much less comfortable than the VCA volume pot which I've made myself to use with RCU-VCA6A. With the encoder it's convenient to perform small adjustments—a couple of dBs up or down, but turning the volume all the way down requires too many turns. Thus, I also wanted to have a mute button. I decided that since I never have used the second mic input, I can use it's "Pad" button to fulfil this role. The "Pad" button has a state, and it's lit when it's turned on.

In order to implement these ideas, I had to write a script. I decided not to use Python to avoid over-designing the solution, instead I turned to something really simple, stemming from the original UNIX systems of 70-s—a bash script. In fact, this solution is truly portable as the POSIX subsystem is part of any modern operating system, including even Windows (via WSL).

The logic of the script is simple. I use the volume of the phones output as the source value for all analog outputs driving my desktop speakers: outputs 1–5. The volume of the output 6 is used to store the mute level. Thus, in practice it can be a full mute, or it can be just a "dim" (for example, -40 dB). The value of the pad for Mic 2, as I've mentioned before, is used to switch between normal and mute trim levels. This way, the control logic can be described as follows:

  1. Read the current values of the "main" and "mute" trims, and the value of the mute toggle switch.
  2. Depending on whether the device is supposed to be muted according to the switch, swap the values as needed.
  3. Apply the volume to the analog outputs.

Then the script uses the ETag polling technique to ask the server to report back when any of the values have changed as a result of user's action (this is also described in MOTU's manual). Then all goes back to the start.

The full script code is here on GitHub, it's only about 70 lines of code. If needed, this way of controlling the MOTU interface can be extended to be fully remote.

Sunday, August 14, 2022

Modular Audio Processing System, Part III

Finally, after Part I and Part II, we are getting to the last part of my audio system description. First I'll tell a few words about the power unit, and then get into details about iterating on the digital audio path, also presenting some measurments.

The Power Unit

When there is a bunch of hardware units stacked in a rack and each of them requiring a power outlet, a natural desire occurs to have only a single power cord for the entire rack. For a long time a was using a simple metal-housed power strip which I bolted on the side wall of the rack. At some point I decided that I want to provide a more serious level of protection for the equipment. I also wanted the power unit to be implemented in the same half-rack form-factor as the rest of the equipment, and of course I wanted it to have no fans.

Around that time I also learned about the principle with a somewhat spiritually sounding name—the principle of "non-sacrificial" power protection. There is nothing supernatural in it though. Most power filter and protector strips used at home employ electrical elements that are intended to take the impact of an electrical power surge and thus "sacrifice" themselves, protecting the equipment this way. The elements used for this noble role are called "metal oxide varistor" (MOV). The problem is that power protectors never indicate how many MOVs contained in them are still in a good shape, thus it's always a lottery when such a protector will fail, possibly taking down the downstream equipment with it.

Whereas, the power protector I've bought: Middle Atlantic PD-415R-SP was intentionally built using a MOV-free design. Another feature advertised by the manufacturer is EMI filtering between the sockets, which is nice to have, at least in theory, when one has to mix digital and analog equipment and use switched power supplies. I must admit, I partially defeated the last feature by using power socket splitters, because the power unit unfortunately only has 4 sockets, while I have 7 pieces of equipment to power. However, since there is a lot of equipment and wires packed into a compact rack, there is a lot of EMI "flying" around, thus filtering just at connection points is not enough anyway.

To close the topic on the power unit, another its drawback besides not having enough sockets is relatively high price—around $350–$500, depending on the dealer. However, if we divide this price per socket, and compare to the price of equipment it protects, it seems like a reasonable investment.

The Digital Path

Finally, the fun part. My aim was to ensure that practically any digital source of audio could be connected to the input of the DSP. This is because I don't want to limit myself to use of certain streaming services or stick to media software like Roon. I'm a long time Google Play / YouTube Music user, I ripped my audio CDs, I also might want to play something via a browser. In addition, recently I decided to subscribe to Apple Music because they have switched to lossless streaming, and even offer "high resolution" versions for many popular albums—with a monthly fee which is less than a cost of a typical CD this was an easy decision.

In order to be able to use wide variety of software-based audio sources, one needs to use a real computer, or at least a mobile device. Initially I tried using the same Mac Mini which runs my DSP, however the performance of this 8-year old machine is clearly not enough to avoid glitches when running a browser alongside Reaper. Also I realized that I want a device with its own screen and keyboard input, so I can use it while I work. So I took off a shelf an old MacBook Air which I connected directly to the MOTU card by Ethernet via the AVB protocol. After a short period of use it has become obvious that modern browsers can pose a heavy load for any old computer—after half an hour of streaming YouTube Music the MacBook Air was always turning its fan on and ruining the listening experience.

I firmly decided that I need a fanless device, so I restored another "Air" device—this time an iPad Air—it had its screen broken and I worked around this with a help of an adhesive "screen protector" film. Then I started considering options for getting digital audio out the iPad (mine has a 3.5 mm analog output, but...) and I realized there are plenty of ways:

  • AirPlay, which can be used either over wireless or over wired network connection. Since iPad needs to be connected to a charger, the wired option seems to be more appropriate, especially if an Ethernet dongle with PoE (Power-over-Ethernet) is used—just a single wire needs to go into the device!

  • HDMI output via a dongle—since it's an old iPad model, it has a Lightning output, thus use of an Apple-made dongle is preferred.

  • USB output via a different type of dongle. Obviously, this dongle needs a power input, too. Unfortunately, USB audio interfaces that can provide power are less frequent to encounter than I would like.

Let's compare these options more thoroughly.

Comparison of Digital Output Alternatives for iPad


The AirPlay protocol—it's not a secret that it is based on an open RTSP streaming protocol, and once the encryption key that Apple uses was extracted, there are now plenty of open-source clients. I have a Raspberry Pi lying around (naturally, I amassed a lot of old computing devices), and I found the DigiOne SPDIF "hat" for RPi from a company that seems to care about audio quality—Allo.com. Another option is to connect Pi to an USB Audio Class compliant sound card.

I decided to try shairport-sync AirPlay client. After going through its docs I have realized that unlike the AVB protocol, AirPlay does not have a notion of "master clock," which means that the sender and the receiver of audio essentially run "freewheleed." Thus, even if both use the same nominal sampling rate (the AirPlay protocol always uses 44.1 kHz sampling rate), due to difference between the effective sampling rates (for example, the ends up running at 44099 Hz and the receiver at 44101 Hz), frames can be dropped or zero frames needs to be stuffed into the stream, thus glitches are unavoidable without a special precaution. In order to avoid glitches shairport-sync resamples the received audio to the sampling rate of the playback device. The effective sampling rate of the sender is discovered from timestamps sent together with audio data.

After playing with shair-port sync, I must admit that I'm impressed with the efforts of its author Mike Brady to make a software that "just works." However, since I didn't really have to transport audio far away and over wireless networks, I decided that perhaps all this complexity of re-synchronization at the receiver side can be avoided. Another important shortcoming of shairport-sync is that it only supports sampling rates that are multiples of 44.1 kHz, and according to this answer by Mike use of other base rate (that is, 48 kHz) is not possible without a substantial rework of the software.


The second option was to use HDMI output. I dismissed HDMI on the grounds that I only need a stereo output, have no interest in multichannel encoded content. Also, use of HDMI has some additional shortcomings:

  • iOS always uses 48 kHz sampling rate (at least, with the HDMI audio splitter that I have), which enforces resampling at playback time of all the content that Apple Music offers: the majority of albums on Music use 44.1 kHz sampling rate (so far I've only encountered one album in 48 kHz, it was "Waiting for Cousteau" by J.M. Jarre). The "Hi-Res" content uses either 96 kHz or 192 kHz. Note that Apple Music may still display a "Hi-Res Lossless" logo while resampling the "Hi-Res" content down to 48 kHz, which is clearly misleading.

  • No volume control on the digital side. Since iPad assumes it plays on some TV or an AVR which offers its own volume control, it always outputs at digital full scale. This obviously leads to clipping of intersample peaks.

  • iPad bears extra load because it also streams its screen along with the audio signal. The HDMI dongle gets warm pretty fast, too.

Thus, use of HDMI output is not an optimal solution for my scenario. This leaves us with the USB output option via "camera kit."

Dealing with USB Output Reliability Issues

The camera kit has a fat ugly wire which goes into iPad, and needs two incoming wires: one for the USB device, and one for power. I really wanted to hide the dongle inside the equipment rack and started looking for a Lightning extender. This has revealed an interesting fact—there exist no "MFi certified" (that is, approved by Apple) Lightning extenders. The extenders which claim to be "certified" are absent from the Apple's database. Apple does not make them either, nor does Belkin (the only accessories manufacturer which I would trust). Nevertheless, I still tried to use an extender wire which at least was shielded (a lot of extenders sold on Amazon are not even shielded, making them suitable only for charging purposes), and it was mostly working, except when it didn't. From time to time Apple Music was stopping playing in the middle of a song—the playback was still "going" on the screen, but there was no sound until the next song.

Finding the source of instability turned out to be a challenging task. Besides trying different Lightning extender wires I also tried 3 different USB transports: Douk Audio U2 Pro, Xing AF200, and finally RME FireFace. None of them were working reliably, including FireFace, and this was really suspicious, knowing that RME is usually rock-solid. Luckily, RME provides an iOS app which allows checking the state of the audio card mixer, and there I could see that whenever audio stops playing, it actually just stops coming from the software, despite the fact that the software (for example, Apple Music) was happily showing that it is playing. Also, while configuring FireFace to work as a USB Audio Class device, I have read some insightful information in its manual regarding connection to Apple devices. RME strongly recommends connecting the USB dongle to any Apple device directly. Finally, this is how I ended up connecting my dongle, and this has solved the stability problem for all USB transports I used.

I chose Xing as my USB transport because it has a screen which shows the current sample rate, attenuation, and playback state. Also, it offers the best variety of digital outputs, including an AES3 balanced digital output which I connected to the input of the sample rate converter.

Power Sources for iPad and Xing AF200

I need to mention that the difficulty with finding the source of iPad's playback instability was exacerbated by the need to find the right power supply for it and for the USB transport. I thought I could just plug the iPad into any USB power outlet and be done with it. However, life is not so easy. First, iPad is picky about power sources—there are various proprietary charging protocols used by Apple, and apparently iPad has some expectations. Obviously, Apple's own charger is accepted, however I've found that it creates a voltage offset between the ground of the output signal and the power ground.

The voltage offset results from a combination of unwanted AC and DC voltages. The AC part is usually some kind of harmonics of the 60 Hz from the power outlet or oscillations produced by the conversion circuitry. Having an offset (either DC or AC) is undesirable because if the USB transport is powered from the same charger, this offset is propagated to the "ground" wire of any electrical unbalanced output, and this can easily cause instability in reception on the input side.

I tried Anker PowerPort 6 charging unit, however its output offset from the power ground is just enormous, around 37V, and this is clearly problematic. I guess, nobody at Anker was envisioning use of this charger in an AV setup.

Finally, I ended up using one of USB ports on the Mac Mini. iPad had no problem charging itself from this port, and it has no significant voltage offset. However, its output power is limited, and I have to power both the iPad and the USB transport. Luckily, the Xing USB transport can also accept an external DC power input, thus leaving USB connection for data transfer only. Unfortunately, it can not send power to iPad. If only it could do that it would obviate the need for an extra USB power wire.

Since all these wires going back and forth and making loops between devices can easily become sources of noise voltages due to ground potentials difference, when choosing a power supply for Xing AF200, I was looking for something flexible. Thankfully, Allo.com has covered that as well, offering an excellent 5V low-noise power supply called "Nirvana" which offers a "ground lift" switch for the DC output, as well as a ground connector. Thus, one can always configure it in a way which eliminates difference in ground potentials.

Later I found that a powered USB hub by Amazon Basics is also properly engineered to have only a negligible voltage offset on its USB outputs (relative to the power ground), however it's not as flexible as Nirvana SMPS. All in all, the resulting connection scheme looks like this:

Needless to say, after all these adventures I'm not a big fan of consumer digital audio equipment. Although I have found a stable configuration, it still feels a bit fragile, for example, once I tried to use a different USB-A/C cable between the "camera kit" dongle and Xing, and this immediately had broken the stability of playback. Apparently, consumer-grade equipment is not designed for use in complex AV systems.

Mutec MC-6 Sample Rate Converter

A quick note on the company name: please don't confuse "Mutec" with "Mytek"—company with a similarly sounding name which also makes audio equipment.

In the world where each digital device is capable of doing sample rate conversion, and the state of modern software converters is really good, why one would still need to use a hardware sample rate converter? For me, this is just a matter of convenience. The MC-6 unit has 4 inputs of various formats: AES3 (balanced XLR), AES3id (BNC), SPDIF (RCA), and TOSLink (optical), as well as the same outputs, plus BNC ins and outs for word clock. In normal SRC mode, only one of the inputs is active. The ability to choose among multiple inputs places the MC-6 unit into the same role which a pre-amp plays in "classic" hi-fi setups.

I also need to mention that the unit is perfectly engineered—switching between inputs and locking onto the source, as well as losing sync if the source gets disconnected does not cause any audible pops. Unlike consumer devices like iPads this unit is built for a constant 24/7 use and works absolutely reliably. I prefer to use XLR and TOSLink ports because they have a good tolerance for surrounding EMI and differences between ground potentials of units. As we have seen in the section about power sources, use of consumer-oriented power supplies can easily result in huge voltage offsets.

Regarding the clipping of oversampling peaks——this is something I always try to prevent because I often listen to recordings which accidentally or deliberatly leave no digital headroom. Unfortunately, MC-6 does not provide any headroom, it's not really possible with digital-to-digital conversion done in the integer domain—thus, it clips intersample peaks. Being aware of that, in order to avoid clipping on albums that are mastered without any digital headroom I use attenuation on the USB transport (Xing). I usually set it to -4 dB, switching to unattenuated output for classical recordings which are usually mastered the right way.


Measuring digital paths is trickier than analog ones. In order to look at digital signal in analog domain, which can reveal issues with cables, one needs a wide bandwidth oscilloscope—I don't have it. Another "classical" test is the J-Test, which can reveal issues in digital paths by looking purely from the digital side. The idea of J-Test is that it creates certain pattern of bits which can provoke unwanted modulations in the digital signal while it's being transmitted. These bit patterns are specific to the sampling rate being used. Thus, when a sampling rate converter is inserted, these bit patters do not really work as intended when we look at the output from the converter.

I decided to apply a different approach. For sampling rate converters, there is a set of tests proposed by the Infinite Wave team which evaluates how well low-pass filters of the converter suppress aliased frequencies, and also characterizes the properties of the filters. As it can be seen from the results page, the primary application of tests is for software sample rate converters. Hardware implementation can introduce more nonlinearity, and also can have "imperfect" effective sample rates, deviating from the nominal sample rate. Thus, in addition to the tests done by Infinite Wave I also did THD measurements using Acourate.

Since the team at Infinite Wave uses specialized software, I investigated how can I make similar tests using Audacity and REW. I used a regular log sweep for checking aliases using the Spectral analysis in REW. Note that the measurement log sweep has a regular frequency change rate, compared to a more specialized sweep used by Infinite Wave. This explains the difference in curve shapes, while the idea of the test is preserved. And for characterizing the low pass filter I simply passed a Dirac pulse via the chain. Since the chain is digital, we have a transfer system which is very close to textbook.

My primary interest was to check what is happening to a digital signal played at 44.1 kHz from iPad while it is passing through my digital equipment chain to the MOTU interface which runs DSP at 96 kHz sampling rate. I also tried sending test signals via AirPort to the Rasperry Pi equipped with a DigiOne SPDIF "hat" and running shairport-sync.

Below is the diagram of both chains. It demonstrates similarities between them and shows differences. The difference in the resulting sampling rates: 96 kHz vs. 88.2 kHz can be ignored. Note that I used a wired Ethernet connection between iPad and Raspberry Pi in order to avoid possible packet losses over WiFi. It was the same iPad in both cases, and I used VLC to play test signals.

Mutec MC-6

Let's start with the impulse response of MC-6 which was obtained by passing a Dirac pulse through it:

We can see that MC-6 uses a linear phase low-pass filter. Let's look at its passband and transition bands (this is the same frequency response graph, just framed differently):

We can see that the response is flat up to 20 kHz, then it abruptly goes down, essentially trimming out everything past 24 kHz. I suppose, the ripples on the passband graph are artefacts of REW's own processing. Note two interesting moments:

  • The low pass filter does not attenuate sufficiently frequencies in the region from 22 kHz to about 23.5 kHz. I think, this is an engineering trade-off to limit pre-ringing.

  • The phase does not stay flat and lags, which I find surprising for a linear phase filter. I will dig deeper into the reason behind this, it can be caused by REW processing, or it can be actual lag due to asynchronous sample rate conversion.

After figuring out the time- and frequency-domain properties of the filter we can now explain the spectrogram of a log sweep:

We can see that near the end there is some aliasing. I can explain that by the fact that aliases that have appeared after converting from 44.1 kHz were not sufficiently attenuated by the low pass filter. I think the approach to filtering used by MC-6 is similar to "allow aliasing" option of the sample rate converter of the SoX toolkit.

Finally, let's look at the distortion for a 1 kHz sine wave. Below are graphs for the sine at -0.1 dBFS and -60 dBFS:

We can see that the -60 dBFS is very clean, whereas the "almost full scale" wave exhibits some non-linearity. As we can see, even digital paths are not completely free from level-dependent non-linearities. However, measured distortion and noise levels are far below audibility thresholds:

And just to confirm to myself that my choice of the protocols and equipment used for the digital audio path was correct, I made a couple of measurements of the AirPort path via shairport-sync running on Raspberry Pi (recall the diagram above).

AirPlay via shairport-sync

After I have looked a the waveform of the recorded logsweep as it was produced by shairport-sync I already got some doubts:

Note spurious "hairs" above the 0.5 mark—those were not present in the original signal, as the entire sweep is at -6 dBFS (that's 0.5 on this scale). We can see that the frequency response graph is also rather jaggy:

Unsurprisingly, the spectrogram of the sweep shows a lot of artifacts as well:

Note that shairport-sync was built with support for resampling via SoX, and I have enabled it in the config file. Just to make sure that the rest of the chain (Raspberry Pi, DigiOne, and RME FireFace) works correctly, I pushed the same logsweep file onto Raspberry Pi, and played it using SoX, also with resampling—and there were no artifacts on the recorded sine wave and on the spectrogram. That means, the artifacts we see with shairport-sync are caused either by the AirPlay protocol itself (maybe, it's not actually lossless after all?), or resynchronization done by shairport-sync. In any case, this has confirmed that my choice of using digital output over USB was the right one.


Building an audio processing and playback chain is not a trivial task even when using off-the shelf components completely. It's not only the performance of each individual component that matters, but also the way they are connected together. Even in a chain where audio is transmitted predominantly via digital paths, these can still be non-linear effects and losses of the signal. To my opinion, this illustrates very well the idea which Rod Elliott has expressed in his article, that "digital" is just an abstraction—a very powerful one, but still an abstraction, and that "analog" aspects like voltages and currents must nevertheless always be taken into account.