Saturday, December 8, 2018

Automatic Estimation of Signal Round Trip Delay, Part 1

When dealing with audio processing modules it might be important to know how much delay they introduce. This parameter is often called "latency." Typically we need to care about latency when using processing modules for real-time performance, or when they need to be synchronized with other audiovisual streams. Examples from my everyday applications are:
  • my DSP computer running AcourateConvolver that I use as multichannel volume control with loudness normalization. Here I need to ensure that the delay introduced by AC does not exceed 100 ms to avoid noticeable lipsync issues while watching movies (yes, I could allow a longer delay and compensate for it on the video player side, but I use a lot of different sources: various players on the computer, BD player, XBox, and not all of them provide video delay);
  • speaker synchronization in a multi-channel setup. In the simplest case, it's possible just to measure the distance from each speaker to the listening position, but if speakers use DSP (like LXmini), the processing delay must be taken into account, too.
  • mobile devices and computers when used as real-time effect boxes for live instruments. In this case, the latency has to be quite low, ideally not exceeding 20 ms.
The module's delay is called "round trip" because the audio signal entering the processing module must eventually return back. With digital signal processing, the typical sources for delays are filters and various buffers that are used to reduce processing load and prevent glitching.

Measuring the round trip delay manually is a relatively easy task. The typical approach is to send a pulse through the processing box, capture it on the output, and somehow lay out the input pulse and the output pulse on the same timeline for measurement. This can be done either by using an oscilloscope, or with audio recording software, like Audacity. Below is an example of input and output impulses as seen on MOTU Microbook's bundled digital oscilloscope:


Here, by eye we can estimate the delay to be about 25 ms. Needless to say, we need to use a pulse which doesn't get filtered out or distorted severely by the processing box. Also need to check the group delay of the box for uniformity, otherwise measuring latency at one particular frequency would not reveal the whole picture.

However, the manual approach is not always convenient, and I've spent some time researching automated solutions. From my experience with Android, I'm aware of several mobile applications: Dr. Rick'o'Rang Loopback app, AAudio loopback command-line app, and Superpowered Audio Latency Test app. On computers, there is a latency tester for the Jack framework—jack_delay. All these apps come with source code. What's interesting, they all use different approaches for performing measurements.

Yet another automatic delay measurement is bundled into ARTA and RoomEQ Wizard (REW), but their source code is not open. At least, for ARTA it's known that the delay estimation is based on cross-correlation between reference and measured channels.

I decided to compare different automatic approaches. The purpose is to figure out how reliable they are, and how robust they are when encountering typical kinds of distortions occurring in audio equipment: noise, DC offset, echoes, non-uniform group delay, signal truncation or distortion.

Superpowered Audio Latency Test app


Let's start with the app that uses the most straightforward approach for round trip latency measurement. The source code and the description of the algorithm is located in this file. I'm referring to the state of the code tagged as "Version 1.7". The algorithm is designed to measure latency on the acoustical audio path of the device—from the speaker to the microphone. It can also be used with an electrical or a digital loopback, too.

At first, the algorithm must measure the average noise level of the environment. It does so over a 1 second interval (and for some reason, in the code the average of absolute sample values are called "energy", although in fact energy is defined in time domain as a sum of squares of sample values). The noise level is then translated into decibels, padded by 24 dB, and the resulting value is translated back into a 16-bit sample value, which is called the threshold.


Then the program outputs a pulse formed by a ramped down 1 kHz sine wave, 20 ms duration, with maximum loudness (the output level is set up manually via media volume on the device). On input, the algorithm waits for the first block of data where the average of absolute sample values exceeds the threshold, and within that block, finds the first sample exceeding the threshold. The index of this sample from the moment when the test pulse has been emitted is considered to be the round trip latency (in frames).

This process is repeated 10 times, current minimum and maximum of measured latency are tracked, and the measurement is abandoned if the maximum is exceeding the minimum more than twice. If not, then the resulting latency is calculated as an average of all measurements.

Leaving the implementation details aside, what can we say about the approach? The idea here is to reduce the test signal to a burst of energy and try to find it in time domain. What are potential issues with this?
  • any other signal with sufficient energy and similar duration arriving earlier than the test pulse can be mistaken for it. This signal can be a "thump" sound of an amplifier powering on, for example, and if it happens every time the program starts playback, the "statistical" correctness checking will be fooled as well;
  • for this method to have a good resolution—for the purposes of speaker alignment it needs to be at least 0.25 ms—the first quarter of the sinewave must fit into this period, which means the period of the test signal must be at least 1 ms—that's 1 kHz. If we are doing subwoofer alignment, the top frequency that can be used for the test signal is somewhere around 100 Hz, thus the resolution will be 10 times worse—2.5 ms, that's not acceptable.
What about resilience to distortions?
  • noise—since the algorithm measures the system's noise floor first, it will adapt itself to any value of it, except if it's too high and does not provide a sufficient dynamic range for the test signal;
  • DC offset—negative DC offset shifts the sinewave down, decreasing the values of the positive cycles of the sinewave, and it's possible that only the negative cycles will reach the detection threshold (see the illustration below). This can be worked around by ensuring that half cycle of the test pulse (instead of a quarter) fits into the required resolution interval, by doubling the frequency of the test pulse;
  • echoes—do not cause any problems, unless they manage to arrive before the test pulse;
  • non-uniform group delay—it's a generic problem for any algorithm that uses a single frequency signal for latency detection. I guess, the transfer function needs to be measured before commencing latency testing;
  • signal truncation—if the system "ramps up" the test signal, the algorithm will find the triggering threshold later, reporting an excessive latency.

In fact, the last issue is a serious one. When doing manual testing, I always check the returned pulse visually, but the algorithm is "blind" to signal being ramped up. And ramping up can actually happen in mobile devices, where sophisticated processing is used for power saving and speaker protection purposes. Note that the algorithm can't use a "warm up" signal to put the system under measurement into a steady state because the warm up signal could be mistaken for the test pulse.

So, although a straightforward time domain approach has its strengths, it can be fooled, and a manual check of the results is required anyway.

I'm going to consider the methods used by other apps in following posts.

Monday, July 23, 2018

Recreating miniDSP filters with Acourate

I'm getting ready to build a second pair of Linkwitz LXmini—this time for rear channels. The original design of LXminis uses miniDSP processors for implementing the crossover and speaker linearization. I use a miniDSP 2x4 HD for the first pair of LXminis, but I decided I don't want to buy a second one. The reason is that 2x4 HD has unbalanced line out connections, but for the rear speakers I would like to put the amplifier further away and would prefer to use balanced lines between the DSP and the power amplifier.

There are some balanced miniDSP units: 2x4 Bal, 4x10 HD, and 10x10 HD, but their form factors do not fit into my half-rack stack. So I decided to go another way—build a dedicated mini PC to run Acourate Convolver via my MOTU UltraLite AVB card. Another reason for choosing Acourate over miniDSP is that the former offers practically unlimited abilities to build filters, because it's all software.

Thus, my first task was to re-create LXmini's DSP crossovers and filters using Acourate. For starters, I decided to follow the original design of the filters as close as possible (which means replicating their phase in addition to amplitude). The end result that I want to achieve is doing all the necessary speaker processing: crossovers, time alignment, speaker linearization, and room correction in one unit—the software DSP. Thus my Oppo BD unit would be only left with the tasks of decoding Dolby and DTS streams, and upmixing stereo into multichannel.

miniDSP 2x4 HD

Let's briefly describe the capabilities and structure of the miniDSP unit. It has a stereo (2 channel) input (switchable between analog, TOSLink, and USB), and 4 channels of analog output. Here is how the processing and routing chain is organized:

When connected to USB, besides 2 output channels the unit also offers 4 input channels that allow capturing processed audio data. This is in fact a very useful feature for our task.

The DSP in "HD" products operates at 96 kHz sampling rate. If digital input arrives at different rate, it automatically gets resampled. The DSP implements 10 biquad IIR filters per both input channel, then 18 biquads for EQ and crossovers per each output channel. It also allows the total of 4096 taps for FIR filter to be arbitrarily distributed over all 4 output channels (with a limitation that a single channel can't have more than 2048 taps).

That means, the processing in this miniDSP has low latency (due to low number of taps), but minimum phase and thus non-constant group delay. The FIR filter section has limited applicability due to short filter length, which gives relatively low resolution in frequency domain. This fact makes me think that miniDSP is optimized for Audio-Video applications where low latency is required, and the quality of the filters can be sacrificed, because when watching movies we normally pay more attention to the picture than to the sound.

miniDSP units are configured using specialized software called "plugins". They can work even without board connection which makes them very useful for studying provided processing system configuration—it's more convenient than trying to decipher the contents of config files manually.

Acourate and Acourate Convolver

"Acourate" is a family of products developed by Audio-Vero company (which as I understand consists of one man—Dr. Ulrich Brüggemann). Acourate is a filter creation tool which also has macro procedures for developing room correction filters. Then there are several variants of software that applies the filters created. Acourate Convolver is designed as a real-time audio processor for Windows, using ASIO interface for low latency access to sound card. Thus, running Convolver on a Windows PC with a good multichannel soundcard effectively turns it into a custom-built DSP box.

Acourate was created with critical listening in mind, so is allows creating linear phase FIR filters with large number of taps. However, it's also possible to cut filters to desired length trading filter quality for lower latency. As Convolver supports several configurations, you can have separate setups for A/V and audio-only scenarios. It's definitely more flexible than hardware-backed miniDSP boxes. Also, by choosing appropriate PC hardware and the soundcard, the software DSP box can be scaled to required number of audio channels. And they can grow quickly in number when active crossover approach (as in Linkwitz speakers) is employed for creating a surround sound setup.

The Method for Filter Re-creation

In a nutshell, there are two approaches for re-creating an existing filter with Acourate. If the filter is already implemented in software or hardware, you can measure it with Acourate or ARTA (or any other compatible analyzer), and proceed based on the measurement results. However, there are some caveats. First, even if the filter can be captured fully in digital domain, there is still possibility for noisy behavior, especially at high frequencies. Thus, some smoothing will be required.

Second, since the filter has some delay, it will manifest itself as phase shift in the measurement. It's easy to understand that by looking at the picture below:


Here we have got two sine waves of the same frequency, but the blue one is lagging behind the red one. If we capture a piece of each wave at the same moment in time and take a Fourier transform (this is what analyzers do), the frequency response will come out the same, but the phase components will be shifted relative to one another. That means, in order to obtain an exact phase response of the system being measured, we will have to compensate for the processing time delay by shifting the phase back.

The second approach is to use pure math and re-create the filter from its parameters using Acourate as an editor. This way it's possible to obtain the filter with exactly the same amplitude and phase characteristics. Also, it will be more precise than a captured one because Acourate calculates in 64-bit (double precision) floating point, whereas the capture will be in 32-bit (single precision) floating point at best. However, it will still help to capture the existing filter in order to verify the analytically obtained one against it, see the "Verification" section below.

Re-creating a Biquad

The LXmini configuration for miniDSP only uses biquad IIR filters. Thus, it's crucial to understand how to re-create them with Acourate. For EQ filters, there are two ways. One is to use the filter parameters: type (shelving or peak), frequency, gain, and Q. They are displayed by the miniDSP plugin app in Basic mode:


And we can enter the same parameters into Acourate's Generate > IIR-Filter dialog and then press Calculate:

And we get the same filter:

Sometimes the definition of the "Q" parameter doesn't match between different DSP vendors, but luckily miniDSP and Acourate use the same definition. What's also convenient about this approach is that it doesn't depend on the sampling rates used. So we can use any target sampling rate in Acourate, and the filter will still affect the same frequency.

There is also another way for re-creating a biquad filter—use the filter coefficients directly. In miniDSP plugin, they are displayed in Advanced mode in a text box:


The box is quite small and doesn't fit all the parameters on this picture. There are 5 of them: b0, b1, b2, a1, and a2. They define the filter completely, but in normalized radian frequency range: from to . The actual angle depends on the sampling rate used. So for example, at 96000 Hz sampling rate the frequency of 960 Hz is π / 100, but it becomes be π / 50 at 48 kHz. That's why when re-creating filters using biquad coefficients the sampling rate at the source and the target must match. Since miniDSP HD uses 96 kHz sampling rate, the same rate must be set for the project in Acourate.

Another thing that needs to be taken care of is the sign of a1 and a2 coefficients. Acourate and miniDSP use different conventions, and thus the signs of a1 and a2 coefficients taken from miniDSP must be negated when being entered into Acourate's dialog box. Acourate also asks for a0 parameter which must always be set to 1:

Assuming that sampling rates match between the miniDSP plugin and Acourate, this should create the same filter.

The second way seems to be more involved and requires great care. Why to use it at all? I would use it in case when for some reason Acourate does not produce the desired filter from a high level definition.

Joining Filters

Now we know how to re-create each EQ filter. The next step is joining them. miniDSP does this automatically. E.g. if we define two EQ filters, the resulting graph will show the result of applying both of them. Here I've added a second EQ notch filter to the previous one:


In Acourate, after re-creating this filter in another curve, we need to apply an operation of convolution (TD-Functions > Convolution) and save the result either in a third curve or overwrite one of the previous curves:

The result is the same curve as we had with miniDSP:

Do not confuse convolution with addition, however. Addition of filters happen when they run in parallel and then their results get summed. This is different from running filters in sequence. Sometimes adding filters may produce a result that looks similar to convolution, but it's in fact not the same.

If all ten EQ filters are engaged in miniDSP, the process of recreating them with Acourate might get tedious—we will need to perform the convoluton operation 9 times. It's better to save each individual EQ curve in case a mistake has been made while generating it. Note that convolution is a commutative operation, thus the order in which the convolutions are made doesn't matter.

Crossovers

Here we have a difference between the capabilities of miniDSP and Acourate. miniDSP HD plugin offers the following types of crossovers:
Acourate has all of these plus Neville-Thiele and Horbach-Keele crossovers. It can also generate them either with minimum phase (as in miniDSP) or with linear phase, see Generate > Crossover menu.

Besides using the Crossover dialog, it's also possible to use an alternative approach for entering biquad coefficients directly and then convolving intermediate curves. In miniDSP, a crossover can consist of up to 8 biquads, and their coefficients are listed on the Advanced tab of the plugin's Xover dialog. Remember that in this case the project sampling rate in Acourate must match the sampling rate of miniDSP HD: 96 kHz.

Combining It All Together

After input and output EQ filters and the crossover filter have been created they need to be joined using the convolution operation. Again, the order in which the convolutions are performed doesn't matter.

Note that it's not possible to re-create a compressor using FIR or IIR filters because it's behavior is amplitude-dependent. However, at least for Linkwitz speakers the compressor is not used. 

Polarity (Phase), Delay, Gain

In miniDSP, any output channel can be delayed, attenuated, and have its polarity inverted. If Acourate Convolver is used for processing, these settings can be set in it directly:

However, it's also possible to use Acourate in order to modify the filter:

  • Gain: use TD-Functions > Gain;
  • Polarity: use TD-Functions > Change Polarity;
  • Delay: use TD-Functions > Rotation or Leading/Trailing Zeros. The difference between them is that Rotation preserves filter length, but the filter must have enough zero samples at the end prior to this operation.

Cutting Filter Length

By default, Acourate generates very long FIR filters—typically consisting of 131072 taps. They create a noticeable delay: e.g. for 96 kHz sampling rate it will be 1.365 second. It's OK for audio only applications—who cares if play / pause button does not react immediately. But for audio-video that's a lot—imagine having a 1 second delay between actor opening their mouth on the screen and us actually hearing their voice.

Thus, for A/V scenarios we need to cut the filters to usable length. Depending on what other processing stages (e.g. surround decoding) are in the chain, the time "budget" for filtering can be from 20 to 60 milliseconds before the audio delay becomes noticeable. For 96 kHz processing sampling rate, this translates into FIR filter length of 2048 or 4096 taps. More taps is better because this increases filter frequency resolution. The resolution of a 2048 taps minimum phase FIR filter at 96 kHz is ~47 Hz, and for a 4096 taps filter it's twice more—about 23.5 Hz. The resolution is especially important for bass equalization, where spacing between notes is only 1–2 Hz!

Acourate has TD-Functions > Cut'N Window function for cutting filters to length. Cutting is some sort of an engineering black art, because the result depends on the interaction between the filter and the windowing function being used for cutting. By default, Acourate uses "Blackman Optimal" window when cutting. In order to use any other function, it is possible to cut first without any windowing, and then apply the desired window via TD-Functions > Windows... dialog.


I've noticed that for filters having bass equalization, it may be helpful before cutting to move the impulse start a bit to the right using TD-Functions > Leading/Trailing Zeroes function. But remember that this introduces a delay which also needs to be added to other channels.

Verification

After re-creating a miniDSP configuration in Acourate we need to verify that our filter indeed replicates the original. This can be done in a lot of ways. We can choose to only use Acourate, and in that case what we need to do is to analyze the transfer function of the miniDSP configuration. As I've mentioned in the beginning, miniDSP also has USB inputs that are in fact returns of the processed signals. So we can open Acourate's LogSweep > LogSweep Recorder, choose the ASIO driver for miniDSP, specify input and output channels, and also make sure there are no fade-ins and fade-outs, and no peak optimization in the test signal (they are not needed for digital measurements):


Alternatively, we can also use other analyzer programs like FuzzMeasure or RoomEQ Wizard. Both allow analyzing a measurement recorded "offline"—outside of the app. So we can save the measurement log sweep, use Acourate's FIR-Functions > WAV Player in order to process the log sweep with the filter, and load the result back into FM or REW for analysis and comparison with the signal recorded from miniDSP.

Finally, we can use Acourate Convolver looped back through a sound card that has routing controls  and check the filters using any analyzer app, even with those that don't offer offline processing, like ARTA. This approach is useful if we do final adjustments to filter's gain and polarity in Acourare Convolver.

When comparing filter phases, depending on the analyzer app it might be needed to calculate minimum phase first, otherwise it will not look like the actual phase of the filter. In Acourate this can be achieved using TD-Functions > Phase Extraction dialog. Also note that due to the processing delay, the phase may appear shifted (recall the sine waves picture at the beginning of the section).

Conclusion

There are several reasons for going with a fully software DSP solution. I certainly like modularity of this approach—you choose the form factor for the PC, and a soundcard with required number of channels and desired quality for DACs. Then you can have different configurations for audio only and AV scenarios, free from any limits of the hardware, and only being constrained by actual physical limit of the filters' time delay.

Also, what I've scooped up in this post is just a tip of what Acourate can do. I will certainly examine linear phase crossovers and room correction soon. One thing I'm missing in Acourate Convolver is IIR filters which could help with achieving required processing latency. However, I do have them on MOTU UltraLite AVB card, so it's not a problem.

Saturday, July 7, 2018

My Setup for Headphone Listening, Part 2

Continuing the topic of my desktop setup for headphone listening, let's recap what we had covered in Part 1. We have set up a transparent hardware chain at moderate cost, and decided to make all the necessary adjustments on the software side using DSP plugins. In order to route audio from any player program via the DSP processing chain, on Mac we use Audio Hijack, and on Windows—a combination of virtual loopback cables and a plugin host program. I'm not covering Linux and mobile platforms here, sorry.

The Processing Chain

I don't believe in the existence of a perfect playback chain that would suit all commercial recordings. Even if the chain itself is transparent, the combination of recording's frequency balance and the headphone's frequency curve may not suit your taste. Also, due to non-linearity of human hearing, even changing playback volume affects perceived tonality. So clearly, an ability to tweak tonal balance is required.

Also, when using closed headphones the reproduction sounds unnatural due to super-stereo effect—each ear can hear its own channel only. This is especially noticeable on recordings that employ some form of "spatial" processing intended for speakers.

So our goals are pretty clear: being able to easily adjust levels of high and low frequencies, and have a crossfeed. In addition, we can try adding some psycho-acoustic enhancement by injecting 2nd or 3rd order harmonics (this is roughly equivalent to using a tube amplifier). Previously, I was also enthusiastic about the idea of headphone frequency response normalization. Now I'm less excited, and I will explain why later. But if headphones used are known to have some particular tonal issue, like the 6 kHz bump of Sennheiser HD800, adding a "normalizing" plugin could be a good idea.

So here is a conceptual diagram of the DSP chain I use:
First comes a simple 2- or 3-band equalizer employing Baxandall curves. I find these to be more pleasant sounding than typical shelving filters of multi-band parametric equalizers.

The next block adds harmonic distortions. It helps to liven up some recordings if they sound too dry and lack "dimension". I think, in small controlled quantities harmonics sometimes can help. However, I prefer to add them with a DSP plugin rather than with an amplifier.

Then comes a crossfeed plugin. An alternative is to use the crossfeed feature of the headphone amplifier or DAC, if it has one. But using a plugin allows to have crossfeed on any DAC / amp, so it's more versatile. Also, if crossfeed is implemented as a plugin, it's possible to add a headphone "normalization" plugin after it. I think that having crossfeed after normalization defeats the purpose of the latter since crossfeed will most likely change the carefully tuned frequency response.

I run my chain at 96 kHz, even when the source material is at 44.1 kHz. Use of higher sampling rates is common in music production world, as they allow using smoother antialiasing filters during processing, and also help reducing the quantization noise. Going up to 192 kHz or higher will consume more CPU resources, and considering a modest amount of effects used, I don't think it's really needed.

At first, I was hesitating a bit whether should I use an integer multiple of the source sampling rate, that is, 88.2 kHz instead of 96 kHz, but then I realized that converting from 44.1 kHz to 48 kHz can be expressed conceptually as first upsampling with a multiplier of 160, and then downsampling by 147, both being integer multipliers (44100 * 160 / 147 = 48000). Also, a lot of good DAC units have an upsampling DSP processor connected before the DAC chip, for upsampling to 192 kHz (TEAC UD-x01), 384 kHz (Cambridge Audio DacMagic Plus and Azur), 768 kHz (!) (Pro-ject DacBox DS2 Ultra), or even an "odd" value of 110 kHz (Benchmark DAC1). So the DAC chip would never "know" what was the track's original sampling rate.

Thus, there is no reason to worry about going from 44.1 kHz to 96 kHz on the processing chain side, as modern software resamplers should be transparent. This is assuming that the input signal doesn't have intersample peaks. And we took care of this by lowering the digital volume of the player (see Part 1), giving some headroom to the audio signal before it gets upsampled.

Measurements

DSP plugins still need to be measured despite that their effects are usually better documented than of hardware units. Why? Because there can be surprises or discoveries, as we will see. Also, some plugins for some reasons have uncalibrated sliders, labelled in a very generic fashion like "0..10" and it's not clear what changes each step introduces. So unless you have a very trained ear, it's better to measure first.

And the audio transport channels that we use, despite being fully digital and thus supposedly "bit-perfect" still can introduce distortions, or cause losses in audio resolution. This is an imperfect world, and we need to be prepared.

The Empty Chain

As an example, let's measure an empty processing chain consisting of Windows 10 Pro (Build 17134.rs4_release.180410-1804), Hi-Fi Virtual Cable (from player to the effects host), VB-Audio Cable (from the effects host to the analyzer input), and DDMF EffectRack (64-bit version). The virtual stream on the EffectRack uses "Windows Audio Exclusive mode".

One problem that I've noticed is that using a test sine signal at 0 dBFS causes distortion:
But after lowering the input signal level by just 0.1 dB it's gone. I've double checked that the test signal is not overshooting 0 dBFS. I've also checked with Bitter plugin that the signal is not clipping:
Having that a lot of modern recordings have peaks normalized to 0 dBFS the advice I gave in Part 1 about lowering the digital volume on the player by at least 3.5 dBFS seems especially useful in this case.

I'm not sure where this distortion is happening—it could be anywhere in the kernel audio transport, in virtual cables, or in EffectRack. However, I've also tried this experiment with DDMF virtual streams, and with another effects host: PedalBoard 2, and the result was the same, so I'm suspecting Windows audio chain. But I must note that 0 dBFS sine plays fine via lots of sound cards' physical loopback, thus most likely it's a combination of Windows and virtual cable drivers that causes this behavior.

The lesson from this measurement is that I should not use a 0 dBFS test signal when testing the processing chain.

Another curious thing I've found is that with some virtual cables EffectRack causes distortion when there are no effects in the chain (stream's audio input is directly shorted to audio output), but the distortion is gone as soon as I insert a processing plugin, even if it does nothing to the audio stream (all processing knobs are at zero position).

By the way, using Bitter plugin is also helpful for verifying the actual bit resoluton of the processing chain. As we have seen on the screenshot above, on Windows I do actually have 24-bit resolution. It's interesting that on Mac with Audio Hijack the resolution seems to be even better—32 bits:

The Equalizer

There is no shortage of equalizer plugins. My long time favorite was basiQ by Kuassa because it's free, simple to use, and it implements a good old 3-band Baxandall equalizer. Typically I used it in very moderate amounts never going more than 5 or 6 steps from the zero settings. This is how the equalization curves look like:

Note that even in "all zeroes" setting the frequency response isn't entirely flat. I'm not sure if it's intentional, but it's better to be aware of this (that's why we measure!) Also note that the amount of correction resulting from the same amount of knob steps are not the same for low, mid, and high frequencies. I think, this is to account for the fact that human ear is less sensitive to changes in bass frequencies.

Another important thing is that when boosting any frequencies, the resulting increase in the sound power must be compensated by decreasing the output level of the plugin (the small knob at the bottom), otherwise clipping may occur on loud music passages.
But I've said that basiQ used to be my favorite plugin. I've migrated to Tone Control by GoodHertz, and this is why. Although it only has 2 bands (vs 3 on basiQ), Tone Control has an interesting offering of using a linear phase filter. That means zero group delay—no frequency groups get delayed by processing. This is important because some musical sounds (e.g. a hi-hat crash) are wide band signals, and delaying some parts of this signal "smears" the sound in time domain, which according to some theories affects its localization by brain.

Tone Control isn't free, and actually it's quite expensive for a 2-band equalizer ($95!). However, using it I could easily replicate my setups in basiQ, and Tone Control can create web shortcuts for them to use anywhere. Before that I tried DDMF LP10 equalizer, which also offers linear phase filters, but replicating delicate tone curves of basiQ with it was very hard, so I decided to pay a bit more for Tone Control.

Harmonics Enhacement

I decided to experiment with adding harmonics after reading this post by Bob Katz. I've found Fielding DSP Reviver plugin, which costs just $29. I measured it in order to calibrate the scales of its controls—they just go from "0" to "100", and also to verify that they don't have aliasing problems.

After measuring the levels of THD of Reviver, I decided never to go higher than "5" mark for both 2nd and 3rd harmonics. For the reference, the "1" mark adds 2nd harmonic at ~0.2% THD (about -55 dBFS), and for "5" it's a bit higher than 1% THD (about -40 dBFS). And for the 3rd harmonic the figures are a somewhat lower, so when both sliders are at "5", this creates a natural harmonics picture:

(I've put the cursor over the 2nd harmonic to show that it's at -39.95 dBFS, while the 3rd as we can see is lower than -40 dBFS.)

Turning on "Serial" mode also adds 4th and 5th harmonics:
Subjectively, adding harmonics may add "dimension" to sound, make it a bit "fatter". It in facts helps some recordings from 70-s and 80-s to sound better. My hypothesis is that for their production, tube amplifiers were be used in studio, so they were sounding "rich" there. But while being played via a transparent solid state chain on headphones they sound more "bleak" and "flat". So adding back some distortions helps.

However, I would not recommend abusing the harmonics plugin because, unfortunately, adding those "euphonic" harmonic distortions also brings in unpleasant non-harmonics. Dr. Uli Brüggemann of AudioVero explains the reasons in his article. And indeed, if we look at IMD SMTPE and CCIF measurements for Reviver, the level of SMTPE-measured distortions is quite high. So use it with caution—keeping it turned on all the time defeats the purpose of having a transparent reproduction chain. The effect of non-linear distortions can also be seen on the frequency response graph which becomes noticeably "fuzzier":


Crossfeed and Headphone Normalization

I covered both Redline Monitor and a couple of headphone normalization plugins in my earlier posts. For headphone normalization I would also prefer a plugin that has "linear phase" mode.

Now I would like to explain why I'm actually not currently using headphone normalization. From my experience, normalization indeed makes the headphones sound different from their original tuning, which can be exciting at first. But is it really a setup that you would want to use all the time? I doubt that. I actually have doubts that normalization can serve as a "reference", here is why.

There are several factors that can affect the headphone normalization process: first, the same model of headphones isn't necessarily consistent from instance to instance, and besides that, pad wear can affect bass response. OK, some companies offer measuring your headphones, and imagine we have done that. Then the second factor comes in—your head. The dummy heads used in measurement use statistical averages for head, ear pinna, and ear canal dimensions. But they are obviously not the same as your head, and this will affect the shape of the frequency response at the ear drum (see this interesting thesis work for details). And finally, the target response is not set in stone. There are several versions of Harman target curve, diffuse field curve, and your actual room curve.

So, there are just a lot of variables in the normalization process. There is a solution that takes them all into account—the Smyth Realizer, but it's too expensive for an ordinary folk. Thus, since we are not interested in music production, but only in pleasantly sounding reproduction, I've found that simply using tone controls delivers a desired sound with much less effort.

Conclusion

For me, using a simple DSP processing chain and a transparent reproduction chain has become a flexible and not too expensive way to enjoy my music in headphones. This setup offers endless ways to experiment with tonalities, "warmth", and soundstage perception while staying with the same hardware.

Wednesday, June 27, 2018

My Setup for Headphone Listening, Part 1

I listen to music on headphones a lot. This is my retreat from distracting noise that often surrounds me at work and at home. I don't normally use portable audio players or a mobile phone, instead I have what is called a "desktop" system: a computer, a desktop DAC, a desktop headphone amp, and closed over-ear headphones. At home, I also use a couple of pairs of open over-ears when it's quiet around.

When listening on headphones, I can notice more issues with the reproduction chain and in the recording, compared to listening on speakers. That's why I pay a lot more attention to details for the headphone setup. My goal here is to be able to relax and enjoy the music on headphones the same way I can enjoy it on speakers.

Hardware


The hardware part of the chain consists of three components: USB DAC, headphone amplifier, and headphones. The criteria for choosing them is easy to formulate—be as transparent as possible. That means, adding as little distortions and colorations as possible, having precise inter-channel balance and low crosstalk levels. I tend to avoid doing any sound processing in the analog domain, relying on DSP plugins running on the computer instead.

DAC

As far as electronic components are concerned, it's quite easy to fulfill the transparency requirements. Any modern DAC with the price starting from $200 does the job. Here are some not very expensive DACs that I'm familiar with.

Cambridge Audio DacMagic series. The entry level models: DacMagic 100 and DacMagic Plus used to be expensive back in time of their introduction, but now have become cheaper because they don't handle DSD and MQA. So for people not interested in those formats these DACs now represent a good deal. Especially DacMagic Plus with its internal operating sampling rate of 384 kHz and selectable output filters. Ken Rockwell had published a very thorough review of this unit. Note that due to high impedance of the headphone output (50 Ohm), DacMagic Plus should not be used as a headphone amplifier, but rather only considered as a DAC.


E-MU 0404. This legendary external sound card of the past is now a bargain because it's not USB Audio Class compatible, and E-MU / Creative Labs have abandoned updating drivers for it, so it's not usable on modern OS versions. However, it has an SPDIF input, so it can be used as an SPDIF DAC driven by a USB Audio Class compliant pro audio card, or the optical output of the computer. For example, I connect it to my MOTU Microbook IIc which has a coaxial SPDIF output. 0404 only supports sampling rates up to 96 kHz over SPDIF. The other caveat is that an instance of an old OS (e.g. WinXP running in a virtual machine) is still needed in order to set up the sampling rate of this card.


JDS Labs EL DAC. Haven't tried it personally, but the price fits the budget. JDS Labs generally follow the principle of designing transparent equipment with good objective characteristics. The measurements are published here.

TEAC UD-301. A cheaper option than UD-5xx series. The UD-501 model was measured by Archimago and looks really solid. UD-301 used the same DAC chip, but doesn't have the option for selecting the type of the output filter.


Headphone Amplifier

Decent transparent headphone amplifiers are not hard to find either, as we can see from my previous post on measurements of AMB M3 and SPL Phonitor Mini. I also use the desktop version of Objective2 headphone amplifier.


Headphones

Headphones are more tricky as there is no clear objective criteria on how headphones should sound like. I'm aware of Harman target equalization curve, but first, it's still under development, and second, not every headphone manufacturer follows it. Anyways, the frequency balance of the headphones can be corrected in the software chain, so the main requirement is about low distortion levels. Personally, I stuck myself with Shure SRH1540. I was also enjoying Beyerdynamic T5p until they broke. Both of those are closed over-ear headphones.

I've got some open over-ears as well: Beyerdynamic T90, Massdrop Sennheiser HD6xx, and AKG K240. These are all different sounding, with K240 being the most uncolored but also adding the most distortions, T90 sounding the most "airy", and adding extra high frequencies, with HD6xx being somewhere in the middle.

To summarize, I strive to have the reproduction chain as transparent as possible. I do not use tube amplifiers, for example. I know they can sound nice, but the distortions they add can't be taken out if needed. On the other hand, if the chain is transparent, it's easy to add any tweaks and "euphonic" distortions at the prior stage—on the computer.

Software


The Player

The software chain starts with the music player. I'm not very picky about them. My primary sources are Google Play Music for streamed content, and Foobar2000 or VLC for grabbed CDs and high resolution (24/96) files.

The only thing I need to tweak in the player is to set its output level so it has headroom for intersample peaks. As it had been demonstrated in that post, a digital sound file can contain encoded sound waves that while being converted into analog would exceed the normal level of 0 dBFS. And thus, having slightly more than 3 dBFS of headroom is recommended. For the Play Music player this means setting the volume control two steps below the maximum output volume:


This provides attenuation by -6 dBFS (one step attenuates by -3 dBFS), which is more than enough.

For VLC I settled up with 82% of output volume (about -4 dBFS attenuation), and for Foobar2000, setting the volume control to -3.5 dBFS provides the necessary headroom. This is a very important step, as any further sound processing step could result in clipping, and distortions caused by clipping can't be removed afterwards.


Plugin Host and Audio Capture

The most important component of the processing chain is the plugin host. I use hosts that allow intercepting system audio or audio from a specific application. On Mac I use Audio Hijack. This is an easy to use and stable application that includes a kernel module for capturing sound output. I think it can only host AudioUnit plugins, but generally it's not a problem since all the plugin makers provide their modules in different formats.
On Windows things are more complicated. There is a free open-source app called Equalizer APO which installs itself as a filter for the selected audio interface. It can host VST plugins. However, I've got a couple of issues with it. First, it doesn't allow VST plugins to show their meters. Second, it crashed when I was attempting to add Redline Monitormy current favorite crossfeed plugin. Since Equalizer APO is open source it should be possible to fix both of these annoyances, but I haven't got to this yet.

Instead, I found another plugin host app called "Virtual Audio Stream". It allows using 4 independent effect racks. In order to capture applications or system sound output, VAS provides virtual audio devices, but they are limited to 44.1 kHz. However, any other "virtual cable" device can be used instead. I use "Virtual Audio Cable", where the "Hi-Fi" version supports sampling rates up to 384 kHz.


The next big topic is the list of plugins that I use with these hosts, and their settings. This will be covered in the next post.

Thursday, June 14, 2018

Measuring AMB M3 vs. SPL Phonitor Mini

I've built AMB's M3 headphone amplifier more than a year ago and I enjoyed it all this time. Judging purely from listening experience, I was quite sure that my build doesn't have any major flaws. Also, I was confident in the M3 measurements that Bob Katz has done for his unit. However, I decided to perform some on my own. As my measurement rig is not super precise, I decided to measure M3 side by side with a commercial headphone amp to have a better grip with reality. I've chosen SPL Phonitor Mini because I think it's in the same "weight category" as M3.

AMB M3 is a two stage amplifier, with the first stage based on opamps, and the second stage on MOSFET transistors (with big heat sinks!) Although formally the amp has Class AB topology, its enormous power allows it to stay in Class A for most of the use cases. Another distinguishing feature of M3 is "active ground"—that is, the ground channel also goes through the same amplification stages as left and right channels. Personally, I'm on the same side with NwAvGuy who said that it's not a good idea. But it's interesting to see what practical consequences this design choice actually has.

SPL Phonitor Mini is one of my favorite headphone amplifiers for a long time. Initially this was due to its awesome crossfeed implementation. Now that I've found some comparable DSP implementations, this is of a less importance. But I still enjoy Phonitor for its power, reliability, and the fact that it has both unbalanced and balanced inputs. Besides crossfeed, Phonitor has another feature—high-voltage rail (120 VDC), which helps to achieve low noise floor.

Notes on Measurements

From my previous experiments, I've found that I can trust my measurements of frequency response, THD, channel balance, and output impedance. I put less faith into my IMD measurements, but for comparison between two amplifiers this should be OK. So this is the set I decided to stick with.

I learned that when measuring an amp with a driven ground like in M3 (or a fully balanced amp), the ground channel of the probe must be left floating. I'm not entirely sure about whether this only applies to mains-powered measuring equipment (mine isn't) or not. But just in case, I decided to stick to this method. And for consistency I decided to measure both amplifiers the same way. Also for consistency, I was using unbalanced inputs on Phonitor (that's the only input on my M3).

I let both amps to heat up for an hour before measuring them. For most of measurements, I set the volume level on both amplifiers to output 400 mV into a 33 Ohm resistive load using a 1 kHz sine wave. The line output of MOTU Microbook IIc was attenuated to -3 dBFS.

Results

THD

Here I was pleasantly surprised by superiority of M3. Below is graph of THD for 1 kHz sine, M3 is on the front, in orange, Phonitor is on the back, in cyan:

It can be seen that M3 almost doesn't have harmonic distortions from the test signal, and it's 60 Hz hum spike is at noise level. Here is the same graph with M3 alone:

That's very impressive. Even considering that Phonitor's harmonics are below audible threshold, on M3 they are practically absent. It's interesting that on Bob Katz's graphs (here and here) the 60 Hz spike is more prominent. Perhaps that depends on the power supply?

The results for a 20 Hz sine are also very good, this is M3:


And this is Phonitor Mini:


Apparently, 400 mV output level is a piece of cake for both amplifiers. I decided to crank them both up to produce 3 V RMS into 33 Ohm load. Distortion levels are now noticeably higher in both amps, but still below audibility (again, M3 is orange, Phonitor is cyan):

It's interesting to note that the level of THD of M3 at 3 V output: 0.0061% is still lower than Phonitor's level of THD into 440 mV: 0.0074%. That demonstrates how much power M3 has. And just a reminder, please don't rely on THD+N numbers on these graphs—they are quite high due to relatively high noise floor of my measurement rig.


IMD

Here M3 also demonstrated better performance. Here is SMTPE IMD for M3:

And for Phonitor Mini:

As we can see, there are a lot more sidebands on the 7 kHz signal caused by 60 Hz signal played along with it.

And here is CCIF IMD for M3:

And for Phonitor Mini:

The 1 kHz signal—the result of interaction between 19 kHz and 20 kHz signals is more visible, although it's level is at -100 dBFS, which is inaudible.


Frequency Response

I would not expect anything but a ruler flat response from both of these amplifiers, and indeed this was the case:

The channel balance is also exemplary. It's 0.078 dB for Phonitor, and 0.061 dB for M3. And remember that I've built M3 by hand!


Stereo Separation (Crosstalk)

It's the only measurement where M3 has shown worse results than Phonitor. Here is Phonitor:

The crosstalk level stays at -74 dBFS until 1 kHz, and then climbs up to -64 dBFS. It's definitely better than Behringer UCA202 was showing. Now let's look at M3:

Here variation is less—within 4 dB, but the overall level is higher—at -60.5 dBFS. Why is that? Bob Katz obtained similarly high figure: -42 dBFS into 20 Ohm load. But the crosstalk was improving (becoming lower) as the load impedance was growing higher. Bob explains this with the fact that the driven (active) ground of M3 has output impedance.

Considering the worse absolute value, Bob says that anything better than -30 dBFS is insignificant. Thus, -60 dBFS isn't a big deal.


Output Impedance

Another victory of Phonitor: 0.06 Ohm of output impedance versus 0.11 Ohm on M3. Although, I'm not sure this measurement has the same meaning considering the driven ground of M3.

Conclusions

M3 is a transparent amplifier. The absence of distortions is due to its enormous power capacity. I actually doubt that the driven ground has much influence on its performance. It would be interesting to build a version of M3 with classical passive ground channel. Like LXmini speakers, it's a great design that can be reliably built and provide consistence level of performance.

Thursday, June 7, 2018

Linkwitz LXmini—First Impressions

Initially I was planning to do the next post about my measurements of FiiO E5 headphone amplifier—another attempt to "calibrate" my measurement rig against NwAvGuy's Prism dScope, but I've got sidetracked by another project.

It was long time ago when I've learned about loudspeakers designed by Siegfried Linkwitz. His promise is to deliver a great sound in conditions of untreated domestic rooms. Sounds challenging, and his speaker designs depart greatly from traditional "boxes." One particular model—LXmini looked very unusual—made from plastic drain pipes, with drivers positioned orthogonal to each other. I've got a chance to attend a demo at Burning Amp festival, and they indeed sounded quite nice to me.


I bought build plans for LXmini, but was endlessly procrastinating actually building them. Finally, I've made an effort—ordered the kit from Madisound and bought the rest of parts at Home Depot. Building took several days mainly because I had to paint the parts, wait until they dry out, then glue them together, then wait again. I've made the speakers in black. Here they are:


Compared to powered monitors on stands they look less bulky, giving back to the room the sense of space.

Choosing the Amplifier

All of Linkwitz designs use active crossovers based traditionally on miniDSP boards (there are of course variations due to DIY nature of this project). With this approach, each speaker driver requires its own amplifier, so for the pair of speakers I had to provide 4 channels of amplification.

I decided to look for an amplifier in a half-rack width body so I can fit it into my gear rack. The woofer driver of LXmini is rated for 8 Ohm impedance and "long term power handling" for 80 Watts. The second—full range driver is 4 Ohm and requires less power. So I decided to look for a 4 channel amplifier rated for 100 Watts into 8 Ohms to have some headroom.

The choice of half-rack width amplifiers has turned out to be not very wide. I've found some models from pro audio equipment makers: Atlas, Crestron, Parasound, QSC, and Stewart Audio. All of them were class D—not surprising because heat sinks that are required for delivering this amount of power via class AB would never fit into half-rack format. But I wasn't afraid of class D amp, as my JBL LSR305 monitors use them, and I can't see any difference from class AB amplifiers in KRK Rokit G5.

I've chosen SPA4-100 from QSC. It was matching my requirements exactly, and the specs state very flat frequency response. It isn't cheap though—costing above $800, but QSC is a well known brand of pro amplifiers, so my hopes were for good quality and long term reliability.

This is how it looks in my rack:


Initial Setup and Check

I decided to put LXmini at the front, replacing my KRKs. I also decided to try to get rid of my center channel because the speaker (E3c) had a non-uniform frequency response that was quite hard to correct, and I could always distinguish it by ear from other speakers. This left me with a bit unusual 4.1 configuration. However, it's not the infamous "quadro" setup, but rather the traditional 5.1 layout, just without the center.

Since LXmini require an additional DSP which has a non-negligible delay, I used REW to make sure the speakers are time aligned with each other. This process is based upon frequency response measurement, and when I looked at the measured FR I was pleasantly surprised how well the speakers are matched:


The graphs use "psychoacoustic" smoothing. Obviously, the irregularities at low frequencies are due to room modes. Judging by the right channel (red), the natural roll off of the speakers starts at 50 Hz. BTW, I've configured my audio chain that I can drive LXminis either on their own in stereo configuration, or as part of surround setup with subwoofer. These graphs are for the stereo configuration.

The interesting thing about LXmini is that unlike traditional designs, they have quite low crossover point—around 700 Hz, and the full frequency range above is covered by the top speaker. Thus, the top speaker in LXminis is properly called "full range", not "tweeter."

Then I ran LEDR test. In short, it's a synthesized signal that exploits HRTF in order to achieve 3D positioning of the test sound in order to help evaluating "imaging." In a room-speaker system with tamed early reflections and reasonably flat FR playing this test signal produces a remarkable effect of a sound moving in an arc in different planes, including vertical one.

Previously I tried this test with my JBL and KRK speakers set as fronts. The KRKs were producing a more realistic picture, although the perception of vertical movement was quite weak. With JBL, everything was smeared. In fact, even a simple test of playing pink noise through both stereo channels wasn't producing any sensible phantom center image with JBLs. That's why some time ago I put them to rear channels position where they do their job better.

Directionality and Energy Time Curve

The key to understanding why all those speakers have different ability to resolve the sound stage in my room lies in the character of their interaction with it. From my previous comparison of my JBLs with KRKs I know that JBLs have wider dispersion, due to the construction of their high frequency horn. And LXmini has the narrowest radiation pattern—dipole (figure 8).

My listening area is not symmetric, with a wall and a large book shelf on the left side. I do have space behind the speakers, and on the right side. Due to the reflective surfaces being close on the left, I always have to compensate for additional sound energy there by slightly reducing the volume level of the left speakers (yes, the for the rear one, too). Another issue with the room is that the ceiling is quite low—2.4 meters (8'). Though, there is a large sofa with cloth cover and the floor is carpeted, creating some natural sound absorption.

Apparently, there are lots of reflections in my room. The question is, how harmful are they for the sound localization cues. A good hint for answering this question is provided by the Energy Time Curve graph. Here is a very good introduction from Gik Acoustics on how to interpret it. Bob Katz's book "Mastering Audio" also contains useful information about the ETC.

Let's look at the ETC graphs for LXminis in my room (listening position, the first 30 milliseconds):



I would say, they look really good. The initial impulse decays to almost -20 dB during the first millisecond. And all the reflections arriving within the first 30 ms are below -15 dB. That's an exemplary performance for an untreated room. For comparison, this is how ETC graph looks for my KRKs:



I left the LXmini graphs as shadowed plots for comparison. Here we see much stronger reflections arriving within the first 5 ms. They must be caused by wider radiation pattern. Some of the sound radiated to the sides immediately reflects from a closely positioned surface and reaches the listening position almost together with the main impulse. For the JBLs the situation is even worse:



Here we see series of strong reflections arriving within the first 5 ms, and also that later reflections are stronger. I'm pretty much sure this is due to much wider radiation pattern of JBLs.

But don't get me wrong, I'm not saying that JBL LSR305 is a bad speaker—no, it's in fact a good one, especially considering its price. It has a flat frequency response and very good directionality. It's just not for a room where reflective surfaces are located close to it. I'm sure, in a more spacious room, or in an acoustically treated room where strong early reflections are eliminated, it will sound great and will not have any problems with imaging.

In fact, even in my room these JBLs work great as rear speakers, due to their proximity to the listening area. In this case, their direct sound dominates over any reflections and they sound very true to life.

Conclusions

LXmini is a fantastic speaker for a small untreated room due to its narrow radiation pattern. The phantom center image created by a stereo pair of LXminis is so strong that I've got rid of my center speaker in surround configuration.