Saturday, October 28, 2017

Re-creating Phonitor Mini with Software DSP

If you have seen my previous posts, you might remember that my plan was to recreate Phonitor Mini crossfeed within miniDSP HA-DSP. However, while trying to do that I've encountered several technical difficulties. I would like to explain them first.

First, the hardware of HA-DSP looks good on paper, but flaws in the implementation can be easily detected even using an inexpensive MOTU Microbook IIc. For starters—look, there is some noise:
Yes, it's at microscopic level, but I don't see anything like that on MOTU cards, that also employ DSP. And the resampler (recall that the DSP in HA-DSP operates at 96 kHz) adds some wiggles when working with 48 kHz signals:

Finally, I've experienced stability issues when connecting HA-DSP to Android and Linux hosts. I raised the last two issues with miniDSP support, but got no resolution.

Another technical problem came from FIR filter design software. I use FIRdesigner, and it's quite powerful and versatile tool. However, it has one serious drawback in the context of my scenario—since the Phonitor crossfeed filters are quite delicate, and have only a 3 dB amplitude at most, when modeling them, every fraction of a decibel counts. But since FIRdesigner is first and foremost designed for speakers builders, it only offers 0.1 dB precision when manipulating the source signals, and that was causing non-negligible deviations of the designed FIR filters' frequency response curve when compared to the original curves of the analog Phonitor filters.

I've been wrestling with these issues for a while, then thought the situation over, and decided to sell my HA-DSP. Having parted with it, I turned back to my initial approach of performing filtering in software.

There had been some work already done by myself for generating IIR filter coefficients to fit a given frequency response using cepstral method (based on this post by R. G. Lyons). So I have dusted off my Matlab / Octave code, and prepared it for action.

However, one important thing had to be done to the analog filter measurements—since I was performing them using MOTU Microbook IIc which lacks ideally flat frequency response, I needed to remove frequency response roll-offs introduced by it from these measurements. On the DSP language this process is called deconvolution—"reversing" application of a filter.

I've learned that it's quite easy to perform deconvolution with REW, although the command is buried deep in the UI. Big thanks to REW's author John Mulcahy for pointing this out to me!

In order to perform deconvolution in REW, two measurements first need to be imported. In our case the first measurement is the Phonitor's filter curve recorded via Microbook, and the second one is a loopback measurement of Microbook on the same output and input ports. Then, on the All SPL tab one need to open the drop down menu (the cog button), select these measurements, and choose division (/) operation. Division in the frequency domain is equivalent to deconvolution performed in the time domain.

One important thing to remember is to always save all impulse responses with 32-bit resolution because quantization errors can cause quite visible deviations of the calculated frequency responses.

Now, having nice deconvolved impulse responses, I was ready for action. Since this time I was creating a software implementation of a filter, I've got more freedom in choosing its parameters—no more was I constrainted by the sampling rate of the DSP or the number of taps on it. So I've chosen to create an IIR filter operating with 44.1 kHz sampling rate, and having 24th order.

The resulting model has turned out to be ideally close to the target filter. Note that on the plot, the offset of the curves for the direct and the opposite channels isn't correct—this is because during deconvolution, the amplitudes of the filters were normalized by REW. Never mind, it's easy to fix.

In order to figure out what is the offset between the direct and the opposite channel filters, I used a nice feature of FuzzMeasure's interactive plots—after clicking a point with "Option" key pressed, FM shows both the variable value (frequency in this case), and the corresponding values of the displayed graph, with precision of 3 digits after the decimal point. So it was quite easy to find out, what is the difference between the filter curves for the direct and the opposite channels.
Using this information, I was able to fix the offset of my curves, and finally was ready to process some real audio. I've chosen a short excerpt form Roger Waters' "Amused to Death" album, where a lot of different kinds of sounds present: male and female vocals, bass, percussion, guitars, etc. The recording quality is outstanding, plus the final result was rendered using QSound technology which uses binaural cues for extending the stereo base when listening on stereo speakers. It's interesting that when listening on headphones without crossfeed, all these cues do not work due to "super stereo" effect. But they start working again with crossfeed applied. 

For blind tests, I've passed a fragment of "Perfect Sense, Pt. II" song through a rig of Phonitor Mini connected via Microbook. First in its original form, but with Phonitor's crossfeed applied, and then after processing via the IIR filter I have created, with crossfeed on Phonitor turned off. This way, the difference between these recordings were only in the implementation of the crossfeed effect.
Below are the links to the original recording, the one processed with Phonitor's own crossfeed, and with my digital re-implementation of it (make sure to listen in headphones):

Original | Phonitor crossfeed | IIR implementation

In my blind test, I couldn't distinguish between the original crossfeed and my model of it (two last samples).

Then I wanted to try to process a couple of full length albums. Here I've got a little problem -- they way sound processing is organized by default in Matlab and Octave doesn't really scale. There is a function called audioread for reading a portion of audio file into memory in uncompressed form (and there must be a continuous region in memory available for allocating the matrix that contains the wave samples). And there is a complementing function called audiowrite which writes the result back to disk. However, it would require creating custom code in order to read the input file in fragments, process it using the filter and write back.

I decided to do something different. Since I anyway was planning applying a headphone normalization filter in Audacity, I though it would be convenient to perform crossfeed processing in Audacity as well.

There is an entire scripting language in Audacity hidden behind Effects > Nyquist Prompt... menu item. The script "sees" the whole input wave as a single object, but behind the scenes Audacity feeds to the script bite-sized chunks of the input file. That's the abstraction I wanted. So I wrote a Matlab script that transforms my high-order IIR filter into a sequence of biquads, and generates a Nyquist Prompt script that performs equivalent processing.

Since biquad filters are implemented in Nyquist Prompt in native code, even a sequence of 12 biquads gets applied quite quickly, and the entire CD is processed on my modest Mac Mini 2014 in slightly more than a minute. The generated Nyquist Prompt script is here. Note that it is needed to enable "Use legacy (version 3) syntax" to work with Lisp code.

One caveat was to avoid losing precision while applying the sequence of filters. My initial mistake was to export the biquad coefficients with just 6 digits after the decimal point—the processed file was sounding awfully. Then I enhanced precision, and diffed the sound wave processed in Audacity with the same wave processed in Octave. The diff wave only contained some noise below -100 dBFS, and the two processed audio samples were now indistinguishable in a blind test.

I have mentioned headphone linearization before. With ToneBoosters Morphit performing linearization is straightforward assuming that measurements of your headphones are in Morphit's database. My first impressions after listening to processed audio samples was that Morphit thins out bass considerably. I've compared Morphit's equalization curves with Harman's Listener Target Curve for headphones and found that the former lacks the bump in the bass area featured in the latter.

So I've switched to custom mode in Morphit, and compensated the applied shaving off of the bass with a shelf filter:

The resulting audio sample sounded much better both than the original version processed with the crossfeed filter (due to more prominent vocals and percussion), and definitely better than the initial linearization by the factory Morphit filter for Shure SRH1540.

By the end of the day, I've created processed wavefiles of some quality recorded albums I have on CDs, and uploaded them into my Play Music locker for further evaluation. I didn't notice any harm from the compression to 320 kbps. Now, would I enjoy this crossfeed + Morphit processing on the albums I know well, I will find a way to apply the processing in real time to all sound output on my Linux workstation.

Monday, September 11, 2017

Challenges While Performing Crossfeed Measurements

Since I've started working on the SPL Phonitor Mini crossfeed filter replication, I've learned so many things that I will need several posts in order to describe them all. Let's start with the analog part.

Channel Imbalance

As I've mentioned in the previous post, I have a goal of replicating Phonitor Mini crossfeed settings using DSP in HA-DSP. The first task I was faced with was performing accurate measurements of Phonitor's filters. They are quite delicate—the amplitude of the filters doesn't exceed 3 dB, and the attenuation of direct vs. opposite channels need to be replicated. Not surprisingly, this task required a lot of attention to the details. Especially when using budget hardware (I use MOTU MicroBook IIc as my capturing tool) which has limits on the precision it can provide.

In addition to being delicate, crossfeed filters involve signal summing due to partially mixing signals from the left and right input channels. This can be a problem when using inexpensive sound card for measurement, thanks to slight channel imbalance. As an example, below are the measurements of all the combinations of "Main" output and "Line" input channels on MicroBook:

It's easy to see that no pair matches exactly in the recorded signal level. What that means to our crossfeed measurements, is that even if we ignore own channel imbalance of Phonitor Mini (which is also present in reality), the records of the sum of left and right channels (from crossfeed) captured by left and right inputs of MOTU will have different levels.

Unfortunately, the input and output attenuation controls on MicroBook, despite being digital, do not provide required resolution in order to compensate these offsets. This also means that one needs to be careful with performing calibration of the soundcard--there need to be separate calibration profiles per input / output channel combination.

I wasn't particularly happy with the discovered lack of balance, so I decided to try other inputs and outputs, as MicroBook has several of them. This time I used unbalanced line output, and "Guitar" input—it's mono, so I used a Y-cable 3.5 mm TRS into two 1/4" TS, connecting each one of them in turn. This time the channels balance was much better, just 0.003 dB offset:

It may appear as if the second graph has a much steeper rolloff on ends, but this is only because this time I've magnified more along the vertical scale. When placed next to each other, the graphs do not show such a dramatic difference:

Using the "Line to Guitar" configuration, I connected MOTU's unbalanced line out to Phonitor's unbalanced input, and one channel of Phonitor's headphone output again via a Y-cable to the guitar input. This is how "flat" Phonitor (no crossfeed applied) measures in this setup:

Note that the channel offset is now about 0.015 dB instead of 0.003 dB of the loopback connection, due to slight imbalance introduced by Phonitor.

On headphone amplifiers, the channel balance usually skews as volume level changes (unless it's a super expensive super precise amplifier, apparently the Mini isn't one). It's anyway unknown how the crossfeed circuit affects channel balance, so I decided not to try to find a volume knob position providing a smaller offset, having that this volume position gives me the highest signal level without the need to engage the amplifier on the guitar input, which would contribute to non-linearity and noise.

It's worth mentioning that Phonitor didn't change the shape of the curve. Plotted on the same graph, its response covers the loopback response exactly. That's great—this is what you expect from a "transparent" audio equipment.


Another problem I faced was noise. Since for measurements I have to connect several pieces of electrical equipment together, this creates a possibility for ground loops to appear. There is a great article by Bill Whitlock on the origins of ground loops, available for free download here (one just need to apply for a free registration first).

For example, when measuring Phonitor (which is a stationary amplifier), I had to disconnect the laptop running measurement software from AC power, since otherwise a ground loop was created via two connections to mains power.

More challenging was avoiding ground loops when measuring HA-DSP. It runs on battery power, but unlike Phonitor contains its own DAC, which means I had to connect it to USB port of the same laptop. This creates a ground loop via two USB connections (from MOTU and HA-DSP). In fact, the noise level introduced by this loop was about 30 dB(Z), I had to get rid of it. I've considered several ways to do that:

  1. Instead of USB, use optical input on HA-DSP. But the issue with this approach is that MicroBook doesn't provide an optical output, only coaxial, so I had to add another piece of equipment in order to do conversion. Also, my preliminary experiments using the optical output of Mac Mini have shown that HA-DSP has an additional brickwall filter at about 24 kHz on this signal path (even if the TOSLINK connection is operated at 96 kHz).
  2. Instead of USB or optical input, use the analog output on MOTU connected to analog input on HA-DSP. Since MOTU provides a good separation of audio and USB grounds, this connection doesn't introduce a ground loop. But the obvious drawback is that it involves additional A/D conversion on HA-DSP, and it's also limited to 24 kHz frequency top range.
  3. Use USB, but insert an isolating transformer after analog output of HA-DSP. The issue with an isolating transformers it that they are non-linear, except for really expensive ones. Since we are doing audio measurements, it would be inconvenient to insert a distorting component.
  4. Use USB via USB isolation. Obviously, doesn't introduce any analog distortions, but the issue with the majority of USB isolators is that they are based on Analog Devices ADuM3160 and ADuM4160 chips, which are limited to USB "Full Speed". In theory, Full Speed should provide bandwidth enough to pass 96 kHz / 24-bit stream, but in practice, High Speed USB DAC chips fall back to 48 kHz / 24-bit if the connection can't provide bandwidth enough for 192 kHz / 24-bit. This is true for HA-DSP. There are a couple of High Speed USB isolators, but they cost about 4x times compared to a normal Full Speed isolator.
I decided to go with the option 4, and bought a Full Speed isolator from USConverters. After all, 48 kHz impulse response can be upsampled to 96 kHz (this is the operating rate of HA-DSP) easily.

Below is the plot of measured channel imbalance of HA-DSP output from USB, at 48 kHz, with DSP bypassed:

As it can be seen, the channel offset is about 0.04 dB—that's considerably more than of Phonitor. Another issue is more noise—it can be seen on an unsmoothed plot. And that's even after making 5 consecutive measurements and averaging them.

Yet another issue is different roll-off at high frequencies:

As it can be seen, the difference at 20 kHz is about 0.9 dB! Recall that the same input was used on MOTU, so the difference is definitely due to HA-DSP. Fortunately, this can be compensated as part of filter design process.

UPDATE 9/14/2017: Turns out the early roll-off for HA-DSP is an artefact of averaging of measurements. In fact it's not that bad, I'll publish updated measurement graphs in the next post.


I think that's enough for now, and it's just the beginning of my findings! Will publish more soon.

Thursday, August 24, 2017

MiniDSP HA-DSP First Impressions

As an audio geek, I couldn't ignore this promising gadget from MiniDSP. To me, MiniDSP products were always associated with loudspeaker and room digital correction, but unexpectedly they have tried their design skills in a relatively niche area of portable DACs and headphone amplifiers. And thanks to their vast DSP experience, the resulting product has come out very interesting and highly unique.

There is a very good description of HA-DSP from the manufacturer. I will give my own version of it, coming from the possible range of functions that this device can perform:
  1. HA-DSP can be used as a regular portable headphone amplifier taking analog input. Unfortunately, HA-DSP can't also be used as a power bank despite presence of a deceptively looking USB-A port on it.
  2. Since HA-DSP has USB inputs and is USB Audio Class compliant, it can also be used as a portable DAC for mobile devices and computers. However, I've encountered issues when I tried using it with some Android devices leading to mobile device reboots.
  3. The 3.5 mm analog input port of HA-DSP also accepts mini-TOSLink jack, allowing to connect HA-DSP to audio gear that has optical output. Note that both analog and optical input has a cutoff frequency at 22050 Hz, even if the output device provides a full range signal.
However, you can find numerous other portable DACs that can fulfill all these scenarios and don't have the drawbacks I've mentioned. What makes HA-DSP unique is the "DSP" part in it's name. If you know how to create FIR and IIR filters, this little box offers a lot of room for audio experiments, for example:
  1. Make precise output frequency response adjustments using 10 bi-quad IIR filters per channel.
  2. Create arbitrary crossfeed effects with two parallel blocks of FIR filters and cross-commutation.
  3. Apply headphone target curve correction with FIR filters.
  4. Switch between 4 configurations of the filters that can be tied to different headphone models and crossfeed settings.
My own plan for this box is to replicate Phonitor Mini crossfeed filter, and also to experiment with headphone correction based on the headphone measurements from Inner Fidelity database and Isone MorphIt filter.

I've already mentioned some drawbacks of HA-DSP. Another issue I discovered while making measurements is some non-linearity of the frequency response. My measurements of HA-DSP using MOTU Microbook IIc have shown early roll-off in high frequency range. Below is a frequency response of HA-DSP (blue) compared to Microbook's own (orange):
On this graph, the HA-DSP's response plot uses Microbook calibration from a loopback measurement. The good part, however, is that this roll-off can be corrected using the on-board DSP as part of FIR filters design.


HA-DSP is definitely a device targeted to audio geeks. One would probably need to carefully consider whether they would be able to use it to full extent. If designing filters isn't your hobby, there are definitely better alternatives in terms of cost and quality.

Wednesday, July 5, 2017

Wiistar 5.1 Audio Decoder Teardown

For some experiments that require hardware decoding of Dolby Digital I've acquired a cheap Chinese 5.1 decoder on Amazon -- it costs just $24 so there was not much hesitation while buying it.

The good news is that it's indeed a proper Dolby Digital (AC3) decoder, which also supports upmixing of stereo channels into 5.1 (probably using Dolby Prologic). The bad news is that the quality of the audio output is... consistent with the price of the device.

I've found a post by Alexander Thomas describing previous versions of this device. Compared to what Alexander had observed, the hardware I've bought seems to be somewhat newer:
  1. Instead of CS4985 decoder chip it uses an unidentified DSP chip of a square form.
  2. There is no filtering of the output signals or any "bass management" (sinking of low frequencies from the main channels into the subwoofer's channel).
  3. The unit is powered from a 5 V source instead of 9 V.
  4. The unit provides a 5 V USB power outlet.

There are still some similarities though:
  1. LFE channel lacks +10 dB boost expected by the DD spec.
  2. The board's ground is not connected to the case.

Hardware Teardown

Now let's take our screwdriver and see what's inside the box. This is how the board looks like:

Most of the components are mounted on the top side. Some of the major components can be identified:
  • [1] 4558D is a stereo opamp, this make is by Japan Radio Company (JRC);
  • [4] ES7144LV is a stereo DAC -- the board employs three DAC / opamp pairs;
  • [7] 25L6405D chip is flash memory;
  • [6] NXP 74HC04D is hex inverter chip;
  • [2] AMS1117 is power regulator.
There are two mystery chips:
  • [5] the big one labelled VA669 -- I suppose that's the decoder DSP, having that there are traces coming from it to the DACs, but the actual make and model of the chip are unknown;
  • [3] the one labelled "78345 / 8S003F3P6 / PHL 636 Y" -- judging by its position on the board, it could be a microcontroller handling input selection and "5.1 / 2.1" switches.
And this is the bottom view:

One interesting thing to note is that the labels and holes suggest that this board can be equipped with RCA output jacks per channel, as an alternative to three 3.5" stereo jacks and the 5 V USB outlet. This suggestion is confirmed in the manual:


I was wondering whether this device can be used in any serious setup, and for that I've hooked this device up to the inputs of MOTU UltraLite AVB audio interface.

I needed a test sound file that is AC3-encoded and contains measurement sweeps in all 6 channels. For that purpose, I took the measurement sweep file generated by FuzzMeasure, and used Audacity in order to create a 6-channel file with a sweep in each channel:

Note that ffmpeg library which is used to encode AC3 applies a lowpass filter to the LFE (4th) channel. This will prevent us from seeing the full performance of the LFE channel on the device.

Using a TOSLINK cable I hooked up the device to MacMini's optical output, played back the encoded file, and recorded the decoded analog output using MOTU.

The first thing I discovered was that the surround channels are swapped. That is, they use a reverse of the standard TRS stereo channels mapping where the left channel is on the "tip" contact plate, and the right channel is on the "ring". Instead, the left surround is on the "ring", and the right surround is on the "tip". Perhaps, this was done on purpose to undo the reversal of "left" and "right" if one sets the surround speakers facing him, and then turns around :)

The next discovery was quite a bad shape of the output waves. As one can see, the sine wave is severely clipped at bottom half-waves. This is how the source -3 dBFS sine wave has been rendered:

Input sine wave with smaller amplitude (-6 dBFS) is clipped a bit less:

This is very unfortunate, and is probably caused by a bad design of the output stage. Looks like using the 4558 opamp wasn't the best choice in the first place, and the designers of this board seriously hindered its performance by failing to drive it correctly.

After looking at these horrible output sinewaves, I wasn't expecting a good frequency response, and indeed it's quite bad. Below are the plots for the left channel from a -3 dBFS input signal (blue), and for -6 dBFS input (orange), no smoothing:
The measurements for the remaining channels are the same as for the left -- at least this device is consistent for all channels. Below is left channel (blue) vs. LFE channel (yellow):
This plot confirms that the LFE channel has the same output level as other channels, lacking the required +10 dB boost.

It's very funny to look into the "Technical Data" section of the manual for this device, stating "Frequency Response: (20 Hz ~ 20 KHz) +/- 0.5db":

The authors tactfully omit the level of the input signals used in this measurement (if it actually was performed) -- probably the level wasn't too high.


Looks like this family of devices can't be used in any serious setup. It will be interesting though to try to reverse engineer the electrical design of this board, and fix obvious flaws.

Sunday, June 25, 2017

Little Toolbox for Stereo Filters Analysis

Since I'm very interested in studying different implementations of crossfeed filters, I've came up with a little toolbox for GNU Octave that helps me to compare and decompose them.

Although, some of this analysis can be performed using existing software tools, such as FuzzMeasure (FM) or Room EQ Wizard (REW), my little toolbox offers some nice features. For example:
  • convenient offline processing -- analyze the filter by processing stimulus and response wave files; although this functionality exists in FuzzMeasure (but not in REW), it isn't very convenient for use with binaural filters like crossfeed, because FM assumes stimulus and response to be mono files;
  • microsecond precision for group delay; both FM and REW show group delay graphs, but their units of measurement is milliseconds (makes sense for acoustic systems), whereas in filters induced delays are usually thousand times smaller;
  • IIR filter coefficients computation from frequency response.
The toolbox supports different representations for the filter specification:
  • a pair of stimulus and response wave files; the stimulus file is a stereo file with a log sweep in the left channel; when this file is processed by a typical crossfeed filter, the response wave file is also stereo, and receives the processed signal in both channels with different filters (that's the essence of crossfeeding);
  • a csv file with frequency response of a filter (magnitude response and phase response) for both channels, or two csv files one per channel;
  • IIR transfer function coefficients (vectors traditionally named "B" and "A") for each channel, and the attenuation value for the opposite channel.
The functions of the toolbox can convert between those representations, and plot frequency response and group delay for both channels, and for a pair of filters for comparison.

Usage Example

Let's perform an exercise of applying these filters to the BS2B implementation of crossfade filter. Although there is a source code and a high level description of this implementation, we will consider the filter to be a "black box", and see if we can reverse engineer it.

Preparing Stimulus File

We need a sine sweep from 20 Hz to 20 kHz in order to cover the whole audio range. It turns out, that generating a sweep that best suits our task is not as easy as it might seem. The sweep wave must be as clean as possible (free of noise and other artifacts). Audacity can generate sine sweeps, but the produced signal contains aliasing artifacts that can be clearly seen on the spectrogram. REW also can generate sweeps, and they are free from aliasing, but the log sweep it's not perfect on the ends.

The best sweep I was able to find is generated using an online tool called "WavTones". Here are the required settings:

The downloaded WAV file is mono. For the purpose of analyzing the crossfeed filter, we need to make a stereo file with the right channel containing silence. We will use Audacity in order to make this edit.

But before doing any editing, let's make sure that Audacity is set up properly. What we need to do is to turn off dithering, as otherwise Audacity will inject specially constructed high-level noise when saving files. This usually improves signal-to-noise ratio when playing them, but for us this is undesired, as it will result in contamination of the frequency response with noise. Turning off dithering is performed by setting the "Quality" preferences as follows:

Now we can load the mono log sweep file generated by WavTones, add a second track, and generate silence of the same length as the log sweep. Then make the sweep track "Left Channel", and the silence track the "Right Channel", and join them into a stereo track. The resulting stereo sound wave should look like as below. It needs to be exported as a 16-bit WAV file.

Preparing Response File

I'm using the OS X AudioUnits BS2B implementation assembled by Lars Ggu (?). Audacity can apply AudioUnit filters directly:
After applying BS2B to our stimulus stereo wave, the resulting wave (filter response) looks like this:

As it can be seen, in the response wave the left channel has low frequencies attenuated, whereas the right channel contains a copy of the source wave passed through a low-pass filter, and also attenuated, but by a different value.

Plotting Frequency Response and Group Delay

With my toolbox, this is a straightforward operation. The function 'plot_filter_from_wav_files' takes two stereo wav files for the stimulus and the response, and produces a plot in the desired frequency range:

There is a noticeable jitter in the opposite channel's graph starting at about 2000 Hz mark which is especially visible on the group delay plot. I'm currently working on implementing better smoothing. This is the code of the script that produces these graphs:

fig = plot_filter_from_wav_files(
  [20, 20000],                                % frequency range
  'sweep_20Hz_20000Hz_-6dBFS_5s-LeftCh.wav',  % stimulus file
  'bs2b-sweep_20Hz_20000Hz_-6dBFS_5s.wav',    % response file
  [-14, -1],                                  % amplitude response plot limits
  [-100, 300],                                % group delay plot limits
  200);                                       % gd plot smoothing factor
print(fig, 'plot-bs2b.png');

The plots do correspond with the filter parameters we have specified: the difference in amplitude between direct and opposite channels feed is 4.5 dB, and the opposite channel lowpass filter achieves -3 dB attenuation at 700 Hz. This also corresponds with the original plots on the BS2B page for this filter setting, except that the group delay there is plotted upside down (due to a wrong sign in the group delay calculations in the script provided).

Cross-check with FuzzMeasure

Since FuzzMeasure also allows offline stimulus-response analysis, I've cross-checked the results with it. FM also provides fractional octave smoothing which gets rid of those nasty jitters I have in the plots produced by my Octave scripts:
As I've noted earlier, FM use milliseconds instead of microseconds for group delay. Another inconvenience was the need for saving left and right channel responses as separate audio files.

BTW, FM also produces good quality log sweep waves which can be reliably used for analysis. But the stimulus file generator can only be parametrized on the sampling frequency, and file bit depth.

To Be Continued

This was a very simple example, I will come up with more interesting cases in upcoming posts.

Sunday, May 14, 2017

Clipping In Sampling Rate Converters

In my last post, I investigated clipping of intersample peaks that happen in DACs. But as I had started exploring the entire path of sound delivery, I discovered that digital sound data can arrive to DAC already "pre-clipped". And thus even a DAC with headroom will render it with audible inharmonic distortions.


The reason behind this is inevitable sample rate conversion when sampling rates of the source material and of the DAC do not match. Unfortunately, this happens quite often because during the evolution of digital audio multiple sampling rates come into use. The major "base" sample rates are 44100 Hz originating from CDs (Red Book Audio standard), and 48000 Hz coming from digital video. Plus, there are whole multiples of those rates: 88200, 176400, 96000, 192000 etc.

Having this variety, it's not surprise that sampling rate converters are ubiquitous. Without them, it would be impossible to correctly play, say a 44100 Hz CD audio via a 48000 Hz DAC -- the source audio will be rendered with wrong rate and will have incorrect pitch.

But doing the conversion isn't trivial. What sample rate converter has to do is basically render the sound wave into a mathematical curve, and then resample the values of this curve using the target sample rate. The problem that can occur here is that in a sound wave normalized to 0 dBFS the points of the target sample rate can overshoot this limit.

For example, below is a graph of a 11025 Hz sine wave at 45° phase shift sampled at 44100 Hz (blue dots), and sampled at 48000 Hz (red dots):
As you can see, at the 48 kHz sampling rate the dots are closer to each other, and some of the red dots have values of above (or below) the margins of the original 44.1 kHz sampling rate.

Had the source wave 44.1 kHz wave been normalized to 0 dBFS, the blue dots that currently have approximate values of 0.5 and -0.5 would be at 1 and -1, respectively. Thus, the values of the 48 kHz sampling would end up above 1 (or below -1). Which means, if the converter is using integer representation for samples (16-bit or 24-bit), and doesn't provide headroom, it will not be possible for the converter to render those values, as they will exceed the limit of the integer. Thus, they will be clipped, and this will result in a severe distortion of the source wave.

The same thing can happen in a conversion from 48 kHz down to 44.1 kHz, or when upsampling from 48 kHz to 96 or 192 kHz. Basically, any conversion that results in emerging of new sample values can produce values that exceed the peak value in the source wave. The only potentially "safe" conversion is when the source wave get downsampled to a whole multiple, e.g. from 96 to 48 kHz, because this operation can be performed by simply throwing out every other sample.

Practical Examples

Google Nexus Player

Here am examining sound paths that I have at home. Let's start with Google Nexus Player. It's a rather old thing, and I don't think it pretends to be a "Hi-Fi" player, but nevertheless I use it from time to time, and I would like to see what it does to sound.

This is my setup: the HDMI output from Nexus Player goes into an LG TV, and it separates audio via TOSLINK connection that goes into E-MU 0404 music interface, and then to SPL Phonitor Mini. As in the last post, for measurements I will be using E-MU Tracker Pre card connected to a laptop on battery power.

I use two sound files for test: one is the same as the last time (11025 Hz sine wave at 45° phase in a 44.1 kHz FLAC), and another is 12 kHz sine wave at 45° in a 48 kHz FLAC. Both files were uploaded to my Play Music locker. I'm aware that Play Music uses lossy 320 kbps MP3 on their servers, but for these simple sine wave files this generous bitstream is effectively equivalent to lossless. At least, Play Music doesn't perform any resampling.

Since TVs are designed to be used with video content, their preferred sampling rate for audio is 48 kHz. I haven't found any way to change that setting for my TV. So first in order to test the signal path, I played the 12 kHz sine wave file (48 kHz SR), and captured it from the line output of E-MU 0404 also using 48 kHz sampling rate on Tracker Pre. The result on the frequency analysis is a beautiful clean peak at 12 kHz with no distortions at all:
However, 48 kHz isn't the typical sampling rate for the content on Play Music store--since their source is CD content, most of the albums are using 44.1 kHz sampling rate. Even YouTube uses 48 kHz sampling rate audio as I have discovered (I've checked with VLC player, it can open YouTube video streams). Not sure about the sampling rate used in Play Movies, though.

So let's now play the 44.1 kHz sine wave file using the same setup. The only change I've made is setting the capturing sampling rate to 44.1 kHz on Tracker Pre. And the result is pretty ugly:
If I wasn't really happy about how the frequency analysis looked for Benchmark DAC1, this one simply made my hair stand. The resampler in Nexus Player clips severely. What's even worse, there is not much I can do about that, since there are no controls over digital attenuation or sampling rate. Too bad. At least now I know why snare drum on "Gasligting Abbie" by Steely Dan doesn't sound good when played via this setup.

Dune HD Smart H1

I also have an old Dune HD player connected to the same LG TV. Unlike Nexus Player, Dune offers a lot of control over playback. It also supports FLAC format. Again, I started with playing a 12 kHz sine wave at 48 kHz SR just to make sure that the sound path is clean, and it was all OK.

Then I played a 11025 Hz sine at 44.1 kHz SR, and again got a lot of distortion (although the level of distortion peaks is lower than on Nexus Player):
But here at least I can do something to fix that. I can't change the sampling rate, but Dune offers digital volume control, even in dB scale. I used it to reduce the volume by 4 dB down, providing enough headroom for the resampler, and the result is a beautiful clean 11025 Hz peak:
Great, now I have much more confidence in my setup.

PC-based Playback

By PC I mean Macs as well. On desktops and laptops there is a lot more control over the parameters of the digital audio signal path--it's easy to change the sampling rate on the DAC to match the sampling rate of the source material, also the majority of digital players offer digital attenuation. So there is no problem ensuring that nothing clips the digital signal on its way to the DAC.

The practical advice here is--if you are not sure about the sampling rate of the source material, use the digital volume control on the player to reduce the volume and thus provide some headroom for the sampling rate converter. Setting volume down to -4 dB (or about 80-85% if the volume control uses percents) should do the job.


Sampling rate converters are ubiquitous, and conveniently adapt the source audio stream to ensure that it will play regardless of the sampling rate set on the DAC. However, as we have found out, they are not transparent and can easily clip intersample peaks, thus producing audible inharmonic distortions.

To avoid that, make sure the sampling rates match between the played material and the DAC, or at least reduce the digital volume a bit to offer some headroom for the sampling rate converter.

Sunday, May 7, 2017

DAC Clipping on Intersample Peaks

The article "Intersample Overs in CD Recordings" on Benchmark Media raises interesting topics of intersample peaks, and DAC headroom. In short, this is what the article states:
  • 16-bit 44.1 kHz digital samples can be interpolated to achieve signal-to-noise ratio equivalent of 20-bit systems, and modern DAC chips are capable of that;
  • but these chips don't provide digital headroom, and intersample peaks, when they occur, get clipped, producing audible non-harmonic distortions.
  • Benchmark DAC1 is susceptible to this problem, whereas in DAC2 and DAC3 this issue was addressed by introducing a design involving using an external interpolator, and driving DAC chips at -3.5 dB.
  • Maintaining headroom in DAC is important because in audio recordings normalized to 0 dBFS intersample peaks can easily occur.
So I decided to test the DACs I use on the subject of headroom, and also figure out what can be done to address the clipping problem without resolving to buying DAC2 or DAC3 converters.

Let's take some measurements. I don't have Audio Precision, so I was taking my measurements using an old trusty E-MU Tracker Pre connected to a notebook on battery power. In Audacity I created a 16-bit 44.1kHz sound file containing 11025 Hz sine wave phase shifted to 45° and normalized to 0 dBFS.

Creating Test Sample

BTW, generating this sine wave is not as straightforward as it may seem. The "Generate Tone" Audacity function unfortunately doesn't allow specifying the phase. The workaround is to use very powerful by not so straightforward "Nyquist Prompt" effect instead.

First, generate 10 seconds of silence (it will become selected automatically). Then in "Effect" menu choose "Nyquist Prompt", enter the following, and press "OK":
(osc (hz-to-step 11025) 10 *table* 45))
This will replace the silence with a 11025 Hz sine wave phase-shifted to 45°. Afterwards, normalize it to 0 dBFS by choosing "Effect > Normalize" and entering "0.0 dB" as the target value. The result should look like the left channel on the screenshot below (with "View > Show clipping" option enabled):

The left channel represents the sine wave normalized to 0 dBFS, the right channel shows the same wave normalized to -6 dBFS. Note that Audacity doesn't render sine wave images, like Adobe Audition does, instead it just connects the dots representing sample values.

The red bars on the left channel warn us that these samples will overshoot 0 dBFS when rendered by DAC--that's because the "hat" of the rendered analog sine wave will connect these dots and thus will end up above the maximum value that can be represented using integer values.

Let's look at this sine wave in the frequency domain ("Analyze > Plot Spectrum" in Audacity):
I have changed the default settings of the analysis panel to use Blackman-Harris window and 4096 FFT buckets. This provides the most accurate result for the sine wave. As you can see, the panel shows that the peak of the sine wave is at +3.0 dBFS.


For each of the DACs I tested I was using the following sequence of steps:

  1. Load the test signal wave into VLC audio player, ensure that its volume is set to 100% (unity). Also check the OS sound level, it needs to be at 100% as well.
  2. Connect the outputs of the DAC to the inputs of E-MU, and play the sample several times in order to set up input sensitivity on E-MU at the maximum level right before it starts to clip--this is to maximize signal-to-noise ratio at the input end.
  3. Now record the signal, check in Audacity that the input isn't shown as clipped, so if there was clipping it could only happen at the output DAC, not at the input ADC.
  4. Check the frequency domain to see if there are any extra frequencies in the recorded signal besides 11025 Hz. The presence of extra frequencies mean that the DAC has clipped output and produced inharmonic distortions.
  5. If the DAC is clipping, check whether reducing volume at the player or at the OS level helps to get rid of distortions.
I started with Benchmark DAC1 since it is known that it doesn't provide headroom and will clip. And indeed it does:
Note that E-MU's input sensitivity is not as good as of the Audio Precision frontend used by Benchmark Media for their post, so we don't see the noisy spikes below -90 dBFS, but the presence of extra spikes around the input signal frequency confirms that we indeed can detect whether the DAC clips by using this technique.

The next thing I tested was Objective DAC of JDS Labs make. It has turned out to be producing even harsher distortions:
It was also interesting to find out that due to enormous distortions, the resulting 0 dBFS wave on the left channel was produced at lower level than quieter but having enough headroom -6 dBFS wave on the right channel. That's clearly a disaster.

Do all DACs clip?

Indeed, the results were a bit disappointing--the "audiophile grade" DACs are not very good at dealing with normalized CD recordings. Also, the following statement from the Benchmark Media's post seems to be leaving no hope:
Every D/A chip and SRC chip that we have tested here at Benchmark has an intersample clipping problem! To the best of our knowledge, no chip manufacturer has adequately addressed this problem. For this reason, virtually every audio device on the market has an intersample overload problem. This problem is most noticeable when playing 44.1 kHz sample rates.
I started testing the other DACs I had lying around:
And to my surprise, I found that none of them has the audible clipping problem! Look at the frequency analysis for MB Air (the only one among the listed that has shown any IHD at all):
There are very minor (I would say, inaudible) spikes from IHD, but it looks much cleaner than the results of Benchmark DAC1!

The music production oriented sound interfaces (E-MU and MOTU) actually have no oversample clipping at all--they provide enough headroom. I guess most of the music pros oriented devices do, since during recording and mixing quite loud transients can be produced, and these devices need to handle them.

A bit surprising was the absence of clipping on the another version of Objective DAC (the Mayflower version). I don't have a good enough explanation for that except that the versions of ODAC they use are different:
  • the JDS Labs one uses "UAC1 DAC" (the old revision of ODAC);
  • Mayflower uses "ODAC-revB" (the newer revision, see this post by JDS Labs).
But JDS Labs never mention that "revB" has added headroom, and in fact acknowledge that performance of the DAC at 0 dBFS level is slightly worse than at lower levels. So, still a mystery to me.


But what if you have a DAC that is subjective to clipping, like Benchmark DAC1 or an old version of ODAC? What I tried to do is first to reduce the output volume level on the VLC player--this reduction happens in the digital domain, and then, as a separate experiment--on the DAC itself using OS volume control provided by DAC as part of the USB Audio standard.

Not surprisingly, scaling the peaks below 0 dBFS by reducing the volume level at the player gets rid of distortions.

What's more surprisingly, is that for ODAC, reducing the volume level with OS volume controls (I've set them to -6 dB) also remedies the clipping. That was something new for me since my understanding was that USB Audio volume control would apply to the analog wave that comes out from the DAC chip. But it turns out that at least for ODAC, the chip itself scales down the input digital signal before processing it.

Benchmark DAC1 doesn't provide external volume control via USB Audio protocol, and the volume knob that it has applies the volume control in the analog domain to the signal that has left the DAC chip (already clipped), so it's not helping. The only option to avoid clipping with DAC1 is to use the volume control at the music player.


First of all, big kudos to Benchmark Media for raising awareness about the facts that DACs can clip intersample overs, and that a lot of music recordings actually have them.

But then I would like to steer away from their (not explicit but assumed) conclusion that you should only buy their DAC2 and DAC3 products if you want to avoid the clipping problem. In fact, using pro sound interfaces may be an answer, as well as simply reducing the output volume level. Just don't hesitate to test the resulting signals yourself.


After reading some docs on ODAC / O2 interconnection I have discovered that line out of my ODAC revB is accessible via the "line in" jack on O2's front panel (so it's actually a dual purpose jack--it can serve either as line input for O2 amp or as line output for ODAC--wicked smart!). And I have repeated my measurements on intersample clipping. Nothing changed however--the result look the same as the one recorded via O2's headphone output--no IMD distortions.