Wednesday, July 5, 2017

Wiistar 5.1 Audio Decoder Teardown

For some experiments that require hardware decoding of Dolby Digital I've acquired a cheap Chinese 5.1 decoder on Amazon -- it costs just $24 so there was not much hesitation while buying it.


The good news is that it's indeed a proper Dolby Digital (AC3) decoder, which also supports upmixing of stereo channels into 5.1 (probably using Dolby Prologic). The bad news is that the quality of the audio output is... consistent with the price of the device.

I've found a post by Alexander Thomas describing previous versions of this device. Compared to what Alexander had observed, the hardware I've bought seems to be somewhat newer:
  1. Instead of CS4985 decoder chip it uses an unidentified DSP chip of a square form.
  2. There is no filtering of the output signals or any "bass management" (sinking of low frequencies from the main channels into the subwoofer's channel).
  3. The unit is powered from a 5 V source instead of 9 V.
  4. The unit provides a 5 V USB power outlet.

There are still some similarities though:
  1. LFE channel lacks +10 dB boost expected by the DD spec.
  2. The board's ground is not connected to the case.

Hardware Teardown

Now let's take our screwdriver and see what's inside the box. This is how the board looks like:


Most of the components are mounted on the top side. Some of the major components can be identified:
  • [1] 4558D is a stereo opamp, this make is by Japan Radio Company (JRC);
  • [4] ES7144LV is a stereo DAC -- the board employs three DAC / opamp pairs;
  • [7] 25L6405D chip is flash memory;
  • [6] NXP 74HC04D is hex inverter chip;
  • [2] AMS1117 is power regulator.
There are two mystery chips:
  • [5] the big one labelled VA669 -- I suppose that's the decoder DSP, having that there are traces coming from it to the DACs, but the actual make and model of the chip are unknown;
  • [3] the one labelled "78345 / 8S003F3P6 / PHL 636 Y" -- judging by its position on the board, it could be a microcontroller handling input selection and "5.1 / 2.1" switches.
And this is the bottom view:


One interesting thing to note is that the labels and holes suggest that this board can be equipped with RCA output jacks per channel, as an alternative to three 3.5" stereo jacks and the 5 V USB outlet. This suggestion is confirmed in the manual:


Measurements

I was wondering whether this device can be used in any serious setup, and for that I've hooked this device up to the inputs of MOTU UltraLite AVB audio interface.

I needed a test sound file that is AC3-encoded and contains measurement sweeps in all 6 channels. For that purpose, I took the measurement sweep file generated by FuzzMeasure, and used Audacity in order to create a 6-channel file with a sweep in each channel:


Note that ffmpeg library which is used to encode AC3 applies a lowpass filter to the LFE (4th) channel. This will prevent us from seeing the full performance of the LFE channel on the device.

Using a TOSLINK cable I hooked up the device to MacMini's optical output, played back the encoded file, and recorded the decoded analog output using MOTU.

The first thing I discovered was that the surround channels are swapped. That is, they use a reverse of the standard TRS stereo channels mapping where the left channel is on the "tip" contact plate, and the right channel is on the "ring". Instead, the left surround is on the "ring", and the right surround is on the "tip". Perhaps, this was done on purpose to undo the reversal of "left" and "right" if one sets the surround speakers facing him, and then turns around :)

The next discovery was quite a bad shape of the output waves. As one can see, the sine wave is severely clipped at bottom half-waves. This is how the source -3 dBFS sine wave has been rendered:


Input sine wave with smaller amplitude (-6 dBFS) is clipped a bit less:


This is very unfortunate, and is probably caused by a bad design of the output stage. Looks like using the 4558 opamp wasn't the best choice in the first place, and the designers of this board seriously hindered its performance by failing to drive it correctly.

After looking at these horrible output sinewaves, I wasn't expecting a good frequency response, and indeed it's quite bad. Below are the plots for the left channel from a -3 dBFS input signal (blue), and for -6 dBFS input (orange), no smoothing:
The measurements for the remaining channels are the same as for the left -- at least this device is consistent for all channels. Below is left channel (blue) vs. LFE channel (yellow):
This plot confirms that the LFE channel has the same output level as other channels, lacking the required +10 dB boost.

It's very funny to look into the "Technical Data" section of the manual for this device, stating "Frequency Response: (20 Hz ~ 20 KHz) +/- 0.5db":

The authors tactfully omit the level of the input signals used in this measurement (if it actually was performed) -- probably the level wasn't too high.

Conclusion

Looks like this family of devices can't be used in any serious setup. It will be interesting though to try to reverse engineer the electrical design of this board, and fix obvious flaws.

Sunday, June 25, 2017

Little Toolbox for Stereo Filters Analysis

Since I'm very interested in studying different implementations of crossfeed filters, I've came up with a little toolbox for GNU Octave that helps me to compare and decompose them.

Although, some of this analysis can be performed using existing software tools, such as FuzzMeasure (FM) or Room EQ Wizard (REW), my little toolbox offers some nice features. For example:
  • convenient offline processing -- analyze the filter by processing stimulus and response wave files; although this functionality exists in FuzzMeasure (but not in REW), it isn't very convenient for use with binaural filters like crossfeed, because FM assumes stimulus and response to be mono files;
  • microsecond precision for group delay; both FM and REW show group delay graphs, but their units of measurement is milliseconds (makes sense for acoustic systems), whereas in filters induced delays are usually thousand times smaller;
  • IIR filter coefficients computation from frequency response.
The toolbox supports different representations for the filter specification:
  • a pair of stimulus and response wave files; the stimulus file is a stereo file with a log sweep in the left channel; when this file is processed by a typical crossfeed filter, the response wave file is also stereo, and receives the processed signal in both channels with different filters (that's the essence of crossfeeding);
  • a csv file with frequency response of a filter (magnitude response and phase response) for both channels, or two csv files one per channel;
  • IIR transfer function coefficients (vectors traditionally named "B" and "A") for each channel, and the attenuation value for the opposite channel.
The functions of the toolbox can convert between those representations, and plot frequency response and group delay for both channels, and for a pair of filters for comparison.

Usage Example

Let's perform an exercise of applying these filters to the BS2B implementation of crossfade filter. Although there is a source code and a high level description of this implementation, we will consider the filter to be a "black box", and see if we can reverse engineer it.

Preparing Stimulus File

We need a sine sweep from 20 Hz to 20 kHz in order to cover the whole audio range. It turns out, that generating a sweep that best suits our task is not as easy as it might seem. The sweep wave must be as clean as possible (free of noise and other artifacts). Audacity can generate sine sweeps, but the produced signal contains aliasing artifacts that can be clearly seen on the spectrogram. REW also can generate sweeps, and they are free from aliasing, but the log sweep it's not perfect on the ends.

The best sweep I was able to find is generated using an online tool called "WavTones". Here are the required settings:


The downloaded WAV file is mono. For the purpose of analyzing the crossfeed filter, we need to make a stereo file with the right channel containing silence. We will use Audacity in order to make this edit.

But before doing any editing, let's make sure that Audacity is set up properly. What we need to do is to turn off dithering, as otherwise Audacity will inject specially constructed high-level noise when saving files. This usually improves signal-to-noise ratio when playing them, but for us this is undesired, as it will result in contamination of the frequency response with noise. Turning off dithering is performed by setting the "Quality" preferences as follows:


Now we can load the mono log sweep file generated by WavTones, add a second track, and generate silence of the same length as the log sweep. Then make the sweep track "Left Channel", and the silence track the "Right Channel", and join them into a stereo track. The resulting stereo sound wave should look like as below. It needs to be exported as a 16-bit WAV file.

Preparing Response File

I'm using the OS X AudioUnits BS2B implementation assembled by Lars Ggu (?). Audacity can apply AudioUnit filters directly:
After applying BS2B to our stimulus stereo wave, the resulting wave (filter response) looks like this:


As it can be seen, in the response wave the left channel has low frequencies attenuated, whereas the right channel contains a copy of the source wave passed through a low-pass filter, and also attenuated, but by a different value.

Plotting Frequency Response and Group Delay

With my toolbox, this is a straightforward operation. The function 'plot_filter_from_wav_files' takes two stereo wav files for the stimulus and the response, and produces a plot in the desired frequency range:

There is a noticeable jitter in the opposite channel's graph starting at about 2000 Hz mark which is especially visible on the group delay plot. I'm currently working on implementing better smoothing. This is the code of the script that produces these graphs:

fig = plot_filter_from_wav_files(
  [20, 20000],                                % frequency range
  'sweep_20Hz_20000Hz_-6dBFS_5s-LeftCh.wav',  % stimulus file
  'bs2b-sweep_20Hz_20000Hz_-6dBFS_5s.wav',    % response file
  [-14, -1],                                  % amplitude response plot limits
  [-100, 300],                                % group delay plot limits
  200);                                       % gd plot smoothing factor
print(fig, 'plot-bs2b.png');

The plots do correspond with the filter parameters we have specified: the difference in amplitude between direct and opposite channels feed is 4.5 dB, and the opposite channel lowpass filter achieves -3 dB attenuation at 700 Hz. This also corresponds with the original plots on the BS2B page for this filter setting, except that the group delay there is plotted upside down (due to a wrong sign in the group delay calculations in the script provided).

Cross-check with FuzzMeasure

Since FuzzMeasure also allows offline stimulus-response analysis, I've cross-checked the results with it. FM also provides fractional octave smoothing which gets rid of those nasty jitters I have in the plots produced by my Octave scripts:
As I've noted earlier, FM use milliseconds instead of microseconds for group delay. Another inconvenience was the need for saving left and right channel responses as separate audio files.

BTW, FM also produces good quality log sweep waves which can be reliably used for analysis. But the stimulus file generator can only be parametrized on the sampling frequency, and file bit depth.

To Be Continued

This was a very simple example, I will come up with more interesting cases in upcoming posts.

Sunday, May 14, 2017

Clipping In Sampling Rate Converters

In my last post, I investigated clipping of intersample peaks that happen in DACs. But as I had started exploring the entire path of sound delivery, I discovered that digital sound data can arrive to DAC already "pre-clipped". And thus even a DAC with headroom will render it with audible inharmonic distortions.

Theory

The reason behind this is inevitable sample rate conversion when sampling rates of the source material and of the DAC do not match. Unfortunately, this happens quite often because during the evolution of digital audio multiple sampling rates come into use. The major "base" sample rates are 44100 Hz originating from CDs (Red Book Audio standard), and 48000 Hz coming from digital video. Plus, there are whole multiples of those rates: 88200, 176400, 96000, 192000 etc.

Having this variety, it's not surprise that sampling rate converters are ubiquitous. Without them, it would be impossible to correctly play, say a 44100 Hz CD audio via a 48000 Hz DAC -- the source audio will be rendered with wrong rate and will have incorrect pitch.

But doing the conversion isn't trivial. What sample rate converter has to do is basically render the sound wave into a mathematical curve, and then resample the values of this curve using the target sample rate. The problem that can occur here is that in a sound wave normalized to 0 dBFS the points of the target sample rate can overshoot this limit.

For example, below is a graph of a 11025 Hz sine wave at 45° phase shift sampled at 44100 Hz (blue dots), and sampled at 48000 Hz (red dots):
As you can see, at the 48 kHz sampling rate the dots are closer to each other, and some of the red dots have values of above (or below) the margins of the original 44.1 kHz sampling rate.

Had the source wave 44.1 kHz wave been normalized to 0 dBFS, the blue dots that currently have approximate values of 0.5 and -0.5 would be at 1 and -1, respectively. Thus, the values of the 48 kHz sampling would end up above 1 (or below -1). Which means, if the converter is using integer representation for samples (16-bit or 24-bit), and doesn't provide headroom, it will not be possible for the converter to render those values, as they will exceed the limit of the integer. Thus, they will be clipped, and this will result in a severe distortion of the source wave.

The same thing can happen in a conversion from 48 kHz down to 44.1 kHz, or when upsampling from 48 kHz to 96 or 192 kHz. Basically, any conversion that results in emerging of new sample values can produce values that exceed the peak value in the source wave. The only potentially "safe" conversion is when the source wave get downsampled to a whole multiple, e.g. from 96 to 48 kHz, because this operation can be performed by simply throwing out every other sample.

Practical Examples

Google Nexus Player

Here am examining sound paths that I have at home. Let's start with Google Nexus Player. It's a rather old thing, and I don't think it pretends to be a "Hi-Fi" player, but nevertheless I use it from time to time, and I would like to see what it does to sound.

This is my setup: the HDMI output from Nexus Player goes into an LG TV, and it separates audio via TOSLINK connection that goes into E-MU 0404 music interface, and then to SPL Phonitor Mini. As in the last post, for measurements I will be using E-MU Tracker Pre card connected to a laptop on battery power.

I use two sound files for test: one is the same as the last time (11025 Hz sine wave at 45° phase in a 44.1 kHz FLAC), and another is 12 kHz sine wave at 45° in a 48 kHz FLAC. Both files were uploaded to my Play Music locker. I'm aware that Play Music uses lossy 320 kbps MP3 on their servers, but for these simple sine wave files this generous bitstream is effectively equivalent to lossless. At least, Play Music doesn't perform any resampling.

Since TVs are designed to be used with video content, their preferred sampling rate for audio is 48 kHz. I haven't found any way to change that setting for my TV. So first in order to test the signal path, I played the 12 kHz sine wave file (48 kHz SR), and captured it from the line output of E-MU 0404 also using 48 kHz sampling rate on Tracker Pre. The result on the frequency analysis is a beautiful clean peak at 12 kHz with no distortions at all:
However, 48 kHz isn't the typical sampling rate for the content on Play Music store--since their source is CD content, most of the albums are using 44.1 kHz sampling rate. Even YouTube uses 48 kHz sampling rate audio as I have discovered (I've checked with VLC player, it can open YouTube video streams). Not sure about the sampling rate used in Play Movies, though.

So let's now play the 44.1 kHz sine wave file using the same setup. The only change I've made is setting the capturing sampling rate to 44.1 kHz on Tracker Pre. And the result is pretty ugly:
If I wasn't really happy about how the frequency analysis looked for Benchmark DAC1, this one simply made my hair stand. The resampler in Nexus Player clips severely. What's even worse, there is not much I can do about that, since there are no controls over digital attenuation or sampling rate. Too bad. At least now I know why snare drum on "Gasligting Abbie" by Steely Dan doesn't sound good when played via this setup.

Dune HD Smart H1

I also have an old Dune HD player connected to the same LG TV. Unlike Nexus Player, Dune offers a lot of control over playback. It also supports FLAC format. Again, I started with playing a 12 kHz sine wave at 48 kHz SR just to make sure that the sound path is clean, and it was all OK.

Then I played a 11025 Hz sine at 44.1 kHz SR, and again got a lot of distortion (although the level of distortion peaks is lower than on Nexus Player):
But here at least I can do something to fix that. I can't change the sampling rate, but Dune offers digital volume control, even in dB scale. I used it to reduce the volume by 4 dB down, providing enough headroom for the resampler, and the result is a beautiful clean 11025 Hz peak:
Great, now I have much more confidence in my setup.

PC-based Playback

By PC I mean Macs as well. On desktops and laptops there is a lot more control over the parameters of the digital audio signal path--it's easy to change the sampling rate on the DAC to match the sampling rate of the source material, also the majority of digital players offer digital attenuation. So there is no problem ensuring that nothing clips the digital signal on its way to the DAC.

The practical advice here is--if you are not sure about the sampling rate of the source material, use the digital volume control on the player to reduce the volume and thus provide some headroom for the sampling rate converter. Setting volume down to -4 dB (or about 80-85% if the volume control uses percents) should do the job.

Conclusion

Sampling rate converters are ubiquitous, and conveniently adapt the source audio stream to ensure that it will play regardless of the sampling rate set on the DAC. However, as we have found out, they are not transparent and can easily clip intersample peaks, thus producing audible inharmonic distortions.

To avoid that, make sure the sampling rates match between the played material and the DAC, or at least reduce the digital volume a bit to offer some headroom for the sampling rate converter.

Sunday, May 7, 2017

DAC Clipping on Intersample Peaks

The article "Intersample Overs in CD Recordings" on Benchmark Media raises interesting topics of intersample peaks, and DAC headroom. In short, this is what the article states:
  • 16-bit 44.1 kHz digital samples can be interpolated to achieve signal-to-noise ratio equivalent of 20-bit systems, and modern DAC chips are capable of that;
  • but these chips don't provide digital headroom, and intersample peaks, when they occur, get clipped, producing audible non-harmonic distortions.
  • Benchmark DAC1 is susceptible to this problem, whereas in DAC2 and DAC3 this issue was addressed by introducing a design involving using an external interpolator, and driving DAC chips at -3.5 dB.
  • Maintaining headroom in DAC is important because in audio recordings normalized to 0 dBFS intersample peaks can easily occur.
So I decided to test the DACs I use on the subject of headroom, and also figure out what can be done to address the clipping problem without resolving to buying DAC2 or DAC3 converters.

Let's take some measurements. I don't have Audio Precision, so I was taking my measurements using an old trusty E-MU Tracker Pre connected to a notebook on battery power. In Audacity I created a 16-bit 44.1kHz sound file containing 11025 Hz sine wave phase shifted to 45° and normalized to 0 dBFS.

Creating Test Sample

BTW, generating this sine wave is not as straightforward as it may seem. The "Generate Tone" Audacity function unfortunately doesn't allow specifying the phase. The workaround is to use very powerful by not so straightforward "Nyquist Prompt" effect instead.

First, generate 10 seconds of silence (it will become selected automatically). Then in "Effect" menu choose "Nyquist Prompt", enter the following, and press "OK":
(osc (hz-to-step 11025) 10 *table* 45))
This will replace the silence with a 11025 Hz sine wave phase-shifted to 45°. Afterwards, normalize it to 0 dBFS by choosing "Effect > Normalize" and entering "0.0 dB" as the target value. The result should look like the left channel on the screenshot below (with "View > Show clipping" option enabled):


The left channel represents the sine wave normalized to 0 dBFS, the right channel shows the same wave normalized to -6 dBFS. Note that Audacity doesn't render sine wave images, like Adobe Audition does, instead it just connects the dots representing sample values.

The red bars on the left channel warn us that these samples will overshoot 0 dBFS when rendered by DAC--that's because the "hat" of the rendered analog sine wave will connect these dots and thus will end up above the maximum value that can be represented using integer values.

Let's look at this sine wave in the frequency domain ("Analyze > Plot Spectrum" in Audacity):
I have changed the default settings of the analysis panel to use Blackman-Harris window and 4096 FFT buckets. This provides the most accurate result for the sine wave. As you can see, the panel shows that the peak of the sine wave is at +3.0 dBFS.

Tests

For each of the DACs I tested I was using the following sequence of steps:

  1. Load the test signal wave into VLC audio player, ensure that its volume is set to 100% (unity). Also check the OS sound level, it needs to be at 100% as well.
  2. Connect the outputs of the DAC to the inputs of E-MU, and play the sample several times in order to set up input sensitivity on E-MU at the maximum level right before it starts to clip--this is to maximize signal-to-noise ratio at the input end.
  3. Now record the signal, check in Audacity that the input isn't shown as clipped, so if there was clipping it could only happen at the output DAC, not at the input ADC.
  4. Check the frequency domain to see if there are any extra frequencies in the recorded signal besides 11025 Hz. The presence of extra frequencies mean that the DAC has clipped output and produced inharmonic distortions.
  5. If the DAC is clipping, check whether reducing volume at the player or at the OS level helps to get rid of distortions.
I started with Benchmark DAC1 since it is known that it doesn't provide headroom and will clip. And indeed it does:
Note that E-MU's input sensitivity is not as good as of the Audio Precision frontend used by Benchmark Media for their post, so we don't see the noisy spikes below -90 dBFS, but the presence of extra spikes around the input signal frequency confirms that we indeed can detect whether the DAC clips by using this technique.

The next thing I tested was Objective DAC of JDS Labs make. It has turned out to be producing even harsher distortions:
It was also interesting to find out that due to enormous distortions, the resulting 0 dBFS wave on the left channel was produced at lower level than quieter but having enough headroom -6 dBFS wave on the right channel. That's clearly a disaster.

Do all DACs clip?

Indeed, the results were a bit disappointing--the "audiophile grade" DACs are not very good at dealing with normalized CD recordings. Also, the following statement from the Benchmark Media's post seems to be leaving no hope:
Every D/A chip and SRC chip that we have tested here at Benchmark has an intersample clipping problem! To the best of our knowledge, no chip manufacturer has adequately addressed this problem. For this reason, virtually every audio device on the market has an intersample overload problem. This problem is most noticeable when playing 44.1 kHz sample rates.
I started testing the other DACs I had lying around:
And to my surprise, I found that none of them has the audible clipping problem! Look at the frequency analysis for MB Air (the only one among the listed that has shown any IHD at all):
There are very minor (I would say, inaudible) spikes from IHD, but it looks much cleaner than the results of Benchmark DAC1!

The music production oriented sound interfaces (E-MU and MOTU) actually have no oversample clipping at all--they provide enough headroom. I guess most of the music pros oriented devices do, since during recording and mixing quite loud transients can be produced, and these devices need to handle them.

A bit surprising was the absence of clipping on the another version of Objective DAC (the Mayflower version). I don't have a good enough explanation for that except that the versions of ODAC they use are different:
  • the JDS Labs one uses "UAC1 DAC" (the old revision of ODAC);
  • Mayflower uses "ODAC-revB" (the newer revision, see this post by JDS Labs).
But JDS Labs never mention that "revB" has added headroom, and in fact acknowledge that performance of the DAC at 0 dBFS level is slightly worse than at lower levels. So, still a mystery to me.

Workarounds

But what if you have a DAC that is subjective to clipping, like Benchmark DAC1 or an old version of ODAC? What I tried to do is first to reduce the output volume level on the VLC player--this reduction happens in the digital domain, and then, as a separate experiment--on the DAC itself using OS volume control provided by DAC as part of the USB Audio standard.

Not surprisingly, scaling the peaks below 0 dBFS by reducing the volume level at the player gets rid of distortions.

What's more surprisingly, is that for ODAC, reducing the volume level with OS volume controls (I've set them to -6 dB) also remedies the clipping. That was something new for me since my understanding was that USB Audio volume control would apply to the analog wave that comes out from the DAC chip. But it turns out that at least for ODAC, the chip itself scales down the input digital signal before processing it.

Benchmark DAC1 doesn't provide external volume control via USB Audio protocol, and the volume knob that it has applies the volume control in the analog domain to the signal that has left the DAC chip (already clipped), so it's not helping. The only option to avoid clipping with DAC1 is to use the volume control at the music player.

Conclusions

First of all, big kudos to Benchmark Media for raising awareness about the facts that DACs can clip intersample overs, and that a lot of music recordings actually have them.

But then I would like to steer away from their (not explicit but assumed) conclusion that you should only buy their DAC2 and DAC3 products if you want to avoid the clipping problem. In fact, using pro sound interfaces may be an answer, as well as simply reducing the output volume level. Just don't hesitate to test the resulting signals yourself.

UPDATE

After reading some docs on ODAC / O2 interconnection I have discovered that line out of my ODAC revB is accessible via the "line in" jack on O2's front panel (so it's actually a dual purpose jack--it can serve either as line input for O2 amp or as line output for ODAC--wicked smart!). And I have repeated my measurements on intersample clipping. Nothing changed however--the result look the same as the one recorded via O2's headphone output--no IMD distortions.

Sunday, April 23, 2017

Headphone Amplifier ABX Testing Switch Box

In order to figure out whether it's actually possible to distinguish between reasonably transparent headphone amplifiers I've decided to build a switch box. It's as simple as wiring together three TRS sockets and a 3-pole 2 positions switch. But knowing what amplifier one is listening to can affect the outcome of evaluation. The key to making unbiased judgements is blind testing and randomization.

So I decided to add to the box a "shuffling" switch. The idea is that the person evaluating two amplifiers doesn't know which one is currently active, that is, which amplifier is bound to which position of the switch. This binding is chosen randomly by an assistant, unbeknown to the evaluating person. Schematically the setup looks like this:


Digital signal from the source is converted into analog signal and duplicated to both headphone amplifiers under evaluation. Then outputs from the amplifiers are shuffled (so the actual signal from the Amp A may end up be labelled either "A" or "B", while the signal from the Amp B will be labelled the opposite) and passed to the A/B switch which is controlled by the evaluating person.

The "shuffler" and A/B Switch are encapsulated into one physical box. It looks like this:


As you can see, the state of the shuffling switch (labelled I / O: "Inverse" and "Original") on the back (left photo) can not be seen when looking at the front panel which hosts the A/B switch (right photo).

The shuffling is implemented trivially, here is the diagram for a pair of wires from "A" and "B" inputs:
Thus, when the shuffling switch is in the "O" ("Original") position, "A" and "B" wires from the input correspond to "A" and "B" positions of the A/B switch. When the shuffling switch is in the "I" ("Inverse") position, they are swapped.

Since stereo signal needs 3 wires, this schematics need to be triplicated. Thus for shuffling a 6-pole 2 position switch has to be used, while the A/B switch is a more common 3-pole 2 position.

As one can see from the diagram, there are 2 points where 3 wires need to be connected together (6 points in total for full stereo signal). I've found it handy to use Sparkfun's Square 1" Single Sided proto board, which features connected groups by 3 of through-hole contacts. The inside of the switch box looks like this (the board is on the right):


One last important thing to keep in mind is that before doing any comparisons, the volume levels of the amplifiers must be matched exactly. The human ears are super sensitive to difference in loudness, and a louder sound is always perceived as a "better sounding" one.

In order to align the volume levels, I use the T-Cable I crafted previously and a reasonably precise Agilent U1252B multimeter. Be sure to measure the voltage on both left and right channels. Not every single headphone amplifier I've tested featured precise match of inter-channel voltage levels. On some amps the left channel is louder, one some the right one. Make sure that the voltage levels of the loudest channels match (it doesn't matter if on the Amp A the loudest is the left one, while on the Amp B it's the right one).

Thursday, April 6, 2017

T-Cable for Output Level Measurements and Surprise from Benchmark

When performing headphone amplifier comparisons (actually, any audio-related comparisons), matching output levels is of a paramount importance. Louder sounding equipment always perceived as sounding "better" (unless it is clipping because it has exceeded its capabilities). And human ears are amazingly sensitive to volume levels, even a bit of difference in them may affect our judgements.

That means, before starting any comparisons of headphone amps "by ear" make sure that they have been set up correctly. Two tools that are helpful for this job are: good "true RMS" multimeter, and a special cable that has open contacts for attaching probes (unless one is OK with partially disassembling the amplifier or headphones to reach their contact plates).

That's why I decided to make a simple pass-through 1/4" TRS T-Cable with an outlet where multimeter cables can be connected to. This is how is supposed to be used:



This is how an assembled cable looks like:


After finishing the cable, I decided to test it with my headphone amps. First I tried with SPL Phonitor Mini and AKG K550 headphones. I've connected the T-cable in between, and started playing a 1 kHz sine tone--a simple wave, so the multimeter doesn't have any problem measuring the output level. As I expected, the output level was increasing or decreasing with my volume adjustments, and levels of the left and right channels matched pretty closely (within 1%).

The next was Benchmark DAC1 HDR, and here I've got a big surprise--the levels of the left and right channels were pretty much off from each other--as much as 16%. Something that I wasn't expecting from this piece of equipment. I listened to this sine wave myself, and indeed I noticed that it was shifted to the right, and the amount of shifting was changing as I was adjusting the volume.

I've searched on the web, and found this old thread on ProSoundWeb forum describing exactly the same problem I have, and the conclusion there was that the left / right balance for headphones on DAC1 only holds at a certain output level. This seems pretty strange to me, especially combined with the fact that Benchmark has a remote control. So they put a motorized volume pot in this amp, but couldn't make it to preserve balance across the volume control range?

Having figured out this sad fact, I decided to adjust the balance on the Benchmark. Thankfully, it has a trimpot for that. Here is how my setup was looking like:


The trimpot on this model is easy to find. What it seems to do is adjusting the level of the left channel. After I balanced the channel levels for a level of about 100 mV RMS, I've found that it actually only holds in this region. As soon as you move the volume slider by a couple of marks, the sound is getting slightly out of balance again. Not great, but at least I'm now aware of this issue.

For me, the conclusion is never trust the brands, and always check everything with tools before jumping into any comparisons.

Sunday, April 2, 2017

MOTU UltraLite AVB: Hybrid Stereo + 5.1 Setup

I use MOTU UltraLite AVB as my primary sound interface. It's a versatile and easy to use device, with lots of audio inputs, outputs, and excellent DSP-based routing and mixing capabilities. Once you have created a certain audio setup, UltraLite AVB offers a way to save it and restore it later. For example, I had a setup for 2.1 speaker configuration, and a setup for 5.1 surround configuration (why they have to be different? -- see below), and I was switching between them depending on the material playing.

But switching between setups isn't something that my kids or wife can do easily. So I decided to create a hybrid configuration that can be applied to all my use cases. Here they are:


Use Case 1: This one is active when kids play games. They sit next to the computer, way behind the left and right monitors, so they can't hear them properly. The only speaker that can deliver sound to them is the Cambridge Audio Minx Go located below the computer monitor.

Use Case 2: This one is for playing stereo content. The primary speakers is a pair of JBL LSR305 supported by KRK 10s sub, comprising a 2.1 setup. But since there are also rear KRK RPG2 5 speakers set up for the surround use case, and the center channel, these can be optionally engaged for widening the soundstage and enhancing dialog clarity in movies.

Use Case 3: This is the real 5.1 surround setup where each of the 6 speakers has its own channel to play. However, since the speakers are not full range, bass parts of their channels need also to be routed to the subwoofer, in addition to the LFE content.

The presence of additional speakers in the 2.1 setup doesn't allow it to be used for the 5.1 case. Take for example, the left channel in the 2.1 setup--it needs to be routed into the front left speaker, as well as into the center, and into the rear left speaker, and this routing is incompatible with the 5.1 setup, where the front left channel only goes into the front left speaker.

On the computer side, achieving a hybrid setup is pretty easy. On Mac, in Audio MIDI Setup app it's possible to assign different input channels of a multi-channel audio interface to different configurations, e.g. for stereo use input channels 1 & 2, while for multichannel 5.1 setup, use channels from 3 to 8. Now, the question is, how to configure MOTU to route the channels accordingly.


This actually has turned out to be non-trivial. Mostly because the approach for controlling routing and mixing used in MOTU products is modeled after classical mixing boards used in studios. In practice that means there are restrictions on what can be connected to what, and at which stages effects can be applied.

The main effect I need is the equalizer--to perform some basic room correction. Another important thing is digital attenuation which allows aligning speaker output levels precisely, as knobs on inexpensive powered monitors usually lack required precision--you can do basic alignment with the knobs, but then if you need to make one of the monitors softer, say by 1 dB, the only way to achieve that is by attenuating the corresponding channel on the sound card.

In order to visualize for myself all the allowed connections between mixing stages of MOTU card, I've created the following diagram:


See, it's actually not that simple. Each processing block can be characterized with the following attributes:
  • how many inputs (and outputs) does it allow; typical values are 0, 1, and many. E.g. a sound card input can only output audio data to the DSP, so it have zero mixer inputs, but it can be used as a source to any number of other mixer blocks;
  • whether the block has effects; that's easy to figure out--only blocks that provide effects appear on the Mixing tab of MOTU control UI;
  • what is the way to route the output of the block back to DSP; E.g. the "Mix Aux" block--a plentiful resource, can't output to other mixing blocks, its output can only be chained via another "Mixer Input" block, and this connection is done using the Routing tab;
  • and finally, some stereo blocks can be split into independent mono channels, and some don't.
Note on the Reverb group: it's a special Mix Group because, first, it is the only that contains a reverb effect, and second, other Mix Groups can send to it directly from the Mixing tab, but not to each other. This feature is expressed on the diagram as a special input marked "R".

After figuring out the rules, and having the use cases in mind I've came up with the following diagram of how the blocks should be connected:

Here, "L" / "R" letters on inputs and outputs designate left and right channel. I had to only use one channel on "Sub L" and "Sub R" groups because equalizer settings are different for left and right channels, and unfortunately a Mix Group can't be split into a pair of monos.

This is how the mixing configuration looks on MOTU UI (note that Mix Aux strips didn't fit):


Having a diagram at hand was really helpful to set everything up.

The Reverb group is used for the real channels in order to add a delay. Unfortunately, there is no direct way to set up a delay on MOTU (that's a big deficiency to my view, compared to miniDSP products). The trick was to use a "Pre-delay" setting on Reverb effect, set all other parameters of Reverb to minimum, and compensate with an EQ for a high frequency shelving that Reverb creates. This restores the frequency response, but not the phase, resulting in a non-uniform group delay. But this is hardly noticeable.

As a conclusion, I would say that I greatly appreciate robustness of MOTU configuration abilities, but I would really like to have some "DYI" mode for DSP that would offer the following:

  1. Input bi-quad (or better, multi-pole multi-zero) coefficients directly.
  2. Remove the processing block "specializations".
  3. Input delays directly, not as part of the Reverb effect.