Sunday, October 6, 2019

Case Study of LXmini in Our New Living Room

This summer we moved into a new rented house and finally I got some time to set up LXmini in this new environment. I've learned a lot while doing this and hope that sharing this experience could be useful for other people.

Initially I was planning to recreate my old 4.1 surround setup with two pairs of LXminis as front and surround speakers + KRK 10s subwoofer (only for LFE channel). However, I tried watching a couple of movies on a temporary stereo setup of LXminis and decided that stereo image they create is immersive enough and I don't want to complicate the setup with another pair.

The challenges I faced while getting the stereo setup right were different from what I had in our old apartment. First, we have bought a tall wide console for the computer and XBox, and I learned that the console creates strong reflections if speakers are put too close to it. On the other hand if I set the speakers further from the console, they get either too close to the couch or to the side wall. Second, this time I decided to use the subwoofer as a low frequency extension for LXminis but didn't want to compromise their excellent output.

Minimizing Reflections

This is a schematic drawing of the room. Note that the ceiling is quite high and sloped. This reduces vertical room modes significantly. The bad news is that the listening space is asymmetric and narrow. Below are views from the top and from the side, all lengths are in meters:

Blue circles represent the positions of the speakers in my temporary setup. The orange circles is the final setup. I've spent some time looking for the best placement and used a number of "spatially challenging" test tracks:
  • tom-tom drum naturally panned around (track 28 "Natural stereo imaging" from "Chesky Records Jazz Sampler & Audiophile Test Compact Disc, Vol. 3");
  • LEDR test—HRTF-processed rattle sound (track 11 "LEDR" from "Chesky Records Jazz Sampler & Audiophile Test Compact Disc, Vol. 1");
  • phantom center test files from Linkwitz Lab page.
When the speakers were placed too close to the console, LEDR was sounding smeared and so were the phantom center tests. ETC curves were also showing some strong early (< 6 ms) reflections:

I moved the speakers further from the console and placed them wider, so they didn't get too close to the couch. Though, the right speaker was now too close to the right wall. Fortunately, the reflections from the wall can be defeated by rotating the speaker appropriately. The hint that I've read in the notes of S. Linkwitz was to put a mirror to the wall and ensure that from the listening position I see the speaker from the side. Since LXmini is a dipole speaker there is a null at the side, thus the most harmful reflection from the nearby wall is minimized. We can see that on the ETC graphs from the new position (the graphs from the initial position are blended in for comparison):

For the left speaker, instead of the two reflections above -20 dB within the first 6 ms there is now one of a bit lesser power. For the right speaker, the overall level of reflections arriving during the first 6 ms are significantly reduced, and its ETC graph resembles more the ETC of the left speaker.

Playing the test tracks has also confirmed the improvement—now I can feel the rattle sound in LEDR moving in vertical and front-back directions clearly. Also, by avoiding creating strong reflections for the right speaker, I've made it essentially equal to more "spacious" left speaker placement, thus the asymmetry of the listening space doesn't matter anymore. However, the resulting "aggressive" toeing in of the right speaker has narrowed the listening "sweet spot". Apparently, it's not easy to achieve a perfect setup under real life conditions.

Equalizing Speakers

From my previous measurements I knew that the quality of the speaker drivers used in LXminis make them well matched. However, my initial measurements has shown some discrepancy which I wanted to correct:

I'm not a fan of excessive equalization—I believe that our brains are a much more powerful computers than our audio analyzers. But adding a couple of filters to correct for speaker placement seems reasonable here. In this case, I reduced the amplitude of one of the notch filters in LXmini equalization and added a couple more filters:

Note that I didn't do anything below 50 Hz because I plan to use the subwoofer with the crossover frequency at 45 Hz.

Then I adjusted KRK 10s to inhibit its output in the range of 30–60 Hz to "boost" its output at 20 Hz. Here I used filters suggested by Room EQ Wizard for the listening position:

Subwoofer Alignment in Time Domain

This was the most challenging part. I connected subwoofer using a cascaded miniDSP 2x4 HD in the following way:

Additional processing delay, phase shifts, and asymmetric positioning together create a framework which is challenging to analyze. Instead, I decided to apply the approach suggested by the author of Acourate software Dr. Ulrich Brüggemann. The procedure consists of the following steps:
  1. Capture the impulse response of the main speaker using Acourate without the subwoofer.
  2. Capture the impulse response of high-passed main speaker plus the subwoofer. The high frequency part of the response allows Acourate to align these IRs in time.
  3. Convolve both impulse responses with a sine wave from the overlapping region.
  4. By comparing the mutual offsets of the resulting sine waves in the initial transient moment and during sustained period deduce time delay and possibly phase inversion.
As I've learned from my experience, aligning based on a single frequency in the Step 3 may not provide the best results as at low frequencies the phase and the group delay of speakers may fluctuate severely. So instead of using a single sine wave I used a log sweep range in the bass region. This doesn't provide data for aligning initial transients, but for bass frequencies I think the sustained stage is much more important.

Here is how convolutions with a log sweep from 40 to 100 Hz were looking initially for the left and right speaker:

The left graph is mostly aligned, while the right one shows a delay of the main speaker for 2.5 ms. It can be seen that even on the left speaker, the alignment in the low bass region is poorer than at higher frequencies. I don't consider that to be a problem because there the contribution of LXminis is negligible. It's much more important to time align the region where both the sub and the LXminis can be heard together. It's also easy to see that if we attempt to use the crossover frequency (45 Hz) as the anchor point for time alignment, the speakers would be out of phase for higher frequencies which will result in "sagged" frequency response.

To avoid compromising the alignment of the left speaker, I decided to delay the sub for 1.25 ms which improves alignment for the right speaker, but doesn't degrade it too much for the left one. Below are the graphs of LXminis filtered with Linkwitz-Riley 24 dB/oct crossover at 45 Hz and with added subwoofer:

Definitely we can see extended bass range. You can also feel it :) I think, setting the crossover point low allows to get the maximum fidelity from the LXminis + subwoofer combination.

With all this laborious setup done, it's time to enjoy music!

Thursday, September 12, 2019

AES Conference on Headphones, Part 2—ANC, MEMS, Manufacturing

I continue to share my notes from the recent AES Conference on Headphones. This post is about Active Noise Cancellation (ANC), Microelectromechanical (MEMS) technologies for speakers and microphones, and topics on headphones manufacturing, measurement, and modelling.

Active Noise Cancelling

Apparently, Active Noise Cancelling (ANC) is a big thing for the consumer market and is an interesting topic for research because it involves both acoustics and DSP. ANC technologies save our ears because they allow listening to music in noisy environments without blasting it at levels that damage hearing. Typically, ANC is implemented on closed headphones or earphones as their physical construction allows to attenuate some of the noise passively, especially at middle and high frequencies. Low frequency noise has to be eliminated using active technologies. Since this requires embedding electronics into headphones, even for wired models, it also gives headphone designers a good opportunity to add "sound enhancing" features like active equalization and crossfeed.

The obvious approach to active noise cancellation is to put a microphone on the outer side of the ear cup and generate an inverse sound via the speaker to cancel the noise at the eardrum. However, as there is always some leakage of the noise inside the ear, "slow" or too aggressive noise inversion will create unpleasant comb filter effect due to summing of the noise with its delayed inverted copy.

An interesting idea that helps to win some time for more precise noise cancelling is to capture the noise from the ear cup on the opposite ear, as in this case the acoustic wave of the noise will have to travel some extra distance over the head. However, as an engineer from Bose has explained to me, the actual results will depend on the plane of the sound wave with respect to the listener.

One consideration that has to be taken into account when generating an inverse noise is to avoid creating high peaks in the inverse transfer function from notches in the original function. The process that helps to avoid that is called "regularization". It is described in this AES paper from 2016.

Use of ANC puts additional requirements on the drivers used in the headphones. As low frequency noise needs the most attenuation, a high displacement speaker driver is required to produce an adequate counter pressure. This typically requires increasing the size of the driver which in turn increases the amount of distortion at higher frequencies. This paper contains an interesting analysis of these effects for two commercial drivers.

"Hear Through"

"Hear Through" is a commonly used term for describing the technology that presents the environmental sounds to the listener wearing noise cancelling headphones. This is achieved by playing the sound captured by the outer microphone into the corresponding ear (basically, performing a real-time "dummy head" sound field capture which I was describing in the previous post). The Sennheiser AMBEO headset and AKG N700NC headphones implement "hear through", however not perfectly according to my experience—the external sound has some coloration and some localization problems. Although, that doesn't affect the ability to understand speech, it still feels unnatural, and there is some ongoing research to make "hear through" more transparent.

According to the study described in the paper "Study on Differences between Individualized and Non-Individualized Hear-Through Equalization...", there are two factors that affect "transparency" of the played back sound. First, it is the fact that closed headphones act as a filter when interacting with the ear, and this filter has to be compensated. Second, it's already mentioned sound leakage. Because "hear through" involves sound capture, processing, and playing back, it has non-negligible latency that creates a comb filter with the original leaked noise. The researchers demonstrated that use of personal "hear through" equalization (HT EQ, specific both to the person and the headphones) can achieve very convincing reproduction. However, the acquisition of HT EQ parameters has to be performed in an anechoic room (similar to classic HRTF acquisition), and thus is not yet feasible for commercial products.

MEMS Technologies

I must admit, I've been completely unaware of this acronym before I attended the conference. But turns out, this technology isn't something new. A portable computer you use to read this article contains several MEMS microphones in it. The key point about this technology is that resulting devices are miniature and can be produced using the same technologies as the ones used for integrated circuits (IC). The resulting device is packaged into a surface mounted (SMD) component. The use of IC process means huge production volumes are easily possible, and the variation in components is quite low.

Initially I thought that MEMS means piezoelectric technology, but in fact any existing transducer technology can be used for engineering MEMS speakers and microphones: electret, piezo, and even electrostatic as was demonstrated in the paper "Acoustic Validation of Electrostatic All-Silicon MEMS-Speakers".

MEMS microphones are ubiquitous. The biggest manufacturer is Knowles. For example, their popular model SPH1642HT5H-1 that has high SNR and low THD costs less than $1 if bought in batches. Due to the miniature size MEMS microphones are omnidirectional across the whole audio range. Because of this fact I, was wondering whether MEMS microphones can be used for acoustic measurements. Turns out, researchers were experimenting with them for this purpose since 2003 (see this IEEE paper). However, the only commercially available MEMS measurement microphone I could find—from IK Multimedia—doesn't seem to provide a stellar performance.

Engineering a MEMS speaker is more challenging than a microphone due to miniature size. Apparently, the output sound level decays very quickly, so currently they can't be used for laptop or phone speakers. The only practical application for MEMS speakers at the moment is in-ear headphones where the effect of a pressure chamber in an occluded ear canal boosts their level a bit. A prototype of MEMS earphones was presented in the paper "Design and Electroacoustic Analysis of a Piezoelectric MEMS In-Ear Headphone". The earphone is very minimalist and can be made DIY because it basically consists of a small PCB with a soldered on MEMS speaker and a 3D-printed enclosure. The performance isn't satisfying yet, but there is definitely some potential.

Headphones Manufacturing, Measurement, and Modelling

This is a collection of notes that I've gathered from the workshop on "Practical Considerations of Commercially Viable Headphones" (a "workshop" format means that there was no paper submitted to AES), my chats with engineers from headphone companies, and conversations with the representatives of measurement equipment companies.

Speaker driver materials

The triangle of driver diaphragm properties:
The effect of the low mass is in a better sensitivity, as lower force is required to move the diaphragm. Good mechanical damping is needed for producing transients truthfully and without post-ringing. And the higher the rigidity, the more the diaphragm resembles a theoretical "piston", and thus has lower distortion.

In practice, it is hard to satisfy all of the properties. For example, the classical paper cone diaphragm has good rigidity but high mass. Rigid metal diaphragms can lack good damping and "ring". I would also add the fourth dimension here—the price. There are some diaphragms on the market that satisfy all three properties, but they are very expensive due to use of precise materials and a complicated manufacturing process.

Driver diaphragms for headphone speakers are typically produced from various polymers as they can be stamped easily. In terms of the resulting diaphragm quality, the following arrangement has been presented, from worst to best:
However, it looks like even better results are achieved with beryllium foil (used by Focal company), but these diaphragms are quite expensive.

If we step away from dynamic drivers, planar magnetic drivers are very well damped and have a lightweight diaphragm, they also move as a plane. The problem with their production is a high chance of defect, each speaker has to be checked individually, that’s why they mostly used in expensive hi-end headphones. Big companies like AKG, Beyerdynamic, Sennheiser, and Shure use classic dynamic drivers even on their flagship models.

Measurements and Tuning

Regarding the equipment, here is a measurement rig for over-ear and in-ear headphones from Audio Precision. It consists of APx555 Audio Analyzer, APx1701 Test Interface (basically a high-quality wide bandwidth amplifier), and AECM206 Test Fixture to put the headphones on.

APx555 is modular. The one in the picture is equipped with a Bluetooth module, and I've been told that it supports HD Audio protocols: AAC, aptX, and LDAC.

Besides AP's AECM206 test fixture, a full head and torso simulator (HATS), e.g. KEMAR from GRAS can be used. For earphone measurements it is sufficient to use an ear simulator as earphones do not interact with the pinna.

Companies Brüel & Kjær and Listen Inc also presented their measurement rigs and software. Prices on this equipment are on the order of tens of thousands of dollars, which is expected.

Measuring the headphones correctly is a challenging problem. There is a nice summary in these slides, courtesy of CJS Labs. The resulting frequency response curves can vary due to variations in placement of the headphones on the fixture. Usually, multiple trials are required with re-positioning of the headphones and averaging.

When measuring distortion, the first requirement is to perform it in quiet conditions to avoid external noise from affecting the results. Second, since measurement microphones are typically band-limited to the audio frequencies range, THD measurement at high frequencies can't be done adequately using the single tone method. Instead, non-linearity is measured using two tone method (intermodular distortion).

The usual "theoretical" target for headphones is to imitate the sound of good (linear) stereo loudspeakers. The deviation between frequency response of the headphones from the frequency response recorded from loudspeakers using a HATS simulator is called "insertion gain". Ideally, it should be flat. However, listening to the speakers can happen under different conditions: extremes are free field and diffuse field. So the real insertion gain of headphones is never flat, and it is usually tweaked according to the taste of the headphones designer. 

There is one interesting effect which occurs when using closed-back headphones or earphones. Due to ear canal occlusion, the sound level from headphones must be approximately 6 dB higher to create the same perceived loudness as from a loudspeaker. This is called “Missing 6 dB Effect”, and a full description can be found in this paper. Interestingly, the use of ANC could help with reducing the effects of occlusion, see the paper "The Effect of Active Noise Cancellation on the Acoustic Impedance of Headphones" which was presented on the conference.

Speaking of ANC, measuring its performance is yet another challenge due to the absence of industry-wide standards. This is explained in the paper "Objective Measurements of Headphone Active Noise Cancelation Performance".

Modelling and Research

Thanks to one of the attendants of the conference, I've learned about works of Carl Poldy (he used to work at AKG Acoustics, then at Philips), for example his AES seminar from 2006 on Headphone Fundamentals. It provides classical modelling approaches using electrical circuits and two-port ABCD model. The two-port model can be used for simulating in the frequency domain. Time domain simulation can be done using SPICE, see this publication by Mark Kahrs.

However, these modelling are more of "academic" nature. "Practical" modelling was presented by the representatives of COMSOL company. Their Multiphysics software can simulate creation of acoustic waves inside the headphones and how they travel through the ear's acoustic canals and bones. This was quite impressive.

Another interesting paper related to research, "A one-size-fits-all earpiece with multiple microphones and drivers for hearing device research" presents a device that can be used in hearables research. It consists of an ear capsule with two dynamic drivers and two microphones. It is called "Transparent earpiece", more details are available here.

Thursday, September 5, 2019

AES Conference on Headphones, Part 1—Binaural Audio

I was lucky to attend the AES Conference on Headphones held in San Francisco on August 27–29. The conference represents an interesting mix of research, technologies, and commercial products.
I learned a lot of new things and was happy to have conversations with both researchers and representatives of commercial companies that produce headphones and other audio equipment.

There were several main trends at the conference:
  • Binaural audio for VR and AR applications
    • HRTF acquisition, HCF
    • Augmented reality
    • Capturing of sound fields
  • Active noise cancellation
    • "Hear through"
  • MEMS technologies for speakers and microphones
    • Earphones and research devices based on MEMS
  • Headphone production: modeling, materials, testing, and measurements.
In this post I'm publishing my notes on binaural audio topics.

"Binaural audio" here means "true to life" rendering of spatial sounds in headphones. The task here is as follows—using headphones (or earphones) produce exactly the same sound pressure on the eardrums as if it was from a real object emitting sound waves from a certain position in a given environment. It is presumed that by doing so we will trick the brain into believing that this virtual sound source is real.

And this task is not easy! When using loudspeakers, commercially available technologies usually require use of multiple speakers located everywhere around the listener. Produced sound waves interact with the listener's body and ears, which helps the listener to determine positions of virtual sound sources. While implementing convincing surround systems is still far from trivial, anyone who had ever visited a Dolby Atmos theater can confirm that the results sound plausible.

HRTF Acquisition, HCF

When a person is using headphones, there is only one speaker per ear. Speakers are positioned close to the ears (or even inside ear canals), thus sound waves skip interaction with the body and pinnaes. In order to render correct binaural representation there is a need to use a personal Head-related Transfer Function (HRTF). Traditional approaches to HRTF acquisition require use half-spheres with speakers mounted around the person, or use moving arcs with speakers. Acquisition is done in an anechoic room, and measurement microphones are inserted into the person’s ear canals.

Apparently, this is not a viable approach for consumer market. HRTF needs to be acquired quickly and under "normal" (home) conditions. There are several approaches that propose alternatives to traditional methods, namely:
  • 3D scanning of the person's body using consumer equipment, e.g. Xbox Kinect;
  • AI-based approach that uses photos of the person's body and ears;
  • self-movement of a person before a speaker in a domestic setting, wearing some kind of earphones with microphones on them.
On the conference there were presentations and demos from Earfish and Visisonics. These projects are still in the stage of active research and offer individuals to try them in order to get more data. Speaking of research, while talking with one of the participants I've learned about structural decomposition of HRTF, where the transfer function is split it into separate parts for head, torso, and pinnaes which are combined linearly. This results in simpler transfer functions and shorter filters.

There was an interesting observation mentioned by several participants that people can adapt to “alien” HRTF after some time and even switch back and forth. This is why research on HRTF compensation is difficult—researchers often get used to a model even if it represents their own HRTF incorrectly. Researchers always have to ask somebody unrelated to check the model (similar problem in lossy codecs research—people get themselves trained to look for specific artifacts but might miss some obvious audio degradation). There is also a difficulty due to room divergence effect—when sounds are played via headphones in the same room they have been recorded in, they are perfectly localizable, but localization breaks down when the same sounds are played in a different room.

Although use of “generic” HRTFs is also possible, in order to minimize front / back confusion use of head tracking is required. Without head tracking, use of long (RT60 > 1.5 s) reverberation can help.

But knowing the person's HRTF constitutes only one half of the binaural reproduction problem. Even assuming that a personal HRTF has been acquired, it's still impossible to create exact acoustic pressure on eardrums without taking into account the headphones used for reproduction. Unlike speakers, headphones are not designed to have a flat frequency response. Commercial headphones are designed to recreate the experience of listening over speakers, and their frequency response curve is designed to be close to one of the following curves:
  • free field (anechoic) listening environment (this is less and less used);
  • diffuse field listening environment;
  • "Harman curve" designed by S. Olive and T. Welti (gaining more popularity).
And the actual curve is often neither of those, but rather is tuned to the taste of the headphone designer. Moreover, the actual pressure on the eardrums in fact depends on the person who is wearing the headphones due to interaction of the headphones with pinnae and ear canal resonance.

Thus, a complementary to HRTF is Headphone Compensation Function (HCF) which "neutralizes" headphone transfer function and makes headphone frequency response flat. As well as HRTF the HCF can be either "generic"—measured on a dummy head, or individualized for a particular person.

The research described in "Personalized Evaluation of Personalized BRIRs..." paper explores whether use of individual HRTF and HCF results in better externalization, localization, and absence of coloration for sound reproduced binaurally in headphones compared to real sound from a speaker. The results confirm that, however even with "generic" HRTF it's possible to achieve a convincing result if it's paired with a "generic" HCF (from the same dummy head). Turns out, it's not a good idea to mix individualized and generic transfer functions.

Speaking of commercial applications of binaural audio, there was a presentation and a demo of how Dolby Atmos can be used for binaural rendering. Recently a recording Henry Brant's symphony “Ice Field” was released on HD streaming services as a binaural record (for listening with headphones). The symphony was recorded using 100 microphones and then mixed using Dolby Atmos production tools.

It seems that the actual challenge while making this recording was to arrange the microphones and mix all those 100 individual tracks. The rendered "3D image" to my opinion isn't very convincing. Unfortunately, Dolby does not disclose the details of Atmos for Headphones implementation so it's hard to tell what listening conditions (e.g. what type of headphones) they target.

Augmented Reality

Augmented reality (AR) implementation is even more challenging than for virtual reality (VR) as presented sounds must not only be positioned correctly but also blend with environmental sounds and respect the acoustical conditions (e.g. the reverberation of the room, the presence of objects that block and / or reflect and / or diffract sound). That means, an ideal AR system must somehow "scan" the room for finding out its acoustical properties, and continue doing that during the entire AR session.

Another challenge is that AR requires very low latency, < 30 ms for the sound to be presented from human's expectation. The latter is tricky to define, as the "expectation" can come from different sources: a virtual rendering of an object in AR glasses, or a sound captured from a real object. Similarly to how video AR system can display virtual walls surrounding a person, and might need to modify a captured image for proper shading, an audio AR system would need to capture the sound of the voice coming from that person, process it, and render with reverberation from those virtual walls.

There was and interesting AR demo presented by Magic Leap using their AR glasses (Magic Leap One) and Sennheiser AMBEO headset. In the demo, the participant could “manipulate” virtual and recorded sound sources which also had AR presentations as pulsating geometrical figures.

Another example of AR processing application is “Active hearing”, that is, boosting certain sound sources analogous to cocktail party effect phenomenon performed by human brain, but done artificially. In order to make that possible the sound field must first be "parsed" by AI into sound sources localized in space. Convolutional Neural Networks can do that from recordings done by arrays of microphones or even from binaural recordings. 

Capturing of Sound Fields

This means recording environmental sounds so they can be reproduced later to recreate the original environment as close as possible. The capture can serve several purposes:
  • consumer scenarios—accompanying your photos or videos from vacation with realistic sound recordings from the place;
  • AR and VR—use of captured environmental sounds for boosting an impression of "being there" in a simulation;
  • acoustic assessments—capturing noise inside a moving car or acoustics of the room for further analysis;
  • audio devices testing—when making active audio equipment (smart speakers, noise cancelling headphones etc) it's important to be able to test it in a real audio environment: home, street, subway, without actually taking the device out of the lab.

The most straightforward approach is to use a dummy or real head with a headset that has microphones on it. Consumer-oriented equipment is affordable—Sennheiser AMBEO Headset costs about $200—but it usually has low dynamic range and higher distortion levels that can affect sound localization. Professional equipment costs much more—Brüel & Kjær type 4101-B binaural headset costs about $5000, and that doesn't include a microphone preamp, so the entire rig would cost like a budget car.

HEAD Acoustics offers an interesting solution called 3PASS where a binaural recording captured using their microphone surround array can later be reproduced on a multi-loudspeaker system in a lab. This is the equipment that can be used for audio devices testing. The microphone array looks like a headcrab. Here is the picture from HEAD's flyer:
When doing a sound field capture using a binaural microphone the resulting recording can’t be further transformed (e.g. rotated) which limits its applicability in AR and VR scenarios. For these, the sound field must be captured using an ambisonics microphone. In this case the recording is decomposed into spherical harmonics and can be further manipulated in space. Sennheiser offers AMBEO VR Mic for this, which costs $1300, but the plugins for initial audio processing are free.

Friday, May 31, 2019

Measuring Bridged and "Balanced" Amplifier Outputs

For a long time this topic was troubling me—how to measure bridged mode amplifiers properly. The problem here is that without taking precautions it's possible to end up with an amp ruined by a short circuit. I think I've got enough understanding about this matter and got some interesting results by measuring one of the amps I use.

Bridged Mode of Power Amplifers

A lot of commercial stereo amplifiers I've seen have "bridged mode" feature which turns the unit into a mono amplifier of higher power. E.g. on my Monoprice Unity amplifier, one needs to set the mode switch accordingly, connect the "+" wire of the speaker to the right "+" output, and the "-" wire of the speaker to the left "-" output. Obviously, only one input (left) is used in this case.

This mode is implemented in the amplifier by dedicating each of the channels to one wire of the load, and inverting the input to one of the amplifiers. Schematically, it looks like this:

This configuration doubles voltage on the ends of the load compared to regular stereo mode. In theory, this would result in 4x power increase into the same load, but in reality due to various losses it's usually only a bit higher than 3x. For example, the Monoprice Unity 100W amp is specified as delivering 50 Watt/channel into an 8 Ohm load in stereo mode, and 120 W into the same load when bridged, that's 2.4x ratio. Exemplary engineered AHB2 amplifier from Benchmark offers a much higher increase of 3.8x into the same load when in bridged mode.

However, the bridged configuration potentially can add more distortion because each channel effectively "sees" twice less load (e.g. 4 Ohm if an 8 Ohm speaker is connected). Thus, it would be interesting to measure the difference in distortion of bridged vs. regular mode. But here is the catch—the "-" wire of the load is now connected to the second amplifier's output. We can't connect it to the signal ground of an audio analyzer anymore as this would short-circuit the amplifier.

Here is why it happens. Normally, the ground plane of the input audio signal is the same as the ground plane of the output. When using an audio analyzer, this allows directly comparing the input signal from the signal generator to the output:
However, in the bridged configuration the zero voltage point (reference potential) for amp's output is virtual and located "in between" the terminals of the load:
The same situation can be encountered with Class-D amplifiers that are designed for maximum efficiency. In this case so called H-bridge configuration is used. That means, these amplifiers do not offer "single ended" mode at all and always run in bridged mode. Not every Class-D amp use H-bridge, but measurements for this class of amplifiers must be done with caution.

"Balanced" and "Active Ground" Headphone Amplifiers

And we encounter the same problem when we want to measure a headphone amplifier with "balanced" or "active ground" output. Note that the implementation of "balanced" output may vary—in the simplest case it only means that left and right outputs do not share the ground point. This is done to reduce channel crosstalk that occurs due to common-impedance coupling. In this case there is no additional amplifier on the "-" wire, and thus connecting it to the ground of the analyzer input does not cause any issues.

However, if "balanced" headphone output means "doubled circuitry" (essentially, this is the same as "bridging" for a power amplifier), or if the ground channel has a dedicated amplifier path, as in the AMB M3 amplifier (this is called "active ground"), then we must avoid connecting the ground of the output to the ground of the analyzer input.

Measurement Techniques

Since we must avoid connecting the ground of the output to the ground of the input, the simplest solution would be to leave the second wire of the output "floating" and only connect the "+" wire to the signal input of the analyzer. That's what I used myself in the past. In this case, the analyzer will still uses the input ground as a reference. The result might be off due to difference in levels between the "virtual ground" point in the middle of the load and the input ground.

For example, I created a symmetric load consisting of two 4 Ohm resistors. In this case, theoretically there is a 0 V point right between them. In practice, the measured difference between the potentials of the output and input grounds was 0.35 V. That means, it's better to avoid connecting them because this voltage will induce current into the input ground.

However, it's possible to use a second, floating analyzer unit for the output. It's possible to use a battery-powered voltmeter for measuring the voltage across the load, right? The same way, it's possible to use a full analyzer, but only if it's not connected to the input. This way, the analyzer on the output measures the output voltage relative to the output ground, which gives correct results. But operating two analyzers: one for generating signals, and another the measure the output can be cumbersome.

Also, what if we can't split the load, e.g. if we are using a real speaker instead of a resistor load? In this case we need to make a differential measurement. For oscilloscopes, there are special probes for this purpose. QuantAsylum QA401 has differential inputs (marked "+" and "-"). We need to connect one side of the load to the "+" input wire, and the other to the "-", leaving input ground floating. That's OK because the ground is not used as a signal reference anymore. Here is how wiring looks like:
Another advantage of a differential input is that any common mode noise on the probes gets cancelled. What I have noticed is that on a single-ended measurement I see a 60 Hz spike often, but it disappeared immediately after I have switched to differential input—with same amp, same probes, and same connections. That means, the 60 Hz hum is induced into the probes' wires by electromagnetic fields from nearby mains wiring.

Measuring Monoprice Unity 100W Amp

As a practical exercise, I've measured THD and IMD on Monoprice Unity 100W Class-D amplifier. It does not use H-bridge configuration, that means in stereo mode channels are driven from a single end and the "-" wire of the speaker it at the input ground plane's potential.

Bridged mode into 8 Ohm load, differential measurement

First I set the amp to maximum volume and checked with a true RMS voltmeter the potential difference across an 8 Ohm load while driving the input with a 1 kHz sine wave at -10 dBV (that's the nominal consumer line level). The voltmeter was showing 19.55 Vrms. Note that the resulting power value (from the V ^ 2 / R formula) is ~ 48 W, which is twice less than 120 W specified by the amp's manual (perhaps, the manufacturer was using higher level of the input signal). However, these levels seem right to me, in fact usually I don't even run the amp at the maximum volume.

But even that output level is close to QA401's limits on the input voltage (20 Vrms) so I decided to use a split load (2 x 4 Ohm resistors in series) and lowered input signal to -12 dBV. This got me 14.47 Vrms across 8 Ohm load, which is mere 26 W. Over the same load, a differential measurement with QA401 shows 23 dBV peak (agrees with the figure in Vrms), and if the load is specified as 8 Ohm, QA401 also shows 25 W output power—nice.

I also tried measuring with QA401 over half load (4 Ohm). The peak was now 17 dBV (7 Vrms—half of what the full load has), so I had to specify the load in QA401 as 2 Ohm in order to get the same 25 W figure.

Here is what I saw in terms of THD and IMD:

Definitely not outstanding results, especially if we consider that this is at less than 1/4 of the advertised power. One particularly interesting issue is the amount of ultrasonic noise on the IMD measurement. I suppose, this is caused by the fact that this amp uses a weak anti-aliasing filter, as we can see from its frequency response measurement:

The graph is quite fuzzy due to amplifier's non-linearity, but still we can see clearly that the downwards slope on the right is very gentle. This could be good property for a Class-A or Class-AB amplifier, but since Class-D effectively applies sampling to the input signal, the output is better be treated by a brick wall filter.

Single-ended mode into 8 Ohm load

I tried to achieve the same modest 25 W for an 8 Ohm load (remember that the manual states that the amp outputs 50 W into 8 Ohm in single-ended configuration), however with the volume at maximum the reading of the voltmeter reading was only 10.45 Vrms, that's less than 14 W output power. I've increased the input signal level to the nominal -10 dBV, and it got me about 22 W. And even with this lesser power, the THD have increased twice compared to bridged mode, and the dual tone signal for the IMD was overloading the amplifier, so I had to cut it the input for IMD back to -12 dBV (and it still seem to overload).


1. Bridged amplifiers can be measured properly using differential mode of the QuantAsylum QA401 analyzer. If the output voltage is too large, the load can be split to reduce the voltage. Necessary corrections have to be applied if we want QA401 to display proper power figures. It's always possible to double check the results using a true RMS voltmeter.

2. Bridged mode also helps to defeat noise induced into probe wires by electromagnetic fields, especially the notorious 60 Hz hum.

3. The performance of Monoprice Unity 100W amp in single ended mode is quite bad. For driving an 8 Ohm load I would prefer using it in bridged mode.

4. And this result was contrary to my expectations—bridged mode, when driven at lower levels has much less distortion on this amplifier than single-ended mode at nominal level. That's why it's always better to measure first.

Tuesday, March 5, 2019

Amp & DAC box for LXmini (miniBox)

I'm rebuilding my audio system around RME Fireface UCX. According to numerous user posts, RME devices are very reliable, so I'm hoping to achieve overall better stability than with MOTU Ultralite AVB which goes crazy approximately every three weeks. Personally I would tolerate this, but since the system is used by my entire family, hearing from them that "sound is not working AGAIN" has become somewhat annoying.

My surround system is 4.1 configuration. As I've explained in my post about LXmini, their super-stable and super-focused phantom center eliminates the need for the central speaker in the surround configuration. I use LXmini both for the front pair and for the surround pair. The surround pair is about 4 meters away from the audio electronics—the shortest path. However, speaker cables run along the wall, and would be perhaps twice longer. This is why together with the second pair of LXminis I had built a dedicated amplifier and DAC box which is located close to them.

While I'm rebuilding the main system, I've made a small upgrade to this surround box as well. The main change is that I've connected it using a digital SPDIF link, and started using Neutrik SpeakON ports for connecting LXminis. Overall, the box now looks more like a complete system on its own, so I decided to do a quick post about it.

Here is what I've got in it:
It's a 2U half-rack enclosure by All Metal Parts company in the UK. On the lower level I put the QSC SPA4-100 amplifier—I use it for the front pair of LXminis, too. Then there is miniDSP 2x4 HD (one of the options recommended by S. Linkwitz), and an inexpensive SPDIF coaxial to TOSLink optical converter by Monoprice.

This is what I put on the rear side:
Here we have a pair of 4-pole SpeakON connectors for the speakers, and an SPDIF input (RCA). All the power inputs are connected directly: the standard 3-prong AC receptacle on the back of the amplifier, and power adapters from miniDSP and the SPDIF converter. I've also provided an USB wire to miniDSP for the purposes of tuning and diagnostics.

Here is a diagram of the device:
Now the obvious question: why did I chose a digital connection and specifically the coaxial? A digital connection requires only 1 wire, also since miniDSP is then connected using an optical TOSLink, there is complete electrical isolation between the DAC / Amp circuitry of the miniBox and the main system. Then, why didn't I just run a TOSLink cable directly? Mainly because optical cables require a bit more care at corners. A high quality low impedance coaxial cable seems like a more robust option.

The usual concern with using SPDIF is jitter. Would the noise in the cable create more jitter? My particular concern was also due to the fact that the Monoprice coax to optical converter does not isolate the shield of the coaxial input from the ground. Which means, the shield of the cable connects ground planes of audio devices connected to different power outlets—a perfect opportunity for a ground loop-induced noise to occur. Although the coaxial cable I use (Belden 8241) has very low shield resistance, still there can be some noise voltage added to the digital signal.

To check whether there is a real problem, I ran jitter test (at 48 kHz, 24 bit resolution) in ARTA. First using miniDSP's USB input, then using TOSLink connected directly to Fireface's optical output, and finally in the actual working condition when Fireface is connected to another power outlet, with a 5.5 meter coaxial cable running from Fireface's coax output to miniBox. The resulting spectre, measured using QuantAsylum QA401 on the outputs of miniDSP was always the same:
Jitter is minimal and does not depend on how the digital signal gets delivered to miniDSP. Note that similar results for USB and TOSLink input are published at Audio Science Review forum. So I think there is no reason to worry about jitter here.

One more discovery I've made is that the coaxial to TOSLink Monoprice converter I use only supports sampling rates up to 48 kHz. Not a problem for me because this is the sampling rate I intend to run the Fireface at, so it can accept an optical input from Xbox directly. However, would I decide to increase the sampling rate, I would need to look for a different converter.

Saturday, February 9, 2019

rvalue references and move semantics in C++11, Part 2

Continued from Part 1.

Previously we have seen that the compiler can match functions accepting rvalue reference arguments to temporary values. But as we have discussed before, it also helps for program efficiency to avoid copying when the caller abandons ownership of an object. In this case, we need to treat a local variable as if it were a temporary. And this is exactly what std::move function is for.

On the slide example, there are two overloads for the function "foo": one for lvalue references, and one for rvalue references. The calling code prepares a string, and then transfers ownership to the callee by calling std::move on the variable. The agreement is that the caller must not use this variable after calling std::move on it. The object that has been moved from remains in a valid, but not specified state.

Roughly, calling std::move on a value is equivalent to doing a static cast to rvalue reference type. But there are subtle differences with regard to the lifetime of returned temporaries stemming from the fact that std::move is a function (kudos to Jorg Brown for this example). Also, it's more convenient to call std::move because the compiler infers the return type from the type of the argument.
There are well known caveats with the usage of std::move.

First, remember that std::move does not move any data—it's efficiently a type cast. It is used as an indication that the value will not be used after the call to std::move.

Second, the practice of making objects immutable using const keyword interferes with usage of std::move. As we remember, rvalue reference is a writable reference that binds to temporary objects. Writability is an important property of rvalue references that distinguish them from const lvalue references. Thus, an rvalue reference to an immutable value is rarely useful.
This one I see a lot of times. Remember we were talking that expressions have two attributes: the type and the value category. And that an expression which evaluates a variable has an lvalue category. Thus, if you are assigning an rvalue reference to something, or passing it to another function, it has an lvalue category and thus uses copy semantics.

This in fact makes sense, because after getting some object via an rvalue reference, we can make as many copies of it as we need. And then, as the last operation, use std::move to steal its value for another object.
And finally—sometimes people want to "help" the compiler by telling it that they don't need the value they are returning from function. There is absolutely no need for that. As we have seen earlier, copy elision and RVO were in place since C++98.

Moreover, since calling std::move changes the type of the value, a not very sophisticated compiler can call a copy constructor because now the type of the function return value and the type of the actual value we are returning do not match. Newer compilers emit a warning about return value pessimization, or even optimize out the call to std::move, but it's better not to do this in the first place.
We have discussed how use of move semantics improves efficiency of the code by avoiding unnecessary data copying. Does it mean that recompiling the same code with enabled support for C++11 and newer standards would improve its performance?

The anwer depends on the actual codebase we are dealing with. If the code mostly uses simple PODs aggregated from C++ standard library types, and the containers are from the standard library, too, then yes—there can be performance improvements. This is because the containers will use move semantics where possible, and the compiler will be able to add move constructors and move assignment operators to user types automatically.

But if the code uses POD types aggregating primitive values, or homebrew containers, then performance will not improve. There is a lot of work to be done on the code in order to employ the benefits of move semantics.
In order to consider the changes between C++98 and C++11 in more detail, I would like to bring up the question of efficient parameter passing practices. Those who programmed in C++ for long enough know these conventions by heart.

On the horizontal axis, we roughly divide all the types we deal with into three groups:
  - cheap to copy values—small enough to fit into CPU's registers;
  - not very expensive to copy, like strings or POD structures;
  - obviously expensive to copy, like arrays; polymorphic types for which passing by value can result in object slicing are also in this category.

On the vertical axis, we consider how the value is used by the function: as an explicitly returned result, as a read only value, or as a modifiable parameter.

There is not much to comment here except for the fact that unlike the C++ standard library, the Google C++ coding style prohibits use of writable references for in/out function arguments in order to avoid confusion.
What changes with C++11? Not much! The first difference is obvious—C++11 gets move semantics, thus functions can be overloaded for taking rvalue references.

The second change is due to introduction of move-only types, like std::unique_ptr. These have to be passed the same way as cheap to copy types—by value.

Then, instead of considering whether is a type is expensive to copy, we need to consider whether it is expensive to move. This brings the std::vector type into the second category.

Finally, for returning expensive to move types, consider wrapping them into a smart pointer instead of returning a raw pointer.
As a demonstration of why it's now efficient to pass vectors by value, let's consider the case when copy elision is not applicable due to dynamic choice of returned value.

In C++98, this code results in copying the contents of the local vector, this is obviously inefficient.
However, a C++11 compiler will call a move constructor in this case. This demonstrates why a type with efficient move implementation can always be returned by value from functions.
Let's return to our slides explaining different ways of passing data between functions. In C++98, it makes no difference for the caller how the callee will use a parameter passed by a const reference. For the caller, it only matters that it owns this data and the callee will not change it.

If we consider the callee implementation, if it's possible for it not to make a copy of the data, it's likely a different action or algorithm from the one that requires making a copy. I've highlighted this difference on the slide by giving the callee functions different names.

And we have no performance problems with the function that doesn't need to make a copy—function B. It's already efficient.
C++11 can help us with the second case. Now we can prepare two versions of the callee: one for the case when the caller can't disown the data, and one for the case when it can. Clearly, these callers are different now, that's why they are given different names on the slide.

In the first case we still need to make a copy, but in the second case we can move from the value that the caller abandons. It's interesting that after obtaining a local value, it doesn't really matter how it has been obtained. The rest of both overloads of the function C can proceed in the same way.
Which brings us to the idea that we can unify these two overloads, and require the caller to either make a copy, or to transfer ownership for the value it used to own.

This relieves us from the burden of writing two overloads from the same function. As you can see, the call sites do not change!

In fact, this is the approach that the Google C++ Style Guide highly recommends for using.
This idiom doesn't come for free. As you can see, there is always an additional move operation that happens on the callee side.

And since the copy operation now happens at the caller side, there is no way for the callee to employ delayed copying.

However, the pass-by-value idiom works very nicely with return-by-value, because the compiler allocates a temporary value for the returned value, and then the callee moves from it.
Since using pass-by-value idiom apparently has costs, why does the Google C++ Style Guide favors it so much?

The main reason is that it's much simpler to write code this way. There is no need to create const lvalue and rvalue reference overloads—this problem becomes especially hard if you consider functions with multiple arguments. We would need to provide all the combinations of parameter reference types in order to cover all the cases. This could be simplified with templates and perfect forwarding, but just passing by value is much simpler.

The benefit of passing by value also becomes clear if we consider types that can be created by conversion, like strings. Taking a string parameter by value makes the compiler to deal with all the required implicit conversions.

Also, if we don't use rvalue references as function arguments, we don't need to remember about the caveat that we need to move from them.

But we were talking about performance previously, right? It seems like we are abandoning it. Not really, because not all of our code contributes equally to program performance. So it's a usual engineering tradeoff—avoiding premature optimization in favor of simpler code.
Does it mean we can apply pass-by-value idiom to any code base? The answer is similar to the one that we had on the "silver bullet" slide. Apparently, that depends on whether types in your codebase implement efficient move operations.

Also, applying this idiom to an existing code base shifts the cost of copying to callers. So any code conversion must be accompanied with performance testing.
Having pass-by-value in mind, let's revise our table with recommended parameter passing. Types that can be moved efficiently, can be passed by value both as in and out parameters. Also, instead of passing an in/out parameter using a pointer, you can pass it as an input parameter, and then get back as an out parameter, both times by value.

Expensive to move and polymorphic types should still use pass-by-reference approach.

As you can see, there is no place for rvalue reference arguments here. As per Google C++ Style guide recommendations, they should only be used as move constructor and move assignment arguments. A couple of obvious exceptions are classes that are have a very wide usage across your codebase, and functions and methods that are proven to be invoked on performance critical code paths.
So we have figured out that we need to know how to write move constructors and move assignments. Let's practice that.

There is a trivial case when a compiler will add move constructor and move assignment automatically to a structure or a class, and they will move the fields. This is possible in the case when the class is a trivial POD, and doesn't have user-defined copy constructor, copy assignment operator, and destructor.

For this example, I've chosen a simple Buffer class—we do use a lot of buffers in audio code.

Let's start with the class fields and constructors. I've made is a struct just to avoid writing extra code. So we have size field, and a data array. To leverage automatic memory management, I've wrapped the data array into a std::unique_ptr. I've also specified the default value for the 'size' field. I don't have to do that for the smart pointer because it's a class, thus the compiler will initialize it to null.

I defined a constructor that initializes an empty buffer of a given size. Note that by adding empty parenthesis to the array constructor you initialize its contents with zeroes. I marked the constructor as explicit because it's a single argument constructor, and I don't want it to be called for implicit conversions from size_t values.

Note that I don't have to define a destructor because unique_ptr will take care of releasing the buffer when the class instance will get destroyed.

And we don't have to write anything else because as I've said, the compiler will provide a move constructor and a move assignment operator automatically. But it will not be possible to copy an instance of this class because unique_ptr is move-only.
So let's define and implement a copy constructor. It takes a const reference to a source instance. I use a feature of C++11 called delegating constructor, which allows me to reuse the allocation code from the regular constructor. After allocating the data in the copy recipient, I copy the data from the source. Note that a call to std::copy with all parameters being a nullptr is valid, thus we don't need a special code to handle copying of an empty buffer.

In this case the compiler does not add move constructor and move assignment by itself. And that means, any "move" operations on this class will in fact make a copy of it. That's not what we want.

So, we tell the compiler to use the default move assignment and move constructor. These will move all the fields that the class has. Also simple.
By the way, we haven't provided a copy assignment yet. The compiler generates a default version, but suppose we need to write one by hand.

This is one of the idiomatic approaches, it's called copy-and-swap. I've seen it mentioned in early S. Meyers books on C++. We make a copy of the parameter into a local variable—this calls the copy constructor we have defined earlier. Then we use the standard swap function to exchange the contents of the default initialized (empty) values of our current instance with the copy. As we exit the function, the now empty local buffer will be destroyed.

The advantage of this approach is that we don't need to handle self-assignment specially. Obviously, in case of a self-assignment there will happen an unneeded copying of data, but on the other hand, there is no branching which is also harmful for performance.
Now suppose we also want to implement a custom move assignment. A naive attempt would be to use the same approach as for copying. In a move assignment we receive a reference to a temporary, so we could swap with it.

Unfortunately, we are creating an infinite loop here, because the standard implementation of the swap function uses move assignment for swapping values! That makes sense, but what should we do instead?
In order to break the infinite loop, we need to implement the swap function for our class. In fact, this function is implemented by virtually any C++ standard library class.

Note that it takes its in / out parameter by a non-const reference. This is allowed by the Google C++ Style Guide specifically for this function, because it's a standard library convention.

It's important to mark this function as noexcept because we will use it in other noexcept methods.

The implementation of this function simply calls the swap function for every field of the class. As I've said, this functions is defined for all language types and C++ standard library classes, so it knows how to perform swapping of values having size_t and std::unique_ptr types.

Note that the idiomatic way to call the swap function is to have the "using std::swap" statement and then just say "swap", instead of calling "std::swap" directly. This is because this pattern allows to call user-defined swap functions. The full explanation is available in the notes on Argument-dependent lookup.
We also define an out-of-class swap function which takes two Buffers and swaps them.

Note that adding the friend keyword makes this function to be an out-of-class, even if we declare and define it within our class definition.
Now we can implement move constructor and the move assignment trivially using the swap function. A move constructor is simply a call to swap. A move assignment must in addition return a reference to our instance.

Both functions are marked noexcept.
In fact, by employing pass-by-value, we could merge the implementations of copy assignment and move assignment into one function as demonstrated here. This is called a "unified" assignment operator.

As we have discussed before, this implementation costs an additional move operation. Also, the copying now happens at the caller side. Maybe not the best approach for a class that will be used widely.
Here is the complete code for the class, and it fits on one slide. As you can see, writing simple classes that support move semantics is simple. But as usual with C++ you have an infinite number of alternatives to choose from.
What I have explained to far, were basics. It's very important to understand them before going any further.

And when you are ready to, here are some references. I've used these materials while preparing this talk. The first is the Scott Meyer's book on effective modern C++ techniques—it contains an entire chapter on rvalue references and move semantics.
Then it's the Google C++ Style Guide which I was quoting often in this talk. It provides sensible guidelines that help writing understandable code.
Abseil library C++ tips contain a lot of explanations and examples on not so obvious behavior of C++.
Thomas Becker's article is a good place to dive into rvalues and move semantics details.
The "Back to Basics!" talk by Herb Sutter contains interesting discussions regarding the C++11 and C++14 features.

And, for getting more insight into C++ history, and into the history of rvalue references and move semantics, there is the Bjarne Stroustrup's book, and the original proposals to the standard.