Saturday, November 22, 2025

LXmini Full Range Driver Alternatives

There was one question that could never escape the back of my mind since I have built my desktop version of LXmini—is it possible to fix the distortion issue with the full range driver? This issue was also the major negative point in the Erin's review of LXmini.

Because the full range driver used for LXmini—SEAS FU10RB—was released 15 years ago, I thought, perhaps there are any better alternatives on the market that have been developed with modern materials and better technologies, so they can provide more even (without the prominent bump between 1 and 2 kHz), and hopefully achieve a 10 dB or better improvement in the overall distortion level? I checked distortion measurements for various drivers available on Zaph|Audio and at Erin's Audio Corner, but could not find any small "full range" drivers that would come up as obviously superior to SEAS FU10RB in terms of distortion. One notable exception are drivers made by Purifi, however, with the current global trade situation getting them in the USA is very costly.

The Candidates

Scanning through the available stock of MadiSound and Parts Express, I have came up with a very scarce list of possible candidates for high fidelity full range 3"/4" drivers:

  • SEAS MU10RB-SL—this is the "midrange driver" of the LX521 speaker (I guess, the "SL" suffix are the initials of S. Linkwitz). But the fun fact is that is has the same specs as FU10RB, except for the properties of the suspension. The suspension in the "midrange" version is stiffer, which limits the cone excursion, and as a result the driver has poorer bass. But the bass is not an issue for its use in LXmini—there is a woofer for that, and I was thinking that perhaps the difference in suspension makes a positive effect on reducing distortion (spoiler: not so much!). I haven't found any reputable measurements for this driver so decided to try to measure it myself.

  • MarkAudio MAOP-5. This driver looks a bit exotic—it has no "spider" suspension part (the corrugated fabric supporting the cone), so in contrast to MU10RB-SL, this driver has less suspension force than FU10RB. I was a bit suspicious about the consequences of this design decision, but I have found no measurements for the distortion to reveal the effect of this approach so it was interesting to try to measure this driver myself.

  • Tang Band W3-1878 also has an unusual look thanks to the massive motor and a specially designed "phase plug", which reminds me of grilles that are used on some measurement microphones. This driver was measured by the same Erin on a Klippel jig back in 2011, but the results were not cross-posted to his site. In the post Erin mentions that his distorion measurements are "relative", which most probably means they are not calibrated to a specific SPL standard, and frankly I did not understand how to read this particular distortion graph, but from Erin's own comments the distortion is on a good level.

And that's basically it! I have used two more drivers for comparison:

  • One is obviously the FU10RB itself, to make sure that I'm using its measurements taken under the same conditions as from the contenders.
  • And the midrange driver that I had recovered from a broken Cambridge Audio Minx Go portable speaker. It is build as a "full cone", and the cone is made of paper. This one I used as an "anchor" in my measurements—it would be a miracle if it could actually beat any of the speaker drivers above, and the "miracle" would mean that my measurements have gone wrong.

Measurement Setup

I used my QuantAsylum stack consisting of the original QA401 analyzer, QA460 transducer driver, QA492 microphone preamp (this model is relatively new), and Earthworks M30 microphone. I powered both the QA460 and QA492 from a portable Jackery battery because my mains power is rather noisy, and the laptop was also running on battery power. Still, initially I had some issues with the mains-induced noise which I was asking about on the QuantAsylum forum. As that thread indicates, I traced the issue down to a poorly shielded USB cable which I used to power the QA492. Also, after the conversation with Matt of QuantAsylum, I obtained shorting BNC plugs and used them to cover up any unused inputs both on QA401 and Q492. This has reduced electrical noise to the minimum.

Since Earthworks M30 goes beyond the standard 20 kHz range, I was running the measurements up to 35 kHz and was using 192 kHz sampling rate on QA401. I was only using the log sweep method which is sufficient to get basic understanding of the non-linearities in the measured system. I did not have enough spare time to run the stepped sine method with sufficient resolution, and I was interested in making relative comparisons, so using the log sweep was fine.

Impedance, Sensitivity, and Efficiency

First, I measured for each driver its impedance, sensitivity, and acoustic radiation efficiency.

Measuring impedance is straightforward with QA460. Below is the summary graph for all drivers:

The impedance plots are abbreviated as follows:

  • CA is the Cambridge Audio driver;
  • FU is the SEAS FU10RB (the original driver of LXmini);
  • MA is the Mark Audio MAOP-5;
  • MU is the SEAS MU10RB-SL (the bass-reduced version of FU10RB);
  • TB is the Tang Band W3-1878.

We can see that both MU10RB-SL and MAOP-5 have a nominal 4 Ohm impedance. The FU10RB is also 4 Ohm nominally, but the actual impedance is more like 6 Ohm. And both W3-1878 and the CA speakers are honest 8 Ohm drivers, with CA being a true "midrange" driver has steeply increasing impedance above the midrange.

Sensitivity measurement was done by by applying to the driver under test a 1 kHz test tone with 1 V RMS amplitude, and measuring the resulting SPL from 1 meter distance. Due to no-baffle mounting the result is lower than the "official" sensitivity spec. The difference between the drivers is not very critical because I use a 100 Watt amplifier and listening to the speakers from 70–100 cm distance, so even an 8 Ohm driver can work well assuming that it is efficient acoustically. Below are my results:

Driver SPL dBA (1kHz/1V/1m)
Cambridge Audio (CA) 60
FU10RB (FU) 62
MAOP-5 (MA) 74
MU10RB-SL (MU) 68
W3-1878 (TB) 68

It's interesting that MAOP-5 despite having the same impedance at 1 kHz as the MU10RB-SL ends up being louder. However, the FU10RB is also quieter than W3-1878 (TB) despite that the former has lower impedance. Is it because the non-linearity in the 1–2 kHz region causes losses in acoustic transfer? The Cambridge Audio driver is unsurprisingly the quietest because at 1 kHz it has effective impedance 16 Ohm.

And this brings us to acoustic radiation efficiency. In this test I checked what is the SPL level for the same 1 kHz test tone at the same distance of 1 m, but this time the level of the electrical input for each driver was adjusted to achieve 105 dB SPL at the 5 mm distance from the driver's cone. Note that this is different from the sensitivity, and characterizes the ability of the cone to work efficiently as an ideal piston.

Driver SPL dBA (1kHz/1m)
Cambridge Audio (CA) 66
FU10RB (FU) 72
MAOP-5 (MA) 73
MU10RB-SL (MU) 72
W3-1878 (TB) 65

Here, both SEAS drivers and MAOP-5 show almost the same result, while both W3-1878 and the CA driver are 6 dB worse. The point of measuring the radiation efficiency is that even if one driver has lower distortion than another, and both have the same sensitivity, still due to lower efficiency one of the drivers has to be driven with higher voltage relative to another driver in order to achieve the same SPL at the listening position.

Impulse and Frequency Response

Now the results from a logsweep. Since I was using the driver in an open configuration, the same way as it is mounted in LXmini, and my room is small, I was measuring the logsweep in the close proximity of the driver cone—about 5 mm.

Below are impulse and step responses of CA and W3-1878 drivers:

We can see that CA driver is seriously damped and the impulse decays quickly. While W3-1878 is also damped well, it exhibits quicker back and forth motion, allowing for a wider high frequency extension.

It's interesting to compare the SEAS "siblings":

They look similar, yet the "midrange" MU10RB-SL exhibits more back and forth motion about 200 μs after the initial impulse, but then the oscillations become less severe, although the overall step response ends up being almost the same.

And now the interesting part, this is the IR of MAOP-5 driver computed from a 40–35000 Hz sweep:

The lack of damping is apparent here. Is it—is it good? Certainly not. Initially I thought that perhaps this is a kind of "exotic" drivers that promise "euphonic" non-linearities for a pleasant sound? I started experimenting with the test setup: first I swapped the QA460 amp for the same amp I'm actually using in my LXdesktop setup: QSC SPA4-100, but the IR stayed the same. Then I started playing with the parameters of the sweep and figured out that if I limit the high frequency range to the standard 20 kHz then the ringing is gone:

The IR still has more fluctuations than IRs of other drivers, but at least now there are no high frequency modulations. I suppose, the stiff material of the driver's cone causes it to go into very high frequency oscillations. Although these should be above the human hearing, they still can cause more non-linear behavior when excited. For this driver it is strictly required to use a low-pass filter.

And below are pairwise comparisons of frequency responses of all drivers corresponding to these IRs. In fact, the response of MAOP-5 driver does not change in the range of 40–20000 Hz regardless of the sweep's upper bound frequency:

These are "nearfield" responses so they are not super useful for evaluating a dipole. Still, we can see that CA driver is indeed a "midrange" driver with a steep downwards slope after 10 kHz, so it's clearly unsuitable for use in LXmini-based designs.

There is an expected difference between FU10RB and MU10RB-SL in the bass response, otherwise they are indeed very similar. And finally, this is the comparison between MAOP-5 and FU10RB:

We can see that the overall shapes of the frequency responses are close. One interesting point is the high frequency behavior of these full range drivers. Since all the cones operate in "break-up" mode, the material of the cone affects the response a lot. We can see that both MAOP-5 and W3-1878 have a null at about 9 kHz in this arrangement, while both SEAS drivers have two: near 5 and 7.5 kHz.

Distortion

Finally, the graphs we have been looking for. As I mentioned, the distortion measurement is derived from the same logsweep, I did not use the stepped sine method. But I have done the sweep at two SPL levels (as measured near the driver cone): 105 dB and 96 dB. Below are the measurements for each driver done at the higher level showing 2nd to 4th harmonics (the levels of other harmonics are benign):

And the summary graph comparing them:

Looking at the graphs, almost all the drivers seem to be from the same league—I do not see an obvious winner, but I do see an obvious loser: the CA driver again. For others, as we know, FU10RB has higher distortion levels in the midrange, and MU10RB-SL is not much better, and it has these strange peaks between 2–3 kHz and 3–4 kHz, although they are very sharp so likely not audible. MAOP-5 driver has issues in the 2–3 kHz region, while W3-1878 looks like the most linear with the exception that distortion seriously increases past 10 kHz.

This is a comparative graph at the 96 dB output level:

The peaks after 5 kHz seem to be measurement artifact as I see them for all drivers, it's just they are drowned in noise for other drivers but clearly visible for W3-1878 which was measured on a different day. I suppose, for cleaner results I would need to use the stepped sine method.

Conclusions

We can see that MU10RB-SL variation is not significantly better than the original FU10RB, only a bit. While W3-1878 driver can be thought as a winner from the distortion level perspective, recall that it has lower acoustic radiation efficiency, which means I might need to drive it harder in order to achieve the desired loudness at the listening position. So, it looks like in order to make the final decision I will need to build one sample of LXdesktop with MAOP-5 driver, one sample with W3-1878, and compare them with my original LXdesktop speaker, with all samples tuned to the same target, of course. That should be a fun experiment, looking forward to it!

Saturday, September 20, 2025

Visualizing Phase Anomalies

In my unofficial contest of LCR upmixers I encountered multiple cases where the extracted center channel had audible anomalies. These could most often be described as "musical noise" or "underwater gurgling." This kind of artifacts can also be heard when listening to audio content which had passed through low bitrate lossy codecs. One of the reasons that these artifacts can occur is that the audio signal gets processed in the frequency domain, and during the processing the original phase information has been lost or degraded.

For my experiments, I was using two kinds of signals: pink noise and simple two instruments music. From the information theory perspective they sit at the opposite ends of the spectrum: the noise is entirely chaotic and lacks any meaningful information, while music is highly organized on many layers, so let's consider these cases separately.

Pink Noise

One thing that I personally find interesting is realizing how important the relative phase of various frequency components is. For example, if we look just at the frequency spectrum of the phantom center signal extracted by Bertom Phantom Center 2 from uncorrelated stereo pink noise, we will see that the magnitude spectrum is in fact correct and matches the usual spectrum of a pink noise (maybe it is not as "smooth", but these are very minor irregularities):

Yet, an untrained ear can easily hear that this signal doesn't sound like "clean" pink noise and has many artifacts:

So all the issues are actually due to the phase component. But how to understand what is exactly wrong with it? I'm a "visual" person, so I like looking at graphs. However, the phase of audio signals is challenging to visualize. On its own, it's not nearly as intuitive as the magnitude spectrum. In fact, the visualization of the phase of real world signals is even less intuitive than the time domain (waveform) view.

In the particular case of the noise, the phase must be random, basically like the noise itself:

So if we look at the raw view phase of a "proper" pink noise and try to compare it with the phase of pink noise that has artifacts, we will not be able to see much of a difference. Getting to a visualization that works requires some understanding and creativity.

We can ask ourselves—what is the nature the artifacts that we are observing? That's actually the product of our hearing system which automatically tries to find patterns—repeating fragments—in any information it receives. This is normal because all the important sounds that we need to hear: voices, noises from other creatures, and the sounds of nature also have patterns in them. In a "clean" pink noise everything is very much shuffled and the hearing system is unable to detect any patterns so it just perceives it as "noise" (note since we can name it using only one word, the entire "noise" phenomenon actually is just another sound pattern!).

Since it is possible to generate an infinite amount of correctly sounding versions of pink noise—we can just run the random numbers generator over and over again—presence of artifacts does not mean that we have "deviated" from some perfect condition of the phase of the noise signal. Instead, it simply means that the artifacts are some periodic structures created due to corrupted phase information. Because of that, one way of trying to visualize these artifacts is to use some algorithm which is looking for repeating information. One example of such algorithm is the pitch detector. Fortunately, Audacity includes one, and it indeed shows something for the pink noise with artifacts. Check below:

On the top is the result of pitch spectrogram applied to a clean pink noise, under it there are spectrograms of the noise extracted by Bertom Phantom Center 2, UM225 in mode 6, and Dolby Surround algorithms. We can see that the clean pink noise only shows some patterns at low frequencies, and these actually are just processing artifacts (I've seen patterns at low frequencies when examining a pure 1 kHz sinusoid). But!—the noise with actual audible artifacts shows some patterns in the region of 500–3500 Hz where the human ear is very sensitive, and that's what our ear is hearing.

A Bit More on Phase

So I mentioned above that the phase is very non-intuitive, but I also mentioned that the phase is actually very important for proper signal reconstruction. I would like to expand and illustrate these ideas a bit more before we proceed to musical signals analysis.

First of all, let's separate cases of impulse responses and usual musical signals. I'm bringing up impulse responses here because probably most often you could have seen phase graps in audio analysis tools like Room EQ Wizard (REW). You probably know that the value of the phase, since it's an angle of a periodic function normally only goes from -180° to 180° and wraps there. For impulse responses, the continuity of the phase between adjacent FFT bins is very important. That's why phase views always include an "unwrap" function, which lines up an otherwise "jumpy" phase into a continuous line.

However, for usual musical signals phase unwrapping rarely makes any sense because transitions between FFT bins do not have to produce a continuous phase. Take the case of the noise for example—here the bins change completely independent from one another, and that's why trying to "unwrap" phase of a noise will not produce any meaningful visualization.

Yet, in signals that have some structure, for example, in musical signals—there actually exist very important relationship between phases of groups of bins, but not necessarily ajacent ones. If you recall, the FFT decomposes the source signal into a set of orthogonal sinusoids. Now, if we imagine adding these sinusoids in order to get our original signal back, we can realize that relative phases of sinusoids are very important for creating various shapes of the signal in the time domain. For example, let's consider a pulse which has the initial strong transient part. In order to create that part from a set of sinusoids, their phases must be aligned so that their peaks mostly coincide. As I explained in an older post, the result of summing of sinusoids with similar amplitudes greatly depends on their relative phases. When phases are aligned, two sinusoids can produce a signal with at most +6 dB boost, but if their phases are in an inverse relation, then they can cancel each other completely instead.

Below is an illustration how set of sinusoids forms a pulse signal when summed:

In this picture, we see the time domain view of the original signal—it has the base frequency of 128 Hz, and the first 9 sinusoids (these contribute the most of the signal's energy) below it. We can also see the waveform which we get by summing these 9 sinusoids. It's not quite the original signal yet, but it close enough already. If we kept adding the sinusoids specified by the remaining FFT bins, we would eventually reconstruct the source signal. It's interesting to see that the amplitudes of the basis sinusoids are quite small (0.02 absolute value, or less), yet they manage to create a peak which reaches almost 0.420 times larger (!)—on the positive side and lower than -0.4 on the negative side. In order to be able to achieve this magnification it's very important to maintain alignment between their phases!

The problem is that the alignment itself is not possible to see with a "naked eye", as easy as, for example, we can see a fundamental and its harmonics on a magnitude graph. Phase alignment is much more "technical", in a sense that the values of phases are relative to the phase of the corresponding basis sinusoid at the sample 0, and the change with a different speed depending on the bin frequency. If we look at the usual frequency domain graphs: the magniture and the phase, the phase part is not very "illustrative":

As an another example, on the series of graphs below I'm shifting the pulse forward in time. Since its shape is obviously preserved, the relationship between the phases remains the same because the shape of the signal is not changing, yet the values of phases are "jumping" around with no obvious pattern:

On the other hand, if we try to "play" with phase values we can easily disrupt the phase alignment, and the pulse starts to "smear" or even changes its shape completely. In the examples below, I have tried several things: adding random shifts to the phases—this makes the signal "jittery," replacing all phase values with zeroes—this got me a completely different signal, fully symmetric, and finally, I created a "minimum phase" version of the signal by making sure that it has the most energy in the beginning, like an acoustic pulse:

So, the phase of the signal is really-really important. But if looking at the raw phase graph does not really help us in detecting disruptions of the phase information, what should we use then? The answer is that we should use various derivatives of a spectrogram that take the phase information into account. A "classical" spectrogram only shows the magnitude, which, as we can see, means that we are throwing away half of the information about the signal. But some types of spectrograms incorporate phase information into the picture. For example, below is the "classical" spectrogram of the signals from the last example:

We can see the main problem of this visualization—the spectrogram view loses the information about the exact moment when the pulse happens. But if we use a "reassigned" spectrogram, then the frequency-domain view becomes much sharper in the cases when the phase information is consistent. But "mangled" (randomly shifted) phase also produces a blurry image even on a reassigned spectrogram:

Now we have some clues, let's looks at our music signals.

Music Signals

With the uncorrelated pink noise signal, we were in a strange situation where a "reference" extracted center signal did not exist because in theory there is no correlation between the channels and thus there no "correlated phantom center" to extract. We could only compare the extracted center channel with some "theoretical" pink noise, and look at presence of patterns. However, in the case of music signals I do have the "source of truth"—the signals that I had used to create my mix.

However, another consideration that we need to take into account is that none of the upmixers I tested, except "AOM Stereo Imager D," was able to separate the center instrument from the side instrument cleanly. In other words, the extracted center, instead of containing only the "centered" instrument (the guitar) also had a mix in of the saxophone sound (which was panned hard left). Similarly, the left channel also had the saxopone with a mixed in sound of the guitar. For example, comparing the original clean saxophone (bottom) with the processed version (top), we can see that new harmonics have been mixed into the original signal:

If we look at the extracted center channel (which contains the guitar), we actually can see some blurriness of transients compared (at the top) compared to the original clean signal (at the bottom):

That indicates that the phase of the extracted signal is not as good as it was in the original signal. Even more drastic is phase mangling in the right channel which in the source stereo did not contain any hard panned instrument tracks, and only carried the equivalent with the left channel part of the centered instrument. After extracting it, in the ideal case the right channel should become empty, but instead it contained a very poor sounding mix of both instruments, although at very low volume. For comparison purposes, I have normalized its level with other channels. From looking at the reassinged spectrogram we can see a lot of blurriness so there is no surprise that it sounds pretty artificial:

Conclusions

Looking at how hard it is actually to separate stereo signal into components, I'm amazed with the capabilities of our hearing system that can do it so well. Of course, the extraction techniques based on such low level parameters as cross-correlation can't achieve the same result because they do not "understand" the underlying source signals. Source separation (or stems separation) using neural networks trained on clean samples of various types of musical instruments and speech can produce much better results, especially if the reconstruction is able create natural phase—annotations to some of the tools often mention that.

As for my initial task of finding a visual representation for phase issues, I don't think I have fully succeeded. So far, I've only found representations that can illustrate a problem after it has been detected by ear. But I wouldn't rely on these visualizations alone, without listening, for judging the quality of any algorithm.

Saturday, August 23, 2025

Finding the Best Stereo-to-LCR Upmixer

In my last post on tuning my headphone auralization setup, I noted that some "future work" was needed to improve the sonic quality of stereo-to-LCR upmixing. Specifically, I needed a way to extract the phantom center channel from a standard stereo source while avoiding audible artifacts. Center channel extraction is needed because, from my experience with creating an auralization chain, making the phantom center sound "externalized" (experiencing it out of the head) requires the most effort. So, I went down the rabbit hole of testing center extraction plugins, and I found that there is always a trade-off between extraction efficiency and resulting audio quality.

Upmixing Approaches

First, let's quickly review approaches that can be used for stereo signal upmixing. For a much more comprehensive overview I would recommend checking the PhD thesis work by Sebastian Kraft. Note that for my purpose I only consider the extraction of the center channel, which is one of the simplest form of upmixing. The resulting channel configuration is often called "LCR": "Left, Center, Right."

Mid/Side and Matrix Approaches

The simplest approach for extracting the center channel is Mid/Side decomposition, which I discussed previously in one of my posts. As we know, by summing the left and right channels together, we automatically boost the signal that is identical in both channels, creating the effect of the "phantom center." Many simple plugins, like the excellent and free Voxengo MSED and GoodHertz MidSide Matrix can isolate this Mid signal perfectly.

The problem? This isn't a true center extractor. It's a "center summer." A sound panned hard left is still present in the Mid channel, albeit at half its original amplitude (-6 dB). This means that Mid/Side decomposition doesn't separate what's exclusively in the center from what's panned elsewhere.

A generalization of Mid/Side is "matrix" approaches which allow producing more channels from stereo for playback over surround speakers configuration. Processing is done completely in the time domain, which makes it fast and minimizes the possibility of creating any audible artifacts. The downside is that they are fairly limited in their ability to truly decompose real musical signals into components; what they do is mostly "energy rebalancing." Some tricks that matrix processors can employ include changing the processing coefficients "on the fly" to "steer" the dominant sound to the speaker at the most appropriate location, and also adding a delay to rear channels to create more ambience.

Correlation, STFT, and "Musical Noise"

To achieve better separation, proper upmixers don't just sum or subtract the channels. They analyze the signal using a Short-Time Fourier Transform (STFT), which breaks the audio into tiny time slices and analyzes the frequency content of each slice. Within each slice, the algorithm examines the inter-channel correlation across every frequency bin and the lateral energy. This approach has some similarity to the process our brain uses when analyzing inputs from the left and right ears:

  • If a frequency is highly correlated (i.e., similar in phase and energy) between the left and right channels, it's likely panned to the center.

  • If it's uncorrelated and one channel (left or right) has substantially more energy then it belongs to that channel.

  • But if it’s uncorrelated and the energy is the same in both channels, then it’s usually considered to be the "ambient" component.

  • The interesting case is when the signals are anticorrelated (highly correlated but one channel is a phase inverted copy of another). This kind of signal usually sounds very confusing when played both via stereo speakers and headphones, and I don’t think there is a consensus whether it should be considered to be in the center channel or rather spread into side channels.

Note that instead of aggregating inter-channel correlation across all frequency bins together, we can instead consider groups of bands with similar correlation. As explained in S. Kraft’s PhD thesis, this can be used for inferring the location of each instrument, since in a musical composition the instruments are usually arranged in non-overlapping bands.

The weak point in this elegant approach is phase reconstruction. When the algorithm creates a new center channel by manipulating the magnitude of the frequency bins and then performs an inverse STFT to go back to the time domain, it has to guess what the phase of the extracted center should be. What makes this issue worse is that the STFT approach processes the input signal in "chunks" which are glued together. Thus, the phases for the same frequency bin may not even be continuous between chunks, and this creates audible artifacts, infamously known as "musical noise."

If we use a dual mono signal, for example my favorite is pink noise, then if the algorithm uses the average phase between two channels, it gets the true phase value because both channels contain the same noise. However, the "acid test" that I used for upmixers is an uncorrelated pink noise. The theoretical version of this signal has zero correlation between the left and right channels. Thus, an ideal extractor should produce silence, as there is no "center" information to extract. This all in theory, however, and requires the signal to have an infinite length. In practice any real, finite "uncorrelated" pink noise still has some non-zero correlation happening here and there across frequency bands. The shorter the time interval we are looking at, the more pronounced these spurious correlations are. As an example, below is a graph which shows per-band absolute correlation values for such "uncorrelated" pink noise over time, using STFT, and the resulting phantom center energy that such an algorithm can infer from it:

As the center extractor "locks" on these spurious correlations (values with absolute value close to 1), it can put them into the phantom center (the actual result also depends on the energy balance, as we have noted above). Thus, any sufficiently loud phantom center sound extracted from the uncorrelated noise is purely an artifact of the processing algorithm itself.

As noted in the paper "Frequency-Domain Two- to Three-Channel Upmix for Center Channel Derivation and Speech Enhancement" by E. Vickers, in the "traditional" application for upmixers these artifacts may not be a big issue because when all channels are presented over speakers, acoustic summation of their parts happens, and the resulting acoustical "downmix" may conceal the artifacts. However, in my headphone virtualization application I perform separate processing of the center channel, and as a result the partial artifacts lose their initial match, so the resulting sum can still reveal themselves in the binaural downmix as unnaturally sounding artifacts. This is why my goal is to have an upmixer with minimal artifacts.

Testing the Contenders

To find the best tool for the job, I devised a simple test to evaluate separation quality and artifact generation. I pitted a few different classes of plugins against each other:

  • Bertom Audio Phantom Center plugin which I employed initially for my headphone chain—it uses the STFT approach. While I was working on the post, Tom released the new version called Phantom Center 2 which has a bit more settings, but the underlying principle is still the same.

  • A.O.M. Stereo Imager D—another center extractor which also uses the correlation approach.

  • Inexpensive upmixer plugin from Waves AudioUM225. It has several modes, and I have tried two of them because they sounded completely different: mode 5 ("Steady Center") and mode 6 ("Stereo Preserve").

  • "Industry standard" expensive Halo Upmix plugin by Nugen Audio, which I used in LCR mode, with the "Hard Center" preset.

  • Hardware implementations of surround upmixers: Auro 2D, Dolby Surround, and DTS Neural:X from the Marantz AV 7704 AVR. AV 7704 was configured for LCR output (no surround channels) and a narrow center image.

Since for commercial upmixers their actual implementation is kept in secret, we can only guess it by the results they produce. In order to understand the behavior of upmixers better, I used the following test signals:

  • Correlated stereo pink noise (dual mono signal). This should normally go into the center channel, however some upmixers also "spill" it into side channels (this behavior may be configurable). This spilling may also cause the upmixer to decorrelate the output channels in order to avoid producing comb filtering which can happen when identical signals from different speakers reach the listener at non-equal times.

  • Uncorrelated Stereo Pink Noise. As we have already discussed, this signal in its ideal form has zero correlation between the left and right channels. An ideal extractor should produce very quiet cleanly sounding pink noise.

  • Mono Pink Noise (left channel only). Naturally, this signal also has close to zero correlation between the left and the right channels, however all the energy is on the left side. My expectation is that the extractor should produce perfect silence in the center channel, as all the signal should be panned hard left.

  • Simple music track which I have produced myself by combining dry recordings of a saxophone and an acoustic guitar. The sax is hard-panned to the left channel, and the guitar is panned into the center (dual mono).

Besides these tracks, I also used the Plugindoctor app by DDMF to quickly examine the linearity of plugins and hardware implementations. To my surprise, one of the plugins had some issues with that—maybe it’s good for "artistic" purposes, but in my case the requirement is that the processing algorithm should be as transparent as possible.

The Results

The actual results have turned out to be very interesting. Since my intention was to stick to the "best" upmixer, I introduced a ranking system with 5 dimensions, each on the scale from 5 (best) to 1 (worst). This is what they are:

  • For the correlated pink noise: what is the relative level of the center to sides (ideally, there should be no sides). And what is the audible quality of it—are there any artifacts? I grade both factors on the scale from 5 to 1, and average them.

  • For the uncorrelated pink noise: again, how loud is the center compared to side channels (ideally, there should be no center), and are there any audible artifacts?

  • Same for the mono pink noise (which is also uncorrelated)—does it spill into the center channel (or even the right channel), and does the quality degrade?

  • How well the upmixer extracts the phantom center from music—am I hearing just the central instrument, or am I also hearing the left-panned instrument? Also, what is in the right channel—which should ideally be silent. Are there artifacts in the music—also, for all 3 output channels. Again, I average these scores into one grade.

So, what we have—"phantom center" extractors, following their design goal, are the best in actually extracting the center, but since they use the STFT approach, they have issues with the phasing artifacts which can be heard both on uncorrelated noise and on music.

Whereas surround upmixers may have less artifacts, however they may "spill" even a strongly correlated center into all 3 channels. Again, maybe this is fine with actual multi-speaker playback, but this is not what I would prefer for my application.

Artifacts for the uncorrelated pink noise vary widely. Below are examples of how they sound with:

  • Dolby Surround upmixer
  • A.O.M. Stereo Imager D
  • Waves Audio UM225, Mode 5
  • Bertom Phantom Center 2

Artifacts for music are also quite interesting. Here are some examples from the right channel of the resulting LCR upmix—which should be silent, IMO! Note that the actual level of this channel in the produced upmix was much lower, but I have normalized it to -14 dB LUFS to be able to hear the artifacts more clearly:

  • Dolby Surround upmixer
  • DTS Neural:X upmixer
  • Bertom Phantom Center 2
  • A.O.M. Stereo Imager D

Here is a beautiful diagram with the summary. None of the upmixers is perfect, there is always a tradeoff between the degree of channel separation and the induced artifacts. It seems that for my needs Halo Upmixer works the best (it’s at the top position on the chart):

Other plugins / upmixers are listed in clockwise direction, so the next best is actually the Bertom Phantom Center which I was using before—it has the known problem with artifacts. Then came the hardware implementations of upmixers from AV 7704 with Auro 2D having the best quality. A.O.M. Stereo Imager D could be ranked higher than them because it offers separation which is close to ideal (better than Bertom Phantom Center!), however for some reason it has pretty bad aliasing revealed on a simple dual mono sine signal:

Note that they do not appear if I leave the knobs for Center and Side gains at their default value, but they show up as soon as I start moving them. I contacted A.O.M. about this issue, and their reply was rather edgy, stating that they are not going to fix any "imperfections" in order to avoid potentially changing the sound of the plugin—oh, well.

Finally, cheaper upmixers by Waves Audio seem to have low cost for a reason—they are not very good at separating out the center channel, and exhibit strong artifacts on some of my test signals.

Conclusion

I ended up purchasing Halo Upmixer (just the basic version)—I think it’s worth its money. One big disadvantage of it is reliance on the iLok copy protection system which requires use of a USB dongle (that’s in 2025!). I set it up on my MacBook Air which I use as a portable setup. For a more stationary setup I can use Auro 2D on Marantz AV7704 (note that I haven’t tried that for real, so there might be caveats).

I plan to come up with another post which dives deeper into the analysis of these artifact problems. It’s interesting to see how the artifacts look on visualizations.