Monday, August 26, 2024

LXmini Desktop Version (LXdesktop)—Part III: Binaural Tuning

This is another "technical note" about my experience of building and tuning a pair of desktop speakers with a DSP crossover, based on the original design of the LXmini by Siegfried Linkwitz. This post is about the aspect of tuning which helps to obtain the most natural presentation of the audio scene encoded as a stereo format. As Linkwitz itself explains in this talk "Accurate sound reproduction from two speakers in a living room", a stereo representation is no more than an illusion which only appears in the brain of the listener. However, this can be a rather realistic illusion. It's realistic when the listener is able to forget that the sound scene which he or she is hearing is created using the speakers. Ideally, the speakers themselves should not be registered by the auditory brain as the sound sources.

In the Q&A section of the talk, in particular, on this video fragment, somebody is asking Siegfried what is the recommended speaker setup for a small room. And he recommends putting the speakers wider, and to place the listening spot closer to them. That's in fact what I've done in my setup (see one of the previous posts which illustrates the arrangement). The idea behind this setup is to create sort of "giant headphones"—this characteristic is attributed in to the sense of envelopment that this setup can achieve. In fact, the sound of speakers located at some distance is more natural for our auditory brain than the sound from headphones because the sound from the speakers gets filtered by our natural HRTF, thus it's easier for the brain to decode it properly. However, our perception of sound from these "giant headphones" suffers both from a strong interaction between the speakers themselves—that's the crosstalk affecting the center image, and between the speakers and the room—this interaction produces reflections and additional reverb not existing on the recording—that's the "sound of the room."

The good part is that in my unusual setup the predominant dipole radiation pattern of the speakers is supposed to reduce the crosstalk without resorting to DSP tricks. And as for reflections, they can be filtered out by the auditory brain when they are sufficiently separated from the direct sound, and—that's another interesting point from Siegfried's talk—has the spectral content which is similar to the direct sound. The last topic is actually complex and different people have different views on it. However, the cross-talk cancellation is something that can be easily measured.

Cross-talk Cancellation

I have made two types of measurements: one is the usual log sweep which allows recreating the impulse response and window it as necessary, and another kind is a "steady state" measurement produced by taking "infinitely" averaged RTA of a pink noise. Both measurements are made using a "dummy head" technique, so they are binaural. However, since I don't have a proper head and torso simulator at home, I just use my own head and binaural microphones by The Sound Professionals built with the XLR outputs option so that they can be connected to the same sound card used to drive the speakers. I use REW for these captures, and I have purchased the "multi-mic input" option which is essential for this job since I want to record both the ipsi- and contra-lateral ear inputs at the same time.

The typical way to measure the effectiveness of the cross-talk cancellation (XTC) efficiency is to consider the measurement at the ipsilateral (closer to the speaker) ear and see by how much must it be attenuated in order to obtain the same result as the measurement at the contralateral (farther from the speaker, shadowed by the head) ear. The resulting frequency response curve is the profile of the effective attenuation.

So let's see. If we look at the steady state response, the XTC from my speaker arrangement is quite modest—around -10 dB in the best case. Below is the spectrum of the attenuation for the left and for the right ear:

However, if we look at the direct sound only, by applying a frequency-dependent window (FDW) of 8 cycles to the log sweep measurement, results look much better, showing consistent attenuation values between -20 and -10 dB. It works better for one ear due to asymmetry of the setup:

Note that deep notches as well as a couple of peaks are due to comb filtering from reflections and the effects of the dipole pattern itself. I must warn that just looking at what seems to the eye as the "average value" on the graph and taking this as the suppression efficiency measure may be self-deceiving. In order to calculate the actual negative gain of the XTC I have measured the difference in RMS level of pink noise filtered via the impulse responses of these natural attenuation filters. The results are somewhat modest: -4.7 dB for the sound of the left speaker and -4.9 dB for the sound of the right speaker.

For comparison of performance with DSP XTC solutions, I have checked the Chapter 5 of the book "Immersive Sound" which talks about BACCH filters. There is a graph of a similar measurement that I have done, they have made it using a Neumann KU-100 dummy head in a real (non-anechoic) room using speakers set up at 60 degrees, 2.5 meters distance from the head, with their filters turned on. The Figure 5.12 of the book presents the measured spectrum at the ipsi- and contra-lateral ears, and similarly they measure the effectiveness of the XTC by subtracting these. I have digitized their graphs and derived the attenuation curve, it is presented on the graph below as the brown solid curve, and I have changed my curves to dashed and dotted lines for readability purposes:

We can see that the BACCH XTC does a better job, except in the region of 7–10 kHz. Also note that since I have a single subwoofer there is no attenuation below 100 Hz. The author of the chapter calculates the level of attenuation as an average of the frequency response curve values across the audible spectrum, and their result is -19.54 dB. However, since I had digitized their graph, I could build a filter which implements it and measure the resulting decrease of the RMS level of ping noise, the same method that I used for my measurements. Measured this way, the effective gain of the BACCH XTC is -8.86 dB. This is still better than my result, but only by 4 dB. So I must admit, DSP can do better job than natural attenuation due to the speaker arrangement and the radiation pattern, however as we can see from the chapter text, building a custom XTC filter tailored to the particular setup is a challenging task, and there are many caveats that need to be taken care of.

Center Image Height

As I have explained in the section "Target Curve Adjustments" of the previous post, in order to provide correct rendering of the center image, the spectrum of the sound from the speakers which are on the sides of the listener must be corrected so that the phantom center image has the spectrum that a real center source would have. The paper by Linkwitz which I cited in that post contains necessary details. One good test for the correction is to make sure that a source which is intended to be at the ears (or eyes) height is actually heard this way. For that, I use the track called "Height Test"—track 46 from the "Chesky Records Jazz Sampler & Audiophile Test Compact Disc, Vol. 2".

Merging Perceived Results of ITD and ILD Panning

Changing the spectrum of side images in the way described in the previous section also helps to reduce attention to the speakers, because now sounds coming from them do not have the spectral characteristic of a side source.

However, while listening to old engineered recordings (from 70s or earlier) that use "naive" hard panning of instruments entirely to the left or to the right by level adjustment only, I have noticed that this spectrum change is not enough for decoupling the sound of an instrument from the speaker which is playing it. Real acoustic recordings and modern tracks produced with Dolby Atmos sounded better. This was likely because modern panning techniques use both level and delay panning. They may actually use more—to get a full idea of what is possible I used a panning plugin called "PanPot" by GoodHertz.

While playing with a plugin using dry percussion sounds from the "Sound Check" CD produced by Alan Parsons and Stephen Court I have noticed that hard panned sounds using delay panning a perceived a bit "away" from the speakers while level panned sounds are perceived coming from the speaker itself. Schematically, it was perceived like this:

I decided to combine them. In order to move hard ILD panned sounds I use the "Tone Control" plugin, also by GoodHertz. It can do Mid/Side processing, and I switched it to the "Side only" mode. Recall from my previous post on Mid/Side Equalization that M/S decomposition does not completely split out the phantom center from the sides. However, it is good enough to tune hard panned sources.

I have prepared a test track which interleaves pure ILD and ILD+ITD panning of a dry sound of a snare drum. While listening to it, I was experimenting with the settings for the corner frequency, slope, and gain of the treble shelf, as well as with overall gain of the side component. The goal was to move the ILD panned source closer to the position of an ILD+ITD panned source, and at the same time not to change its perceived tonality too much. Obviously the results of panning using different techniques will never sound identically, however, I could come close enough. As a result, the sound scene has moved a bit away from me, behind the speakers plane:

I have pictured the scene as going beyond the speakers because this happens with some well-made recordings like the track 47 "General Image and Resolution Test" from "Chesky Records Jazz Sampler & Audiophile Test Compact Disc, Vol. 2" where the sounds of doors being shut are rendered well beyond the speakers in a distance.

It's interesting that correction of the purely level-panned images really helped to decouple the speakers from the sound they are producing! I used tracks from the full OST to "The Holdovers" movie which feature a number of records produced in 60s and 70s. Note that as far as I know, the full version with all tracks is only available on vinyl—the usual issue with licensing on streaming services prevents them from offering all the tracks. And the producer of the OST decided not to bother themselves with offering a CD.

Banded Correlation and Decorrelation

Since my speaker system is not a dipole across the entire spectrum, and walls are located nearby, there was still some "unnaturalness" of the image, even though the quasi-anechoic frequency response looks correct. How can we do further tuning without noticeably affecting the frequency response? The trick is that we can change it depending on the correlation.

For example, while listening to the bass line of the "Spanish Harlem", I have noticed that the first note, which is mostly delivered by the subwoofer does not sound as strong as following notes, which are higher and are delivered by the main speakers. I did not want to raise the level of the sub, because I know it is at the right level, and listening to OSTs by Hans Zimmer proves that—I don't want the sub to be any louder :). Instead, my solution was to decrease level of the correlated component (the phantom center) in the frequency range served by the woofers—they are omnidirectional, thus their sound is reinforced by the walls. For that I used the "Phantom Center" plugin by Bertom Audio.

Another correlation tweaking needs to be done in the high frequency region, above 4.7 kHz. I took the track I often use—the left / right imaging test from "Chesky Records Jazz Sampler Vol. 1" and overlaid the "midway" position announcement with "off-stage" position. Initial lack of correlation due to somewhat excess of reverberation at high frequencies causes the off-stage announcement to sound either in the front, in the position similar to the "midway" position, or even "inside the head." By increasing the correlation I was able to move it to the intended location. However, having too much correlation causes the phantom center to become too strong and too narrow, which makes the "midway" position to collapse to close to the center. Thus, by hearing both announcements at the same time I can increase the correlation to the just right amount.

Finally, I used a set of 1/3 octave band-filtered dry mono recordings of percussion instruments converted to stereo: first with identical left and right channels, then with the right channel inverted. This is the same set of sounds that I used in this post about headphones. I compared how loud the correlated version sounds to relative to anti-correlated. It is expected that it should be of the same loudness or a bit louder, however I have found that in the region between 400 and 900 Hz anti-correlated sounds are perceived to be louder than correlated. Unlike my previous experience with traditionally arranged speakers, this time I was able to reduce loudness of anti-correlated sounds in this band.

This perceptual correction helps to reduce attention paid to the details in the sound produced by speakers that get amplified by the room too much. The sound becomes less fatiguing—that's yet another aspect of "naturalness." As Linkwitz has put it, it's better to make our brain to add missing details than trying to force it to remove extra details—that costs much more mental effort which manifests itself in exhaustion resulting from long listening sessions.

Processing Chain

The description of the tuning process has turned out to be a bit lengthy. Let's summarize it with a scheme of the filters that I put on the input path. They are inserted before the digital crossover and correction filters that I described in the Part II of these post series.

So, first there is the Tone Control applied to the "Side" part of the M/S decomposition, which is intended to move ILD-panned sounds a bit deeper down the virtual scene to match ITD+ILD-panned sounds. Then there go 3 instances of the Phantom Center plugin tuned at different frequency bands that perform the job of correcting the effects of the room-speakers interaction. I wish there was kind of an "equalizer" plugin that could apply phantom center balancing to multiple bands—Bertom Audio, take a note :)

Some Tracks to Listen To

Having achieved a good imaging through my speakers, I had re-listened to many albums that I have pinned down in my collection. Here are some tracks that I can recommend listening to:

  • "The Snake and The Moon" from the "Spiritchaser" album (2007) by Dead Can Dance. It starts with a buzzing sound rotating around the head. The rhythm of the song is set by an African drum pulsating at the center, and there are other instruments appearing at different distances and locations.
  • "The Fall of the House of Usher" multi-track piece from "Tales of Mystery and Imagination - Edgar Allan Poe" album (1976) by The Alan Parsons Project. Alan Parsons is known for his audio engineering excellence, and this debut album of his own band features rich and interesting combination of various kinds of instruments: synths, guitars, live choir etc. This album was at some point re-issued in a 5.1 version, but I still enjoy it in stereo.
  • "Spark the Coil" from "We Are the Alchemists" joint album (2015) by Architect, Sonic Area & Hologram is a rhythmical electronic piece with a very clear sound and precise instruments placement.
  • "Altered State" from the "360" (2010) album by Asura. Nice melodic piece with ethnic instruments and ambient electronics. The track produces good enveloping effect and delivers well positioned instruments.
  • "Fly On the Windscreen (Final)" from the "Black Celebration" (1986) album by Depeche Mode. Although the recording technique used here is not very sophisticated—one can hear that panning of some instruments is done with level only—it's interesting to listen into different kinds of reverbs applied to sound effects, instruments and vocals.
  • "Prep for Combat" from the "AirMech" game soundtrack released as the 2012 album by the industrial band Front Line Assembly. It uses rather powerful sounding electronic sounds that are panned dynamically and fly around the listener.
  • But of course, when we speak about powerful sound, it is hard to compete with Hans Zimmer whose "Hijack" track from the "Blade Runner 2049" OST (2017) is sending shivers down the spine and can be used as a great test of how well the subwoofer(s) is/are integrated with the main speakers.
  • "Zen Garden (Ryōan-ji Temple Kyoto)" from the classic "Le Parc" album from 1985 by Tangerine Dream starts with ambient sounds of wind and then adds gentle percussion sounds carefully panned along the virtual sound stage. I'm not sure which instruments are synthesized and which are real, but they all sound quite natural, with their locations well-defined.
  • And finally, one track which is not music but rather the recording of rain—featured at the very end of the movie "Memoria" (2021). I keep listening to it over and over again, especially in the evenings. It feels like you are sitting on a porch, watching rain and listening to delicate yet powerful rumbling of the thunder in the distance. It's funny that the track ends with someone coming up to the recording rig (you can feel their breathing) and turning it off—not sure why they did not cut this out during post-production, but it definitely enhances the feeling of realism of the recording :)

Wednesday, July 17, 2024

Adding Bass Traps

As I have mentioned in my previous post about setting up the LXdesktops—the desktop version of LXmini with custom tuning—I ordered some bass traps in order to try to improve reverberation times and maybe deal with the room modes. This is a short report on what I managed to achieve as well as an interesting point about over-optimization when tuning speakers from the listening position only.

In my room I already have absorbers on the walls behind and on the left side of the speakers (the right wall is further away and has a door), and one on the ceiling above the desk. These are mounted on walls 2 inch "FreeStand" absorbers by GIK Acoustics, except for one which is 4 inches because it is located very close to the left speaker and must absorb way more energy:

Besides these absorbers, I also already had one "Soffit" bass trap, residing on the room's closet. In addition to these "engineered" absorbers, there are also two "environmental" ones: a twin bed with a thick 6 inch mattress, and a rug on the floor. Yet, the room is not "dead" and still have enough reflective surfaces, as well as a lot of potential for creating room modes. The room is shaped quite irregularly and has a partially slanted ceiling, thus calculating room modes analytically is not very easy.

To all that acoustic treatments I decided to add 3 more bass traps: one big soffit bass trap, essentially the same as the one I already have, and two smaller traps called "Monster traps" by GIK. After the traps have arrived, I've made a "before" measurement, and an "after" measurement. Here are some comparisons.

Reverb Time

In order to analyze changes in the reverb time I made measurements both with Acourate and REW. Acourate displays them using smooth curves, and also provides reference "corridors" from two standards: DIN 18041 and EBU 3276 calculated from the room size. Below are the "before" and "after" RT60 graphs done by Acourate with the corridors from the EBU 3276 "studio" profile:

As we can see, the reverb time of the bass has lowered by 0.1 ms—not much. However, it's interesting that the reverb of the rest of the frequency spectrum has also lowered, and now it almost fits the upper range for the "studio" corridor. Thus, these wide range bass traps also have a good effect on taming high frequencies. By the way, for the big soffit bass trap GIK has an option of installing a "limiter" (FlexRange) which is intended to reduce this effect, but in my case I didn't need it.

In REW, the spectrograms also have become more uniform. Below are, again, before and after for the left and the right speaker:

So, there is indeed a noticeable effect on the reverb time, however it is not dramatic. As for the room modes, there was one interesting effect described in the following section. By the way, here is an old but useful review of active bass traps by B. Katz where we can see that active traps are more effective, however they are usually more pricey than passive ones.

The Over-Correction Effect

So, one interesting thing that I noticed while placing the new bass traps and re-measuring is the anomaly in the frequency response of the right speaker. Here is how this frequency region is looking before and after installing bass traps:

It is noticeable that the region has become less "regular". Also, the notch from a room mode has become less deep. I also recalled that this region has unusually high distortion which I was explaining to myself as a result of an interaction with a room mode:

So, definitely something is going on with room modes here. Maybe addition of bass traps has decreased the effect of one "negative" mode at approximately 97 Hz and this has resulted in a "swelling."

Another interesting point was that when listening to the log sweep from a side I could clearly hear that the driver is "overworking"—it definitely had a boost which clearly was not needed. I decided to make a near field measurement of the woofer right by the driver. As I have expected, the driver actually is boosting a range around 77–122 Hz. I realized that this likely comes from the fact that at the microphone position there was a cancellation in this region resulting from room modes, which has led to extra boosting when doing the woofer correction. Indeed, when I looked at the filter, it had a hump there. Using a near field measurement, I have created an inversion of the hump and applied it to the correction filter:

After that, the measurement at the listening position started looking less ideal, however two facts was pointing that this tuning is more correct. First, when listening from the side to the log sweep, the region around 97 Hz was now sounding more even. Second, the distortion which I was initially observing has gone:

That's an important lesson for me. Although frequency dependent windowing helps to emulate a measurement in an anechoic chamber in the aspect of cutting down the effect of reflections, it can't eliminate the effect of room modes. This may sound trivial, however apparently this consideration did not cross my mind when I was doing the tuning. Thus, at least for woofers it's actually important to compare measurements from the near field and from the listening position in order to avoid over-correction.

Servos Are The Answer?

What is interesting is that I had these issues with room modes with the speakers, but not with the subwoofer. The subwoofer was producing a very smooth response almost right from the start. This is an interesting feature of the Rythmic subwoofer. I also have a simpler subwoofer by KRK, and it is not as easy do deal with in an untreated room.

What is so special about Rythmic is that it uses the "Direct Servo" technology. The idea is that active electronics has feedback from the driver coil, and can "notice" when room modes are "helping" the driver (with a resonance), or vice versa, and correct the driver gain for that. This requires a specially built driver with an extra coil, but I think it's worth it.

One drawback of the active correction that I can think of is that signal-dependent corrections are essentially produce a non-linear behavior and thus add distortion (see my old post on the automatic gain control). However, for bass that is likely not a big issue. So one idea that has come to my mind for the next generation of my LXmini mods is to try to use a servo driver for the woofer. Would be interesting to see if this will help to deal with room modes.

Saturday, July 6, 2024

LXmini Desktop Version (LXdesktop)—Part II: DSP Tuning

This post continues my story about the desktop version of LXmini speakers that I have built and set up on my computer desk in a somewhat unusual way:

So, why are the speakers are "toed out"? The idea is that since the full range driver has a dipole dispersion pattern, if we turn it outwards, then the null of the dipole becomes directed towards the opposite (ipsilateral) ear, thus naturally contributing to the suppression of the acoustic cross-talk between speakers. This effect these days is usually achieved using DSP by injecting a suppressing signal into the opposite speaker (see a great post by Archimago and STC on this topic). However, it would be nice if the opposite ear would be just naturally blocked hearing the sound from the speaker.

I've estimated the angle between the full range driver and the opposite ear to be approximately 75°, thus the suppression is not maximal. However, it should still add extra -5 to -10 dB attenuation to head shadowing, depending on the frequency. I plan to measure the exact attenuation profile some time later. Another feature of setting the speakers this way is that the back of the speaker gets farther from the back wall, at about the recommended minimum of 1 meter.

Ideas for Tuning

Since the original LXmini tuning was aimed to achieve flat response on-axis (see the design notes), my unusual speaker arrangement required a dedicated tuning. I started looking around for ideas on to achieve close to ideal response in the time domain.

The author of Acourate Dr. Brüggemann holds a very strong position on using linear phase crossovers. Acourate can generate various kinds of crossovers, both in minimum phase and linear phase versions. Also, there are some tools (including a new one added in the recent Version 3) which are intended to bring each driver as close as possible to the corresponding band pass filter of the crossover, both for amplitude and the phase. Together with proper time alignment of the sound from each driver at the listening position, this allows to achieve "ideal" summing of the acoustic crossover components, yielding the perfect Dirac impulse response for the speaker as a whole.

Though, my initial concerns were about the pre- and post-ringing behavior of the linear phase filters. As we know, they are symmetric around the center, and the pre-ringing may potentially exceed the thresholds of masking. When the components of a linear phase crossover sum up as intended—with their peaks coinciding, the pre- and post-ringing components from each crossover band cancel each other. However, if there are time shifts—even as small as a fraction of a millisecond—this does not happen. The example below is for a two-band linear phase Neville Thiele crossover:

This is how the summed impulse response looks like on the logarithmic scale when the components are properly time aligned, and also for 0.23 ms and 0.5 ms time of arrival difference:

The red vertical line is the ideal IR which occurs in the ideal time alignment case, and on the right are the IRs when one of the crossover components is shifted. Recall that these delays correspond to a distance difference of just about 7.88 cm and 17 cm—that's comparable to the size of the human head.

When I started discussing this topic on the Acourate forum, one of the members has pointed me out to the white paper by B. Putzeys and E. Grimm on their ideas behind the DSP-based implementation of the professional Grimm Audio LS1 speaker (which costs quite a lot!). The authors used a minimum phase Linkwitz-Riley filter, but compensated for its phase deviations using an inverse all-pass filter. If we think about this approach, it effectively also yields a linear phase filter. In fact, when crossover components get time shifted, the combination of the crossover plus reverse all-pass filter also exhibits pre-ringing, although its level is a bit lower, and what's more important, the duration is shorter:

(Note that the red IR is not an ideal Dirac pulse because although the phase response of the all-pass filter I created is close to the phase response of LR4, it is not exactly the same). However, these improvements over the ringing of the Neville Thiele crossover are just due to the fact that the LR4 crossover has more relaxed slopes to start with:

Thus, instead of compensating for phase deviations of a minimum phase crossover, which can be quite severe for high order crossovers, we can as well just start with linear phase crossovers as they are much easier to work with. For example, I wanted to use an asymmetric shape in which the higher frequencies driver has more relaxed slope compared to the lower frequencies driver. This is beneficial for the LXmini design because the directional pattern of the full range driver yields more precise spatial cues than the omnidirectional woofer. This approach also helps for the pair of the woofer and the subwoofer because I only have one, so I would like to experience a stereo bass as much as possible. The asymmetric shape of crossover slopes at first yields a non-flat summed frequency response, however this is easy to compensate (again, with a linear phase filter), thanks to the fact that the phase shift, being always equal to zero, does not affect the summing of amplitudes of the crossover components.

Another interesting observation. The fact that I'm performing the tuning in a real room, not in an anechoic chamber, implies that I need to use windowing of the measured frequency response. As I have realized after brief experiments, the frequency dependent windowing (FDW) partially suppresses pre- and post-ringing of linear phase filters. However, as a result it also changes the shape of its frequency response by making it less steep. In my opinion, this is a good trade-off. In the next section I will show the shapes and IRs of the linear phase crossovers I have ended up with.

Crossover Preparation Details

The aforementioned Grimm Audio LS1 white paper has a suggestion on "ideal" crossover points. From the psychoacoustics data, the authors state that the directional pattern of the frequency response should be used down to 300 Hz. The original LXmini has its acoustic crossover point closer to 790 Hz, however it uses a 2nd order LR crossover thus the output from the full range driver actually goes quite low in frequency range. So the first thing I've done was to measure the raw response of the full range driver. Here it is together with an FDW processed version:

Looking at the natural roll-off of the driver I have chosen 366 Hz as the crossover point. At the high frequency end, the full range driver due to its relatively large size starts working in a breakup mode, thus losing efficiency. Plus, I'm not listening to it on-axis and that creates a natural roll-off at high frequencies. However, that's not a problem. Since the speakers are located quite close to my ears, there is no need to try to make the frequency response to be ruler flat at the high frequency end because that makes the sound too harsh. So I generated a LR2 linear phase crossover for 11 kHz and used its low frequency part to taper the response of the driver on the right side. This is how the final crossover component looks like, overlaid with the raw windowed response:

Similarly, for the woofer driver I have chosen 46 Hz as the crossover point. The slope on the left side is LR4, however on the right side I used Neville Thiele 1st order crossover as it has a sharp, "brick wall" slope. I passed it through the same frequency dependent window that I use for the in-room measurements, and this has made the shape of the slope more "relaxed". Below for comparison are the original NT1 slope overlaid with a windowed one:

There is not much difference in the time domain though:

And this is how the designed crossover component looks on top of the raw driver response:

The subwoofer was a bit interesting. Choosing the crossover did not require any thinking because the crossover point was already set from the woofer driver, and the type on the right side is also Neville Thiele 1st order. However, since it's an active subwoofer with servo (Rythmic F12G), it has some settings of its own. I experimented with different damping settings and low-end extension, and found that low damping and the extension down to 14 Hz creates a time domain response which looks close to the IR of the crossover if I invert its polarity. This is how these IRs look like overlapped (the polarity IR of the subwoofer is inverted):

And this is the final look on the crossover components that sum up into a flat frequency response (with the high frequency range trimmed down) and a zero phase response:

Visually this crossover reminds of the Bessel low-pass filter (used in the "RBessel" crossover type in Acourate) of a high order, however mine uses even steeper slopes on rights sides.

Driver Tuning Process

My tuning process has two major stages: the first to bring each driver as close as possible to the behavior of the corresponding band pass filter of the crossover (that also includes fixing the phase behavior), and the second stage is to combine these drivers into a proper acoustic crossover.

I was doing all the measurements from the single position—the listening position. Although it is possible to linearize drivers in the near field, I did not use this approach due to two reasons. First, the full range driver works as a dipole, and they must be measured from some distance. Second, since I was interested in the performance of the crossover at the listening position, this was the natural position to use for driver linearization as well.

For the driver linearization I used the "Room Macros" of Acourate, setting the "Target Curve" to be the desired crossover band pass behavior. Obviously, I used the same window for the FDW of the measured driver response as the one I used to process crossover parts during the preparation stage. I did not use "Psychoacoustic" smoothing at the driver linearization stage, instead I used more technical "1/12 Octave" smoothing. I was also limiting the amplitude correction to avoid creating a boost at the frequency bands where the response of the driver was naturally decaying below the intended crossover suppression level. As an example, below is the correction filter for the woofer driver, overlaid with the target:

After the correction filter has been generated by "Room Macro 4" and the result has been evaluated via a test convolution, I re-measure the driver with the filter applied. Then I check the phase behavior. Since the correction process of Acourate tries to bring the driver to the minimum phase behavior, it will leave out phase deviations that present in the minimum phase impulse response of the target curve. Note that when equalizing an entire full-range speaker to a mostly flat target curve, these phase deviations will end up outside the hearing range. However, for a driver, since it has a limited frequency range the phase deviations will typically end up near crossover frequencies, and this fact will make proper time alignment more problematic. For example, this is the phase response of the corrected woofer:

We can see that the phase gradually deviates from zero and "flips" over the 180° angle at 47 Hz. I treated these phase deviations using the same approach as in the Grimm Audio white paper, which are in essence the same approach as the one described by Dr. Brüggemann in his post "Time alignment of drivers in active multiway speaker systems" on the Acourate forum. That is, we need to "guess" an all-pass filter which has a similar shape as the form of the phase deviation of the speaker, and then put its reversal into the correction chain (that effectively means, we need to convolve the reverse all-pass filter with our existing filter). For example, for the woofer the corrected phase behavior looks like this:

Obviously, since it's an all-pass filter, the amplitude remains the same. There shouldn't be more than 1 or 2 all-pass corrections needed. Only the area within the driver working range must be corrected, and we must look at the windowed response to avoid correcting for the effects from reflections that very dependent on the mutual distances between the driver, the reflecting surface, and the measurement point.

Now with each driver being brought as close as possible to the desired crossover band-pass filter behavior, we need to "assemble" them into a speaker by aligning their levels and times of arrival. To do that, first I measured the speaker as is, and did a rough correction of driver levels. Then I used the the sine wave convolution approach first for aligning the full range driver with the woofer, and then the woofer with the subwoofer. At low frequencies, the convolved sines may initially be considerably shifted from each other. Also, the low frequency filter may be developing a bit slowly and have irregular sine amplitudes in the beginning. To ensure that the resulting time alignment of the drivers is proper, I had applied the same sine wave convolution step to the crossover components and used the produced overlapping picture as a reference. For example, this is how the sine waves of my crossover look like for the 46 Hz point:

And this is how the results of sine wave convolution was looking initially for the woofer and the subwoofer:

Compared to the image before, it becomes obvious that the subwoofer (the blue curve) needs to be shifted ahead in time of the woofer for a proper alignment.

After applying gains and delays to the driver filters, I have made another measurement and double-checked that the sine convolution on the measured IRs produces the expected result.

Target Curve Adjustments

Life would be too easy if we could just take the summed crossover response and use it as a target for the overall speaker tuning. I tried that first and was not impressed with how it sounded. The first problem was that the vertical positions of virtual sources were too high while I would prefer having them at the eye (or ear) level. The second problem was overall lack of "weight" in the sound. The target curve was definitely asking for some adjustments.

The first problem is a consequence of the fact that any virtual source, for example a rendering of the singer's voice, which is appearing to be in front of the listener, is created by a pair of stereo speakers that are physically located on the sides. In my case, the speakers are placed even wider than the conventional "stereo triangle." As S. Linkwitz explains in the paper "Hearing Spatial Detail in Stereo Recordings", if we consider the sound pressure on a very crude approximation of a human head—a sphere—we will find that physical sources located in front of the sphere and on the sides of it create very different sound pressure distributions across the frequency range. A more precise description of this distribution is of course the HRTF. Since the two audio streams that represent the virtual central source arrive from the sides, they do not have a proper frequency profile of a center source, and as a result, the hearing system places this virtual source higher. A simple solution used by Linkwitz is to apply a shelving filter which compensates for this effect.

And the second problem—overall lack of weight, or a bass-shy presentation from a flat target curve can be explained by the interaction with the room. Running a bit ahead, below are comparisons of the speaker quasi-anechoic response (FDW windowed) vs. the steady state room response, obtained from the same measurement position by taking an RTA measurement of pink noise playing continuously:

We can see that the room "eats" the bass but amplifies high frequencies. That's why adding more bass to direct sound as well as tapering the high end seem to make sense. So after some experiments with well recorded tracks, I have chosen the following target curve:

On this graph it is compared to the initial "tapered flat" crossover curve.

The Final Correction and Measurements

The final step in the tuning process is to apply "Room Macros" to the entire speaker using the target I have created. This time I used the "Psychoacoustic" smoothing. This step fixes any remaining discrepancies in the levels of the drivers. Below is the FDW response of the speakers after applying the correction, overlaid with the target:

And below is the phase response of the speaker—as we can see it is indeed close to the "zero phase" (this is also the windowed version which excludes phase deviations due to reflections):

I checked the group delays by using the "ICPA" function of Acourate ("Room Macro 6"), and found only one very high-Q group delay deviation, not worth correcting.

The step responses of the speakers look good:

Note that these are responses without any windowing, so they do not look fully identical due to reflections and asymmetry of the room. It can be clearly seen from the Energy Time Curve (ETC) graphs produced by RoomEQ Wizard (REW):

Since this is a small room, strong reflections start appearing quite early, but it's hard to do anything about that because there are windows behind my listening position—I can't put any acoustic treatment there.

Also using REW, I checked the distortion measurement and observed the known issue with Seas FU10RB drivers of the raised 2nd harmonic distortion level between 1 and 2 kHz, also noted in the "Erin's Audio Corner" review when he was measuring the LXminis:

Also there is a bit more distortion between 300–500 Hz probably because the full range driver is being pushed harder. The distortion in the right speaker around 100 Hz due to an interaction with a room mode—if I move it to a different position, this peak disappears. And I'm not sure why each harmonic trace ends up with a funny upwards curve—this must be a measurement artifact.

The resonances from room modes can be seen on the spectrogram:

I decided to order some more bass traps, will see if they actually help to reduce the effects from room modes.

Does Non-Ideal Summing Induce More Pre-ringing?

Now let's try to get back to one question from the beginning of the post. Recall the simulations of non-ideal summing of the acoustical linear phase crossover and the associated pre- and post-ringing. I decided to check what happens in reality. For that, I have moved the measurement mic by 17 cm to the right and re-did measurement. Below are the resulting step responses. This one is for the left speaker, overlaid with the original (where the crossover components are time aligned):

Note that since the causal part of the IR is dominated by the room reflections it is not possible to judge the effect on post-ringing. As for the pre-ringing, it seems that it is actually lower in the IR recorded from the microphone position shifted off the perfect alignment.

And this is the right speaker:

We can see that for this one there is indeed a bit more pre-ringing. Evidently, the real acoustic behavior of speakers is much more complicated than these ideal models. And for a proper evaluation of the crossover behavior off-axis an anechoic chamber should be used.

Does this all matter? Maybe not so much, after all. Anyway, there is no ideal solution when we are trying to combine a full-range speaker from several band-limited drivers. If we are striving to get a perfect solution, we actually need to avoid using crossovers at all, by using a single driver, for example, an electrostatic panel or the Manger Transducer. The Manger seems to me like a variation on a coaxial driver, however due to use of a single, specially engineered diaphragm it probably does not suffer from the Doppler effect. Anyway, that's a different price level.

To Be Continued

Of course, it's interesting to discuss how this setup sounds like, however this post has already ended up being quite long. I will write about listening impressions and other things separately.