Thursday, February 8, 2018

112dB Redline Monitor Plugin

While looking around for other crossfeed implementations, I've found Redline Monitor by the audio company called "112dB." It's a plugin which is intended to be used with DAWs for simulating loudspeaker sound on headphones. On Mac, with Audio Hijack Pro it's quite easy to hook it up to the system audio output. The plugin has several emulation options, some of them missing on Phonitor headphone amplifiers, but it's price ($69) calls for a thorough evaluation before buying it. Thanks to generosity of 112dB, the plugin is available for a 60 days trial period, which I'm using the get some insight into how it works.

Controls


Let's check out the controls of the plugin. Those are very similar to the controls found on Phonitor:


I would start with the rightmost control—the simulated distance to loudspeakers. I think, it's the most interesting control because the type of processing and the resulting sound changes dramatically depending on its current value. When the value is "0 m", the plugin effect is the most non-intrusive, and resembles Phonitor's processing. All other settings of this switch introduce rather serious phase shifts and comb filtering to simulate room reflections.

The "Soundstage" control defines the total angle between the simulated speakers. "Center" is center signal attenuation. The "Dim" switch pre-attenuates the input signal to make sure that the processes signal doesn't clip. The rest of the switches are mostly needed for professional monitoring purposes, they are covered in the manual.

Confession


While doing my comparisons between Phonitor line of headphone amplifiers and Redline Monitor, I went through manuals for Phonitor 2 (this manual I have never opened before) and Phonitor mini. This was the first time when I discovered the tables and graphs in Phonitor 2 manual (page 17), and also noticed the statement "With the Angle switch you define the frequency-corrected channel crosstalk. In this case, we are dealing with „Interaural Time Difference“ (ITD)" in Phonitor mini's manual.

This was a very surprising discovery for me, because previously I was sure that Phonitor has almost linear phase response, and doesn't introduce any group delay. The delay was never showing up in my measurements, for reasons yet unknown. But from the manuals, it was obvious that Phonitor also employs ITD.

But that was all for good. And in fact, making such discoveries that I would otherwise have missed is one of the reasons I write this blog.

Equalization Differences


First I've checked how the equalization graphs of Redline Monitor look like, compared to similar settings on Phonitor mini. I've made the following settings on Redline Monitor:

Phantom center: -1.2 dB
Soundstage: 60 degrees
Distance: 0 meters

Which are semantically equivalent to the following settings on Phonitor:

Crossfeed level: Low
Angle: 30 degrees (this is between the speaker and the center, thus soundstage is 60 degrees)
Center: -1.2 dB

Surprisingly, the graphs look very different:



The amplitudes, the knee frequency, the inter-channel level—just everything is different, only the general principle stays the same.

However, if we look at group delay graphs, they look very much the same (again, at equivalent settings):


One small difference is that Redline Monitor has 300 μs ITD as low frequencies, while Phonitor 2 has 200 μs ITD.


Sonic Differences


In order to perform an ABX test, I've processed several music excerpts using Redline Monitor, and also recorded them via Phonitor mini crossfeed matrix (after I have discovered that my Phonitor simulation lacks group delay, I decided I shouldn't use it for tests). The same processing settings were used that are specified in the section above. The goal was to check can Redline Monitor and Phonitor mini be distinguished, and which one would I prefer.

The results are not very conclusive. Perhaps, the choice of tracks wasn't revealing enough, or I do need to train my listening skills better. With the modest processing amount I was applying, I couldn't even reliably distinguish the source from processed tracks, and distinguish Redline Monitor from Phonitor. The good news is that there isn't much change to the tonal balance with either crossfeed implementation.

"Distant" Modes


Let's go back to Redline Monitor's settings and check what happens to its transfer function when we start increasing the simulated distance to the speakers. Here the center image is at 0 dB attenuation, the soundstage is 60 degrees. I started with 0 m distance, proceeding in 0.5 m increments up to 2 m setting. Below is the graph of resulting frequency response, where darker colors represent larger distances. The blue plot is for the left channel, the red plot is for the right channel:

I guess, the ripples simulate the interaction of reflected sound with direct sound that happens when listening to loudspeakers in a room. The farther the listener is, the more enveloped they are in the reverberant field. As we can see, the amplitude of ripples is increasing with the distance, making the sound more and more colored.

It would be interesting to judge correctness of this simulation from the psychoacoustic point. In real conditions, the ears and the brain can "listen through" the room, discarding these colorations, but the brain has much more information, e.g. changes in received sound with subtle head moves, which are absent in this simulation. So the question is open whether these ripples just color the sound, or are they "converted" into speaker distance information by brain, or both processes happen to some degree simultaneously.

Conclusions


I think, Redline Monitor can be used as a substitute for Phonitor mini when the latter is unavailable. Although, their processing is a bit different, one needs a very trained ear in order to distinguish between those two implementations.

For Redline Monitor, I would recommended to use 0 m distance setting in order to avoid comb filtering occurring with the other settings of the "Distance" control.

Thursday, January 18, 2018

BBE 802 Sonic Maximizer Measurements and Teardown

While watching this YouTube video that analyzes the transfer function of the older model of the BBE Sonic Maximizer—the 802 model, I've noticed one thing that I miss on the current generation of Maximizers—the ability not only to boost high frequencies (HF), but also to attenuate them. Out of curiosity, I've bought an 802 unit on eBay and performed the same measurements I've done previously for 282i.

What's Inside


But first, I looked under the cover of the unit to see if it's based on the same NJM2153 chip as the 882 model, and I've found out that it in fact isn't! This is what we can find inside:

The first thing we can see is a pair (one per channel) of giant chips marked "BBE." That's the original "sound enhancement" chip. It's interesting, that compared to NJM2153 package which has 20 pins, of which 18 are actually used, this "BBE" chip only has 18 pins, minus 1 not connected, thus only 17 are in use:


It would be interesting to figure out what is the extra input that NJM2153 receives compared to the old BBE chip, but for that I will need to trace the connections on the board. Although, that shouldn't be hard since the board has in fact only one layer, I'd leave that for some time later.

The other chips we can see here are opamp assemblies. There are 3 of them per channel:

NE5532N—the ubiquitous audio opamp, used for balanced output;
SGS LM324N and SGS TL074CN—used for driving LEDs.

These are pretty much the same components that are used in the 882 model, except that 882 uses electronic balancing of inputs, and for that purpose it employs two more pairs of NE5532. Whereas 802 only uses old school transformer balancing (you can see a pair of small transformers per channel.)

Measurements


I'm presenting the measurements in the same order as for 282i here.

Group Delay

Unlike 282i, this 802 unit doesn't affect group delay at all when it's in bypass mode. From what I've seen in the frequency response measurements, when in bypass mode, the 802 excludes all its circuits from the signal path, which perhaps isn't true for 282i.

So, this is the group delay plot when processing is enabled:

The numbers are pretty close to what manual is saying, discounting by this unit's age. We can conclude that this functionality didn't change much with the Maximizer evolution.

THD

Here the things are becoming more spooky. Look at how harmonic distortions increase when the unit is in processing mode (orange) versus the loopback measurements (black):

The faint line is the level of 3rd harmonics—it reaches 0.1% for middle frequencies and crawls up to 1% for bass. Although, it doesn't contradict the official specs—they say "less than 0.15% @ 1kHz", this is much worse than the modern 282i shows.

According to John Siau's calculations, 0.1% of distortions translates into -60 dB noise below playback SPL, which can be audible.

Frequency Response

So, what about the ability to attenuate HFs, is it really there? Yes, indeed:

As we can see, setting the "Processing" knob to the minimum position ("1") attenuates the HF by 6 dB. Setting it to the middle setting ("5") provides a flat-ish response, and turning "Processing" all the way up produces a bump at about 4 kHz.

However, we can also see that HF roll down quickly after 10 kHz on any setting, which is much less exciting. The modern versions of the Maximizer demonstrate a flat FR up to 20 kHz (when the knobs are at their minimum positions.)

Finally, what about that non-linear frequency response that we have seen on 282i and that effectively acts as an expander for HF. Yes, it's there:


(Note that the graphs were produced using white noise as a source signal, thus at low frequencies the plots are wiggly.)

With both knobs at the maximum setting, we can see that the 802 unit doesn't boost the HF if the signal level is low. Even more interesting picture is when the "Processing" knob at the minimum level:


As we can see, at low signal levels, the HF are less attenuated, so the unit works as a compressor! If we align the lowest (red) plot with the highest (magenta) one at 1 kHz, the delta at 6 kHz is about 4 dB.

Conclusions


The older BBE Sonic Maximizer model 802 provides some interesting abilities to manipulate high frequencies not available in the modern models, but unfortunately suffers from high distortions level, and compromised frequency range. Perhaps, at the time when it was introduced (around 1980-s), these specs were acceptable, but currently they clearly don't meet the bar. So unless you intend to process sources that by their nature has high distortions and reduced frequency range (e.g. analog tape), there is absolutely no point in using this ancient unit.

Sunday, January 7, 2018

BBE 282i Sonic Maximizer Measurements

The Need for Tone Controls


I'm reading the new edition of awesome F. Toole's book "Sound Reproduction." Here, in Chapter 4.4 he grieves for the demise of tone controls on modern hi-fi preamps. Indeed, I'm recalling domestic vintage radio + turntable combos my father and his friends had—there always were "bass" and "treble" knobs. More expensive systems featured multi-band graphical equalizers. Definitely, at that era everybody understood that both program and the reproduction chain are not ideal, and some tonal correction may be required.

However, the aspiration for a "clean" reproduction path shaved all the extras off (they contaminate the sound!), and left us only with volume controls on most of hi-fi units. This would work if all recordings were perfectly balanced, and our hearing were linear (or we were always listening at the same reference volume level). But since this is simply not true, it's often desirable to shave off some extra high frequencies that were added by the mixing engineer in order to "reveal" the vocals, but ended up sounding really harsh, or to add some bass when listening at low volume levels.

BBE Sonic Maximizer Mystery


I've started searching for a desktop unit that would implement just tone controls, not a full equalizer—they are bulky and require too much tweaking. But due to the aforementioned "purity" trend in audio equipment, it's next to impossible to find such an unit. Of course, I could just implement the tone controls as a DSP plugin, but at least with Pulse Audio and LADSPA, it's not trivial to add real-time controls to it. Also, virtual knobs never feel as good as physical ones.

Somehow, I've stumbled upon the family of units jointly called "BBE Sonic Maximizer" featuring just two control knobs—a good sign! However, the labels of the knobs were quite cryptic: "Lo Contour" and "Process", and there was nothing about "tone control" in the description the unit, but rather lots of promises in marketing-speak of achieving audio nirvana once this unit is inserted into recording or reproduction chain. That looked really suspicious.

Even more suspicious were reviews on different forums (mostly related to sound recording), where people were either raving about how this unit improves the sound of recorded drums and makes their sound "punchier", or advising not to waste money on this unit because it's snake oil. A lot of YouTube videos demonstrate processing results, and from watching them it seemed like the unit really adjusts the tone, but then there were always people saying that it's not a tone control, however they couldn't provide any specific details.

Digging into Details


I was looking for any objective measurements of the Maximizer, but finding none. At last, I've found three manuals: one for an older 802 unit, another for a newer 882 unit, and another for the modern version 882i.

The manual for 882i is the most useless one—it only says about "envelope distortion" occurring in speakers that this unit is designed to solve, provides unit connection schemes, and brief technical specs stating some distortion figures, and the fact that tone correction happens at 50 Hz and 5 kHz, with maximum attenuation of +12 dB.

The term "envelope distortion" is equivalent to "group delay distortion" which means adding non-uniform delay to different groups of frequencies. According to "Electroacoustics" book by M. Kleiner, horn and transmission line speakers are susceptible to noticeable group delays. Seems like the BBE unit can actually be useful for PA and old studio monitors if you don't have a DSP processor. But I think, modern speakers and especially modern powered studio monitors have required compensation circuits built-in.

The manual for 802 actually explains what the unit does in terms of group delay. The audio signal is split into 3 frequency groups by dividing the spectrum at 150 Hz and at 1200 Hz. The LF group is delayed by 2.5 ms, the Mid-Frequency (MF) group is delayed by 0.5 ms. The HF group is left intact.

As for tone correction, the LF group is simply boosted according to the "Lo Contour" knob. The amount of HF boosting actually depends both on the "Process" knob, and the RMS level of the MF group. This is the most intriguing stuff.

Here the manual for 882 comes to help. Very atypically for commercial electronic products, this manual contains the actual electronic circuit scheme of the device. It shows that the heart of the 882 is NJM2153 chip which also has a technical manual. Finally, we can see some graphs!


This plot supposedly shows the amount of frequency correction applied when both "Lo Contour" and "Processing" knobs are at the maximum position. The LF boost remain constant +12 dB regardless of the input signal level. Whereas the HF boost depends on the input signal level. Here is another graph, a "cross-section" at 10 kHz:


Interestingly, that the description of the NJM2153 chip doesn't align well with what the manual for the 802 unit is saying. The chip description says that the amount of HF boost depends on the overall input signal level, but the 802 manual states that it depends on the RMS of the MF group only. Perhaps, this implementation detail was changed from 802 to 882.

It's also interesting, what happens when the "Processing" knob is at the minimum value—does HF group get attenuated if the input signal level is low, or maintains the input level? On the EC schematics, the VCA is controlled from directly from the signal level meter, so it should not depend on the "Processing" knob. But it's better to check.

Measurements


I've bought an unbalanced desktop version of the BBE Sonic Maximizer—model 282iR. From the scarce technical specs revealed in the manuals, it seems to use the same processing pipeline as 882 or 882i does, but is made in a different form factor, and with combined knobs for left and right channels. Also, as I've said, 282iR uses unbalanced RCA or 3.5 mm inputs and outputs, so has 3 dB lower output level than 882i which uses balanced XLR connections.

Since the BBE unit is an analog line-level signals processor, it's quite trivial to measure it using an ordinary sound card. I was using MOTU Microbook IIc.

Group Delay

Let's start with group delay. It remains the same for any input signal level, and the only parameter that affects it is whether the unit is in bypass mode:

The bypass mode is the red plot, the green plot is when processing is engaged. As we can see, in processing mode the unit indeed adds ~2.5 ms group delay to LF, and ~0.5 ms to MF (as an average value). So, the unit adds some GD distortion even when it is in bypass mode.

THD

The manual for 282i states < 0.1% at -10 dBu input across the entire 20–20000 Hz range. That's actually quite a lot (not good). In fact, it seems to be an order of magnitude better:
This plot shows the 2nd harmonic. The black plot is loopback measurement for Microbook. Red is bypass, green is processing mode with both knobs at the minimum setting. As we can see, the level with processing enables is < 0.01%.

Channel Balance

Since the 282i unit is designed to process both channels at once, I'm expecting the unit to maintain the original balance of the input signal.

As I've checked, in bypass mode the balance is held very much precisely. Looking at the 882 unit schematic, the bypass mode just directly connects output to input, so that's what I would expect. In processing mode, the difference is about 0.1 dB at 1kHz—not too bad, but could be better.

Frequency Response

Finally, the most interesting part. Since the FR of the unit changes with the signal level, I was using Microbook's hardware white noise generator and was performing a real-time FFT analysis in Room EQ Wizard. The method was to change the level of the noise, and observe how it affects the output frequency response. The resulting curves are not that pretty as obtained from sine sweeps, but still reflect the trends.

As it can be seen from the graph, the frequency response plots at maximum "Lo contour" and "Processing" knob setting indeed resemble of those from the NJM2153 chip manual shown above. The level of bass boost remains unchanged, while the level of HF boost falls down once the sound level becomes low, thanks to the attenuator controlled by the input level monitor.



With the knobs at the minimum position, the HF range can even be attenuated for low power signals.



Conclusions


Recall that I've encountered the BBE Sonic Maximizer while looking for a tone controls device. So, can Maximizer be used as a tone control? Somewhat. It definitely can boost LF or HF, which is good. As for the opposite direction—cutting, it depends. For bass it's not needed as often. For treble, I'm curious how the variable attenuation actually helps. Need to check with actual commercial recordings.

Another thing—the group delay. It's definitely not needed for headphones because over-ear models anyways use a single driver. Does the group delay introduced by the unit affect the sound negatively? I will need to check with wide spectrum transients like drums and percussion.

And some additional distortion that the unit adds when processing is engaged. Certainly, 0.01% of 2nd harmonics isn't fatal, but specs of Grace SDAC and Phonitor Mini feature at least 10x less distortion. Although, we can say that those add some warmth to the sound. Again, need to do some listening.

Sunday, December 31, 2017

On Headphone Normalization Part 2

In Part 1, we have considered the need for headphone normalization and its implementation in Morphit plugin by Toneboosters.  In this part, we will examine Sonarworks Reference.

Normalization with Sonarworks package


Sonarworks offers a package for recording studios called "Reference" which consist of room correction software, headphone correction filters, and system components that allow applying these corrections on a system wide level. For non-professionals, Sonarworks also offer "True-Fi" package which applies the same headphone correction curves using a simpler UI. For the purpose of writing this post, I've been playing with the implementation targeted to pro users, which offers more tuning capabilities.

I really like that the UI shows the grid for the response curves. It's also great that all 3 curves: source, correction, and target can be displayed on the graph.

However, Sonarworks doesn't offer the same degree of freedom for setting up the source and the destination frequency responses as does Morphit. The list of source responses only includes real headphone models, but no artificial curves like "flat at the eardrum." The list of target responses is even more limited, offering only a handful of speakers and headphones, which are presented like riddles, e.g. "A well respected open hi-fi and mastering reference headphone model nr.650 [...]" Illustrated by a picture of Sennheiser HD650 as a hint. I guess, this was done to work around some legal issues.

Here I have discovered the first curious thing—simulating HD650 using "HD650 Average" measurement didn't result in a flat compensation curve, and a similar thing with AKG K712:





I've asked a question about this on Sonarworks' support forum, and their representative confirmed my suspicion that the target curves in the package are not as up to date as the source averaged measurements.

Target Curve of Sonarworks


OK, first question has been resolved. The next important question—what does "Flat" target response mean for headphones. The UI doesn't help much, it indeed shows the target response as flat. But as we know from Part 1, it shouldn't really be flat at the earphone speakers.

My speculation is that since the package offers normalization for both headphones and speakers, the team has decided, they will represent the normalization from the speakers point of view. Thus, the "flat" target setting for headphones must be "as heard using speakers calibrated to flat response", but they did not specify under which conditions, and with which tweaks. As we have seen with the Harman Target Curve and Morphit, an honest "flat loudspeaker in a reference room as picked up at eardrum" may not be a preferred setting due to its "dullness."

In order to provide an educated guess, I've performed the following experiment. I've chosen the same headphone model—Shure SRH1540 as a source both in Morphit and Sonarworks, and normalized a test sweep signal separately using two plugins.

Then using "Trace Arithmetic" in Room Eq Wizard, I derived a transfer function for transforming Morphit's filter response into Sonarworks', and applied this transfer function to Morphit's "Studio Reference" curve. Here is the result compared to Morphit's "Studio Reference" (red) and "Studio Speaker" (which as we remember, resembles Harman Target Response) (blue). The Sonarworks' approximated response is a green curve.



Note that this is only approximation since measurement data for SRH1540 is obviously different between Morphit and Sonarworks (it's hard to perform headphone measurements reliably, especially at high frequencies).

But still, we can see similar shapes here, confirming that Sonarworks may be using something similar to either of the curves (and it's definitely not a "flat" target response as their UI suggests). Two remarkable differences can be seen though:

  • The response at high frequencies is rolling off. Indeed, the sound of normalized SRH1540 is duller with Sonarworks, unless additional treble adjustment is applied.
  • The bass is cranked up. Again, this can be heard very well. Though Sonarworks provides a bass control that allows + / - 6 dB correction, which can fix this.

Note on Implementation


Another interesting thing concerning the Sonarworks Reference package is that it can use different filter types for normalization. On the "Advanced" tab, there is a choice between "Zero Latency", "Optimum", and "Linear Phase" settings:




"Zero Latency" means applying a recursive (IIR) filter (as in Morphit), which has negligible latency but introduces some phase shifts.

"Optimum" is a shorter non-recursive minimum phase FIR filter of 500 taps, that at 44.1 kHz introduces a delay of about 11 ms—still OK for real-time operation.

"Linear Phase" is a longer FIR filter that achieves linear phase (no phase changes), but has longer processing time, and also adds some "pre-ringing."

Which Product to Choose


Personally, I've stuck with Morphit because it's cheaper, and allows me to see the target frequency response. On the other hand, Sonarworks offers a system-wide component that applies normalization to all system sounds. Although, this can also be achieved by means of using Morphit in conjunction with Audio Hijack Pro by Rogue Amoeba which allows applying plugins to system output, as well as capturing it.

Sonarworks also offers a service for measuring your personal headphones. However, I would prefer the headphones to be measured for my own head, not for a dummy head simulator, since factors as the shape of the pinna, and the shape of ear canals greatly affect the resonances that occur in the outer ear.

Tuesday, December 19, 2017

On Headphone Normalization

What is Headphone Normalization


The world of headphones is jungle. There are thousands of headphone models on the market, each with its own sound signature. Every aspect of headphone build affects the sound. A lot of headphone makers also come up with their own "sound style", e.g. having lots of bass, or a "bright" sound, or neutral "studio reference" signature which is preserved along the model range.

Several factors comprise the sound signature: frequency response, added distortions, degree of matching between left and right drivers. Unlike the world of loudspeaker makers, which is moving towards the acceptance that the speaker frequency response should be a smooth slope downwards from low to high frequencies, the world of headphone manufacturers is still struggling to figure out the standard for the frequency response curve. The main reason for that is the fact that unlike the sound from loudspeakers, the sound emitted by headphones bypasses several "body filters" on its way to the eardrum.

The sound that we hear from outside sources partially reflects from the shoulders, and receives significant coloration from the outer ear. Thus, a sound from a loudspeaker with ideally flat frequency response being heard in an acoustically treated room is actually far from "flat" when picked up by the eardrum.

Now if we take in-ear monitor headphones that are inserted directly into the ear canal and radiate sound directly to the eardrum, and make their frequency response flat, what would be perceived by the listener is a sound with strongly attenuated vocals (because our outer ears do a great job of amplifying sound in the frequency range of vocals), not appealing at all. It had been understood that in-ear monitors thus need to have a frequency response curve that employs filtering applied by the shoulders and by the outer ear.

With over-ear headphones the situation is even more complicated because although they bypass the shoulders "filter", they still interact with the outer ear of the listener.

So it's universally been understood by the headphone manufacturers that headphones must have non-linear frequency response in order to sound pleasing, but there is still no universal agreement on the exact shape of the frequency response. Also, since most of the headphones are passive devices with no electronics inside, the target frequency response is determined by their physical make. Sometimes it can be challenging to achieve the desired frequency response by just tweaking materials and their shapes, and in cheaper headphone models the resulting frequency response is usually a compromise.

Here is where DSP normalization comes in. Since we listen to headphones via digital devices, we can put a processing stage before feeding the sound to the headphones in order to overcome the deficiencies of the headphones build, or to override manufacturer's preferred "signature."

I'm aware of two software packages that do such kind of processing: Morphit plugin by Toneboosters, and "Reference" package from Sonarworks. In this article I'm using Morphit because its functionality allows for more educational explanations.

Normalization with Morphit by Toneboosters


Morphit is built as an audio processing plugin, so by itself it can't be applied systemwide. And it is not available on Linux. I was experimenting with it on a Mac by adding it to Audacity and applying it as a filter to sine sweeps.

The task of Morphit is to apply a frequency response correction curve that changes the sound signature of certain headphone model into something else. For this, two things must be known: the frequency response of the headphones being corrected, and the target frequency response.

Morphit has three modes of operation: "Correct", "Simulate", and "Custom." The last mode is the most adjustable—it allows to specify the source frequency response, the target response, and additional correction of up to 4 parametric EQ filters. "Simulate" mode is the same, but lacks the EQ filters. "Correct" mode is the simplest one—it sets the target frequency response to "Generic studio reference." I will be using "Custom" mode as the most robust one.

On the UI, Morphit shows the correction curve, not the target. But it's easy to see the target curve as well—we need is to set the source EQ curve to flat, and the corresponding setting is called "Generic flat eardrum." So, here is how we can see what "Generic studio speaker" target setting looks like:
The only problem with the Morphit's UI is that it lacks any grids and an ability to overlay the graphs. Fortunately, we can do that in FuzzMeasure by importing the processed test signals. Here is what we get for "Generic studio reference", "Generic HiFi", and "Generic studio speaker":
The curves certainly have some similarities: low frequencies stand out above the middle range, there is a prominent peak at about 3 kHz, and after it the high frequencies start to roll off. These characteristics resemble what is known as "Harman Target Response for Headphones", and it is thoroughly dissected by the headphone expert Tyll Hertsens here. I would like to compare the attenuation values between the curves of the Harman TR, and the ones on the graph. Note that in Tyll's article the level at 200 Hz has been chosen as the 0 dB reference point, and for comparison I had offset Morphit's curves to the same level.

   Freq Eardrum  Harman  Stud Spk     HiFi   Stud Ref
  60 Hz    0 dB   +4 dB   +4.2 dB  +3.6 dB    +0.9 dB
 200 Hz    0 dB    0 dB      0 dB     0 dB       0 dB
1.2 kHz   +3 dB   +3 dB   +2.9 dB  +1.7 dB    +0.7 dB
  3 kHz  +15 dB  +12 dB  +11.9 dB  +9.9 dB    +8.5 dB
 10 kHz   +5 dB    0 dB   +2.1 dB  +1.3 dB    +2.8 dB
 18 kHz   -7 dB  -13 dB     -9 dB  -0.3 dB    -8.5 dB

As we can see, the "Studio Speaker" setting of Morphit is pretty close to the Harman Headphone Target curve, but the bass on the "Studio Speaker" starts to roll off after about 38 Hz.

In his article, Tyll suggests some refinements to the Harman curve:
  • flattening the rise from 200 Hz to 1.2 kHz;
  • lowering the peak at 3 kHz;
  • adding a peak near 10 kHz that naturally occurs due to ear canal resonance.
As it can be seen, those are very similar to the differences that we can see between the "Studio Speaker" and "Studio Reference" curves. Plus, the "Studio Reference" curve offers a flat LF line from about 110 Hz and below, and shifts the main peak a bit to the left: from 3.1 kHz to 3.3 kHz.The "HiFi" setting sits somewhere in between, and doesn't have the sharp rolloff at HF.

Subjective Evaluation


I performed blind testing using Shure SRH1540 headphones comparing the "Studio Speaker" and "Studio Reference" settings, and the latter was sounding better on most of the test tracks. The only drawback is that on some tracks the amplification of the 6..12 kHz region can sound too bright, adding harshness to "s" and "t" sounds. This can be heard very well on tracks "Little Wing" by Valerie Joyce and on "Hung Up" by Madonna. This is the same drawback that I experienced when listening to MBQuart 400 and Beyerdynamic T90 headphones. But with other track this brightness is usually well perceived by myself.

Note on the Implementation


I haven't found an explicit confirmation, but it seems that Morphit uses a recursive (IIR) filter. First, the plugin has only about 3 ms latency and second, the phase profile of processed waves is the same as in recursive filters that I've built myself in order to replicate Morphit's curves.

Do All Normalized Headphones Sound the Same?


I would not be expecting that despite that we are equalizing the frequency response of several headphones to the same target response. As I've mentioned in the very beginning, there are more additional parameters that define the "sounding" of particular headphones. One is the level of distortions that headphone's drivers introduce—they can change timbres of instruments by adding extra harmonics. Another is how well the drivers are balanced—this affects imaging.

As a simple experiment, I took 3 different headphone models: AKG K240 Studio, Sennheiser HD6xx (Massdrop version of HD650), and Shure SRH1540, then normalized some samples of commercial recordings to the same target curve for each of the headphones, and listened through.

The tonal balance has indeed been aligned. For example, K240 initially being very neutral, after normalization also started displaying the over-brightness of Madonna's "Hung Up." For all headphone models, the vocals have become much clearer.

But despite this sameness, I could still hear the individual characteristics of these headphones. K240's comparatively narrow soundstage didn't change. SRH1540 were still showing somewhat stronger bass than two other models due to closed earcups, and so on.

So there is no magic in normalization, it can't make bad headphones sound like the best ones, but it can be useful in situations where it is needed to remove the sound colorations added by the manufacturer to express a certain "sound signature."

Sunday, December 3, 2017

Why I Don't Save Filtered Samples as 16-bit PCM Anymore

When I need to evaluate a filter on a set of samples of commercial music in CD format, I used to render the filtered results into 16-bit PCM. The reasoning I had behind that is somewhat rational:
  • first, as the source material is in 16-bit resolution, and I'm not enhancing dynamic range, storing the processed result in anything beyond 16-bit seems pointless;
  • comparing floating point numbers is never as precise as comparing integers—integer 5 is always 5, whereas in the floating point world it can be represented either 5.00000, or something like 5.000001 or 4.999999;
  • although the immediate output from filters is in floating point format, there is a pretty deterministic procedure of converting floats into ints, unless dithering has been applied.
But as it turns out, the last statement is actually wrong. In the audio world, there is no single "standard" way for converting ints into floats and back. This is a good writeup I've found on this topic: "Int->Float->Int: It's a jungle out there!"

The first suspicions had started crawling into my mind when I was doing bitwise comparison of filtered results obtained from Audacity, Matlab, and Octave, for the same input sample, and using the same filter. To my surprise, the results were not quite the same.

Performing Bitwise Comparisons with Audacity


By the way, the bitwise comparison is performed trivially in Audacity using the following simple steps (for mono files):
  1. Open the first wave file in Audacity: File > Open...
  2. Convert the track is in 32-bit float format (via track's pop-up menu.)
  3. Import the second wave file: File > Import > Audio...
  4. Also make sure it is in 32-bit float.
  5. Invert one of the waves: select wave, Effect > Invert.
  6. Mix two waves together: Tracks > Mix > Mix and Render to New Track.
This creates a new track containing the difference between two waves in time domain. If the wave files were quite similar, viewing the resulting track in the default "Waveform" mode may look as a straight line at 0.0. In order to view the difference in the lowest bits, switch the resulting track into "Waveform (dB)" mode.

Another option is to check the spectrum of the resulting wave using Analyze > Plot Spectrum... dialog. If there is no difference, the spectrum window would be empty, otherwise some residual noise would be shown.

Note that it is very important to convert into 32-bit, because if the wave stays in 16-bit mode and there are samples with the minimum 16-bit value: -32768, upon inversion they will turn into max positive 16-bit value which is +32767. And summing them up with their counterparts from the original non-inverted track will produce samples of value -1.

So, when I was comparing filtered wave files processed in different systems with the same filter, and saved in 32-bit float format, usually there was no difference (except for Octave—as it turns out, even recent distributions of Octave, e.g. v4.2.1 are affected by this bug which saves into 32-bit integer instead of floats, and also stores 1.0 float value as minimum negative 32-bit value: -2147483648, instead of max positive). But once I started saving them in 16-bit format, the difference started to become quite noticeable. Why is that?

Determining Int16 Encoding Range


First, let's determine how Audacity, Matlab, and Octave deal with converting between minimum and maximum float and int16 values.

In Audiacity, we can generate a square wave of amplitude 1.0, which in 32-bit mode will be a sequence of 1.0 and -1.0 interleaved, like here:


After exporting it into a 32-bit float PCM wav file, it can be examined with "octal dump" (od) utility:

$ od -f aud-square32f.wav
...
0000120     1.000000e+00    1.000000e+00   -1.000000e+00    1.000000e+00
0000140    -1.000000e+00    1.000000e+00   -1.000000e+00    1.000000e+00
...

After exporting the same wave into a 16-bit int PCM wav file, it is possible to see the same values represented as int16:

$ od -s aud-square16.wav 
...
0000040                                               ...   32767   32767
0000060    -32768   32767  -32768   32767  -32768   32767  -32768   32767
...

Now Matlab R2017b. After loading the square wave from a 32-bit file, it's easy to display it:

>> [float_wave, fs] = audioread('aud-square32f.wav', 'native');
>> format shortEng;
>> disp(float_wave(1:8))
     1.0000e+000
     1.0000e+000
    -1.0000e+000
     1.0000e+000
    -1.0000e+000
     1.0000e+000
    -1.0000e+000
     1.0000e+000

Then it's easy to export it into 16-bit again, and check how will it be represented:

>> audiowrite('mat-square16.wav', float_wave, fs, 'BitsPerSample', 16);

$ od -s mat-square16.wav
...
0000040                                               ...   32767   32767
0000060    -32768   32767  -32768   32767  -32768   32767  -32768   32767
...

OK, that means, Audacity and Matlab use the same range for int16 representation: from -32768 to 32767. What about Octave (v4.2.1)? The result of loading the floating point wave is the same, but what about the export into int16?

$ od -s oct-square16.wav
...
0000040                                               ...   32767   32767
0000060    -32767   32767  -32767   32767  -32767   32767  -32767   32767
...

Interesting—it turns out that Octave only uses the range from -32767 to 32767, for symmetry I suppose. It's even more interesting, that if we load a 16-bit wave file produced by Audacity or Matlab into Octave in 'native' mode, that is, without converting into float, Octave will "scale" it in order to avoid using the value of -32768:

octave:5> [mat_int16_wave, ~] = audioread('mat-square16.wav', 'native');
octave:6> disp(mat_int16_wave(1:8))
   32766
   32766
  -32767
   32766
  -32767
   32766
  -32767
   32766

Personally, I find this quite odd, as I was considering the "native" loading mode to be transparent, but it's actually not.

So, obviously, this discrepancy in the range of int16 used can be the source of difference when performing bitwise comparisons. Can there be another reason? Yes, and it's the way fractional values are rounded.

Rounding Rules


For float numbers used in calculations, there is a variety of rounding rules. I've made an experiment—created floating point wave files with series of steps, and converted them into int16 using Audacity, Matlab, and Octave. For the step, I used different values depending on what range the framework uses. Thus, 1 unit—"1u" in the table can be different for the positive and the negative range. The results are quite interesting:

   Float        Audacity   Matlab   Octave
  -1.0            -32768   -32768   -32767
  -1.0 + 0.25u    -32768   -32768   -32767
  -1.0 + 0.75u    -32767   -32768   -32766
  -1.0 + 1u       -32767   -32767   -32766
  -1.0 + 2u       -32766   -32766   -32765
   0.0 - 2u           -2       -2       -2
   0.0 - 1.75u        -2       -2       -2
   0.0 - 1.25u        -1       -2       -1
   0.0 - 1u           -1       -1       -1
   0.0 - 0.75u        -1       -1       -1
   0.0 - 0.25u         0       -1        0
   0.0                 0        0        0
   0.0 + 0.25u         0        0        0
   0.0 + 0.75u         1        0        1
   0.0 + 1u            1        1        1
   0.0 + 1.25u         1        1        1
   0.0 + 1.75u         2        1        2
   0.0 + 2u            2        2        2
   1.0 - 2u        32766    32765    32765
   1.0 - 1u        32767    32766    32766
   1.0 - 0.75u     32767    32767    32766
   1.0 - 0.25u     32767    32767    32767
   1.0             32767    32767    32767

It seems that all the frameworks use slightly different rounding rules. That's another reason why the wave looking the same in the floating point format will look differently when rendered into int16.

Conclusion


Never use 16-bit PCM for anything besides the final result for listening. And then also use dithering. For any comparisons, and for bitwise comparison, always use floats—they turn out to be less ambiguous, and retain consistent interpretation across different processing software packets.

Saturday, October 28, 2017

Re-creating Phonitor Mini with Software DSP

If you have seen my previous posts, you might remember that my plan was to recreate Phonitor Mini crossfeed within miniDSP HA-DSP. However, while trying to do that I've encountered several technical difficulties. I would like to explain them first.

First, the hardware of HA-DSP looks good on paper, but flaws in the implementation can be easily detected even using an inexpensive MOTU Microbook IIc. For starters—look, there is some noise:
Yes, it's at microscopic level, but I don't see anything like that on MOTU cards, that also employ DSP. And the resampler (recall that the DSP in HA-DSP operates at 96 kHz) adds some wiggles when working with 48 kHz signals:

Finally, I've experienced stability issues when connecting HA-DSP to Android and Linux hosts. I raised the last two issues with miniDSP support, but got no resolution.

Another technical problem came from FIR filter design software. I use FIRdesigner, and it's quite powerful and versatile tool. However, it has one serious drawback in the context of my scenario—since the Phonitor crossfeed filters are quite delicate, and have only a 3 dB amplitude at most, when modeling them, every fraction of a decibel counts. But since FIRdesigner is first and foremost designed for speakers builders, it only offers 0.1 dB precision when manipulating the source signals, and that was causing non-negligible deviations of the designed FIR filters' frequency response curve when compared to the original curves of the analog Phonitor filters.

I've been wrestling with these issues for a while, then thought the situation over, and decided to sell my HA-DSP. Having parted with it, I turned back to my initial approach of performing filtering in software.

There had been some work already done by myself for generating IIR filter coefficients to fit a given frequency response using cepstral method (based on this post by R. G. Lyons). So I have dusted off my Matlab / Octave code, and prepared it for action.

However, one important thing had to be done to the analog filter measurements—since I was performing them using MOTU Microbook IIc which lacks ideally flat frequency response, I needed to remove frequency response roll-offs introduced by it from these measurements. On the DSP language this process is called deconvolution—"reversing" application of a filter.

I've learned that it's quite easy to perform deconvolution with REW, although the command is buried deep in the UI. Big thanks to REW's author John Mulcahy for pointing this out to me!

In order to perform deconvolution in REW, two measurements first need to be imported. In our case the first measurement is the Phonitor's filter curve recorded via Microbook, and the second one is a loopback measurement of Microbook on the same output and input ports. Then, on the All SPL tab one need to open the drop down menu (the cog button), select these measurements, and choose division (/) operation. Division in the frequency domain is equivalent to deconvolution performed in the time domain.


One important thing to remember is to always save all impulse responses with 32-bit resolution because quantization errors can cause quite visible deviations of the calculated frequency responses.

Now, having nice deconvolved impulse responses, I was ready for action. Since this time I was creating a software implementation of a filter, I've got more freedom in choosing its parameters—no more was I constrainted by the sampling rate of the DSP or the number of taps on it. So I've chosen to create an IIR filter operating with 44.1 kHz sampling rate, and having 24th order.

The resulting model has turned out to be ideally close to the target filter. Note that on the plot, the offset of the curves for the direct and the opposite channels isn't correct—this is because during deconvolution, the amplitudes of the filters were normalized by REW. Never mind, it's easy to fix.


In order to figure out what is the offset between the direct and the opposite channel filters, I used a nice feature of FuzzMeasure's interactive plots—after clicking a point with "Option" key pressed, FM shows both the variable value (frequency in this case), and the corresponding values of the displayed graph, with precision of 3 digits after the decimal point. So it was quite easy to find out, what is the difference between the filter curves for the direct and the opposite channels.
Using this information, I was able to fix the offset of my curves, and finally was ready to process some real audio. I've chosen a short excerpt form Roger Waters' "Amused to Death" album, where a lot of different kinds of sounds present: male and female vocals, bass, percussion, guitars, etc. The recording quality is outstanding, plus the final result was rendered using QSound technology which uses binaural cues for extending the stereo base when listening on stereo speakers. It's interesting that when listening on headphones without crossfeed, all these cues do not work due to "super stereo" effect. But they start working again with crossfeed applied. 

For blind tests, I've passed a fragment of "Perfect Sense, Pt. II" song through a rig of Phonitor Mini connected via Microbook. First in its original form, but with Phonitor's crossfeed applied, and then after processing via the IIR filter I have created, with crossfeed on Phonitor turned off. This way, the difference between these recordings were only in the implementation of the crossfeed effect.
Below are the links to the original recording, the one processed with Phonitor's own crossfeed, and with my digital re-implementation of it (make sure to listen in headphones):

Original | Phonitor crossfeed | IIR implementation

In my blind test, I couldn't distinguish between the original crossfeed and my model of it (two last samples).

Then I wanted to try to process a couple of full length albums. Here I've got a little problem -- they way sound processing is organized by default in Matlab and Octave doesn't really scale. There is a function called audioread for reading a portion of audio file into memory in uncompressed form (and there must be a continuous region in memory available for allocating the matrix that contains the wave samples). And there is a complementing function called audiowrite which writes the result back to disk. However, it would require creating custom code in order to read the input file in fragments, process it using the filter and write back.

I decided to do something different. Since I anyway was planning applying a headphone normalization filter in Audacity, I though it would be convenient to perform crossfeed processing in Audacity as well.

There is an entire scripting language in Audacity hidden behind Effects > Nyquist Prompt... menu item. The script "sees" the whole input wave as a single object, but behind the scenes Audacity feeds to the script bite-sized chunks of the input file. That's the abstraction I wanted. So I wrote a Matlab script that transforms my high-order IIR filter into a sequence of biquads, and generates a Nyquist Prompt script that performs equivalent processing.

Since biquad filters are implemented in Nyquist Prompt in native code, even a sequence of 12 biquads gets applied quite quickly, and the entire CD is processed on my modest Mac Mini 2014 in slightly more than a minute. The generated Nyquist Prompt script is here. Note that it is needed to enable "Use legacy (version 3) syntax" to work with Lisp code.

One caveat was to avoid losing precision while applying the sequence of filters. My initial mistake was to export the biquad coefficients with just 6 digits after the decimal point—the processed file was sounding awfully. Then I enhanced precision, and diffed the sound wave processed in Audacity with the same wave processed in Octave. The diff wave only contained some noise below -100 dBFS, and the two processed audio samples were now indistinguishable in a blind test.

I have mentioned headphone linearization before. With ToneBoosters Morphit performing linearization is straightforward assuming that measurements of your headphones are in Morphit's database. My first impressions after listening to processed audio samples was that Morphit thins out bass considerably. I've compared Morphit's equalization curves with Harman's Listener Target Curve for headphones and found that the former lacks the bump in the bass area featured in the latter.

So I've switched to custom mode in Morphit, and compensated the applied shaving off of the bass with a shelf filter:


The resulting audio sample sounded much better both than the original version processed with the crossfeed filter (due to more prominent vocals and percussion), and definitely better than the initial linearization by the factory Morphit filter for Shure SRH1540.

By the end of the day, I've created processed wavefiles of some quality recorded albums I have on CDs, and uploaded them into my Play Music locker for further evaluation. I didn't notice any harm from the compression to 320 kbps. Now, would I enjoy this crossfeed + Morphit processing on the albums I know well, I will find a way to apply the processing in real time to all sound output on my Linux workstation.