Sunday, December 31, 2017

On Headphone Normalization Part 2

In Part 1, we have considered the need for headphone normalization and its implementation in Morphit plugin by Toneboosters.  In this part, we will examine Sonarworks Reference.

Normalization with Sonarworks package

Sonarworks offers a package for recording studios called "Reference" which consist of room correction software, headphone correction filters, and system components that allow applying these corrections on a system wide level. For non-professionals, Sonarworks also offer "True-Fi" package which applies the same headphone correction curves using a simpler UI. For the purpose of writing this post, I've been playing with the implementation targeted to pro users, which offers more tuning capabilities.

I really like that the UI shows the grid for the response curves. It's also great that all 3 curves: source, correction, and target can be displayed on the graph.

However, Sonarworks doesn't offer the same degree of freedom for setting up the source and the destination frequency responses as does Morphit. The list of source responses only includes real headphone models, but no artificial curves like "flat at the eardrum." The list of target responses is even more limited, offering only a handful of speakers and headphones, which are presented like riddles, e.g. "A well respected open hi-fi and mastering reference headphone model nr.650 [...]" Illustrated by a picture of Sennheiser HD650 as a hint. I guess, this was done to work around some legal issues.

Here I have discovered the first curious thing—simulating HD650 using "HD650 Average" measurement didn't result in a flat compensation curve, and a similar thing with AKG K712:

I've asked a question about this on Sonarworks' support forum, and their representative confirmed my suspicion that the target curves in the package are not as up to date as the source averaged measurements.

Target Curve of Sonarworks

OK, first question has been resolved. The next important question—what does "Flat" target response mean for headphones. The UI doesn't help much, it indeed shows the target response as flat. But as we know from Part 1, it shouldn't really be flat at the earphone speakers.

My speculation is that since the package offers normalization for both headphones and speakers, the team has decided, they will represent the normalization from the speakers point of view. Thus, the "flat" target setting for headphones must be "as heard using speakers calibrated to flat response", but they did not specify under which conditions, and with which tweaks. As we have seen with the Harman Target Curve and Morphit, an honest "flat loudspeaker in a reference room as picked up at eardrum" may not be a preferred setting due to its "dullness."

In order to provide an educated guess, I've performed the following experiment. I've chosen the same headphone model—Shure SRH1540 as a source both in Morphit and Sonarworks, and normalized a test sweep signal separately using two plugins.

Then using "Trace Arithmetic" in Room Eq Wizard, I derived a transfer function for transforming Morphit's filter response into Sonarworks', and applied this transfer function to Morphit's "Studio Reference" curve. Here is the result compared to Morphit's "Studio Reference" (red) and "Studio Speaker" (which as we remember, resembles Harman Target Response) (blue). The Sonarworks' approximated response is a green curve.

Note that this is only approximation since measurement data for SRH1540 is obviously different between Morphit and Sonarworks (it's hard to perform headphone measurements reliably, especially at high frequencies).

But still, we can see similar shapes here, confirming that Sonarworks may be using something similar to either of the curves (and it's definitely not a "flat" target response as their UI suggests). Two remarkable differences can be seen though:

  • The response at high frequencies is rolling off. Indeed, the sound of normalized SRH1540 is duller with Sonarworks, unless additional treble adjustment is applied.
  • The bass is cranked up. Again, this can be heard very well. Though Sonarworks provides a bass control that allows + / - 6 dB correction, which can fix this.

Note on Implementation

Another interesting thing concerning the Sonarworks Reference package is that it can use different filter types for normalization. On the "Advanced" tab, there is a choice between "Zero Latency", "Optimum", and "Linear Phase" settings:

"Zero Latency" means applying a recursive (IIR) filter (as in Morphit), which has negligible latency but introduces some phase shifts.

"Optimum" is a shorter non-recursive minimum phase FIR filter of 500 taps, that at 44.1 kHz introduces a delay of about 11 ms—still OK for real-time operation.

"Linear Phase" is a longer FIR filter that achieves linear phase (no phase changes), but has longer processing time, and also adds some "pre-ringing."

Which Product to Choose

Personally, I've stuck with Morphit because it's cheaper, and allows me to see the target frequency response. On the other hand, Sonarworks offers a system-wide component that applies normalization to all system sounds. Although, this can also be achieved by means of using Morphit in conjunction with Audio Hijack Pro by Rogue Amoeba which allows applying plugins to system output, as well as capturing it.

Sonarworks also offers a service for measuring your personal headphones. However, I would prefer the headphones to be measured for my own head, not for a dummy head simulator, since factors as the shape of the pinna, and the shape of ear canals greatly affect the resonances that occur in the outer ear.

Tuesday, December 19, 2017

On Headphone Normalization

What is Headphone Normalization

The world of headphones is jungle. There are thousands of headphone models on the market, each with its own sound signature. Every aspect of headphone build affects the sound. A lot of headphone makers also come up with their own "sound style", e.g. having lots of bass, or a "bright" sound, or neutral "studio reference" signature which is preserved along the model range.

Several factors comprise the sound signature: frequency response, added distortions, degree of matching between left and right drivers. Unlike the world of loudspeaker makers, which is moving towards the acceptance that the speaker frequency response should be a smooth slope downwards from low to high frequencies, the world of headphone manufacturers is still struggling to figure out the standard for the frequency response curve. The main reason for that is the fact that unlike the sound from loudspeakers, the sound emitted by headphones bypasses several "body filters" on its way to the eardrum.

The sound that we hear from outside sources partially reflects from the shoulders, and receives significant coloration from the outer ear. Thus, a sound from a loudspeaker with ideally flat frequency response being heard in an acoustically treated room is actually far from "flat" when picked up by the eardrum.

Now if we take in-ear monitor headphones that are inserted directly into the ear canal and radiate sound directly to the eardrum, and make their frequency response flat, what would be perceived by the listener is a sound with strongly attenuated vocals (because our outer ears do a great job of amplifying sound in the frequency range of vocals), not appealing at all. It had been understood that in-ear monitors thus need to have a frequency response curve that employs filtering applied by the shoulders and by the outer ear.

With over-ear headphones the situation is even more complicated because although they bypass the shoulders "filter", they still interact with the outer ear of the listener.

So it's universally been understood by the headphone manufacturers that headphones must have non-linear frequency response in order to sound pleasing, but there is still no universal agreement on the exact shape of the frequency response. Also, since most of the headphones are passive devices with no electronics inside, the target frequency response is determined by their physical make. Sometimes it can be challenging to achieve the desired frequency response by just tweaking materials and their shapes, and in cheaper headphone models the resulting frequency response is usually a compromise.

Here is where DSP normalization comes in. Since we listen to headphones via digital devices, we can put a processing stage before feeding the sound to the headphones in order to overcome the deficiencies of the headphones build, or to override manufacturer's preferred "signature."

I'm aware of two software packages that do such kind of processing: Morphit plugin by Toneboosters, and "Reference" package from Sonarworks. In this article I'm using Morphit because its functionality allows for more educational explanations.

Normalization with Morphit by Toneboosters

Morphit is built as an audio processing plugin, so by itself it can't be applied systemwide. And it is not available on Linux. I was experimenting with it on a Mac by adding it to Audacity and applying it as a filter to sine sweeps.

The task of Morphit is to apply a frequency response correction curve that changes the sound signature of certain headphone model into something else. For this, two things must be known: the frequency response of the headphones being corrected, and the target frequency response.

Morphit has three modes of operation: "Correct", "Simulate", and "Custom." The last mode is the most adjustable—it allows to specify the source frequency response, the target response, and additional correction of up to 4 parametric EQ filters. "Simulate" mode is the same, but lacks the EQ filters. "Correct" mode is the simplest one—it sets the target frequency response to "Generic studio reference." I will be using "Custom" mode as the most robust one.

On the UI, Morphit shows the correction curve, not the target. But it's easy to see the target curve as well—we need is to set the source EQ curve to flat, and the corresponding setting is called "Generic flat eardrum." So, here is how we can see what "Generic studio speaker" target setting looks like:
The only problem with the Morphit's UI is that it lacks any grids and an ability to overlay the graphs. Fortunately, we can do that in FuzzMeasure by importing the processed test signals. Here is what we get for "Generic studio reference", "Generic HiFi", and "Generic studio speaker":
The curves certainly have some similarities: low frequencies stand out above the middle range, there is a prominent peak at about 3 kHz, and after it the high frequencies start to roll off. These characteristics resemble what is known as "Harman Target Response for Headphones", and it is thoroughly dissected by the headphone expert Tyll Hertsens here. I would like to compare the attenuation values between the curves of the Harman TR, and the ones on the graph. Note that in Tyll's article the level at 200 Hz has been chosen as the 0 dB reference point, and for comparison I had offset Morphit's curves to the same level.

   Freq Eardrum  Harman  Stud Spk     HiFi   Stud Ref
  60 Hz    0 dB   +4 dB   +4.2 dB  +3.6 dB    +0.9 dB
 200 Hz    0 dB    0 dB      0 dB     0 dB       0 dB
1.2 kHz   +3 dB   +3 dB   +2.9 dB  +1.7 dB    +0.7 dB
  3 kHz  +15 dB  +12 dB  +11.9 dB  +9.9 dB    +8.5 dB
 10 kHz   +5 dB    0 dB   +2.1 dB  +1.3 dB    +2.8 dB
 18 kHz   -7 dB  -13 dB     -9 dB  -0.3 dB    -8.5 dB

As we can see, the "Studio Speaker" setting of Morphit is pretty close to the Harman Headphone Target curve, but the bass on the "Studio Speaker" starts to roll off after about 38 Hz.

In his article, Tyll suggests some refinements to the Harman curve:
  • flattening the rise from 200 Hz to 1.2 kHz;
  • lowering the peak at 3 kHz;
  • adding a peak near 10 kHz that naturally occurs due to ear canal resonance.
As it can be seen, those are very similar to the differences that we can see between the "Studio Speaker" and "Studio Reference" curves. Plus, the "Studio Reference" curve offers a flat LF line from about 110 Hz and below, and shifts the main peak a bit to the left: from 3.1 kHz to 3.3 kHz.The "HiFi" setting sits somewhere in between, and doesn't have the sharp rolloff at HF.

Subjective Evaluation

I performed blind testing using Shure SRH1540 headphones comparing the "Studio Speaker" and "Studio Reference" settings, and the latter was sounding better on most of the test tracks. The only drawback is that on some tracks the amplification of the 6..12 kHz region can sound too bright, adding harshness to "s" and "t" sounds. This can be heard very well on tracks "Little Wing" by Valerie Joyce and on "Hung Up" by Madonna. This is the same drawback that I experienced when listening to MBQuart 400 and Beyerdynamic T90 headphones. But with other track this brightness is usually well perceived by myself.

Note on the Implementation

I haven't found an explicit confirmation, but it seems that Morphit uses a recursive (IIR) filter. First, the plugin has only about 3 ms latency and second, the phase profile of processed waves is the same as in recursive filters that I've built myself in order to replicate Morphit's curves.

Do All Normalized Headphones Sound the Same?

I would not be expecting that despite that we are equalizing the frequency response of several headphones to the same target response. As I've mentioned in the very beginning, there are more additional parameters that define the "sounding" of particular headphones. One is the level of distortions that headphone's drivers introduce—they can change timbres of instruments by adding extra harmonics. Another is how well the drivers are balanced—this affects imaging.

As a simple experiment, I took 3 different headphone models: AKG K240 Studio, Sennheiser HD6xx (Massdrop version of HD650), and Shure SRH1540, then normalized some samples of commercial recordings to the same target curve for each of the headphones, and listened through.

The tonal balance has indeed been aligned. For example, K240 initially being very neutral, after normalization also started displaying the over-brightness of Madonna's "Hung Up." For all headphone models, the vocals have become much clearer.

But despite this sameness, I could still hear the individual characteristics of these headphones. K240's comparatively narrow soundstage didn't change. SRH1540 were still showing somewhat stronger bass than two other models due to closed earcups, and so on.

So there is no magic in normalization, it can't make bad headphones sound like the best ones, but it can be useful in situations where it is needed to remove the sound colorations added by the manufacturer to express a certain "sound signature."

Sunday, December 3, 2017

Why I Don't Save Filtered Samples as 16-bit PCM Anymore

When I need to evaluate a filter on a set of samples of commercial music in CD format, I used to render the filtered results into 16-bit PCM. The reasoning I had behind that is somewhat rational:
  • first, as the source material is in 16-bit resolution, and I'm not enhancing dynamic range, storing the processed result in anything beyond 16-bit seems pointless;
  • comparing floating point numbers is never as precise as comparing integers—integer 5 is always 5, whereas in the floating point world it can be represented either 5.00000, or something like 5.000001 or 4.999999;
  • although the immediate output from filters is in floating point format, there is a pretty deterministic procedure of converting floats into ints, unless dithering has been applied.
But as it turns out, the last statement is actually wrong. In the audio world, there is no single "standard" way for converting ints into floats and back. This is a good writeup I've found on this topic: "Int->Float->Int: It's a jungle out there!"

The first suspicions had started crawling into my mind when I was doing bitwise comparison of filtered results obtained from Audacity, Matlab, and Octave, for the same input sample, and using the same filter. To my surprise, the results were not quite the same.

Performing Bitwise Comparisons with Audacity

By the way, the bitwise comparison is performed trivially in Audacity using the following simple steps (for mono files):
  1. Open the first wave file in Audacity: File > Open...
  2. Convert the track is in 32-bit float format (via track's pop-up menu.)
  3. Import the second wave file: File > Import > Audio...
  4. Also make sure it is in 32-bit float.
  5. Invert one of the waves: select wave, Effect > Invert.
  6. Mix two waves together: Tracks > Mix > Mix and Render to New Track.
This creates a new track containing the difference between two waves in time domain. If the wave files were quite similar, viewing the resulting track in the default "Waveform" mode may look as a straight line at 0.0. In order to view the difference in the lowest bits, switch the resulting track into "Waveform (dB)" mode.

Another option is to check the spectrum of the resulting wave using Analyze > Plot Spectrum... dialog. If there is no difference, the spectrum window would be empty, otherwise some residual noise would be shown.

Note that it is very important to convert into 32-bit, because if the wave stays in 16-bit mode and there are samples with the minimum 16-bit value: -32768, upon inversion they will turn into max positive 16-bit value which is +32767. And summing them up with their counterparts from the original non-inverted track will produce samples of value -1.

So, when I was comparing filtered wave files processed in different systems with the same filter, and saved in 32-bit float format, usually there was no difference (except for Octave—as it turns out, even recent distributions of Octave, e.g. v4.2.1 are affected by this bug which saves into 32-bit integer instead of floats, and also stores 1.0 float value as minimum negative 32-bit value: -2147483648, instead of max positive). But once I started saving them in 16-bit format, the difference started to become quite noticeable. Why is that?

Determining Int16 Encoding Range

First, let's determine how Audacity, Matlab, and Octave deal with converting between minimum and maximum float and int16 values.

In Audiacity, we can generate a square wave of amplitude 1.0, which in 32-bit mode will be a sequence of 1.0 and -1.0 interleaved, like here:

After exporting it into a 32-bit float PCM wav file, it can be examined with "octal dump" (od) utility:

$ od -f aud-square32f.wav
0000120     1.000000e+00    1.000000e+00   -1.000000e+00    1.000000e+00
0000140    -1.000000e+00    1.000000e+00   -1.000000e+00    1.000000e+00

After exporting the same wave into a 16-bit int PCM wav file, it is possible to see the same values represented as int16:

$ od -s aud-square16.wav 
0000040                                               ...   32767   32767
0000060    -32768   32767  -32768   32767  -32768   32767  -32768   32767

Now Matlab R2017b. After loading the square wave from a 32-bit file, it's easy to display it:

>> [float_wave, fs] = audioread('aud-square32f.wav', 'native');
>> format shortEng;
>> disp(float_wave(1:8))

Then it's easy to export it into 16-bit again, and check how will it be represented:

>> audiowrite('mat-square16.wav', float_wave, fs, 'BitsPerSample', 16);

$ od -s mat-square16.wav
0000040                                               ...   32767   32767
0000060    -32768   32767  -32768   32767  -32768   32767  -32768   32767

OK, that means, Audacity and Matlab use the same range for int16 representation: from -32768 to 32767. What about Octave (v4.2.1)? The result of loading the floating point wave is the same, but what about the export into int16?

$ od -s oct-square16.wav
0000040                                               ...   32767   32767
0000060    -32767   32767  -32767   32767  -32767   32767  -32767   32767

Interesting—it turns out that Octave only uses the range from -32767 to 32767, for symmetry I suppose. It's even more interesting, that if we load a 16-bit wave file produced by Audacity or Matlab into Octave in 'native' mode, that is, without converting into float, Octave will "scale" it in order to avoid using the value of -32768:

octave:5> [mat_int16_wave, ~] = audioread('mat-square16.wav', 'native');
octave:6> disp(mat_int16_wave(1:8))

Personally, I find this quite odd, as I was considering the "native" loading mode to be transparent, but it's actually not.

So, obviously, this discrepancy in the range of int16 used can be the source of difference when performing bitwise comparisons. Can there be another reason? Yes, and it's the way fractional values are rounded.

Rounding Rules

For float numbers used in calculations, there is a variety of rounding rules. I've made an experiment—created floating point wave files with series of steps, and converted them into int16 using Audacity, Matlab, and Octave. For the step, I used different values depending on what range the framework uses. Thus, 1 unit—"1u" in the table can be different for the positive and the negative range. The results are quite interesting:

   Float        Audacity   Matlab   Octave
  -1.0            -32768   -32768   -32767
  -1.0 + 0.25u    -32768   -32768   -32767
  -1.0 + 0.75u    -32767   -32768   -32766
  -1.0 + 1u       -32767   -32767   -32766
  -1.0 + 2u       -32766   -32766   -32765
   0.0 - 2u           -2       -2       -2
   0.0 - 1.75u        -2       -2       -2
   0.0 - 1.25u        -1       -2       -1
   0.0 - 1u           -1       -1       -1
   0.0 - 0.75u        -1       -1       -1
   0.0 - 0.25u         0       -1        0
   0.0                 0        0        0
   0.0 + 0.25u         0        0        0
   0.0 + 0.75u         1        0        1
   0.0 + 1u            1        1        1
   0.0 + 1.25u         1        1        1
   0.0 + 1.75u         2        1        2
   0.0 + 2u            2        2        2
   1.0 - 2u        32766    32765    32765
   1.0 - 1u        32767    32766    32766
   1.0 - 0.75u     32767    32767    32766
   1.0 - 0.25u     32767    32767    32767
   1.0             32767    32767    32767

It seems that all the frameworks use slightly different rounding rules. That's another reason why the wave looking the same in the floating point format will look differently when rendered into int16.


Never use 16-bit PCM for anything besides the final result for listening. And then also use dithering. For any comparisons, and for bitwise comparison, always use floats—they turn out to be less ambiguous, and retain consistent interpretation across different processing software packets.