Friday, May 31, 2019

Measuring Bridged and "Balanced" Amplifier Outputs

For a long time this topic was troubling me—how to measure bridged mode amplifiers properly. The problem here is that without taking precautions it's possible to end up with an amp ruined by a short circuit. I think I've got enough understanding about this matter and got some interesting results by measuring one of the amps I use.

Bridged Mode of Power Amplifers


A lot of commercial stereo amplifiers I've seen have "bridged mode" feature which turns the unit into a mono amplifier of higher power. E.g. on my Monoprice Unity amplifier, one needs to set the mode switch accordingly, connect the "+" wire of the speaker to the right "+" output, and the "-" wire of the speaker to the left "-" output. Obviously, only one input (left) is used in this case.

This mode is implemented in the amplifier by dedicating each of the channels to one wire of the load, and inverting the input to one of the amplifiers. Schematically, it looks like this:

This configuration doubles voltage on the ends of the load compared to regular stereo mode. In theory, this would result in 4x power increase into the same load, but in reality due to various losses it's usually only a bit higher than 3x. For example, the Monoprice Unity 100W amp is specified as delivering 50 Watt/channel into an 8 Ohm load in stereo mode, and 120 W into the same load when bridged, that's 2.4x ratio. Exemplary engineered AHB2 amplifier from Benchmark offers a much higher increase of 3.8x into the same load when in bridged mode.

However, the bridged configuration potentially can add more distortion because each channel effectively "sees" twice less load (e.g. 4 Ohm if an 8 Ohm speaker is connected). Thus, it would be interesting to measure the difference in distortion of bridged vs. regular mode. But here is the catch—the "-" wire of the load is now connected to the second amplifier's output. We can't connect it to the signal ground of an audio analyzer anymore as this would short-circuit the amplifier.

Here is why it happens. Normally, the ground plane of the input audio signal is the same as the ground plane of the output. When using an audio analyzer, this allows directly comparing the input signal from the signal generator to the output:
However, in the bridged configuration the zero voltage point (reference potential) for amp's output is virtual and located "in between" the terminals of the load:
The same situation can be encountered with Class-D amplifiers that are designed for maximum efficiency. In this case so called H-bridge configuration is used. That means, these amplifiers do not offer "single ended" mode at all and always run in bridged mode. Not every Class-D amp use H-bridge, but measurements for this class of amplifiers must be done with caution.

"Balanced" and "Active Ground" Headphone Amplifiers

And we encounter the same problem when we want to measure a headphone amplifier with "balanced" or "active ground" output. Note that the implementation of "balanced" output may vary—in the simplest case it only means that left and right outputs do not share the ground point. This is done to reduce channel crosstalk that occurs due to common-impedance coupling. In this case there is no additional amplifier on the "-" wire, and thus connecting it to the ground of the analyzer input does not cause any issues.

However, if "balanced" headphone output means "doubled circuitry" (essentially, this is the same as "bridging" for a power amplifier), or if the ground channel has a dedicated amplifier path, as in the AMB M3 amplifier (this is called "active ground"), then we must avoid connecting the ground of the output to the ground of the analyzer input.

Measurement Techniques


Since we must avoid connecting the ground of the output to the ground of the input, the simplest solution would be to leave the second wire of the output "floating" and only connect the "+" wire to the signal input of the analyzer. That's what I used myself in the past. In this case, the analyzer will still uses the input ground as a reference. The result might be off due to difference in levels between the "virtual ground" point in the middle of the load and the input ground.


For example, I created a symmetric load consisting of two 4 Ohm resistors. In this case, theoretically there is a 0 V point right between them. In practice, the measured difference between the potentials of the output and input grounds was 0.35 V. That means, it's better to avoid connecting them because this voltage will induce current into the input ground.

However, it's possible to use a second, floating analyzer unit for the output. It's possible to use a battery-powered voltmeter for measuring the voltage across the load, right? The same way, it's possible to use a full analyzer, but only if it's not connected to the input. This way, the analyzer on the output measures the output voltage relative to the output ground, which gives correct results. But operating two analyzers: one for generating signals, and another the measure the output can be cumbersome.

Also, what if we can't split the load, e.g. if we are using a real speaker instead of a resistor load? In this case we need to make a differential measurement. For oscilloscopes, there are special probes for this purpose. QuantAsylum QA401 has differential inputs (marked "+" and "-"). We need to connect one side of the load to the "+" input wire, and the other to the "-", leaving input ground floating. That's OK because the ground is not used as a signal reference anymore. Here is how wiring looks like:
Another advantage of a differential input is that any common mode noise on the probes gets cancelled. What I have noticed is that on a single-ended measurement I see a 60 Hz spike often, but it disappeared immediately after I have switched to differential input—with same amp, same probes, and same connections. That means, the 60 Hz hum is induced into the probes' wires by electromagnetic fields from nearby mains wiring.

Measuring Monoprice Unity 100W Amp


As a practical exercise, I've measured THD and IMD on Monoprice Unity 100W Class-D amplifier. It does not use H-bridge configuration, that means in stereo mode channels are driven from a single end and the "-" wire of the speaker it at the input ground plane's potential.

Bridged mode into 8 Ohm load, differential measurement

First I set the amp to maximum volume and checked with a true RMS voltmeter the potential difference across an 8 Ohm load while driving the input with a 1 kHz sine wave at -10 dBV (that's the nominal consumer line level). The voltmeter was showing 19.55 Vrms. Note that the resulting power value (from the V ^ 2 / R formula) is ~ 48 W, which is twice less than 120 W specified by the amp's manual (perhaps, the manufacturer was using higher level of the input signal). However, these levels seem right to me, in fact usually I don't even run the amp at the maximum volume.

But even that output level is close to QA401's limits on the input voltage (20 Vrms) so I decided to use a split load (2 x 4 Ohm resistors in series) and lowered input signal to -12 dBV. This got me 14.47 Vrms across 8 Ohm load, which is mere 26 W. Over the same load, a differential measurement with QA401 shows 23 dBV peak (agrees with the figure in Vrms), and if the load is specified as 8 Ohm, QA401 also shows 25 W output power—nice.

I also tried measuring with QA401 over half load (4 Ohm). The peak was now 17 dBV (7 Vrms—half of what the full load has), so I had to specify the load in QA401 as 2 Ohm in order to get the same 25 W figure.

Here is what I saw in terms of THD and IMD:



Definitely not outstanding results, especially if we consider that this is at less than 1/4 of the advertised power. One particularly interesting issue is the amount of ultrasonic noise on the IMD measurement. I suppose, this is caused by the fact that this amp uses a weak anti-aliasing filter, as we can see from its frequency response measurement:


The graph is quite fuzzy due to amplifier's non-linearity, but still we can see clearly that the downwards slope on the right is very gentle. This could be good property for a Class-A or Class-AB amplifier, but since Class-D effectively applies sampling to the input signal, the output is better be treated by a brick wall filter.

Single-ended mode into 8 Ohm load

I tried to achieve the same modest 25 W for an 8 Ohm load (remember that the manual states that the amp outputs 50 W into 8 Ohm in single-ended configuration), however with the volume at maximum the reading of the voltmeter reading was only 10.45 Vrms, that's less than 14 W output power. I've increased the input signal level to the nominal -10 dBV, and it got me about 22 W. And even with this lesser power, the THD have increased twice compared to bridged mode, and the dual tone signal for the IMD was overloading the amplifier, so I had to cut it the input for IMD back to -12 dBV (and it still seem to overload).



Conclusions


1. Bridged amplifiers can be measured properly using differential mode of the QuantAsylum QA401 analyzer. If the output voltage is too large, the load can be split to reduce the voltage. Necessary corrections have to be applied if we want QA401 to display proper power figures. It's always possible to double check the results using a true RMS voltmeter.

2. Bridged mode also helps to defeat noise induced into probe wires by electromagnetic fields, especially the notorious 60 Hz hum.

3. The performance of Monoprice Unity 100W amp in single ended mode is quite bad. For driving an 8 Ohm load I would prefer using it in bridged mode.

4. And this result was contrary to my expectations—bridged mode, when driven at lower levels has much less distortion on this amplifier than single-ended mode at nominal level. That's why it's always better to measure first.

Tuesday, March 5, 2019

Amp & DAC box for LXmini (miniBox)

I'm rebuilding my audio system around RME Fireface UCX. According to numerous user posts, RME devices are very reliable, so I'm hoping to achieve overall better stability than with MOTU Ultralite AVB which goes crazy approximately every three weeks. Personally I would tolerate this, but since the system is used by my entire family, hearing from them that "sound is not working AGAIN" has become somewhat annoying.

My surround system is 4.1 configuration. As I've explained in my post about LXmini, their super-stable and super-focused phantom center eliminates the need for the central speaker in the surround configuration. I use LXmini both for the front pair and for the surround pair. The surround pair is about 4 meters away from the audio electronics—the shortest path. However, speaker cables run along the wall, and would be perhaps twice longer. This is why together with the second pair of LXminis I had built a dedicated amplifier and DAC box which is located close to them.

While I'm rebuilding the main system, I've made a small upgrade to this surround box as well. The main change is that I've connected it using a digital SPDIF link, and started using Neutrik SpeakON ports for connecting LXminis. Overall, the box now looks more like a complete system on its own, so I decided to do a quick post about it.

Here is what I've got in it:
It's a 2U half-rack enclosure by All Metal Parts company in the UK. On the lower level I put the QSC SPA4-100 amplifier—I use it for the front pair of LXminis, too. Then there is miniDSP 2x4 HD (one of the options recommended by S. Linkwitz), and an inexpensive SPDIF coaxial to TOSLink optical converter by Monoprice.

This is what I put on the rear side:
Here we have a pair of 4-pole SpeakON connectors for the speakers, and an SPDIF input (RCA). All the power inputs are connected directly: the standard 3-prong AC receptacle on the back of the amplifier, and power adapters from miniDSP and the SPDIF converter. I've also provided an USB wire to miniDSP for the purposes of tuning and diagnostics.

Here is a diagram of the device:
Now the obvious question: why did I chose a digital connection and specifically the coaxial? A digital connection requires only 1 wire, also since miniDSP is then connected using an optical TOSLink, there is complete electrical isolation between the DAC / Amp circuitry of the miniBox and the main system. Then, why didn't I just run a TOSLink cable directly? Mainly because optical cables require a bit more care at corners. A high quality low impedance coaxial cable seems like a more robust option.

The usual concern with using SPDIF is jitter. Would the noise in the cable create more jitter? My particular concern was also due to the fact that the Monoprice coax to optical converter does not isolate the shield of the coaxial input from the ground. Which means, the shield of the cable connects ground planes of audio devices connected to different power outlets—a perfect opportunity for a ground loop-induced noise to occur. Although the coaxial cable I use (Belden 8241) has very low shield resistance, still there can be some noise voltage added to the digital signal.

To check whether there is a real problem, I ran jitter test (at 48 kHz, 24 bit resolution) in ARTA. First using miniDSP's USB input, then using TOSLink connected directly to Fireface's optical output, and finally in the actual working condition when Fireface is connected to another power outlet, with a 5.5 meter coaxial cable running from Fireface's coax output to miniBox. The resulting spectre, measured using QuantAsylum QA401 on the outputs of miniDSP was always the same:
Jitter is minimal and does not depend on how the digital signal gets delivered to miniDSP. Note that similar results for USB and TOSLink input are published at Audio Science Review forum. So I think there is no reason to worry about jitter here.

One more discovery I've made is that the coaxial to TOSLink Monoprice converter I use only supports sampling rates up to 48 kHz. Not a problem for me because this is the sampling rate I intend to run the Fireface at, so it can accept an optical input from Xbox directly. However, would I decide to increase the sampling rate, I would need to look for a different converter.

Saturday, February 9, 2019

rvalue references and move semantics in C++11, Part 2

Continued from Part 1.

Previously we have seen that the compiler can match functions accepting rvalue reference arguments to temporary values. But as we have discussed before, it also helps for program efficiency to avoid copying when the caller abandons ownership of an object. In this case, we need to treat a local variable as if it were a temporary. And this is exactly what std::move function is for.

On the slide example, there are two overloads for the function "foo": one for lvalue references, and one for rvalue references. The calling code prepares a string, and then transfers ownership to the callee by calling std::move on the variable. The agreement is that the caller must not use this variable after calling std::move on it. The object that has been moved from remains in a valid, but not specified state.

Roughly, calling std::move on a value is equivalent to doing a static cast to rvalue reference type. But there are subtle differences with regard to the lifetime of returned temporaries stemming from the fact that std::move is a function (kudos to Jorg Brown for this example). Also, it's more convenient to call std::move because the compiler infers the return type from the type of the argument.
There are well known caveats with the usage of std::move.

First, remember that std::move does not move any data—it's efficiently a type cast. It is used as an indication that the value will not be used after the call to std::move.

Second, the practice of making objects immutable using const keyword interferes with usage of std::move. As we remember, rvalue reference is a writable reference that binds to temporary objects. Writability is an important property of rvalue references that distinguish them from const lvalue references. Thus, an rvalue reference to an immutable value is rarely useful.
This one I see a lot of times. Remember we were talking that expressions have two attributes: the type and the value category. And that an expression which evaluates a variable has an lvalue category. Thus, if you are assigning an rvalue reference to something, or passing it to another function, it has an lvalue category and thus uses copy semantics.

This in fact makes sense, because after getting some object via an rvalue reference, we can make as many copies of it as we need. And then, as the last operation, use std::move to steal its value for another object.
And finally—sometimes people want to "help" the compiler by telling it that they don't need the value they are returning from function. There is absolutely no need for that. As we have seen earlier, copy elision and RVO were in place since C++98.

Moreover, since calling std::move changes the type of the value, a not very sophisticated compiler can call a copy constructor because now the type of the function return value and the type of the actual value we are returning do not match. Newer compilers emit a warning about return value pessimization, or even optimize out the call to std::move, but it's better not to do this in the first place.
We have discussed how use of move semantics improves efficiency of the code by avoiding unnecessary data copying. Does it mean that recompiling the same code with enabled support for C++11 and newer standards would improve its performance?

The anwer depends on the actual codebase we are dealing with. If the code mostly uses simple PODs aggregated from C++ standard library types, and the containers are from the standard library, too, then yes—there can be performance improvements. This is because the containers will use move semantics where possible, and the compiler will be able to add move constructors and move assignment operators to user types automatically.

But if the code uses POD types aggregating primitive values, or homebrew containers, then performance will not improve. There is a lot of work to be done on the code in order to employ the benefits of move semantics.
In order to consider the changes between C++98 and C++11 in more detail, I would like to bring up the question of efficient parameter passing practices. Those who programmed in C++ for long enough know these conventions by heart.

On the horizontal axis, we roughly divide all the types we deal with into three groups:
  - cheap to copy values—small enough to fit into CPU's registers;
  - not very expensive to copy, like strings or POD structures;
  - obviously expensive to copy, like arrays; polymorphic types for which passing by value can result in object slicing are also in this category.

On the vertical axis, we consider how the value is used by the function: as an explicitly returned result, as a read only value, or as a modifiable parameter.

There is not much to comment here except for the fact that unlike the C++ standard library, the Google C++ coding style prohibits use of writable references for in/out function arguments in order to avoid confusion.
What changes with C++11? Not much! The first difference is obvious—C++11 gets move semantics, thus functions can be overloaded for taking rvalue references.

The second change is due to introduction of move-only types, like std::unique_ptr. These have to be passed the same way as cheap to copy types—by value.

Then, instead of considering whether is a type is expensive to copy, we need to consider whether it is expensive to move. This brings the std::vector type into the second category.

Finally, for returning expensive to move types, consider wrapping them into a smart pointer instead of returning a raw pointer.
As a demonstration of why it's now efficient to pass vectors by value, let's consider the case when copy elision is not applicable due to dynamic choice of returned value.

In C++98, this code results in copying the contents of the local vector, this is obviously inefficient.
However, a C++11 compiler will call a move constructor in this case. This demonstrates why a type with efficient move implementation can always be returned by value from functions.
Let's return to our slides explaining different ways of passing data between functions. In C++98, it makes no difference for the caller how the callee will use a parameter passed by a const reference. For the caller, it only matters that it owns this data and the callee will not change it.

If we consider the callee implementation, if it's possible for it not to make a copy of the data, it's likely a different action or algorithm from the one that requires making a copy. I've highlighted this difference on the slide by giving the callee functions different names.

And we have no performance problems with the function that doesn't need to make a copy—function B. It's already efficient.
C++11 can help us with the second case. Now we can prepare two versions of the callee: one for the case when the caller can't disown the data, and one for the case when it can. Clearly, these callers are different now, that's why they are given different names on the slide.

In the first case we still need to make a copy, but in the second case we can move from the value that the caller abandons. It's interesting that after obtaining a local value, it doesn't really matter how it has been obtained. The rest of both overloads of the function C can proceed in the same way.
Which brings us to the idea that we can unify these two overloads, and require the caller to either make a copy, or to transfer ownership for the value it used to own.

This relieves us from the burden of writing two overloads from the same function. As you can see, the call sites do not change!

In fact, this is the approach that the Google C++ Style Guide highly recommends for using.
This idiom doesn't come for free. As you can see, there is always an additional move operation that happens on the callee side.

And since the copy operation now happens at the caller side, there is no way for the callee to employ delayed copying.

However, the pass-by-value idiom works very nicely with return-by-value, because the compiler allocates a temporary value for the returned value, and then the callee moves from it.
Since using pass-by-value idiom apparently has costs, why does the Google C++ Style Guide favors it so much?

The main reason is that it's much simpler to write code this way. There is no need to create const lvalue and rvalue reference overloads—this problem becomes especially hard if you consider functions with multiple arguments. We would need to provide all the combinations of parameter reference types in order to cover all the cases. This could be simplified with templates and perfect forwarding, but just passing by value is much simpler.

The benefit of passing by value also becomes clear if we consider types that can be created by conversion, like strings. Taking a string parameter by value makes the compiler to deal with all the required implicit conversions.

Also, if we don't use rvalue references as function arguments, we don't need to remember about the caveat that we need to move from them.

But we were talking about performance previously, right? It seems like we are abandoning it. Not really, because not all of our code contributes equally to program performance. So it's a usual engineering tradeoff—avoiding premature optimization in favor of simpler code.
Does it mean we can apply pass-by-value idiom to any code base? The answer is similar to the one that we had on the "silver bullet" slide. Apparently, that depends on whether types in your codebase implement efficient move operations.

Also, applying this idiom to an existing code base shifts the cost of copying to callers. So any code conversion must be accompanied with performance testing.
Having pass-by-value in mind, let's revise our table with recommended parameter passing. Types that can be moved efficiently, can be passed by value both as in and out parameters. Also, instead of passing an in/out parameter using a pointer, you can pass it as an input parameter, and then get back as an out parameter, both times by value.

Expensive to move and polymorphic types should still use pass-by-reference approach.

As you can see, there is no place for rvalue reference arguments here. As per Google C++ Style guide recommendations, they should only be used as move constructor and move assignment arguments. A couple of obvious exceptions are classes that are have a very wide usage across your codebase, and functions and methods that are proven to be invoked on performance critical code paths.
So we have figured out that we need to know how to write move constructors and move assignments. Let's practice that.

There is a trivial case when a compiler will add move constructor and move assignment automatically to a structure or a class, and they will move the fields. This is possible in the case when the class is a trivial POD, and doesn't have user-defined copy constructor, copy assignment operator, and destructor.

For this example, I've chosen a simple Buffer class—we do use a lot of buffers in audio code.

Let's start with the class fields and constructors. I've made is a struct just to avoid writing extra code. So we have size field, and a data array. To leverage automatic memory management, I've wrapped the data array into a std::unique_ptr. I've also specified the default value for the 'size' field. I don't have to do that for the smart pointer because it's a class, thus the compiler will initialize it to null.

I defined a constructor that initializes an empty buffer of a given size. Note that by adding empty parenthesis to the array constructor you initialize its contents with zeroes. I marked the constructor as explicit because it's a single argument constructor, and I don't want it to be called for implicit conversions from size_t values.

Note that I don't have to define a destructor because unique_ptr will take care of releasing the buffer when the class instance will get destroyed.

And we don't have to write anything else because as I've said, the compiler will provide a move constructor and a move assignment operator automatically. But it will not be possible to copy an instance of this class because unique_ptr is move-only.
So let's define and implement a copy constructor. It takes a const reference to a source instance. I use a feature of C++11 called delegating constructor, which allows me to reuse the allocation code from the regular constructor. After allocating the data in the copy recipient, I copy the data from the source. Note that a call to std::copy with all parameters being a nullptr is valid, thus we don't need a special code to handle copying of an empty buffer.

In this case the compiler does not add move constructor and move assignment by itself. And that means, any "move" operations on this class will in fact make a copy of it. That's not what we want.

So, we tell the compiler to use the default move assignment and move constructor. These will move all the fields that the class has. Also simple.
By the way, we haven't provided a copy assignment yet. The compiler generates a default version, but suppose we need to write one by hand.

This is one of the idiomatic approaches, it's called copy-and-swap. I've seen it mentioned in early S. Meyers books on C++. We make a copy of the parameter into a local variable—this calls the copy constructor we have defined earlier. Then we use the standard swap function to exchange the contents of the default initialized (empty) values of our current instance with the copy. As we exit the function, the now empty local buffer will be destroyed.

The advantage of this approach is that we don't need to handle self-assignment specially. Obviously, in case of a self-assignment there will happen an unneeded copying of data, but on the other hand, there is no branching which is also harmful for performance.
Now suppose we also want to implement a custom move assignment. A naive attempt would be to use the same approach as for copying. In a move assignment we receive a reference to a temporary, so we could swap with it.

Unfortunately, we are creating an infinite loop here, because the standard implementation of the swap function uses move assignment for swapping values! That makes sense, but what should we do instead?
In order to break the infinite loop, we need to implement the swap function for our class. In fact, this function is implemented by virtually any C++ standard library class.

Note that it takes its in / out parameter by a non-const reference. This is allowed by the Google C++ Style Guide specifically for this function, because it's a standard library convention.

It's important to mark this function as noexcept because we will use it in other noexcept methods.

The implementation of this function simply calls the swap function for every field of the class. As I've said, this functions is defined for all language types and C++ standard library classes, so it knows how to perform swapping of values having size_t and std::unique_ptr types.

Note that the idiomatic way to call the swap function is to have the "using std::swap" statement and then just say "swap", instead of calling "std::swap" directly. This is because this pattern allows to call user-defined swap functions. The full explanation is available in the notes on Argument-dependent lookup.
We also define an out-of-class swap function which takes two Buffers and swaps them.

Note that adding the friend keyword makes this function to be an out-of-class, even if we declare and define it within our class definition.
Now we can implement move constructor and the move assignment trivially using the swap function. A move constructor is simply a call to swap. A move assignment must in addition return a reference to our instance.

Both functions are marked noexcept.
In fact, by employing pass-by-value, we could merge the implementations of copy assignment and move assignment into one function as demonstrated here. This is called a "unified" assignment operator.

As we have discussed before, this implementation costs an additional move operation. Also, the copying now happens at the caller side. Maybe not the best approach for a class that will be used widely.
Here is the complete code for the class, and it fits on one slide. As you can see, writing simple classes that support move semantics is simple. But as usual with C++ you have an infinite number of alternatives to choose from.
What I have explained to far, were basics. It's very important to understand them before going any further.

And when you are ready to, here are some references. I've used these materials while preparing this talk. The first is the Scott Meyer's book on effective modern C++ techniques—it contains an entire chapter on rvalue references and move semantics.
Then it's the Google C++ Style Guide which I was quoting often in this talk. It provides sensible guidelines that help writing understandable code.
Abseil library C++ tips contain a lot of explanations and examples on not so obvious behavior of C++.
Thomas Becker's article is a good place to dive into rvalues and move semantics details.
The "Back to Basics!" talk by Herb Sutter contains interesting discussions regarding the C++11 and C++14 features.

And, for getting more insight into C++ history, and into the history of rvalue references and move semantics, there is the Bjarne Stroustrup's book, and the original proposals to the standard. 

Friday, February 8, 2019

rvalue references and move semantics in C++11, Part 1

This is the presentation I have delivered recently to my colleagues. When doing audio programming, a lot of times we need to think how to write efficient code, and C++ move semantics help to avoid unneeded data copying.
This presentation is both for people who are new to C++ and used to program in other languages, and for veteran C and C++ programmers who still didn't quite catch up with the changes that had happened with the introduction of the C++11 standard. The aim of this talk is to explain how rvalue references and move semantics help to write more efficient and readable code.
We strive to write modular code, where data processing is distributed among thousands of functions or methods (in object-oriented languages). Typically, when passing data between functions we need to decide whether we need to copy the data (which can be expensive) or rather use indirection and simply pass the address of memory where the data is stored. The usual concern when doing the latter is that it can lead to unexpected program behavior, because the contract for data ownership is often neither expressed formally, nor enforced. I'm not talking about data races that happen in multi-threaded code, but rather about getting unexpected results even in the case of a single thread of execution.
As an illustration of "unexpected behavior" of a single-threaded program due to misuse of indirection, here is an arcane problem that was often biting JavaScript programmers. For those who never programmed in JavaScript, you need to know that functions in JS are first-class objects and can be manipulated as easily as any other values, which includes dynamic creation of functions at runtime. When a function is created, it captures a subset of program state (it's called "closure").

In this example, we create 10 functions, each of them is intended to return the current value of the for loop variable—i, and store them in an array for later use. You ask, why would we even need to do something like that? Well, in JavaScript the program is executed on the user interface (UI) thread and in order to achieve smooth user experience it is often needed to split some task into smaller pieces and execute them one by one when the UI thread is otherwise idle.

So, when we try to execute the stored functions later, what results do we actually get?
Turns out, it's not what we expected—all the functions evaluate to the value of 10! This is because in JavaScript all the variables are passed by reference, thus the "return i" part has captured a reference to i, instead of its value, and thus when the function gets executed, it returns the current value of i, accessed via the reference.

Why is this code so confusing? One of the reasons is that "return i" looks syntactically similar to "a[i]" construct, but they result in different actions being performed. As for accessing the array element its index needs to be known, in the expression "a[i]" the current value of i is evaluated immediately. But since the evaluation of the created function is postponed, so is the evaluation of i for the "return i" expression.
A clean solution came with EcmaScript 6 standard, which adds the keyword "let" that needs to be used instead of "var" for introducing the loop variable. Unlike "var", "let" scopes the variable to the block of code. Since the JS interpreter knows that the variable i is going out of the loop's scope, it copies the value of i into the function closure.
If we intend to minimize copying by means of indirection, that is, via passing variables to functions by reference, and at the same time would like to guarantee program correctness, three strategies are possible:
  • Prohibit any modifications to shared data. This approach is taken to the extreme in purely functional programming languages, where all data is immutable. However, the experience of writing programs in those languages is very different from what we find in traditional imperative programming languages (such as C++ and Java). For example, a "for" loop that we have seen in the example before simply couldn't exist in a purely functional PL, because it requires modifying the loop variable.
  • The second approach is to start with sharing a reference, and then make a copy of data as needed—typically when the recipient of the reference is about to modify the data. This approach is widely used by operating systems for physical memory pages, it is known as "Copy on write". However, determining the actual moment when the data needs to be copied without help from computer hardware is tricky, and can lead to even more overhead than eager data copying.
  • The third approach is to indicate clearly when the data owner changes. This way we can guarantee that the recipient of a reference to the data can then modify the data freely.
These approaches are graphically illustrated here. Again, we consider only the case of a single program thread. In all examples, function A calls function B, and needs to pass in a big blob of data.

Case 1) If it is possible to guarantee that B does not modify data, then it's safe to pass it by reference.

Case 2) If B needs at some point to modify the data which it has received via a read-only reference, it makes a copy.

Case 3) It would be also useful to know that A in fact does not need the data anymore, for example, it has prepared the data and now is passing it to B for further processing. In this case, the data can still be passed by reference, and B does not even need to make a copy.

Before considering how these behaviors can be expressed in C++11, we start with Java, C, and C++98 to illustrate some parallels and to help people proficient in these languages roll into C++11 smoothly.
Java gurus say that everything in Java is passed by value. This requires a clarification—the value of an object or an array variable is a reference to the object or to the array. This makes sense because declaring a variable of object type also requires initializing, for example by calling the 'new' operator. Thus, it's the reference to the object that gets passed by value. I remember this thing being confusing for C programmers because in C a declaration of a structure variable allocates it automatically.

Another slightly confusing thing in Java for a C/C++ programmer, is how the final keyword (the closest analog of the const keyword in C/C++) works. As we will see from the examples, making an object variable final does not prevent modifications to the referenced object. The only way to guarantee object immutability in Java is to make sure its class does not have any mutator methods. The canonical example is the String class.

However, the drawback of making the String class immutable is that a naive way of concatenating two strings requires copying the contents of both strings into a new one. For efficient concatenation, a mutable StringBuffer class has to be used (this is the approach that Java compiler typically employs under the hood).

When passing a mutable object to another method, it's not possible to specify using the Java language that the object is not going to be modified there. If the caller needs to be sure, it would be better to use an object of an immutable class.
This is an illustration. For a primitive type, the final keyword works as a C/C++ programmer would expect it to—it prevents the variable value from being modified. However, a final variable of an object type allows calling methods that change its contents, only the variable itself can't be mutated.

The same way, if there is a method taking an object, and the argument is marked final, it does not prevent the method from mutating the object.
Now, let's consider the language that was used as a base for C++—the C language. Here, even variables of a structure type are passed by value. However, the Java-like logic twist still applies to arrays—a variable of an array type is a reference to the data location, thus when passing an array to a function, its data does not get copied. In order to pass an array by value, it needs to be wrapped within a structure.

C offers an explicit type for a variable holding an address of a memory location—it's called pointer. In order to initialize such a variable, an explicit "address of" (&) operator is used. Again, since array variables are in fact references to data location, they can be used the same way as pointers.

C also offers more explicit immutability specification. A pointer variable can either be immutable itself (no reassignments are possible), point to data that can't be altered, or both. However, the syntax for specifying the immutability rules can look a bit confusing.
Personally, I find C rules for specifying immutability more intuitive than those in Java. If there is a const variable of a structure type, any modifications to it are prohibited. Thus, a composite type behaves very much like a primitive type.

As I've said before, a function parameter of a structure type is passed by value.
When looking at pointer types containing the const keyword, you need to make sure you understand what it actually applies to. The first example shows a const pointer to a structure. The second shows a writable pointer to unmodifiable data, known as pointer to const.

The nice thing about C rules is that passing a structure by pointer to const guarantees that the function will not modify it (well, it's always possible to cast constness out, but you would hope the function does not misuse this capability).
Returning a structure from a C function turns out to be challenging. Historically, returning by value required making a copy of the structure. The ANSI C (2nd) edition of K&R's book on C advises to return a structure by reference (using pointer) to avoid copying.

Another reason why returning structs by reference is so popular with C programmers is that it's an abstraction mechanism. When returning a structure by reference the recipient does not need to know it's size, thus it's possible to hide the definition of the structure. This approach is used in the standard C library for the FILE type, for example.

But returning a struct by a reference imposes risks that the recipient might ignore the result, and thus create a memory leak. Also, some functions do not actually allocate data dynamically and instead return a pointer to some static data. You always need to read the function documentation to figure this out.

Returning using a pointer also complicates the case when custom memory pools are used in the program, so sometimes the allocation is done by the caller, which then passes a pointer to allocated, but yet uninitialized structure to the callee. This is in fact a manual implementation of copy elision.
On modern operating systems this optimization is performed automatically by the compiler, and many popular ABIs support it (see the summary here). This optimization is called copy elision. It's mechanism is explained on this slide.

This optimization applies only when the structure is returned by value. This implies that on the recipient side it is allocated automatically. The compiler treats the destination structure and the local structure in the callee as the same variable. As a result, returning a structure by value has a negligible cost.

On the right I've placed a high-level pseudocode to show what exactly the compiler does to the code on the left. Effectively, the function doesn't return anything, but instead takes a pointer to the destination memory. The constructor of the object created in the function is called for that location.
However, copy elision is not always applicable. On this slide we see the case when inside the function there are several structures, and one of them is chosen as a return value based on some condition. Obviously, the compiler can't initialize two variables in the area designated for one structure. Thus, on return a copy still has to be made.
Compared to C, it's successor C++ in its initial widespread version (referred to as C++98 or C++03) introduced a powerful concept of constructor and destructor methods. Since they are called when an instance of a class is being created or destroyed, these events can now be observed by the program code. In C, the program can't observe allocation of automatic (stack-allocated) structures. In C++ this is possible.

This fact is exploited by the RAII pattern—objects with automatic storage class can be used for automatic resource management. However, the copy elision optimization becomes observable by the program code, too—it's possible to note whether a copy constructor and a destructor were executed. This is why the return value optimization (RVO) had to be defined at the level of the C++ language standard, and not merely as an ABI convention.
The fact that C++ also permits the programmer to "hide" constructors and copy assignment operators allows the creating of non-copyable types. For these types, an attempt to copy an instance causes a compilation error. There are some good uses for this feature. Some classes manage physical or OS resources, and a copy operation for them does not make sense. The stream classes in the C++ standard library are non-copyable.

In the C++98 era, the mechanism for making a class non-copyable was implemented as macros. In Google's C++ code this macro was initially called DISALLOW_EVIL_CONSTRUCTORS, probably coming from the famous and now also obsolete "Don't Be Evil" motto. Since the name wasn't quite exact—the macro was disabling the copy constructor and copy assignment operator, it had been later renamed to DISALLOW_COPY_AND_ASSIGN. The modern style recommends to use the more explicit "= delete" syntax introduced in C++11.

Using non-copyable types had drawbacks in C++98. Since copy elision was considered as a nice-to-have optimization, old compilers could prohibit returning an instance of a non-copyable class from a function by value. This got finally sorted out in C++17 which makes copy elision mandatory. Non-copyable types also couldn't be placed into containers, as they might need to make a copy of an item, e.g. while resizing the internal storage.
Another addition in C++ was the reference type. It might seem to be excessive—after all, C already has a very powerful pointer type. As Bjarne Stroustrup explains in his book "The Design and Evolution of C++", it was a consequence of allowing user-defined operators in C++ .

If we consider a subtraction operator for a hypothetical Matrix class, it could be expensive to pass a Matrix by value. The C-style solution—pass by a pointer—would require the caller to apply the "address of (&)" operator to the operator arguments. However, the resulting expression would look indistinguishable from pointer arithmetic.

The solution was to leave the syntax at the call site as if the operator arguments are passed by value, but at the caller specify that it receives the arguments by reference.
As we can see, the reference type is defined at a higher level of abstraction than the pointer type, because it hides the fact that the address of a value is taken. References are aliases for the original variable. As a result, they don't have an associated storage on their own, as pointers do. Thus, it's impossible to get an address of a reference—the application of "address of (&)" operator yields the address of the referenced value.

In order to simplify semantics of assignment into references, the reference itself is considered to be immutable, it can (and must) be initialized during declaration. Thus, placement of the const keyword in the reference type declaration doesn't change anything, and always means that the referenced value is immutable.
Here is a simple example showing usage of a reference. The call site looks the same regardless of whether the callee takes the argument by value or by a reference. But for the callee, things look different. The reference is bound to the argument, and any inadvertent modifications will be reflected on the original value.

That means passing a value by reference can lead to confusion at the caller site, similarly to what we have seen with the "Last Value" Problem in  JavaScript.
That's why Google's C++ Coding Style doesn't permit use of non-const references as function and method parameters. For an in/out parameter, a pointer type must be used. This way, at the call site there is an indication that the argument can be modified.
A classic example where passing by a non-const reference was causing confusion is the notorious auto_ptr class from C++98 standard library. The intention behind it was good—to provide a smart pointer, where the RAII pattern is used for automatic memory management.

However, in order to achieve that, the designers had to drop the const keyword from the copy constructor and the assignment operator. That's because the destination auto_ptr had to "steal" the managed resource, invalidating the source auto_ptr.

This behavior wasn't only confusing to human users. Standard containers were not ready to this kind of behavior either, because they sometimes need to make a temporary copy of an item, e.g. to use as a pivot element during sorting, and then discard it. Containers assume that all copies of an item are equivalent. However, auto_ptr clearly violates this assumption, and that's why it was banned from being stored in standard containers.

Needless to say, don't use auto_ptr in modern code.
Apparently, you feel that it's time to jump into the great changes that happened in C++11, and finally allowed to express this kind of object behavior more clearly. But before we actually do that, let's consider one more interesting aspect of references.

So, here is a situation—we have an integer variable, which we assign to a const reference to a float, and then we change the variable's value. I can assure you, this code is accepted by all C++ compilers. 

What value will the float reference have afterwards? Obviously, 3 outcomes are possible: f changes its value to match if's value remains the same (42.0), or this code causes undefined behavior and formats your hard drive.
The answer is—f's value remains the same. Why? It's because the compiler creates a temporary float value, and this is what f actually references. The compiler uses the fact that the int type can be implicitly converted into a float. This situation couldn't happen with pointers because the compiler would simply reject the attempt to initialize a pointer to float with a pointer to int.

That explains why Google C++ Style recommends tagging all single argument constructors with the explicit keyword (unless implicit conversions need to be allowed for this class). This prevents the compiler from silently using the constructor for implicit type conversion.

Other curious things to consider here:
  • the reference must be const. As Bjarne Stroustrup explains, he decided that allowing a non-const reference to a temporary value would be even more confusing;
  • when we create a reference to a temporary, it's called binding. This extends the lifetime of a temporary until the moment when the reference goes out of scope. This is a very tricky behavior and it's easy to get into a trouble when a temporary is returned from a function.
  • since taking an address of a reference actually takes the address of the referenced value, we can obtain an address of a temporary!
I've just said that having a writable reference to a temporary would be confusing. In fact, an rvalue reference type introduced in C++11 does exactly that—it binds to temporaries (and to temporaries only), and allows modifying them. Why on Earth would anyone want to do that?
But before trying to answer that question, let's clear up on the naming. What is an "rvalue", exactly? Turns out, it's something people were dealing with even in C++98, and even in C.

From language point of view, any expression in addition to the type of the value emitted, also has an attribute which is called "value category", which basically tells whether this expression can be assigned to or not.

Let's consider an example of prefix and postfix increment. Since a prefix increment returns a reference to the new value of the object, that is, to the object itself, we can assign to it. This operation is only possible in a language that has reference types. In order to assign, we need to put the expression on the left side of the assignment operator, that's why such expressions have lvalue category.

Postfix increment returns the previous value of the object, that is, a temporary value. We can't assign to it, thus a postfix increment can only be put on the right side of an assignment operator. Such expressions have the rvalue category.

One caveat with using the "left side of an assignment" definition is that there can be const lvalue expressions. Obviously, they can't be assigned to, but could be, have we stripped them from the const qualifier. So const lvalue expressions are still counted as lvalues.

lvalue and rvalue expressions are defined on a case-by-case bases. For example, an expression for accessing a variable has lvalue category, and so has a prefix increment or decrement, whereas a postix increment or decrement has an rvalue category. The type category of some expressions, e.g. of the conditional operator ?: and comma operator (,) depends on the categories of their operands. For the ternary operator the rules are really complex.
So this is how rvalue references can be used for good. C++ allows their use as arguments of functions and methods. It in fact defines a new kind of a constructor—the move constructor, and a new kind of the assignment operator—the move assignment. They take an rvalue reference as a parameter, which means they will be called for temporary values, and for them only.

Since the argument is a temporary value, no other code could see it. Thus, it's possible to "steal" its value, leaving the temporary in some valid but otherwise unusable state, and let it get discarded. "Stealing" of the value (moving from) for certain types can be implemented more efficiently than copying from.

Recall the "Proxy" pattern. Classes implementing this pattern are typically lightweight by themselves, but reference a big chunk of data (indirection!) Canonical examples from the C++ standard library are std::string and std::vector. Another name for this pattern in C++ is "resource handle."
This is why moving is more efficient for a "proxy" type. I use std::vector as an example here. Copying a vector involves copying both the instance of the vector class, and all the vector's elements. The elements are not stored in the vector instance directly, but instead are allocated outside of the class.

Thus, when moving from a vector the instance of the vector class still has to be copied, but the vector elements don't have to. Since the moved from instance is discarded anyways, the destination vector can simply make its internal data reference to point to the elements of the old vector. The actual implementation typically swaps the values of important data members while performing a move.
From the previous explanation it's easy to see what types can be moved from more efficiently than copied. As an example, for primitive types and their aggregations "moving from" is equivalent to copying. But for "proxy" types that use indirection moving is more efficient.

Keep in mind that some standard library types like std::string or std::function may or may not use indirection. When they don't it's typically because the amount of data they carry is small, so it doesn't hurt performance if they get copied. For strings, this approach is called small string optimization.

Any non-efficient type can be trivially converted into an efficient one by introducing indirection—wrapping the type into a smart pointer. We will talk about them in a minute.
Unlike in the previous somewhat artificial example for references to temporaries, you typically don't need to use rvalue references as variables in real code. The most common usage for them is in arguments of move constructors and move assignments. They can also appear in overloads of functions or methods that are optimized for taking temporary values.
The introduction of move semantics helped C++ get a smart pointer that actually works. It's called std::unique_ptr, and is designed to automatically manage the lifetime of a dynamically allocated object. There can only be one owner for a unique_ptr at any given time. This is why copying is disabled for this class. For the purposes of ownership transfer, move constructor and move assignment are used. This is different from auto_ptr which was trying to use copy constructor and assignment in an unusual way, as you remember.

Types that allow moving, and that's pretty much all of the types, are called moveable types. It's possible for a type in C++11 to be non-copyable but moveable. Such types are called move-only types. There is another smart pointer type added in C++11 which can be copied, it's called shared_ptr.

One technical moment I would like to direct you attention to is that both move constructor and move assignment for unique_ptr are tagged with the noexcept keyword. This states that these methods can't throw exceptions, and it's a compile-time replacement for the "throw()" statement which was used in the prior versions of the C++ standard.
Another big change in the C++ standard library is that the classes now support move-only types (recall that auto_ptr was not allowed in containers), so it's possible to store unique_ptr in vectors, maps, etc.

When you define a container of move-only types, the container naturally becomes move-only, too. That also applies to structures and classes containing move-only types as fields.

And an important change in the container behavior is that they can use move semantics instead of copying during certain operations, for example when resizing the internal container storage. This is good for performance—for example, when a vector of vectors get resized, there is no need to copy the contents of stored vectors. One caveat however, is that containers use this optimization only if move constructor and assignment are marked as exception-safe using the noexcept keyword.

This container behavior doesn't change if exception support is disabled using compiler switches. That's why Google C++ Style recommends not forgetting to put the noexcept specification on move constructors and move assignment operators for your classes.