One-Shot Learning of C++ r-value, &&, and Move

Photo by
Sean McAuliffe on 
Unsplash

C++ is hard, the newer versions become even harder. This article will deal with some of the hard parts in C++, r-value, r-value reference (
&&) and move semantics. And I am going to reverse engineer (not a metaphor) these complex and correlated topics, so you can understand them completely in one shot.

Firstly, let’s examine

What is a r-value?

A r-value is one that should be on the right side of an equals sign.

Example:

int var; // too much JavaScript recently:)
var = 8; // OK! l-value (yes, there is a l-value) on the left

8 = var; // ERROR! r-value on the left
(var + 1) = 8; // ERROR! r-value on the leftCopy the code

Simple enough. Then let’s look at some more subtle r-values, ones that are returned by functions:

#include <string>
#include <stdio.h>

int g_var = 8;
int& returnALvalue() {
   return g_var; //here we return a left value
}

int returnARvalue() {
   return g_var; //here we return a r-value
}

int main() {
   printf("%d", returnALvalue()++); // g_var += 1;
   printf("%d", returnARvalue());
}Copy the code

Result:

8 and 9Copy the code

It is worth noting that the way of returning a l-value (in the example) is considered a bad practice. So do not do that in real world programming.

Beyond theoretical level

Whether a variable is a r-value can make differences in real programming even before && is invented.

For example, this line

const std::string& name = "rvalue";Copy the code

can be compiled fine while this:

std::string& name = "rvalue"; // use a left reference for a rvalueCopy the code

generates following error:

error: non-const lvalue reference to type 'std::string' (aka 'basic_string<char, char_traits<char>, allocator<char> >') cannot bind to a value of unrelated type 'const char [7]'Copy the code

The error message means that the compiler enforces a const reference for r-value.

A more interesting example:

#include <stdio.h> #include <string> void print(const std::string& name) { printf("rvalue detected:%s\n", name.c_str());  }Copy the code
void print(std::string& name) {
    printf("lvalue detected:%s\n", name.c_str());
}Copy the code
int main() {
    std::string name = "lvalue";
    print(name); //compiler can detect the right function for lvalue

    print("rvalue"); // likewise for rvalue
}Copy the code

Result:

lvalue detected:lvalue
rvalue detected:rvalueCopy the code

The difference is actually significant enough and compiler can determine overloaded functions.

So r-value is constant value?

Not exactly. And this where && (r-value reference)comes in.

Example:

#include <stdio.h>
#include <string>Copy the code
Void print(const STD ::string& name) {printf(" const value detected:%s\n ", name.c_str()); }Copy the code
Void print(STD ::string& name) {printf(" lvalue detected%s\n ", name.c_str()); }Copy the code
Void print(STD ::string&& name) {printf(" rvalue detected:%s\n ", name.c_str()); }Copy the code
Int main() {STD ::string name = "lvalue"; Const STD ::string cname = "cvalue";Copy the code
print(name); print(cname); Print (" rvalue "); }Copy the code

Result:

lvalue detected:lvalue
const value detected:cvalue
rvalue detected:rvalueCopy the code

If the functions are overloaded for r-value, a r-value variable choose the more specified version over the version takes a const reference parameter that is compatible for both. Thus, && can further diversify r-value from const value.

In bellow I summarize the compatibility of overloaded function versions to different types the in default setting. You can iterate different permutation by commenting out certain lines in the example above to verify the result.

It sounds cool to further differentiate r-value and constant value, which are not exactly the same. But what is the practical value?

What problem does && solve exactly?

The problem is the unnecessary deep copy when the argument is a r-value.

To be more specific. && notation is provided to specify a r-value, which can be used to avoid the deep copy when the r-value, 1) is passed as an argument of either a constructor or an assignment operator, and 2) the class of which contains a pointer (or pointers) referring to dynamically allocated resource (memory).

It can be more specific with examples:

#include <stdio.h>
#include <string>
#include <algorithm>

using namespace std;

class ResourceOwner {
public:
  ResourceOwner(const char res[]) {
    theResource = new string(res);
  }
  ResourceOwner(const ResourceOwner& other) {
    printf("copy %s\n", other.theResource->c_str());
    theResource = new string(other.theResource->c_str());
  }
  ResourceOwner& operator=(const ResourceOwner& other) {
    ResourceOwner tmp(other);
    swap(theResource, tmp.theResource);
    printf("assign %s\n", other.theResource->c_str());
  }
  ~ResourceOwner() {
    if (theResource) {
      printf("destructor %s\n", theResource->c_str());
      delete theResource;
    }
  }

private:
  string* theResource;
};Copy the code
void testCopy() { // case 1
  printf("=====start testCopy()=====\n");Copy the code
  ResourceOwner res1("res1");
  ResourceOwner res2 = res1;  //copy res1Copy the code
  printf("=====destructors for stack vars, ignore=====\n");
}Copy the code
void testAssign() { // case 2
  printf("=====start testAssign()=====\n");Copy the code
  ResourceOwner res1("res1");
  ResourceOwner res2("res2");
  res2 = res1; //copy res1, assign res1, destrctor res2Copy the code
  printf("=====destructors for stack vars, ignore=====\n");
}Copy the code
void testRValue() { // case 3
  printf("=====start testRValue()=====\n");
  
  ResourceOwner res2("res2");
  res2 = ResourceOwner("res1"); //copy res1, assign res1, destructor res2, destructor res1Copy the code
  printf("=====destructors for stack vars, ignore=====\n");
}Copy the code
int main() {
  testCopy();
  testAssign();
  testRValue();
}Copy the code

Result:

=====start testCopy()=====
copy res1
=====destructors for stack vars, ignore=====
destructor res1
destructor res1
=====start testAssign()=====
copy res1
assign res1
destructor res2
=====destructors for stack vars, ignore=====
destructor res1
destructor res1
=====start testRValue()=====
copy res1
assign res1
destructor res2
destructor res1
=====destructors for stack vars, ignore=====
destructor res1Copy the code

The result are all good for the first two test cases, i.e., testCopy() and testAssign(), in which resource in res1 is copied for the res2. It is reasonable to copy the resource because they are two entities both need their unshared resource (a string).

However, in the third case, the (deep) copying of the resource in res1 is superfluous because the anonymous r-value (returned by ResourceOwner(” res1 “)) will be destructed right after the assignment thus it does not need the resource anymore:

res2 = ResourceOwner("res1"); // Please note that the destructor res1 is called right after this line before the point where stack variables are destructed.Copy the code

I think it is a good chance to repeat the problem statement:

&& notation is provided to specify a
r-value, which can be used to avoid the deep copy when the
r-value, 1) is passed as an argument of either a constructor or an assignment operator, and 2) the class of which contains a pointer (or pointers) referring to dynamically allocated resource (memory).

If copying of a resource that is about to disappear is not optimal, what is the right operation then? The answer is

Move

The idea is pretty straightforward, if the argument is a r-value, we do not need to copy. Rather, We can simply “move” the resource (that is the memory the r-value points to). Now let’s overload the assignment operator using the new technique:

ResourceOwner& operator=(ResourceOwner&& other) {
  theResource = other.theResource;
  other.theResource = NULL;
}Copy the code

This new assignment operator is called a move assignment operator. And a move constructor can be programmed in a similar way.

A good way of understanding this is: when you sell your old property and move to a new house, you do not have to toss all the furniture as we did in case 3 right? Rather, you can simply move the furniture to the new home.

All good.

What is std::move?

Besides the move assignment operator and move constructor discussed above, there is one last missing piece in this puzzle, std::move.

Again, we look at the problem first:

when 1) we know a variable is in fact a r-value, while 2) the compiler does not. The right version of the overloaded functions can not be called.

A common case is when we add another layer of resource owner, ResourceHolder and the relation of the three entities is given as bellow:

holder
 |
 |----->owner
         |
         |----->resourceCopy the code

(N.b., in the following example, I complete the implementation of ResourceOwner’s move constructor as well)

Example:

#include <string>
#include <algorithm>Copy the code
using namespace std;Copy the code
class ResourceOwner {Copy the code
public:
  ResourceOwner(const char res[]) {
    theResource = new string(res);
  }Copy the code
ResourceOwner(const ResourceOwner& other) {printf(" copy %s\n ", other.theresource ->c_str()); theResource = new string(other.theResource->c_str()); }Copy the code
++ResourceOwner(ResourceOwner&& other) {++ printf(" move cons %s\n ", other.theresource ->c_str()); ++ theResource = other.theResource; ++ other.theResource = NULL; + +}Copy the code
ResourceOwner& operator=(const ResourceOwner& other) { ResourceOwner tmp(other); swap(theResource, tmp.theResource); Printf (" the assign % s \ n ", other theResource - > c_str ()); }Copy the code
++ResourceOwner& operator=(ResourceOwner&& other) {++ printf(" move assign %s\n ", other.theresource ->c_str()); ++ theResource = other.theResource; ++ other.theResource = NULL; + +}Copy the code
~ResourceOwner() {if (theResource) {printf(" destructor %s\n ", theResource->c_str()); delete theResource; }}Copy the code
private:
  string* theResource;
};Copy the code
class ResourceHolder {Copy the code
...Copy the code
ResourceHolder& operator=(ResourceHolder&& other) {printf(" move assign %s\n ", other.theresource ->c_str()); resOwner = other.resOwner; }Copy the code
...Copy the code
private:
  ResourceOwner resOwner;
}Copy the code

In ResourceHolder’s move Assignment operator, We want to call ResourceOwner’s move Assignment operator since “a no-pointer member of an r-value should be an r-value However, when we simply code resOwner = other.resowner, What get invoked is actually the ResourceOwner’s normal assignment operator that, again, incurs the extra copy.

It’s a good chance to repeat the problem statement again:

when 1) we know a variable is in fact a
r-value, while 2) the compiler does not. The right version of the overloaded functions can not be called.

As a solution we use to std::move to cast the variable to r-value, so the right version of ResourceOwner’s assignment operator can be called.

ResourceHolder& operator=(ResourceHolder&& other) {printf(" move assign %s\n ", other.theresource ->c_str()); resOwner = std::move(other.resOwner); }Copy the code

What is std::move exactly?

We know that type cast is not simply a compiler placebo telling a compiler that “I know what I am doing” Generate instructions of mov a value to bigger or smaller registers (e.g.,%eax->%cl) to conduct the “cast”.

So what std::move does exactly behind scene. I do not know myself when I am writing this paragraph, so let’s find out together.

First we modify the main a bit (I tried to make the style consistent)

Example:

Int main() {ResourceOwner res(" res1 "); Asm (nop); // remeber me ResourceOwner && rvalue = std::move(res); Asm (nop); // remeber me }Copy the code

Compile it, and dissemble the obj using

clang++ -g -c -std=c++11 -stdlib=libc++ -Weverything move.cc
gobjdump -d -D move.oCopy the code

Result:

0000000000000000 <_main>:
 0: 55 push %rbp
 1: 48 89 e5 mov %rsp,%rbp
 4: 48 83 ec 20 sub $0x20,%rsp
 8: 48 8d 7d f0 lea -0x10(%rbp),%rdi
 c: 48 8d 35 41 03 00 00 lea 0x341(%rip),%rsi # 354 <GCC_except_table5+0x18>
 13: e8 00 00 00 00 callq 18 <_main+0x18>
 18: 90 nop // remember me
 19: 48 8d 75 f0 lea -0x10(%rbp),%rsi
 1d: 48 89 75 f8 mov %rsi,-0x8(%rbp)
 21: 48 8b 75 f8 mov -0x8(%rbp),%rsi
 25: 48 89 75 e8 mov %rsi,-0x18(%rbp)
 29: 90 nop // remember me
 2a: 48 8d 7d f0 lea -0x10(%rbp),%rdi
 2e: e8 00 00 00 00 callq 33 <_main+0x33>
 33: 31 c0 xor %eax,%eax
 35: 48 83 c4 20 add $0x20,%rsp
 39: 5d pop %rbp
 3a: c3 retq
 3b: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)Copy the code

between the two nop, we can notice some dummy instructions generated for the move(if looking closely, you can know that they do basically nothing) However, if we turn on O (-O1)for the compiler, all the instructions will be gone.

clang++ -g -c -O1 -std=c++11 -stdlib=libc++ -Weverything move.cc
gobjdump -d -D move.oCopy the code

Moreover, if changing the critical line to:

ResourceOwner & rvalue = res;Copy the code

The assembly generated is identical.

That means the move semantics is pure syntax candy and a machine does not care at all.

To conclude,