Reprinted from light – city. The club/sc/codingSt…

Introduction:

In C, there are three types of memory allocation: static, automatic, and dynamic. Static variables are constants embedded in the source file, and because they have known sizes and never change, they are not that interesting. Automatic allocation can be thought of as stack allocation — allocating space when a lexical block enters and freeing space when the block exits. Its most important feature is directly related to this. Prior to C99, automatically allocated variables needed to know their size at compile time. This means that any strings, lists, maps, and any structures derived from these must exist in dynamic memory in the heap.

Programmers explicitly allocate and free dynamic memory using four basic operations: MALloc, Realloc, calloc, and free. The first two do not perform any initialization and the memory may contain fragmentation. Except for freedom, they can all fail. In this case, they return a null pointer whose access is undefined behavior; In the best case, your program will crash. In the worst case, your program looks like it will work for a while, processing garbage data before crashing.

Such as:

int main(a) {
   char *str = (char *) malloc(7); 
   strcpy(str, "toptal");
   printf("char array = \"%s\" @ %u\n", str, str);

   str = (char *) realloc(str, 11);
   strcat(str, ".com");
   printf("char array = \"%s\" @ %u\n", str, str);

   free(str);

   return(0);
}
Copy the code

Output:

char array = "toptal" @ 2762894960
char array = "toptal.com" @ 2762894960
Copy the code

Although the code is simple, it already contains an anti-pattern and a questionable decision. In real life, you should not write the number of bytes directly, but use sizeof instead. Similarly, we assign the char * array exactly twice the size of the string we need (twice the length of the string to account for null terminations), which is quite an expensive operation. A more complex program might build a larger string buffer, allowing the string size to grow.

1.RAII invention: A New Hope

All manual management is unpleasant, to say the least. In the mid-1980s, Bjarne Stroustrup invented a new paradigm for his new language C ++. He calls it “resource fetching is initialization,” with the basic insight that you can specify that an object has constructors and destructors that are called automatically by the compiler when appropriate, providing a more convenient way to manage the memory of a given object. Yes, and this technique is useful for resources that are not memory.

This means that the above example is more concise in c++ :

int main(a) {
   std::string str = std::string ("toptal");
   std::cout << "string object: " << str << "@" << &str << "\n";

   str += ".com";
   std::cout << "string object: " << str << "@" << &str << "\n";

   return(0);
}
Copy the code

Output:

string object: toptal @ 0x7fffa67b9400
string object: toptal.com @ 0x7fffa67b9400
Copy the code

In the above example, we have no manual memory management! Construct a string, call an overloaded method, and destroy it automatically when the function exits. Unfortunately, the same simplicity can lead to other problems. Let’s look at an example in detail:

vector<string> read_lines_from_file(string &file_name) {
    vector<string> lines;
    string line;

    ifstream file_handle (file_name.c_str());
    while (file_handle.good() && !file_handle.eof()) {
        getline(file_handle, line);
        lines.push_back(line);
    }

    file_handle.close(a);return lines;
}

int main(int argc, char* argv[]) {
    // get file name from the first argument
    string file_name (argv[1]);
    int count = read_lines_from_file(file_name).size(a); cout <<"File " << file_name << " contains " << count << " lines.";

    return 0;
}
Copy the code

Output:

File makefile contains 38 lines.
Copy the code

It seems simple enough. Vector is filled, returned, and called. However, as efficient programmers concerned with performance, there are some problems with this: in the return statement, the vector is copied into a new vector shortly before it is destroyed because of the value semantics.

In modern C ++, this is no longer a strict requirement. C ++ 11 introduces the concept of mobile semantics, where the origin is left in a valid state (so that it can still be destroyed correctly) but no state is specified. The return call is the easiest for the compiler to optimize for semantic movement because it knows that the source will be destroyed shortly before any further access. However, the purpose of this example is to show why people invented a whole bunch of garbage collection languages in the late 80s and early 90s when the C ++ move semantics were not available.

This can become expensive for files with large data volumes. Let’s optimize it to return only one pointer. Some syntax changes have been made, but the rest of the code is the same:

vector<string> * read_lines_from_file(string &file_name) {
    vector<string> * lines;
    string line;

    ifstream file_handle (file_name.c_str());
    while (file_handle.good() && !file_handle.eof()) {
        getline(file_handle, line);
        lines->push_back(line);
    }

    file_handle.close(a);return lines;
}
int main(int argc, char* argv[]) {
    // get file name from the first argument
    string file_name (argv[1]);
    int count = read_lines_from_file(file_name).size(a); cout <<"File " << file_name << " contains " << count << " lines.";

    return 0;
}
Copy the code

Output:

Segmentation fault (core dumped)
Copy the code

Program crash! We just need to allocate memory for the above lines:

vector<string> * lines = new vector<string>;
Copy the code

Now it’s ready to run!

Unfortunately, while this looks perfect, it still has one drawback: it leaks memory. In C++, Pointers to the heap must be removed manually when they are no longer needed; Otherwise, once the last pointer is out of range, the memory becomes unavailable and will not be restored until the operating system manages it at the end of the process. Conventional modern C++ will use unique_ptr here, which implements the desired behavior. It deletes the object to which the pointer points when out of range. However, this behavior did not become part of the language until C++11.

Here, we can use the pre-c ++11 syntax directly, just change the main:

vector<string> * read_lines_from_file(string &file_name) {
    vector<string> * lines = new vector<string>;
    string line;

    ifstream file_handle (file_name.c_str());
    while (file_handle.good() && !file_handle.eof()) {
        getline(file_handle, line);
        lines->push_back(line);
    }

    file_handle.close(a);return lines;
}

int main(int argc, char* argv[]) {
    // get file name from the first argument
    string file_name (argv[1]);
    vector<string> * file_lines = read_lines_from_file(file_name);
    int count = file_lines->size(a);delete file_lines;
    cout << "File " << file_name << " contains " << count << " lines.";

    return 0;
}
Copy the code

Manually allocate and free memory.

Unfortunately, as programs expand beyond the scope described above, it quickly becomes more difficult to reason about when and where Pointers should be removed. When a function returns a pointer, do you now own it? Should you delete it yourself when you’re done, or does it belong to some data structure that will later be released all at once? On the one hand error, memory leak, on the other hand error, you have corrupted the data structure in question and other possible data structures by trying to de-reference Pointers that are no longer valid now.

2. Use the garbage collector

The garbage collector is not a new technology. They were invented by John McCarthy for Lisp in 1959. Garbage collection became mainstream in 1980 with the advent of Smalltalk-80. But the 1990s represented the true germination of the technology: between 1990 and 2000, multiple languages were released, all using garbage collection of one kind or another: Haskell, Python, Lua, Java, JavaScript, Ruby, OCaml, and C# are the most famous.

What is garbage collection? In short, this is a set of techniques for automating manual memory management. It is typically provided as a library for languages with manual memory management (such as C and C ++), but is more commonly used in languages where it is required. The biggest advantage is that the programmer doesn’t need to worry about memory at all. It’s all abstracted. For example, the Python equivalent of our file-reading code above looks like this:

def read_lines_from_file(file_name) :
    lines = []
    with open(file_name) as fp: 
        for line in fp:
            lines.append(line)
    return lines

if __name__ == '__main__':
    import sys
    file_name = sys.argv[1]
    count = len(read_lines_from_file(file_name))
    print("File {} contains {} lines.".format(file_name, count))
Copy the code

The row array appears when it is first assigned and is returned without being copied to the call scope. Since the time is uncertain, it will be cleaned up by the garbage collector at some point beyond the range. Interestingly, RAII for non-memory resources is not an idiomatic language in Python. Allow – We could simply write fp = open (file_name) instead of using the with block and let GC clean it up. But the recommended pattern is to use context managers where possible so that they can be released at certain times.

Although it simplifies memory management, it comes at a significant cost. In reference counting garbage collection, all variable assignments and scope exits receive a small cost to update references. In a marker clearing system, all program execution is paused at unpredictable intervals while the GC clears memory. This is often called a world stop event. An implementation like Python that uses both systems suffers both penalties. These issues reduce the applicability of a garbage collection language in situations where performance is critical or where real-time applications are required. Even on the following toy programs, you can see actual performance degradation:

$ make cpp && time ./c++ makefile
g++ -o c++ c++.cpp
File makefile contains 38Lines. real 0m0.016s user 0m0.000s sys 0m0.015s $time python3 python3.py makefile File Makefile contains38Lines. real 0m0.041s user 0m0.015s sys 0m0.015sCopy the code

The Python version has almost three times as much real-time time as the C ++ version. While not all of this difference can be attributed to garbage collection, it is still significant.

3. Ownership: RAII Awakening

We know that the lifetime of an object is determined by its scope. However, sometimes it is useful, or useful, to create an object that is independent of the scope in which the object was created. In C ++, the operator new is used to create such an object. To destroy an object, use the delete operator. Objects created by the new operator are dynamically allocated, that is, allocated in dynamic memory (also known as heap or free storage). Therefore, objects created by new continue to exist until they are explicitly destroyed using DELETE.

Some errors that can occur when using new and delete are:

Object (or memory) leak: Allocating an object with new and forgetting to delete it.

Premature deletion (or suspended reference) : Holding another pointer to an object, deleting that object while other Pointers are referring to it.

Double delete: Attempts to delete an object twice.

In general, scope variables are preferred. However, RAII can be used as an alternative to new and DELETE to keep objects independent of their scope. This technique involves assigning Pointers to objects allocated on the heap and placing them in a handle/manager object. The latter has a destructor that is responsible for destroying the object. This ensures that the object is available to any function that wants to access it, and that the object is destroyed at the end of the life of the handle object without explicit cleanup.

Examples from the C ++ library that use RAII are STD :: String and STD :: Vector.

Consider this code:

void fn(const std::string& str)
{
    std::vector<char> vec;
    for (auto c : str)
        vec.push_back(c);
    // do something
}
Copy the code

When creating a vector and pushing elements into it, you don’t have to worry about allocating and unallocating memory for such elements. Vector uses new to allocate space for elements on its heap and uses DELETE to free that space. As a vector user, you don’t need to care about implementation details and can be confident that Vector won’t leak. In this case, a vector is a handle object to its element.

Other examples of RAII in the standard library are STD :: shareD_ptr, STD :: unique_ptr, and STD :: lock_guard.

Another name for this technology is SBRM, which stands for Range-bound Resource Management.

Now, let’s modify the above example of reading a file:

#include <iostream>
#include <vector>
#include <cstring>
#include <fstream>
#include <bits/unique_ptr.h>

using namespace std;
unique_ptr<vector<string>> read_lines_from_file(string &file_name) {
    unique_ptr<vector<string>> lines(new vector<string>);
    string line;

    ifstream file_handle (file_name.c_str());
    while (file_handle.good() && !file_handle.eof()) {
        getline(file_handle, line);
        lines->push_back(line);
    }

    file_handle.close(a);return lines;
}
int main(int argc, char* argv[]) {
    // get file name from the first argument
    string file_name (argv[1]);
    int count = read_lines_from_file(file_name).get() - >size(a); cout <<"File " << file_name << " contains " << count << " lines.";

    return 0;
}
Copy the code

4. Only in the end do you realize the true power of RAII.

Manual memory management is a nightmare programmers have been trying to avoid since the invention of compilers. RAII is a promising pattern, but without some weird workaround it simply doesn’t work with heap-allocated objects, and therefore suffers in C ++. Hence the explosion of garbage collection languages in the 1990s, designed to make programmers’ lives more enjoyable, even at the cost of performance.

Finally, RAII concludes as follows:

The resource is freed in the destructor

Instances of this class are stack-allocated

Resources are retrieved in the constructor. RAII stands for resource acquisition is initialization.

Common examples are:

  • File operations
  • Smart Pointers
  • The mutex