C++ tutorial: STD ::vector

This article is the translation of the original address

Some terms from the text:

  • Short for ctor constructor, referring to the constructor of a class

This tutorial is designed to help beginners and intermediate C++ programmers better understand the standard template classes in the C++ language.

Using Vectors

The last vote for the C++ standard was on November 14, 1997, a long time ago. However, some important parts of the Standard, especially the Standard Library, are still not popular among C++ users. Regular readers of CodeGuru’s C++ forums may have noticed that many of the questions and answers are still hand-coded, but they can be solved elegantly using the standard library.

A common problem is the use of C-style arrays, which are riddled with problems and drawbacks. People seem to be afraid of the standard Vector and its cousin, deque. One reason may be that the standard library documentation is mostly omitted and difficult to understand.

In this article, I’ll discuss vectors in C++ and try to explain them in a way that’s easier to understand and grasp. This article is not perfect; its purpose is to give you a head start when using Vector and to help you avoid the most common pitfalls. We will start small and not try to approach the topic very academically.

Introduction of the vector

Vector is a template class that is a perfect substitute for old C arrays. It allows you to use the same native syntax as normal arrays, in addition to providing a set of services that free C++ programmers from the task of dealing with allocated memory and help to operate consistently on contained elements.

The first step in using vector is to import the corresponding header file

#include <vector>
Copy the code

Note that the header file name does not have any extension, as all standard library header files do. In addition, all standard libraries are contained in the STD namespace. You must prefix the name with * STD ::* to use them.

std::vector<int> v; // Declare a vector of type INTEGERCopy the code

For small projects, you can scope the entire namespace STD by inserting a using directive at the top of the.cpp file.

#include <vector> using namespace std; / /... vector<int> v; // The STD :: prefix is no longer requiredCopy the code

This is fine for small projects, as long as you write the using directive in your.cpp file. But never write using directives in header files! This directive is applied to every CPP file that contains the header. For larger projects, it is best to explicitly qualify (prefix) each name accordingly. I don’t like this form of ellipsis. In this article, I’ll qualify each name accordingly. Where appropriate, I’ll introduce some typedefs in the example to improve readability.

What is the STD: : vector

What is STD ::vector V; ? It is a template class that wraps an array of Ts. In this widely used symbol, the ‘T’ stands for any data type, as well as built-in or user-defined classes. This vector stores Ts in a contiguous memory area and lets you access each Ts via v[0], *v[1]*, and so on, just like a C-style array.

Note that writing explicit types of vectors repeatedly can be tedious for larger projects. You can use a typedef if you like.

typedef std::vector<int> int_vec_t; // Type name custom //... int_vec_t v;Copy the code

Note: Do not use macros!

#define int_vec_t std::vector<int> ; / / ❌Copy the code

First, let’s see what Vector can do for us. Let’s start small, taking an array of integers as an example. If you are using a normal array, you can prepare a static or dynamic array.

size_t size = 10;
int sarray[10];
int *darray = new int[size];
// do something with them:
for(int i=0; i<10; ++i){
    sarray[i] = i;
    darray[i] = i;
}
// Delete darray when you are finished
delete [] darray;
Copy the code

Use vector to do the same thing

#include <vector>
/ /...
size_t size = 10;
std::vector<int> array(size);    // Apply for a memory region containing 10 interger types and initialize it to 0
                                
// do something with them:
for(int i=0; i<size; ++i){
    array[i] = i;
}
// No longer need to delete the array
Copy the code

As you can see, vector combines the advantages of static and dynamic arrays. Like a dynamic array, vector can provide a dynamic size argument, and like a static array, it can automatically delete used memory.

The standard vector defines the operator *[] to preserve the syntax of a native array. For performance reasons, []* does not check whether the index is valid. Similar to C-style arrays, using an invalid index can cause access errors in most cases.

In addition to *[], vector also defines a member function at(). This function does the same thing as [], but checks the index. If the index is invalid, it throws an instance of the STD ::out_of_range* error class.

std::vector<int&gt array;
try{
    array.at(1000) = 0;
}
catch(std::out_of_range o){
    std::cout<<o.what()<<std::endl;
}
Copy the code

Depending on your implementation of the C++ standard library, the above snippet will print more or less explicit error messages. STLPort prints the word “cector “and the Dinkumware implementation that comes with Visual C++ prints” invalid vector subscript “. Other implementations might print other things.

Note that vector is a standard container. List data can also be accessed using iterators. More on iterators later in this article. Now, keep it simple!

What if you don’t know how many elements you will have (as shown in the code below)? If you use C-style arrays to store elements, you either need to implement logic that allows developers to grow your array from time to time; Or you’re going to allocate an array that’s “big enough”. The latter is a more poking method, while the former can give you a headache. Vector does not.

#include <vector> #include <iostream> //... std::vector<char> array; char c = 0; while(c ! = 'x'){ std::cin>>c; array.push_back(c); }Copy the code

In the previous example, push_back() adds one element at a time to the array. That’s what we want, but it has a little catch. To understand this, you must know that a vector has a so-called controlled sequence and a chunk of storage allocated to that sequence. A controlled sequence is just another name for an array in a vector. To hold this array, vector allocates some memory, usually more than it needs. You can push_back() elements until the allocated memory is used up. Vector then triggers a reallocation and increases the allocated chunk. This may mean that it will have to move (i.e., copy) controlled sequences into a larger block. Copying too many elements can slow down your application dramatically. Note that the reallocation is absolutely transparent to you (unless the move fails — memory is out of bounds). Normally, you don’t need to do anything; Vector does everything. Of course, there are a few things you can do to prevent vector from redistributing storage frequently. Read on.

In the previous example, we declared a vector using its default constructor. This creates an empty vector. Depending on the standard library implementation used, an empty vector may or may not allocate some memory “for a rainy day”. If we want to avoid reallocating vector’s storage too often, we can use its *reserve()* member function.

#include <vector> #include <iostream> //... std::vector<char> array; array.reserve(10); Char c = 0; while(c ! = 'x'){ std::cin>>c; array.push_back(c); }Copy the code

Of course, the arguments we pass to reserve() depend on context. In this case, the function reserve() will ensure that we have space for at least 10 elements. If the vector already has room for the required number of elements, *reserve()* does nothing. In other words, reserve() increases the amount of storage allocated to vector if necessary, but never shrinks it.

By the way, the following two code snippets are not the same thing.

// snip 1:
std::vector<int> v(10);
// snip 2:
std::vector<int> v;
v.reserve(10);
Copy the code

The first fragment defines a vector of 10 integers and initializes them with their default value (0). If we don’t have integers, but some user-defined classes, the vector will call the default CTOR (constructor) 10 times and contain 10 objects that are easy to build. The second fragment defines an empty vector and tells it to make room for 10 integers. The vector will allocate enough memory to hold at least 10 integers, but it will not initialize the memory. If we have no integers, but some user-defined classes, the second fragment will not construct any instances of that class.

To find out how many elements a vector can currently hold in its allocated storage, use the capacity() member function. To find out how many elements the vector currently contains, use the *size()* member function.

#include <vector>
#include <iostream>
//...
std::vector<int> array;
int i = 999;          // some integer value
array.reserve(10);    // make room for 10 elements
array.push_back(i);
std::cout<<array.capacity()<<std::endl;
std::cout<<array.size()<<std::endl;
Copy the code

Print the following:

10
1
Copy the code

This means that the number of elements that can be added to the vector without triggering redistribution is always capacity()-size().

Note that for the previous example, only 0 is a valid index to array. This is because, although we have reserved space for at least 10 elements with reserve(), memory is not initialized. Since int is a built-in type, it is actually possible to write all 10 elements with *[], but we would have a vector* in an inconsistent state because size() would still return 1. Also, if we try to access anything other than the first element with array.at(), we throw an error ** STD ::out_of_range**. At first glance, this may seem inconvenient, but if you think about it, you’ll see why it is. If the vector contains objects of a user-defined class, *reserve()* does not call any CTOR. Accessing an unconstructed object yields undefined results, which should not be done under any circumstances. Finally, keep in mind that * Reserve () serves to minimize the amount of potential memory reallocation and does not affect the number of elements in the controlled sequence. It is beneficial to call reserve()* with arguments less than the current capacity(); it simply does nothing.

The correct way to increase the number of contained elements is to call resize(), a member of the vector. The member function *resize()* has the following properties.

  • If the new size is greater thanvectorThe original size, which will retain existing elements; The remaining elements are initialized according to the second argument. If the new size is smaller than the old size, the elements are kept for the new size and the rest are discarded.
  • If the new size is greater thancapacity()It reallocates the storage space to accommodate all the elements. Note:resize()Will not be closedcapacity().

Example:

std::vector<int> array; // Create an empty vector array.reserve(3); // In this case, capacity() is 3 // size() is 0 array.push_back(999); // Insert the element array.resize(5); // The vector contains the following elements: 999, 0, 0, 0, 0 array.push_back(333); // The elements in the vector include: 999, 0, 0, 0, 0, 333 array.reserve(1); // The elements in the vector include: 999, 0, 0, 0, 333 array.reserve(1); // do nothing because capacity() > 1 array.resize(3); // Capacity () is still 6 // size() is 3 array.resize(6, 1); // Resize again to fill the remaining space with 1 // At this point, the elements in cector are: 999, 0, 0, 1, 1, 1Copy the code

Another way to expand the number of control elements is to use push_back(). In some cases, this may be more efficient than calling resize() and then writing the element. Let’s take a closer look at the internal structure of a vector and look at the following example.

class X { public: X():val_(0){} X(int val):val_(val){} int get(){return val_; } void set(int val){val_=val; } private: int val_; }; / /... std::vector<X> ax; Version 1: ax.resize(10); // Create an empty vector to hold objects of type X. // resize for(int i=0; i<10; ++i){ ax[i].set(i); // assign value} //... // version 2: ax.reserve(10); For (int I =0; i<10; ++i){ ax.push_back(X(i)); // Insert elements using the constructor of X}Copy the code

These two versions are equivalent, and they produce the same result. In both cases, we start with an empty vector. In the first version, we used resize() to increase the size of the controlled sequence to 10 elements. When resize() is complete, our vector will have 10 valid objects of type X, all of which will have val_ == 0, because that’s what the default CTOR for X does. We then select each X in the sequence and change its val_ with X::set().

In the second version, we call *reserve() to make room for 10 elements. Vector reallocates its storage and does nothing else, nor does it build any elements. In the second step, we use the x-th ctor to create 10 objects of type X, assign them directly, and push_back()* into the vector.

Which method is more efficient? This may also depend on the library implementation, but the second version may perform better because it does not call X::set() for each element.

Now that we’ve seen how to declare a vector and how to populate it, let’s look at how to manipulate it. We’ll start with an analogy with C-style arrays to look for best practices.

There are two ways to access C-style arrays: either use the subscript operator or use Pointers. In addition, passing a C-style array to a function means passing a pointer to the first element. Can we do the same thing with vector? The answer is yes. Let me give you an example.

#include <iostream> double mean(double *array, size_t n) { double m=0; for(size_t i=0; i<n; ++i){ m += array[i]; } return m/n; } int main() { double a[] = {1, 2, 3, 4, 5}; std::cout<<mean(a, 5)<<std::endl; Return 0; }Copy the code

When mean(a, 5) is called, the first argument is actually the address of the first element in the array *&a[0]. We know that a vector needs to place its elements sequentially in a contiguous block of memory. This means that we simply pass the address of the first element in the vector to mean()* and it will work.

int main() { std::vector<double> a; a.push_back(1); a.push_back(2); a.push_back(3); a.push_back(4); a.push_back(5); std::cout<<mean(&a[0], 5)<<std::endl; Return 0; }Copy the code

That’s nice, but it’s different. While we can initialize c-style arrays directly, we must push_back() elements into a vector. Can we do better? Well, yes. Instead of using an initializer list directly for vector, we use an intermediate array.

double p[] = {1, 2, 3, 4, 5};
std::vector<double> a(p, p+5);
Copy the code

Here we use another constructor provided by vector. It takes two arguments: a pointer to the first element of the C-array and a pointer to the element after the last element of the array. It initializes the vector with a copy of each element in the array. There are two things to note: first, the array is copied, which has nothing to do with the vector we created, and second, we provide ranges from the first element to the last element in the array.

It is crucial to understand this second point when working with vector or any other standard container. Controlled sequences are always represented by **[first, one-past-last]** — this is true not only of Ctors, but also of other functions that operate on ranges of elements.

When getting the addresses of the elements contained in a vector, there are a few things to be aware of: internal reallocation of a vector invalidates Pointers you hold to its elements.

std::vector<int> v(5); int *pi = &v[3]; v.push_back(999); // <-- may trigger memory reallocation * PI = 333; // <-- may cause an error, the pointer PI is no longer validCopy the code

In the previous example, we took the address of the fourth element of the vector and stored it in PI. We then push_back() another element into the vector. Then we try PI. If the store is insufficient to hold additional elements, push_back() may trigger a reallocation of memory. PI will then point to a memory address that has just been removed, and the result of using it is undefined. The bad news is that vector may or may not reallocate memory — it is generally impossible to tell. The solution is not to use Pointers that might already be invalid, or to ensure that the vector is not redistributed. The latter means judicious use of reserve() so that vector handles memory allocation at a given time.

Of the member functions we’ve seen so far, only push_back() and resize() can invalidate Pointers to vectors. In addition, there are other member functions that can invalidate Pointers, which we will discuss later in this tutorial.

Note that neither the subscript operator nor the member function at() invalidates Pointers to vectors.

Speaking of Pointers to vectors, we can introduce a standard concept here: iterators. Iterators are the library’s way of establishing a common interface for all containers — vectors, lists, sets, deques, and so on. Iterators occur because operations that are “native” to one container (such as subscripts on a vector) don’t make sense to other containers. The library needs a common way to apply iteration, find, sort, and so on to all containers. This general approach is called an iterator.

An iterator is a handle to the contained element. If you want, you can find an exact definition in your favorite textbook. What matters is that if you have an iterator, you can dereference the element to which it “points”. (For vectors, the most natural implementation of an iterator is indeed a plain pointer, but don’t count on that.) Let’s take a look at iterators with a small example.

#include <vector> #include <iostream> int main() { std::vector<double> a; std::vector<double>::const_iterator i; a.push_back(1); a.push_back(2); a.push_back(3); a.push_back(4); a.push_back(5); for(i=a.begin(); i! =a.end(); ++i){ std::cout<<(*i)<<std::endl; } return 0; }Copy the code

Let’s look at the program step by step:

std::vector<double>::const_iterator i;
Copy the code

This declares a const iterator i for a vector. We are using a const iterator because we do not intend to modify the contents of the vector.

Here we declare a double const_iterator named I. Constants are used because they are not changed.

. i=a.begin(); .Copy the code

The member function begin() returns an iterator to the first element in the sequence.

. i! =a.end(); .Copy the code

The member function end() returns an iterator that refers to the last element in the sequence. Note that it is illegal to dereference an iterator returned by *end()*, resulting in undefined results.

. ++iCopy the code

You can access elements one by one through incremental iterators.

Note that the same program, using Pointers instead of iterators, has a similar code structure.

#include <vector> #include <iostream> int main() { std::vector<double> a; const double *p; a.push_back(1); a.push_back(2); a.push_back(3); a.push_back(4); a.push_back(5); for(p=&a[0]; p! =&a[0]+5; ++p){ std::cout<<(*p)<<std::endl; } return 0; }Copy the code

So why use iterators when we can do the same thing in the same way with Pointers? The answer is that if we want to apply some standard algorithm on a vector, such as sorting, we must use iterators. The library does not implement the algorithm as a member function of various containers, but as a free template function that can operate on many containers.

The combination of standard containers (and vectors in particular) and standard algorithms is a very powerful tool in general; Unfortunately, it is often overlooked by programmers. By using it, you can avoid a lot of hand-crafted, error-prone code, and it enables you to write compact, portable, and maintainable programs.

Let’s look at the member functions provided by vector.

The constructor

The C++ standard provides a complete set of C++ constructors, C++ destructors, and copy operators. Let’s look at them using a vector of standard strings as an example.

typedef std::vector<std::string> str_vec_t; str_vec_t v1; // Create an empty vector str_vec_t v2(10); // 10 copies of empty strings str_vec_t v3(10, "hello"); // 10 copies of the string "hello" str_vec_t v4(v3); V3 STD ::list< STD ::string> sl; // Create a list of strings and populate sl.push_back("cat"); sl.push_back("dog"); sl.push_back("mouse"); str_vec_t v5(sl.begin(), sl.end()); // Copy elements from s1 to v5 where v1 = v5; // Copy all elements from v5 to v1Copy the code

assign()

The *assign()* function reinitializes the vector. We can use the [first, last] iterator to pass a valid range of elements, or we can specify the number and value of elements to be created.

v1.assign(sl.begin(), sl.end()); // Copy values from list to vector v1.assign(3, "hello"); // Initialize with three "hello" elementsCopy the code

Assign () completely changes the elements of vector. The old elements (if any) are discarded, and the size of the vector is set to the number of allocated elements. Of course, *assign()* may trigger memory reallocation.

Stack,

We’ve already seen the *push_back() function. It adds an element to the end of the controlled sequence. There is a corresponding function, pop_back(), that removes the last element in a controlled sequence. The deleted element becomes invalid, and size()* is decremented. Note that *pop_back()* does not return the value of the element that was removed. You have to check it before popping it out. The reason for this is to ensure safety. Popping elements on an empty vector produces undefined results.

std::vector<int> v;
v.push_back(999);
v.pop_back();
Copy the code

Note that *pop_back() does not affect the output of capacity()*.

Predefined iterators

We have already seen the iterators begin() and end(). They refer, respectively, to the first and last element in a controlled sequence. There are also rbegin() and rend(), which point to the first and last element of the reversed sequence, respectively. Note that rbegin() and rend() both return the type reverse_iterator (or a const version of a const_reverse_iterator) — unlike iterator, which is const_iterator, respectively. To get a “normal” iterator from a reverse iterator, use the *base()* member function of the reverse_iterator.

std::vector<int> v; v.push_back(999); std::vector<int>::reverse_iterator r = v.rbegin(); std::vector<int>::iterator i = r.base(); // points to the last element in the listCopy the code

Access to the elements

We’ve seen the subscript operator [], which provides unchecked access, and the member function at(), which throws an object of type STD ::out_of_range if the passed index is invalid. There are also two member functions, front() and back(), which return a reference to the first and last element in the controlled sequence, respectively. Note that they do not return iterators.

std::vector<int> v; v.push_back(999); // fill up the vector //... // the following expression is equivalent: int I = v.front(); int i = v[0]; int i = v.at(0); int i = *(v.begin()); Int j = v.ack (); int j = v[v.size()-1]; int j = v.at(v.size()-1); int j = *(v.end()-1);Copy the code

Note that we cannot write **(-v.nd ()), because v.nd ()* is not an lvalue.

The list of operations

Vector provides list native operations that most containers basically implement to insert or remove elements from a controlled sequence. Let’s look at some examples.

#include <vector> #include <iostream> int main() { std::vector<int> q; q.push_back(10); q.push_back(11); q.push_back(12); std::vector<int> v; for(int i=0; i<5; ++i){ v.push_back(i); STD ::vector<int>::iterator it = v.begin() + 1; // Insert 33 it = v.insert(it, 33) before the second element; // v contains elements: 0 33 1 2 3 4 // it refers to the inserted element // Insert all elements in q before the second element v.insert(it, q.bijn (), q.edd ()); Iterator 'it' iterator = v.begin() + 3; // Insert three (it, 3, -1) before the fourth element. // iterator 'it' = v.begin() + 4; // iterator 'it' = v.begin() + 4; v.erase(it); // v contains 0 10 11-1-1 12 33 12 3 4 // iterator 'it' is invalid // remove the second to fifth element in v it = v.begin() + 1; v.erase(it, it + 4); // v contains 0 12 33 12 3 4 // iterator 'it' is invalid // empty v.clear(); return 0; }Copy the code

Note that insert() and erase() can both invalidate iterators you hold. The first version of insert() returns an iterator to the inserted element. The other two versions return void. Inserting elements may trigger a reallocation. In this case, all iterators in the container are invalidated. If no reallocation occurs (for example, calling reserve() before insertion), only the iterators between the insertion point and the end of the sequence are invalidated.

Deleting elements does not trigger reallocation and does not affect Capacity (). However, all iterators between the first element to be deleted and the end of the sequence are invalidated.

Call *clear()* to remove all elements from the controlled sequence. However, the allocated memory is not freed. Of course, all iterators are invalidated.

Note that neither insert() nor erase() is efficient for vectors. Their time is order n plus. If your application uses insert and delete a lot, vector may not be your best container choice.

Comparison operation

You can use the operator *==,! = and < compare the contents of two vectors (an internal element-by-element comparison). If two vectors have the same size() and the elements in the corresponding positions are equal, they are equal. Note that the capacity() of two equal vectors need not be the same. Additionally, when comparing vectors with the <* operator, the sorting is alphabetical.

std::vector<int> v1, v2; / /... if(v1 == v2) ...Copy the code

Swap the contents of Vectors

Sometimes, it is useful to be able to swap the contents of two vectors. A common use is to free up memory held by a vector. As we know, deleting an element or cleaning up a vector does not affect its allocated memory. To free up memory, we need a little trick.

std::vector<int> v; / /... v.clear(); v.swap(std::vector<int>(v));Copy the code

Normally, a vector simply swaps its memory. In the previous example, we created a temporary vector by using copy ctor and swapping its contents with v (). The temporary object will receive all memory held by V, and V will receive all memory held by the temporary object — maybe none. The end result is that the temporarily created vector is destroyed at the end of the statement and the memory held by V is freed.

The vector class template has a second default template parameter.

template<class T, class A = allocator<T> >
        class vector ...
Copy the code

Allocator is a class that provides functions that the container uses to allocate and remove memory for its elements. In this tutorial, we have made a consistent assumption: that we have a default Allocator. As a final note, if two allocators are the same, swap() will execute in constant time (that is, simply swap the internal structure of the two vectors). Most of the time.

In this tutorial, we touched the surface of the standard library and learned about STD :: Vector. In the next tutorial, we’ll cover more advanced topics related to vectors, apply standard algorithms to each of them, and discuss design decisions such as when to store objects and when to store Pointers to objects in vectors. In addition, we will introduce a brother of Vector, STD :: Deque.