Data Structure (3)

This section covers the basic use of hash tables and C++ STL containers

Hash table

What hash tables do: Map a relatively large space to a relatively small space.

In general, taking a prime number as module in hash operation will reduce the probability of conflict

Storage of hash tables

Conflict resolution

  • Open addressing
  • Zipper method

Practice: ACwing-840 simulation hash table

Zipper method

#include<iostream>
#include<cstring>
using namespace std;

const int N = 1e5 + 3;

int h[N]; // Store all nodes' subscripts
int e[N], ne[N], idx; // Storage node value and next pointer

int mod(int x) {
	return (x % N + N) % N;
}

void insert(int x) {
	int k = mod(x);
	e[idx] = x; // Assign a new node
	ne[idx] = h[k]; // Next of this node points to the list header
	h[k] = idx; // New link header
	idx++;
}

bool query(int x) {
	int k = mod(x);
	for(inti = h[k]; i ! =- 1; i = ne[i]) {
		if(e[i] == x) return true;
	}
	return false;
}


int main(a) {
    int n;
	scanf("%d", &n);
	memset(h, - 1.sizeof h); // -1 indicates an empty node
	char op;
	int x;
	while(n--) {
		cin >> op >> x;
		if(op == 'I') {
			insert(x);
		} else if(op == 'Q') {
			if(query(x)) printf("Yes\n");
			else printf("No\n"); }}return 0;
}
Copy the code

Open addressing

#include <cstring>
#include <iostream>

using namespace std;

const int N = 200003, null = 0x3f3f3f3f;

int h[N];

int find(int x)
{
    int t = (x % N + N) % N;
    while(h[t] ! = null && h[t] ! = x) { t ++ ;if (t == N) t = 0;
    }
    return t;
}

int main(a)
{
    memset(h, 0x3f.sizeof h);

    int n;
    scanf("%d", &n);

    while (n -- )
    {
        char op[2];
        int x;
        scanf("%s%d", op, &x);
        if (*op == 'I') h[find(x)] = x;
        else
        {
            if (h[find(x)] == null) puts("No");
            else puts("Yes"); }}return 0;
}
Copy the code

Note that the memset function is set by byte and null is defined as 0x3f3f3f3f.

In particular, if you want to initialize 0 or -1, you can simply set 0 or -1 because the binary representation of 0 is all zeros (00000000) and -1 is all ones (11111111). Setting 1 byte is the same as setting 4 bytes.

String hash

String prefix hash: Evaluate the hash value of each position in the string as a prefix

For example, there is the string s = ABCDE

Is the hash array of the solution

H [1] = the hash of A

H [2] = the hash of AB

H [3] = the hash of ABC

.

How do I hash a string?

Thinking of A string as A p-base number, such as the string ABCD, we map A to 1, B to 2, C to 3, and D to 4. Then ABCD can be regarded as a p-base number 1234. The hash value of the string ABCD is

(1 * P ^ ^ 3 + 2 * P ^ ^ 2 + 3 * P ^ ^ 1 + 4 * P ^ ^ 0) mod Q

It is generally not advisable to map a letter to 0, as this will lead to duplication. If I map A to 0, A is 0, AA is 0, AAA is 0.

When we do string hashing, we don’t care about collisions.

We can take P = 131 or 13331, Q = 2^64^, and there will be no conflicts in 99.99% of cases (according to YXC)

The h array can be typed as unsigned long long (64 bits), so that no modulo 2^64^ is required, and overflow is the same as modulo

What’s the use of solving a string prefix hash?

Can solve the hash of any substring! This is hard to do even with KMP algorithm.

Can be used to quickly determine whether two strings are equal. (Pattern matching requires at least O(n), while string hashing requires only O(1).)

Let’s say we want to solve the hash of [L,R] in string S

We can get the value of h[L-1] first, and the value of h[R]

First shift h[L-1] to the left r-L +1 bit (p-base) to align it with h[R], and then subtract the two to obtain the p-base number represented by the substring of the interval [L,R]

H [R] -h [L-1] × P^R-L+1^

In addition, when calculating the prefix hash value of string S, it is easy to obtain the following recursion

H [I] = h[I - 1] × P + S[I]

Practice: ACwing-841: String hash

#include<iostream>
using namespace std;

const int P = 131, N = 1e5 + 10;

typedef unsigned long long ULL;

// h[N] is used to store string prefix hashes, p[N] is used to store powers of p
ULL h[N], p[N];
int n, m;
char str[N];

ULL get(int l, int r) {
	return h[r] - h[l - 1] * p[r - l + 1];
}

int main(a) {
	scanf("%d%d%s", &n, &m, str + 1);

	p[0] = 1; // p[
	for(int i = 1; i <= n; i++) {
		// Initialize the h array
		p[i] = p[i - 1] * P;
		h[i] = h[i - 1] * P + str[i];
	}
	while(m--) {
		int l1, r1, l2, r2;
		scanf("%d%d%d%d", &l1, &r1, &l2, &r2);
		if(get(l1, r1) == get(l2, r2)) printf("Yes\n");
		else printf("No\n");
	}
	return 0;
}
Copy the code

Other practice topics: Search the keyword “hash” on acwing

STL

The C++ STL library provides many data structures, including some very complex ones.

This section explains some of the data structures commonly used in the C++ STL library, including

  • vector

    Variable-length arrays, the basic idea being multiplication (similar to an ArrayList in Java)

  • pair

    Stores a tuple of any variable type

  • string

    String, common functions substr(), c_str()

  • queue

    Queue, push(), front(), back(), pop()

  • priority_queue

    A priority queue is essentially a heap. Push (), top(), pop()

  • stack

    The stack. Push (), top(), pop()

  • deque

    Double-endian queue. Insertion and deletion can be performed at the head of the queue and at the end of the queue, and random access is supported

  • Set, map, multiset, multimap

    Based on balanced binary tree (red black tree), dynamic maintenance of ordered sequence. These sets/maps support sort-related operations, such as the lower_bound/upper_bound methods. They also support ++ and — iterators, but they add, delete, and check in order (logn) time.

  • Unordered_set, unordered_map, unordered_multiset, unordered_multimap

    Based on a hash table. These sets and maps are similar to the set/map above. However, these unordered sets/maps are O(1) faster and do not support lower_bound() and upper_bound(), nor do they support ++ and — iterators

    If you use unordered_map, you need the header file #include

  • bitset

    Pressure position

vector

#include<iostream>
#include<vector>
using namespace std;

int main(a) {
    vector<int> a; // The simplest way to initialize
    vector<int> a(10); // Define a vector of length 10
    vector<int> a(10.3); // Define a vector of length 10 and initialize each element to 3
    vector<int> a[10]; // Define a vector of size 10
    
    // Vector supports functions
    a.size(); // The number of elements in the vector
    a.empty(); // Whether vector is empty
    // The time complexity of the above two methods is O(1), and all other containers have both methods
    a.clear(); / / to empty
    a.front(); // return the first one
    a.back();
    a.push_back();
    a.pop_back();
    a.begin(); // is the position of the first element
    a.end(); // is the position next to the last element
    // Vector supports random addressing with [], just like arrays
    a[0]; // Take the first element in the vector
    // Vector supports comparisons
    vector<int> a(4.3).b(3.4);
    // a = [3,3,3] b = [4,4,4]
    if(a < b) printf("a < b\n"); // The size is in lexicographical order
    
    // Vector traversal
    vector<int> a;
    for(int i = 0; i < 10; i++) a.push_back(i);
    for(int i = 0; i <a.size(); i++) cout << a[i] << "";
    cout << endl;
    
    for(vector<int>::iterator it = a.begin(); i ! = a.end(); i++)cout << *i << "";
    cout << endl;
    
    // new in C++ 11, for each traversal
    for(auto x : a) cout << x << "";
    cout << endl;
    return 0;
}
Copy the code

Note: The amount of time it takes for an operating system to allocate memory for a program is independent of the amount of space to be allocated. It just depends on the number of assignments. For example, a request to allocate a space of size 1 takes the same amount of time as a request to allocate a space of size 100.

For example, the time it takes to apply for an array of size 1000 is 1000 times the time it takes to apply for an array of size 1 1000 times.

So, if you have a variable length array, you want to minimize the number of requests for space.

So doubling a vector is, roughly, doubling the size of the array every time the array is short, and copying the elements from the old array

If a vector ends with n elements, it allocates space logn times. The number of copies of elements is approximately n. The average time to each element is order one.

pair

The pair is defined in the Utility library and can be used by importing ioStream directly

#include<iostream>
using namespace std;

int main(a) {
    pair<int.string> p;
    p.first; // The first element
    p.second; // The second element
    // Pair also supports comparison, with first as the first keyword and second as the second keyword
    // Create a pair
    p = make_pair(10."hby");
    p = {10."hby"}; // C++ 11 can be initialized like this directly
    // When something has two attributes and needs to be sorted by one attribute,
    // You can place the sorted attribute in FISRT and the other attribute in second
    
    // You can also use a pair to store three attributes, as follows
    pair<int.pair<int.int>> p;
}
Copy the code

string

#include<cstring>
#include<iostream>
using namespace std;

int main(a) {
    string a = "hby";
    a += "haha"; // String concatenation
    a += 'c';
    
    a.size();
    a.length(); // The length can be either way
    
    a.empty();
    a.append("3");
    a.append(10.'3'); // Add 10 3's
    
    a.find('b'); // Returns the index of the character, the first character found from left to right
    
    a.front(); // The first character of the string
    a.back(); // The last character of the string
    a.substr(1.3); // The first argument is the starting position of the subscript, and the second argument is the length
    // Select a substring of length 3 from 1, resulting in byh
    // When the length of the second argument exceeds the length of the string, output to the end of the string
    a.substr(1); // If the second argument is omitted, the substring after subscript 1 is returned
    
    a.c_str(); // Returns the starting address of the string a stored in
    printf("%s\n", a.c_str());
}
Copy the code

queue

#include<iostream>
#include<queue>
using namespace std;

int main(a) {
    queue<int> q;
    q.push(1); // To the end of the line
    q.pop(); // Pop up the header element, notice that void is returned
    q.front(); // return to queue head
    q.back(); // return to the end of the queue
    q.size();
    q.empty();
    Queue has no clear function
    // What if I want to clear a queue?
    q = queue<int> ();// Directly rebuild a queue
}
Copy the code

priority_queue

Priority queue, bottom is a heap

#include<iostream>
#include<queue>
#include<vector>
using namespace std;

int main(a) {
    // The default is large root heap
    priority_queue<int> q;
    // What if you want to define a small root heap?
    // 1. If you want to insert x, insert -x
    // 2. Define as a small root heap, as follows (vector is needed)
    priority_queue<int.vector<int>, greater<int>> heap;
    
    q.push();
    q.top(); // Return the top of the heap element
    q.pop(); // Pop up the top of the heap element
}
Copy the code

stack

#include<iostream>
#include<stack>
using namespace std;

int main(a) {
    stack<int> s;
    s.push(); / / pressure stack
    s.top(); // return to the top of the stack
    s.pop(); // Pop the top of the stack
}
Copy the code

deque

A two-endian queue, or enhanced vector. Support a variety of methods, but the speed is relatively slow, so generally not how to use

#include<iostream>
#include<deque>
using namespace std;

int main(a) {
    deque<int> q;
    q.clear(); / / there are clear
    
    q.front();
    q.back();
    
    q.push_back();
    q.pop_back();
    
    q.push_front();
    q.pop_front();
    // Random addressing is supported
    q[0];
    // Support begin() and end() iterators
}

Copy the code

set

#include<iostream>
#include<set>
using namespace std;

int main(a) {
    set<int> s; // There can be no duplicate elements. If a duplicate element is inserted, the operation is ignored
    multiset<int> ms; // Can have repeating elements
    // Operations supported by set and multiset
    
    insert(1); // Time complexity O(logn)
    find(1); // Find a number. If none exists, return the end iterator
    count(1); // Return the number of a number. Set returns 0 or 1. Multiset may return more than 1
    erase(1); O(k + logn), where k is the number of elementserase(??) ;// If you enter an iterator, only iterators will be deleted
    // set compares the core operation
    lower_bound(x); // Returns the smallest iterator greater than or equal to x.
    upper_bound(x); // Return the smallest iterator greater than x.
    // begin(), end() iterators
}
Copy the code

map

#include<iostream>
#include<map>
using namespace std;

int main(a) {
    insert(); // Insert a pair
    erase(); // The input argument is a pair or iterator
    find();
    lower_bound();
    upper_bound();
    // Map can be used as an array
    // The time complexity of almost all map operations is O(logn), except for size(), empty().
    
    map<string.int> m;
    m["hby"] = 1; // Insert can be done directly
    
    cout << m["hby"] < <endl; / / to find
}
Copy the code

bitset

For example, if you want to open a 1024 length bool array, since the bool type in C++ is 1 byte.

1024 bytes, or 1KB, are required. But when we can actually represent bool in bits, we only need 1024 bits, 128 bytes

Bitsets support all bit operations, as well as shifts

#include<iostream>
using namespace std;

int main(a) {
    bitset<1000> s;
    / / support ~, &, | ^
    // Support >>, <<
    // Support ==,! =
    / / support []
    // Count () returns the number of 1s
    // any() whether there is at least one 1
    // None () specifies whether all zeros are 0
    // set() sets all positions to 1
    // set(k, v) to v
    // reset() changes all positions to 0
    // flip() inverts all positions, equivalent to ~
    // flip(k) invert the KTH bit
}
Copy the code

If you forget the usage of an STL container, look it up at the official address below

Data address: www.cplusplus.com/reference/