Discretization and its application

This is the first day of my participation in the Gwen Challenge in November. Check out the details: the last Gwen Challenge in 2021

preface

The concept of discretization is simple, but in the specific implementation process, there are some small problems in the implementation details at the beginning of learning, and most of the online understanding and summary are very abstract, without specific code examples to explain. So, here or record some of their own experience.

An overview of the

Discretization is a common technique in programming, which can effectively reduce time complexity. The basic idea is to consider only the values you need in a wide range of possible situations. Discretization can improve an inefficient algorithm or even implement an algorithm that is impossible to implement. To grasp this idea, it is necessary to understand the characteristics of this method from a large number of topics. For example, discretization can be considered when there is not enough space to build a line segment tree.

Above is a good explanation of discretization found on the Internet.

But if you just look at this text, it’s a little abstract to understand the concept of discretization. Now, let’s take a look at a concrete example:

What is discretization?
How should it be discretized?
How should the discretized data be used?

sample

The original link

Suppose you have an infinite number line, and the number on each coordinate is 000.

Now, we first perform NNN operations, each adding CCC to the number at a certain position XXX.

Next, make MMM queries, each containing two integers LLL and RRR, and find the sum of all numbers in the interval [l,rl,rl,r].

Input format

The first line contains two integers, NNN and MMM.

NNN lines follow, each containing two integers XXX and CCC.

Then the MMM line, each line contains two integers LLL and RRR.

The output format

A total of MMM lines, each line outputs a sum of numbers in the interval sought in the query.

Data range

– 109 x 109-10 ^ 9 x or less or less or less 10 ^ 9 or less – 109 x 109 1 n or less, or less or less m 1051 n or less, or less 10 ^ m or less 51 acuities were n, m, 105-109 or less l r 109-10 ^ 9 or less or less or less l r 10 ^ 9 or less or less or less – 109 l r or less or less 109 or less 10000 c or less – 10000-10000 c or less or less 10000-10000 c or less or less 10000 or less

Example Input:

3, 3, 4, 6, 7, 8Copy the code

Example output:

8
0
5
Copy the code

Problem analysis

See n times to add c to some position x, and also ask the sum of a given interval m times. The first reaction is that this is a problem that looks at prefixes and sums.

−109≤x≤109−10^9≤x≤10^9−109≤x≤109…… Boy, not to mention whether we could open an array that big, but even if we could open an array that big, the O(n) time complexity of preprocessing would be unbearable.

And if you look at the actual amount of data that we need to process, n, MN, MN,m, they’re all on the order of 10510^5105, which is much smaller than the order of 10910^9109, so we just need to focus on the order of 10510^5105. In this case, we can use discretization.

The following example data is used to illustrate the specific process:

So the first thing to make clear is that what we’re discretizing in this case is the coordinates of the number line. Therefore, for the input position X and the corresponding value C, it is only necessary to put the number line subscript into the array f[n]f[n]f[n], where F [n]f[n]f[n] is the discretized remapped array.

Note that the array A in the figure above is not a real array, but is written as an array for demonstration purposes. When implemented, you can create a tuple like a struct to store the coordinate X and the corresponding number C.

Similarly, insert all the number line coordinates to f[n] F [n]f[n] in input order to form the f[n] F [n]f[n] array in the figure above. Lx and Rx appear in the figure, representing the left and right boundaries of the x-th interval.

So this is the first step of discretization.

You can see that this new array, discretized, has multiple identical, unordered values, which is definitely not going to work. So the second step is to sort f[n]f[n]f[n] and remove the weight to form the sequence as shown below.

Ok, now all the discrete points are “close” to each other. But the “contiguous” array is just a subscript, not an actual value. So we need to create an array S[n]S[n]S[n] as large as the discretized array F [n]f[n]f[n] f[n] to store the values of the subscript before discretization.

Finally, in the third step, we can find the new coordinates of the original subscript after discretization through a find function, and assign a value. (Find is implemented using binary lookup.)

The process is shown as follows:

At this point, the discretization is complete. The discretized array is prefixed and treated as an ordinary array.

code

#include<iostream>
#include<algorithm>
#include<vector>
using namespace std;
typedef pair<int.int> PII;
const int N = 300010;
vector<PII> a,query;
vector<int> f;
int s[N];


int find(int x){
    return lower_bound(f.begin(),f.end(),x) - f.begin(a); }int main (a){
    int n,m;
    cin>>n>>m;
    
    // Step 1: Put the index to be discretized into the new array
    for(int i=0; i<n; i++){int x,c;
        cin>>x>>c;
        // Discretize the position x
        f.push_back(x);
        a.push_back({x,c});
    }
    for(int i=0; i<m; i++){int l,r;
        // Discretize positions L,r
        cin>>l>>r;
        query.push_back({l,r});
        f.push_back(l);
        f.push_back(r);
    }
    
    // Step 2: Sort, de-weight
    f.push_back(-2e9);
    sort(f.begin(),f.end());
    f.erase(unique(f.begin(),f.end()),f.end());/ / to heavy
    
    // Step 3: Array assignment
    for(int i=0; i<n; i++)s[find(a[i].first)]+=a[i].second;
    n = f.size(a);// Common prefix and processing
    for(int i = 1; i<n; i++)s[i]+=s[i- 1];
    for(int i=0; i<m; i++){ cout<< s[find(query[i].second)] - s[find(query[i].first)- 1]<<endl;
    }
    return 0;
}
Copy the code

summary

In this case:

In the first step, the time complexity of the initial discretization subscript value is O(n).
In the second step, the time complexity of sorting and deduplication is O(nlogn).
The third step is to assign the new value S[n]S[n]S[n] with the time complexity O(nlogn).

So the complexity of discretization of data is O(nlogn).

Only see such a question may not be familiar enough, can take the following simple practice hand:

Movie Acwing105.

The resources

Why is data discretized? (Benefits of decentralization)

AcWing 802. Draw a picture to help you understand