Principle and implementation of weight random distributor

Small knowledge, big challenge! This article is participating in the creation activity of “Essential Tips for Programmers”.

This article also participated in the “Digitalstar Project” to win a creative gift package and creative incentive money

Focus on the public account: High-performance architecture exploration. [PDF], you can get free computer classic books

Suppose you have an array, and you want to output an element from that array at random. Simply generate a random number between 0 and the collection length minus 1, and use it as an index in the collection (if it’s an array) to get random entries. The opportunity to select an item is the same for each item in the collection. This is called uniform distribution or uniform distribution.

But what if we don’t want every entry to appear like every other? Suppose we’re building a trivia game, and we want the questions the user got wrong before to come up more often than the ones he or she got right? This is called weighted random distribution, sometimes called weighted random selection, and can be implemented in a variety of ways, such as random selectors.

In reality, many similar requirements, such as nginx, if we need to control the number of requests to the server, then we only need to do the following configuration in nginx.conf:

http { upstream cluster { server a weight=1; server b weight=2; server c weight=4; }... }Copy the code

For another example, in the advertising industry that the author is engaged in, there is a similar product demand, and each advertisement is configured with weight. After the advertisement is recalled, the corresponding advertisement needs to be selected according to the weight configuration.

So, if we need to implement a weight allocator, how do we do it?

The target

Suppose there are several numbers and their weights as follows:

weights = {
        'A': 2,
        'B': 4,
        'C': 3,
        'D': 1
}
Copy the code

That means we want B to appear twice as often as A, and we want A to appear twice as often as D. In other words, if we generate ten random choices, we want two of them to be A, four of them to be B, and so on (this certainly doesn’t happen when there are only ten random choices).

implementation

The extended

One of the simplest solutions is to simply expand our collection so that each item in it appears as many times as its weight. We start with a basic selection set with relevant weights:

weights = {
        'A': 2,
        'B': 4,
        'C': 3,
        'D': 1
}
Copy the code

We create a container that places each element in the same number of times as its associated weight. After this operation, the elements in the container are as follows:

['A', 'A', 'B', 'B', 'B', 'B', 'C', 'C', 'C', 'D']
Copy the code

We can now get our weighted random selection from the list by generating a random number between 0 and the length of the list and using it as an index in the list.

For some requirements, this implementation can basically meet, but if the weight is very large, the scheme may have problems, for example, the weight is 1000W or larger, then it may lead to too much memory, so another implementation scheme is needed.

{'A': 20140, 'C': 29880, 'B': 39986, 'D': 9994}
Copy the code

advantages

The time is order 1, which is the fastest way

It allows updating weights fairly easily and quickly. If we want to reduce the weight of a choice, we simply scan the list and remove as many choices as necessary. Adding weights or adding new options is even easier, because we can add any number of options to the end of the list. It doesn’t matter in what order they appear, that is, order independence

The implementation is simple and easy to understand

disadvantages

For large collections, or for large weight values, this obviously takes up too much memory. One possible way to optimize it would be to find the greatest common divisor, but that would require more processing time and would make updating our weights slower

The complete code implementation is as follows:

struct Item {
char val;
int weight;
};
char select(const std::vector<Item> &items) {
  std::vector<char> v;
  for (auto elem : items) {
    v.insert(v.end(), elem.weight, elem.val);
  }
  
  int idx = rand() % v.size();
  return v[idx];
}
Copy the code

Selection in place (unordered)

Instead of extending the collection as used above, we can keep the collection in its current form and simply simulate the collection’s expansion in a loop. To do this, we must first know the total weight of the set.

int sum = 0;
for (auto elem : items) {
  sum += elem.weight;
}
Copy the code

Then we choose a random value between 0 and the total weight -1. We simulate the extended set we saw in the previous approach by looping through the elements of the set and keeping the fractions of the total we have seen so far. When the value is greater than the random value we chose, we find our random choice.

int rd = rand() % sum; int s = 0; for (auto elem : items) { s += elem.weight; if (s >= rd) { break; }}Copy the code

advantages

Updating our weight set is very easy and fast. Add and delete items; Reduction and inheritance weights: both are equally fast. All we have to do is focus on our total weight and update or recalculate it as we add or remove values or change weights. This method uses as little memory as possible. There is no need to copy the elements

disadvantages

Because of the added computation in the loop, the selection of random values is a little slower. The larger our initial set, the slower this becomes. The chosen complexity is O(n), where n is the number of elements in the set.

The complete code is as follows:

struct Item {
char val;
int weight;
};

char select(const std::vector<Item> &items) {
  int sum = 0;
  for (auto elem : items) {
    sum += elem.weight;
  }
  
  int rd = rand() % sum;
int s = 0;
char res;
for (auto elem : items) {
  s += elem.weight;
  if (s >= rd) {
    res = elem.val;
    break;
  }
}
return res;
}
Copy the code

Local selection (ordered)

In theory, we can speed up our previous in-place algorithm by sorting the set before starting the selection. Since the collection is sorted, we can scan it from the end in reverse order. Since the highest weights will appear at the end of the set, and these weights are most likely to be selected randomly, we can speed up the selection of random numbers from our set. Whether we get a speed boost in practice depends on our initial weight set.

First, we sort the set by weight.

std::sort(items.begin(), item.end(), [](item a, item b){
  return a.weight < b.weight;
};
Copy the code

After sorting, the contents are as follows:

[('D', 1), ('A', 2), ('C', 3), ('B', 4)]
Copy the code

Then make the weight selection, the code is as follows:

int rd = rand() % sum;
int s = sum;
char res;
for (int i = items.size() - 1; i >= 0; --i) {
  res = items[i].val;
  s -= items[i].weight;
  if (s <= rd) {
    break;
  }
}

return res;
Copy the code

advantages

As long as at least one weight of our set is significantly greater than the others, we can improve the speed of unsorted selection.

disadvantages

Performance is not necessarily improved, that is, due to the introduction of sorting, it is possible that the improved performance of the optimization point is offset by the performance degradation caused by sorting

The complete code is as follows:

struct Item {
char val;
int weight;
};
char select(const std::vector<Item> &items) {
  int sum = 0;
  for (auto elem : items) {
    sum += elem.weight;
  }
  int rd = rand() % sum;
int s = sum;
char res;
for (int i = items.size() - 1; i >= 0; --i) {
  res = items[i].val;
  s -= items[i].weight;
  if (s <= rd) {
    break;
  }
}

return res;
}
Copy the code

Line sections of

Map w[I] to the position of the line segment, then enumerate the position on the line segment.

This solution and in place unsorted type, we give the complete code here:

struct Item {
char val;
int weight;
};
char select(const std::vector<Item> &items) {
  int sum = 0;
  std::vector<int> v;
  for (auto elem : items) {
    sum += elem.weight;
    v.push_back(sum);
  }
  int rd = rand() % sum;
  for (int i = 0; i < v.size(); ++i) {
    if (rd <= v[i]) {
      res = items[i].val;
      break;
    }
  }
return res;
}
Copy the code

conclusion

As you can see, there are many ways to perform random weighted selection. Each scheme has its own advantages and disadvantages. If the goal is fast selection, and you have a small number of elements that are not heavily weighted, use an extended scheme. Do not use extensions if you need to reduce memory usage. For simplicity, use extended, in-place (not sorted), or line segments

Principle and implementation of weight random distributor

The target

implementation

The extended

advantages

disadvantages

Selection in place (unordered)

advantages

disadvantages

Local selection (ordered)

advantages

disadvantages

Line sections of

conclusion

Related Posts

Docker container directory migration

Python reads image files base64 encoded in two ways

Deep copy is implemented using serialization and deserialization