Article source: W www.iamshuaidi.com, a programming website specializing in recruiting, interviews, and interviews

Question: I have 4 billion integers. Given a new integer, I need to determine whether the new integer is among the 4 billion integers. What do you do?

【 Consult the great God 】

Small history returned to the school, the situation of the interview and the computer institute of teacher Lu said once.

Small history busy pull lu teacher asked, why I said 8 times to load the data, the interviewer will say too slow?

Teacher Lu: Haha, loading data from disk is disk IO operation, it is very slow, you have to load such a large number of data, 8 times, I guess you can find a number of minutes or even hours.

Xiao Shi: What would you do if it was you?

Mr Lu: Actually, the interviewer has given you an obvious hint. He said that giving you a batch of machines implies that you can use distributed algorithms. You spread the data across eight machines, then you get a new one, all eight machines look for it, and then you aggregate the results.

Xiao Shi: How much faster would that be?

Miss Lu: It should be able to reach the second level. Stevie, you can analyze it for yourself.

Shi: Let me see… Oh, and because each machine can read data into memory at once, there is no need to load data back and forth between comparisons, so you can save on the overhead of loading data! That’s a good idea.

[Better solution]

Teacher Lv: actually this is not the best method, I still have a millisecond level method, do you want to know?

Xiao Shi: Of course, teach me.

Xiao Shi: Oh, right, so I’ll just apply for 4 billion bits, convert the new number into a bit, and then determine whether the bit is a 0 or a 1.

Teacher Lv: Xiao Shi, you should think clearly about the problem. If it is 4 billion bits, which ones are 0 and which ones are 1? If I have a new number, how do I know if it’s in the 4 billion digits?

Four billion bits, four billion numbers, so every bit is one, that’s…

Miss Lu: Actually, if you think about it, the range of 32 bits of int is 2 to the 32 power, about 4.2 billion points. So you can apply 2 to the 32nd ones.

It means I covered the whole integer range. Oh, yeah. So we can do this, 1 is the first bit, 2 is the second bit, 2 to the 32nd is the last bit. Of the 4 billion numbers, the ones that exist are in the corresponding position 1, and the other bits are 0.

Miss Lu: That’s right. What about a new number?

Shi: The new number goes to the corresponding bit. For example, if there is a 1234, look for the 1234 bit. If it is 1, it exists; if it is 0, it does not exist.

Mr Lv: Yes, in that case, how much memory do YOU need?

Shi: let me see, 2 to the 32nd bits equals 2 to the 29th bytes. Wow, only 500MB. That’s a lot of memory savings.

Xiao Shi: How did you come up with such a powerful algorithm?

Lu: Actually, this is a very famous big data algorithm, called bitmap, or Bitmap in English. As the name implies, it is used to represent states with bits, thus saving space. I have a lecture tomorrow on bitmap, so you can listen to it.

[Miss Lu’s Class]

The next day, Lu teacher began to teach, he began to throw out the interview question that Small history met.

Miss Lu: Class, this question is an interview question of BAT company, do you have any ideas?

Voice just fell, egg brother stood up to answer. Egg brother is teacher Lu’s most satisfied pupil, known for active thinking.

Egghead: I think we can do that. First of all, the range of 32-bit ints is 4.2 billion, and some of the 4 billion integers must be continuous. We can first perform an external sort on the data, and then form a data structure with an initial number and a length to represent a contiguous number, for example.

If the numbers are 1, 2, 3, 4, 6, 7… This one, then, can be represented by (1,4) and (6,2), so that successive numbers are represented by two numbers. When a new number comes in, we use dichotomy to find it.

So, the worst-case scenario is 200 million + breakpoints, which is 200 million + structures, 8 bytes each, about 1.6 billion bytes, 1.6GB, can fit in memory.

Miss Lu: Well, very good, not only gave the plan, but also actively analyzed the space and feasibility.

After hearing this, Shi was deeply impressed. There is definitely more than one way to solve the problem. As long as you are willing to think, even if you have not learned the Bitmap algorithm, you can also have other ways to solve the problem.

【 】 after class

After class, xiao Shi found Lu teacher again.

Miss Lu: But your comprehension ability is still very strong. You can understand a lot of things easily, which is not something everyone can do.

Hello, everyone, I am handsome, is also more interview, face classics, algorithms and other core articles, click on my avatar, you will find that met too late, if you feel the article however, also don’t mean you like oh, hee hee

More interview articles:

1. How can I tell if a number is among the 4 billion integers?

2. How to achieve a stack that can get the minimum value?

3. Remember a Shopee interview: optimal solution of minimum stack

4. Why stable sort and unstable sort?

5. How to solve the problem of Huarong Dao by programming?

6. How to find the longest substring in a string?

7. How to count the number of words with specific prefixes in 500W words?

8. How do you find the largest 1,000 numbers in a billion?

9. How to program to get the most year-end bonus?

10. How to program the number of friends?

11. How to design a self-learning backgammon AI?

12. Why does MySQL database use B+ tree to store index?

13. Remember a Bytedance interview: Anamorphic list inversion

14. Remember a hand-tearing algorithm interview: Bytedance’s interviewer quitted me

15. Memorize an Ali written test: one line of code solves the Joseph ring problem

16. Remember an Ali interview: The interview hangs on the design of the LRU cache algorithm

17. Write down a netease written test: the application of prefix and

18. How is sensitive word filtering implemented in the game?

19. How can I find the most frequent integer among 20/40/8 billion integers using only 2GB of memory