Author: Lin Guanhong/The Ghost at my Fingertips

The Denver nuggets: https://juejin.cn/user/1785262612681997

Blog: http://www.cnblogs.com/linguanh/

Making: https://github.com/af913337456/

Tencent cloud column: https://cloud.tencent.com/developer/user/1148436/activities


Just to name a few solutions, there are many practical solutions.

These problems are faced with the following considerations:

  • There’s not enough memory to hold all the numbers.
  • The number of CPU cores in the machine is insufficient.
  • .

The meaning of asking these questions:

If you can answer these questions well, you must integrate all aspects of computer knowledge, from memory to data structures and even to hardware and methods. So far, I give it positioning is, comprehensive consideration of a programmer computer basic ability of the interview question.


One, find what doesn’t repeat

Find the non-repeating integers among the 250 million positive integers.

Idea 1:

Divide and conquer + HashMap(HashMap is not limited to the Java language)

Take 250 million whole numbers, do it in batches, say, 2.5 million batches, 100 batches. Each batch is iterated once, and stored in HashMap

, int1 corresponds to this number,int2 corresponds to the number of occurrences, if not, default is 1. Int2 > 1 = int2; int2 > 1 = int2; The next batch, and so on, comes to 100, and the rest are naturally non-repeating.
,int2>

All right, so let’s calculate the double complexity of this solution,time & space

Time complexity: 250W * 100 rounds + other lots. For multicore machines, thread operations can be initiated.

–> Key + Value: (250W * 4 bytes, 4Byte)/(1024*1024) ~ (Key + 9.5MB) memory

Idea 2:

Bitmap Bitmap method(A bit can only be 0 or 1)

For this problem, we can design every two bits to indicate the occurrence of a number. 00 indicates no occurrence, 01 indicates one occurrence, and 10 indicates multiple occurrence. 250 million positive integers, the first thing we need to know is positive integers, so we don’t have to worry about negative numbers, which is unsigned, and unsigned integers are four bytes.

So let’s take this as an example and start to calculateThe bitmapMemory.

1B = 8b, 4B = 32b, the largest integer it can represent is 2^32-1(no overflow), that is, we need 2^32-1 to 2^32 bits to represent the 250 million numbers. We said that each state has two bits, so it’s going to be 2 to the 32 times 2 bits.

2^32*2 bit, (2^32*2)/(1024*1024*8) = 1GB. Of course, we can also add the idea of divide and conquer, batch processing, do not directly use 1G, haha.

So how do we find this number when we do this? For example, if we read 64, the corresponding bit of 64 is 64*2=128, that is, the 127th and 128th bits indicate the state of its occurrence. And so on. Whenever we read a number, we go to the corresponding bit, first read the value of the bit, and then record, already 01, again, should be changed to 10. Finally we get the result by scanning the entire bitmap and subscript /2 to get the number if it is 10.

Two, find the one that shows up the most

Problem 1: Find the word that appears most frequently in a passage.

The second problem: 1 billion positive integers to find the most repeated 100 integers.

Idea 1:

Divide and conquer + HashMap

Yes, divide-and-conquer + HashMap can be used to solve many Top K problems.

As for question 1, it is actually relatively simple. This question is also the one I asked to write codes on the spot in Tencent’s third round of technical aspects in 2016. We can judge first that the article may be very long or very short, so we should specify a word mark as a word limit for a batch, such as 100 words. Every 100 characters is a batch processing limit, we read 100 first, less than 100 directly read all. When it is read, it is broken up into a string, such as an English text which is separated by Spaces and symbols. Use the split method to break it up. At this point we have an array of strings, String[] array, which we can refer to to find the solution to the non-repeating problem. Each batch is iterated once and stored in HashMap

, where String corresponds to the String of the number,Integer corresponds to the number of occurrences, and the largest is the most. So let’s just give you a Demo function.
,integer>

// LinGuanHong
public static void search(String limitText){
    String maxWord = "";
    int    maxTime = 0;
    String[] words = limitText.split(. | \ \ | ",");
    int length = words.length;
    HashMap<String,Integer> one = new HashMap<>();
    for(int j=0; j<length; j++){ Integer number = one.get(words[j]);if(number ! =null){
            number = number + 1;
            /** find times plus 1 */
            one.put(words[j],number);
            if(maxTime < number){ maxTime = number; maxWord = words[j]; }}else{
            /** if not found, assign 1 */
            one.put(words[j],1);
        }
    }
    System.out.println("maxTime is :"+maxTime+"; maxWord is :"+maxWord);
}
Copy the code

Divide-and-conquer + HashMap

In the previous example, the first thing we’re going to do is divide up the billion numbers again. Let’s say 1,000 copies, 100,000 copies. Then use HashMap

to count. In each count, we can find the largest 100 numbers, so why only find 100 out of 100,000? Because we have 1,000 copies, and the second largest of the other copies is probably the smallest one. So it all adds up to 100 times 1,000 numbers. OK, after we find the 100*1000 candidates, we can continue to divide and conquer, or we can sort them directly, if we sort them directly, we have 10W numbers. The sorting algorithm can choose quicksort and so on, the first 100 are the results.
,int>

Idea 2:

Bitmap Bitmap method

Problem one, skip. Bitmap is not recommended because it is not purely digital.

The second question:

There’s a foundation for finding examples that don’t repeat themselves. The maximum number of positive integers is 2^23-1. For this problem, we don’t need to multiply by 2, so we need 512MB memory. So we can use this bitmap to store everything. If it occurs once, the bit is 1; if it does not, the bit is 0. If it happens multiple times, we can’t add up to the bit, because it’s at most 1. In this case, we will find that if there are more than one occurrence, it will not be able to accumulate the record through the bit. Therefore, this problem is not suitable for bitmap method.

The rest of the

For example, ask: XXXXX find the largest one, the smallest one, the largest several, the smallest several. This can be done using divide-and-conquer + min heap/Max heap seconds.

After yi