In-depth understanding of HashMap (II) theory

Hashmap is the most popular hashmap in the world. It is the most popular hashmap in the world. It is the most popular hashmap in the world.

What exactly does a hash function do? How does it relate to a hash table? What are its main functions and application scenarios?

In simple terms, the hash function is mainly: a binary string through a certain algorithm to get a new binary string. The method of calculation is the hash function. Also called a hash function, the value you get is the hash value

To design a hash function, you need several features: 1. You can’t get the original value by hashing. For example, our passwords are saved in the server after MD5, otherwise the database is stolen and everyone’s password is ruined. Md5 is actually a hash algorithm.

2. For the original value, because any object in the computer is a string of binary values, so even if there is a bit difference, the resulting hash value should be different.

3. After the above two conditions are met, the probability of the best hash conflict should be small and the speed of the algorithm should be fast.

Then the relationship between hash table and hash function is obvious: using the advantage that the time complexity of random access to data of array structure is O (1), we get a key value after the data is calculated by hash algorithm, and the key value is the corresponding position of the array. So if we’re looking for the data and we calculate the key, we can figure out where we want the array to be, the efficiency of the search is o (1).

So the main purpose of hash table is to solve the problem of fast lookup. Its application scenarios also revolve around this function. Here’s a quick example:

1. Load balancing

The simplest load balancing we can think of is to create a table corresponding to the client IP address and server IP address. So every time a client request comes in, we go to this table and look up the corresponding server IP address that should be assigned, and then we send the client request to this server IP address. So obviously this is very bad, the first table will be infinitely large, consumption of storage space, the second table when the query efficiency will become low, the third server expansion after processing is very troublesome.

If we use the hash function to do so, it is much easier. We just need to get a value of the customer IP address through the hash algorithm, and then modulo the number of servers, we can quickly establish the key-value relationship.

More examples include CRC checks for network protocols, p2p download algorithms, and even git commit IDS that use hash functions.

What about collisions with hash functions, and do they have to happen?

In short, no matter how well designed a hash function is, hash conflicts are inevitable. Because our capacity is limited. We can baidu next drawer principle, for example, we have 5 oranges, you only have 4 drawers, then you must have a drawer with 2 oranges.

The same is true for the hash algorithm, because the value produced by our hash algorithm is fixed length, so the number must be limited, for example, the md5 value is 128 bits. Fixed length. If you have more than this length of data hashed through the MD5 algorithm, then you are sure to have at least some duplicates!

How to resolve the collision of hash functions?

There are two main approaches, one is open addressing (ThreadLocalMap in Java) and the other is linked list (HashMap). Among them, the former is not used much now, interested students can learn to see. We’re going to focus on linked lists. The linked list method is to store the data with the same hash value in the linked list in case of hash conflict.

List is known to all, find the complexity is o (n), so you can imagine, if you face too many bad hash conflicts, we o (1) of the hash table lookup efficiency will drop to o (n), hashmap is optimized for the new version of the JDK optimization problem, when the conflict resolution list length is more than 8, It automatically turns into a red-black tree, which is order logn, which is pretty efficient when you look at binary search. There were only about 32 lookups of 4.2 billion data. (We’ll talk about it separately behind the red black tree)

What is the relationship between load factor and dynamic expansion? How to understand

Generally speaking, a larger value of the load factor means that for a hash table, if there are too many elements, the table with a large load factor will have fewer free places, and the probability of hash conflicts will be higher. For most hash tables that use the linked list method to solve hash conflicts, if the probability of hash conflicts is large, the linked list will be too long, so that the search efficiency will be infinitely low.

So when we find that the load factor is too large, we can expand the hash table, like a Hashmap in Java where we double the size, so let’s say the array starts at 16 and then becomes 32.

In the case of array expansion, there’s really nothing to say, you can do it, but hash table expansion also involves recalculating the hash value, so that the position of the data in the expanded hash table may be different from the previous position. This step is called recalculating the hash.

Therefore, dynamic expansion is a time-consuming operation: reapply the array space, reapply the hash value (that is, calculate the position in the array), and finally copy the data from the old array into the new array (the value of the list that resolves the hash conflict may also be moved into the new array).

LinkedHashMap: LinkedHashMap: LinkedHashMap: LinkedHashMap: LinkedHashMap: LinkedHashMap: LinkedHashMap: LinkedHashMap

Nonsense, of course not. The basic storage of hash tables must be arrays, otherwise o(1) query efficiency cannot be achieved. But the biggest difference between LinkedHashMap and regular HashMap is that LinkedHashMap maintains an additional two-way list in addition to an array. Anyone familiar with Android knows that many open source image caching frameworks use LinkedHashMap as their data structure. For example, an image caching framework needs to remove the least recently used images from the cache when the cache reaches MAX. This process is a simple LRU algorithm, which can be easily done with LinkedHashMap.

In a nutshell, the structure of a HashMap looks like this:

Basic storage uses arrays, or singly linked lists if the data has the same hash value. So the basic storage structure of a HashMap is four fields

Hash value — — — — — — — — — — > key — — — — — – > value — — — — — — — > next

The next pointer is used to construct a single linked list if duplicate hash conflicts occur.

LinkedHashMap, in order to implement LRU, also implements an additional set of double-linked lists to ensure that. In other words:

The basic storage of LinkedHashMap is also using arrays, but in addition to using arrays, he also maintains a separate bidirectional linked list, this bidirectional linked list is the whole (array + single linked list is the basic implementation of hash table in Java) to string, and the data structure that implements LRU is bidirectional linked list.

So you can guess that the basic storage structure of LinkedHashMap is

Double linked list before the hands — > the hash value — — — — — — — — — — > key — — — — — – > value — — — — — — — > next — — — – > double linked list after pointer.

What else does a hash table do?

Well, there are a lot of places in the production environment that use HashMap, so you can search for it yourself. Here is a simple leetcode algorithm.

Sum problem of two numbers:

Given an array of integers and a target value, find two numbers that neutralize the target value in the array.

You can assume that there is only one answer for each input and that the same elements cannot be reused.

Example:

Given nums = [2, 7, 11, 15], target = 9

Nums [0] + nums[1] = 2 + 7 = 9

Normally we think of a double loop violent traversal solution, the complexity is very easy O(n2), in fact, hashmap can be very convenient to solve two numbers, three numbers, and even four number summation problems. For two-number summation problems, the complexity of using map is only O (n).

[I] = target; [I] = target; [I];

/** * New a map, the key of the map is the value of the array element, and the value is the location of the array element. Then we iterate through the array using the target value -- the wanted value is the desired value. If it is found in the map, * returns the index of the current array and the value of the wanted value in the map. * * If the wanted value is not found in the map, place the current array elements in the mapreturn
     */
    public int[] twoSum(int[] nums, int target) {
        Map map = new HashMap<Integer, Integer>();
        for (int i = 0; i < nums.length; i++) {
            int wanted = target - nums[i];
            if (map.containsKey(wanted)) {
                return new int[]{i, (int) map.get(wanted)};
            }
            map.put(nums[i], i);
        }
        return null;
    }
Copy the code

In-depth understanding of HashMap (II) theory

What exactly does a hash function do? How does it relate to a hash table? What are its main functions and application scenarios?

What about collisions with hash functions, and do they have to happen?

How to resolve the collision of hash functions?

What is the relationship between load factor and dynamic expansion? How to understand

LinkedHashMap: LinkedHashMap: LinkedHashMap: LinkedHashMap: LinkedHashMap: LinkedHashMap: LinkedHashMap: LinkedHashMap

What else does a hash table do?

Related Posts

06. Java8 functional interface

Java Virtual Machine: Memory allocation policy

Python from white to big – Object-oriented programming