### The introduction

This section starts with a headline interview question: how to design hash functions and how to resolve conflicts. It is explained step by step from the following aspects:

- What is a hash table?
- What is a hash function?
- What are the common hash functions?
- How can conflicts be resolved?
- Dynamic expansion of hash tables
- answer
- + interview questions

### Hash table Hash table Hash table

Different from the linear table we introduced before, all the data is stored in order. When we need to search for a certain data in the linear table, when the linear table is too long, the data to be searched is sorted later, it takes a lot of time, resulting in poor search performance.

For example, if you want to find a certain student through the student id, suppose there are N students, do you want to find one by one, this time complexity is O(n), low efficiency. Of course, you can also use binary search algorithm, which is O(logn) time, is there a more efficient solution?

We know that the subscript search time of the array is O(1), if we store the student number in the array, it will be much easier, we can directly find the corresponding student by subscript (key).

But in everyday life, key is usually given a specific meaning, using 0,1,2… It’s too easy. Student id number usually needs to add information such as grade and class. Student ID number 010121 stands for Grade 1, Class 1, and Number 21. We know the student number of a certain student can directly find the corresponding student.

That’s the hash! Through the given student number, access a conversion algorithm (the method of converting student number 010121 to Grade 1, Class 1, No. 21), get the address of the corresponding student (Grade 1, Class 1, No. 21).

The conversion algorithm is called the Hash function (Hash function). The given key is called the keyword, and the value of the keyword calculated by the Hash function is called the Hash value (Hash value). The retrieval value is obtained by hashing the value into the ** Hash table **.

The diagram below:

In other words, the hash function is given a key value and returns the address of the value in the table.

```
/ / a hash table
function HashTable() {
let table = []
this.put = function(key, value) {}
this.get = function(key) {}
this.remove = function(key) {}}Copy the code
```

Continue to look at the above student number example, each student corresponds to a student number, if there are more students, if there are 10W, then we need to store there

- 10W Student numbers. Each student number contains 6 decimal digits. Each decimal number is represented by 4 bits (1 byte =8 bits).
- The hash function
- 10W student information

This requires more than 100000 * 6/2/1024 = 292.97 KB storage capacity for the storage of each student’s student number, so, hash table is a space for time storage structure, is a more common way to improve the efficiency of the algorithm, but the required space is too large will also make people headache, So it’s often a trade-off between the two.

### Second, hash function

Here, it is important to understand that the principles for constructing hash functions are as follows:

- The hash value is non-negative: common student numbers and memory addressing require that the hash value cannot be negative
- If the key value is the same, the hash value calculated by the hash function must be the same
- The hash value calculated by the hash function may not be different depending on the key value

Recently, the school plans to build a library for students’ self-study. If each student is provided with a separate small desk, it will need 10W, which is obviously impossible. So, is there any way to obtain the search efficiency of O(1) without paying too much space cost?

The hash function provides this solution, 10W is many, but if we divide by 100 nan, we get 0 to 999, which is easy to find, and we don’t need that many tables.

The corresponding hash function is:

```
function hashTables(key) {
return Math.floor(key / 100)}Copy the code
```

But in the actual development application, the scene can not be so simple, there may be many ways to implement the hash function, for example, the hash function can also be:

```
function hashTables(key) {
return key >= 1000 ? 999 : key
}
Copy the code
```

This implemented hash function has a much higher probability of collisions than the last one at table 999, so choosing a hash function that performs well is critical

####1. Create better hash functions

A well-behaved hash function can dramatically improve the performance of our code, with faster lookup, insert, delete, etc., fewer collisions, and less storage space. So building a high-performance hash function is critical.

A good hash function has the following basic requirements:

- Easy to compute: It should be easy to compute and not be the algorithm itself.
- Uniform distribution: It should provide uniform distribution in the hash table and should not lead to clustering.
- Less conflict: Conflicts occur when pairs of elements are mapped to the same hash value. These should be avoided.

#### 2. Common hash functions

- Direct addressing: Take the value of a keyword or a linear function of the keyword as a hash address.
- Numerical analysis: Through the analysis of the data, find the less conflicting parts of the data, and construct hash addresses. For example, the student id of students, usually of the same class, the first part of which does not differ much, so the last part is used to construct the hash address.
- Square middle method: When it is impossible to determine which bits of the keyword are relatively evenly distributed, you can first calculate the square value of the keyword, and then take the middle bits of the square value as the hash address. This is because the squared middle bits are related to each of the keywords, so different keywords have a high probability of producing different hash addresses.
- Random number method: a random function that takes the random value of the keyword as the hash address. This method is usually used when the keyword length is different.
- The remainder p is the hash address after the keyword is divided by a number m not larger than the table length N of the hash table. This method can also be used after using other methods. The choice of m is very important for this function, so you can either take a prime number or you can just use n.

Note: No matter how robust a hash function is, conflicts are bound to occur. Therefore, in order to maintain hash table performance, it is important to manage conflicts through various conflict resolution techniques.

For example, there was a problem in the last meeting. Students with student numbers 011111 and 021111 were both 111 after dividing by 100, which caused a conflict.

### Conflict resolution

In hashes, collisions are inevitable. So how do you resolve conflict?

There are several common methods of conflict resolution:

- Open address (also known as open addressing) : When we need to store a value, we hash the Key and find that the address already has a value. Do not place it at this address, otherwise the previous mapping will be overwritten. At this point, a probe is performed on the calculated address and then hashed, such as moving an address back, if no one is occupied, the address is used. If the maximum length is exceeded, the total length can be mod. The address moved here is the increment of the order when the collision occurred.
- Chain address method: chain address method is actually the Key hash after the value of the same address, do a linked list. In fact, many high-level language implementations use this approach to handle conflicts, which we’ll focus on later.
- Rehash: After a conflict, use the other parts of the keyword to continue computing the address. If there is still a conflict, use the other parts to continue computing the address. The disadvantage of this approach is that time increases.
- Create a public overflow zone: Create a public overflow zone and place new addresses in the public overflow zone when there are conflicting addresses.

Here we introduce two of the simplest: linear probing in open addressing, and chained addressing.

#### 1. Linear detection

Linear detection is the simplest method of open addressing. When adding a new element to the hash table, if the index is already occupied, try the index + 1 position, if the index + 1 position is also occupied, try the index + 2 position, Similarly, if no free space is found after trying to reach the end of the table, start at the top of the table and continue trying until it is placed into the hash table.

The diagram below:

If the value is null, set it to undefined. If the value is not equal, continue to compare index + 1, and so on, until the entire hash is equal or traversed.

If the hash value calculated by the hash function is equal to the hash value to be searched, if the hash value is equal, the search is complete, and if the hash value is not equal, the search continues until the idle node is encountered (undefined node is ignored), and the search fails.

It’s simple, but it has its limitations. As the hash table grows larger and larger, the chance of collisions increases.

**The worst-case time is O(n).**

#### 2. Chain address method

The chain address is also very simple. It establishes a linked list for each node in the hash table. The key word is converted into a hash value through the hash function.

The diagram below:

**Insert: Insert a piece of data into the corresponding linked list, time complexity O(1)**

**Search or delete: From the header, search and delete time complexity is O(k), k is the length of the list**

### 4. Dynamic capacity expansion

In JavaScript, when an array is pushed, if the array is not large enough, JavaScript dynamically expands the array. The new capacity is 1.5 times the old capacity plus 16.

As the hash values are added to the hash table, the data in the hash table slows down and the probability of conflicts increases. The time complexity of operations such as search, insert, and delete increases, and the hash table needs to be dynamically expanded.

### Answer the opening question

How to design hash functions and how to resolve conflicts are important problems in the study of hash tables.

How do I design a hash function?

A good hash function has the following basic requirements:

- Easy to compute: It should be easy to compute and not be the algorithm itself.
- Uniform distribution: It should provide uniform distribution in the hash table and should not lead to clustering.
- Less conflict: Conflicts occur when pairs of elements are mapped to the same hash value. These should be avoided.

How to resolve conflicts?

There are several common methods of conflict resolution:

- Open address (also known as open addressing) : When we need to store a value, we hash the Key and find that the address already has a value. Do not place it at this address, otherwise the previous mapping will be overwritten. At this point, a probe is performed on the calculated address and then hashed, such as moving an address back, if no one is occupied, the address is used. If the maximum length is exceeded, the total length can be mod. The address moved here is the increment of the order when the collision occurred.
- Chain address method: chain address method is actually the Key hash after the value of the same address, do a linked list. In fact, many high-level language implementations use this approach to handle conflicts, which we’ll focus on later.
- Rehash: After a conflict, use the other parts of the keyword to continue computing the address. If there is still a conflict, use the other parts to continue computing the address. The disadvantage of this approach is that time increases.
- Create a public overflow zone: Create a public overflow zone and place new addresses in the public overflow zone when there are conflicting addresses.

### Common hash table problems

What we’ve already brushed:

- Tencent & LeetCode349: Given two arrays, write a function to calculate their intersection
- Byte &leetcode1: sum of two numbers
- Tencent &leetcode15: sum of three numbers

Brush today with leetcode380: constant time to insert, delete and get random elements

#### Leetcode380: Constant time to insert, delete and get random elements

Design a data structure that can perform the following operations at average time complexity O(1).

`insert(val)`

: Inserts the item into the collection when the element val does not exist.`remove(val)`

Removes the item from the collection when the: element val exists.`getRandom`

: returns a random item from an existing collection. Each element should have**Equal probability**Is returned.

**Example:**

```
// Initialize an empty collection.
RandomizedSet randomSet = new RandomizedSet();
// Insert 1 into the set. Returning true indicates that 1 was successfully inserted.
randomSet.insert(1);
// Returns false to indicate that 2 does not exist in the collection.
randomSet.remove(2);
// Insert 2 into the set. Returns true. The collection now contains [1,2].
randomSet.insert(2);
// getRandom should randomly return 1 or 2.
randomSet.getRandom();
// Remove 1 from the collection, return true. The set now contains [2].
randomSet.remove(1);
// 2 is already in the collection, so return false.
randomSet.insert(2);
Since 2 is the only number in the set, getRandom always returns 2.
randomSet.getRandom();
Copy the code
```

Please refer answers to github.com/sisterAn/Ja… , here we submitted the front-end algorithm used in the series of articles and the title (has been solved), welcome star, if you feel good, click on the support to see ðŸ˜Š

### Seven, past wonderful

- Front-end advanced algorithm 7: small white can understand the tree and binary tree
- Front-end advanced algorithm 6: queue, double-ended queue, sliding window and matching algorithm
- Front-end advanced algorithm: common algorithm problems and perfect problem solutions
- Video interview uHF online programming questions, enough to handle most companies
- 10 questions and 10 answers, with a quick introduction to front-end algorithms
- Front-end advanced Algorithm 5: Fully read the stack structure used by the front-end (+ LeetCode
- Front-end advanced algorithm 4: Linked lists are so simple
- Front-end advanced algorithm 3: Learning the LRU algorithm from the browser cache elimination strategy and Vue’s keep-alive
- Bottle jun front-end advanced algorithm camp first week summary
- Front-end advanced algorithm 2: JavaScript array from Chrome V8 source code
- Front-end advanced algorithm 1: How to analyze and count the execution efficiency and resource consumption of algorithms?

### The first phase of the front-end algorithm training camp is free to join

Scan code to join the front-end algorithm training camp, from 0 to 1 to build a complete data structure and algorithm system!

Here, I not only introduce algorithms, but also combine algorithms with various front-end areas, including browsers, HTTP, V8, React, Vue source code, and so on.

Here, you can learn a big factory algorithm (Ali, Tencent, Baidu, byte, etc.) or leetcode every day, Aquarius will solve it the next day!

**Scan code into the camp to learn! Scan the bottom of the TWO-DIMENSIONAL code, in the public account “front-end bottle jun” reply “algorithm” automatically pull you into the group learning**