This is the 7th day of my participation in the August Text Challenge.More challenges in August

This series of articles is a summary of personal learning, if you find mistakes or questions, welcome to comment!

This article is the seventh in the series of relearning data structures, which are as follows: 1. Time complexity and space complexity of algorithms 2. Relearn data structure — Linked list 3. Relearn data structure — Queue 4. Relearn data structure — Stack 5. Relearn data structure — Tree 6. Relearn data structure — Graph

Hash table

A hash table is also called a hash table (hash map, map, dictionary, associative array). A hash table is a data structure that is accessed directly based on a key-value. It is based on array, by one of the keywords is mapped to array subscript to accelerate the search speed, but also, and array data structure, linked lists, trees, and so on in the data structure of a keyword search, usually to traverse the entire data structure, which is O (N) time, but for a hash table, only O (1) time.

There is an important problem is how to convert the keyword to the index of the array, the conversion function is called the hash function (also known as the hash function), the conversion process is called hashing.

1. Hash functions

A hash function is a function that gives you back whatever number you give it (mapping input to numbers). The hash function needs to satisfy some requirements:

  • The hash function always maps the same input to the same index (the same input returns the same)
  • Hash functions map different inputs to different indexes.

We don’t need to implement hash tables ourselves; any good language does.

2. Hash table application

  • For finding: After storing data, you can find it quickly by providing a key
  • Prevent duplication: The corresponding data has a unique key (for example, voting is stored with the key obtained from the hash of the mobile phone number).
  • Cache: For example, a child always asks you questions about the stars, how far is the moon from the Earth, and every time you do a Baidu search, it might take a few minutes. Now suppose he always asks you how far is the moon from the Earth, and soon you remember 238,900 miles. So instead of going to Baidu search, you can tell him the answer directly. This is how caching works (websites remember data and don’t recalculate it)

3. The conflict

As mentioned earlier, hash tables need to be able to map different inputs to different locations, but it’s almost impossible to write such hash functions.

Let’s take a look at a simple example. Suppose you have an array of 26 locations. The hash function you use is very simple, it allocates the array position in alphabetical order

If you were to store the price of Apple in a hash table, you’d get the first slot, then you’d store the price of Banana in the second slot, then you’d store the price of Avocado in the first slot. That’s when you realize there’s a problem, and this location already stores the price of apples! This situation is called conflict: assigning the same position to two keys. If you store avocado’s price in this location, Apple’s price will be overwritten. If you query Apple’s price in the future, you will get Avocado’s price. This is a big problem and must be avoided. The easiest way to handle collisions is to store a linked list at the same location if two keys map to the same location.

In this example, Apple and Avocado are mapped to the same location, so a linked list is stored at that location. It’s still fast to look up the price of banana, but slower to look up the price of apples: you have to find apples in the corresponding linked list. If the list is short, it doesn’t matter, you just need to find a few elements. But suppose you store all goods that start with A.

It looks bad because the entire hash table is empty except for the first position, which contains a long list. All the elements in the hash table are in the linked list, which is just as bad as storing all the elements in a linked list to begin with: the hash table is slow.

There are two lessons here:

  • The hash function is important. The previous hash function maps all the keys to one location, and ideally, the hash function maps the keys evenly to different locations in the hash table.
  • If the hash table stores a long linked list, the speed of the hash table decreases dramatically. However, if the hash function used is good, these lists will not be very long!

Performance of 4.

Hash tables are compared to arrays and linked lists

On average, a hash table looks up (gets the value at a given index) as fast as an array, and inserts and deletes as fast as a linked list, so it has the best of both worlds! But in the worst case, the hash table’s various operations are slow. Therefore, avoiding the worst is critical when using hash tables. To do this, conflict needs to be avoided. To avoid conflict, you need to:

  • Lower fill factor: Adjust the length of the hash table once the fill factor is greater than 0.7.

    Fill factor = number of elements contained in the hash table/total number of positions

  • Good hash functions