This is the seventh day of my participation in the First Challenge 2022

The Hash table

Such as Java HashMap, a (K,V) data storage structure. The underlying HashMap uses arrays + linked lists, mostly arrays, not all hash tables.

When adding an element to a Hash table, the element’s key is used by a specific Hash function to get a Hash value and calculate the storage location in the current array.

The hash function

A function that computes the hash value of an element’s key through a specific function. Also called the hash function.

Hash conflict

The keys of the same element will be hashed to the same value.

Different keys should have different hash values based on hash theory, but no such “perfect” hash function has ever been used in real life, so different keys may compute the same hash value. In addition, if the array is too small, multiple elements may be computed to the same location in the array. This is called a hash collision, also known as a hash collision.

The solution

The following are just a few of them, there are many solutions in reality, please refer to the relevant information.

Open addressing

The underlying storage is an array:

(P.S. had never understood the solution, but it was only a matter of a few words.)

  1. Hash to find the position in the array
  2. If the location already has elements, continue to find a free location

A hash conflict occurs when an element is already present at a location. To continue detecting the next free location, use the following methods:

  • Linear detection
  • Second detection
  • Double hash
  • .

Linear probe is used to illustrate the process of adding, querying and deleting, as follows:

The locations of black numbers indicate existing elements, and blue indicates free locations.

increase

When adding an element, there is a conflict, so search backwards until you find a location in space, as shown in the figure above. If the location of the new element is 1, it already exists, continue to search, 2 also exists, and after that, 3 is found.

10->11->1->2->3

The query

If 2 is found, it is found. If 2 is not found, look further. If 3 is free, it indicates that the element does not exist.

delete

When deleting an element, do not delete it directly because the query will be affected. The criteria for the end of the query may be empty. So when you delete, you just mark the element as deleted.

The difference between the other detection methods is that the step length is different. For example, the previous linear detection is one step at a time, the second detection is the square of the current step number, and the double Hash is the use of multiple Hash functions, and the next Hash function is changed if the conflict occurs.

The sample

ThreadLocalMap in Java uses linear probing, and if a hash conflict occurs, search down until a suitable location is found.

Here is part of the code for set:

            for(Entry e = tab[i]; e ! =null;
                 // nextIndex is I +1 or 0: ((I +1 < len)? i + 1 : 0e = tab[i = nextIndex(i, len)]) { ThreadLocal<? > k = e.get();if (k == key) {
                    // If it exists, replace it
                    e.value = value;
                    return;
                }
                // Write if it does not exist
                if (k == null) {
                    replaceStaleEntry(key, value, i);
                    return; }}Copy the code

The linked list method

The underlying storage is still an array, but each location actually stores a linked list, and when an element is added, it’s a node added to the list at that location.

new

Hash the key and calculate the position in the array. After comparison, if the element is not present in the current link, it will be added as a node of the linked list. If it exists, it will be replaced.

The query

The key hash computes the position in the array and compares each node on the list until it is found or empty.

delete

If you find that node, it’s just a regular linked list that deletes a node, and if you don’t find one, you don’t need to deal with it.

The sample

A HashMap in Java is basically an implementation of this approach, except that it is not necessarily a pure linked list. In the IMPLEMENTATION of JDK8 (not noted in other versions), the chain becomes a red-black tree when it reaches eight nodes.

In order to intuitively see the implementation, the part of the source copy over:

        Node<K,V>[] tab; Node<K,V> p; int n, i;
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;
        if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);
        else {
            Node<K,V> e; K k;
            if(p.hash == hash && ((k = p.key) == key || (key ! =null && key.equals(k))))
                e = p;
            else if (p instanceof TreeNode)
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            else {
                for (int binCount = 0; ; ++binCount) {
                    if ((e = p.next) == null) {
                        p.next = newNode(hash, key, value, null);
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab, hash);
                        break;
                    }
                    if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
                        break; p = e; }}if(e ! =null) { // existing mapping for key
                V oldValue = e.value;
                if(! onlyIfAbsent || oldValue ==null)
                    e.value = value;
                afterNodeAccess(e);
                returnoldValue; }}Copy the code