GitHub source code sharing

Wechat search: code nong StayUp

Home address: goZhuyinglong.github. IO

Source: github.com/gozhuyinglo…

1. What is a hash table

A Hash Table, also known as a Hash Table, is a data structure that calculates the address stored in the Table based on a given Key. That is, hash tables establish a direct mapping between keywords and stored addresses, mapping keywords to addresses recorded in the table, which speeds up lookups.

Hash (key)=v, where key is the keyword, hash() is the hash function, and v is the hash address.

1.1 Related Terms

  • Key: The keyword is the lookup object of the hash table and the parameter of the hash function. This can be the record itself or a unique index of the record, depending on the design of the hash table.
  • Hash functions: Functions that map keywords to the appropriate storage address in the Hash table are called Hash functions, also known as Hash functions.
  • Hash Address: The Address calculated by the Hash function of a keyword is called a Hash Address, also called a Hash value or Hash value.
  • Hash tables: Tables that use Hash techniques to store data records are called Hash tables.
  • Hash Collision: When different keywords have the same Hash address, this phenomenon is called Collision, also called Hash Collision. These keywords are synonyms for the hash function.
  • Table length: The size of the hash table, that is, the number of hash addresses that can be stored. Let’s call it m.

1.2 Take an example

A popular example is the address book in our mobile phones. The address book stores the names and phone numbers of contacts in alphabetical order. For example, if we look for “Ying zheng”, we can quickly locate “ying zheng” through the letter “Y”.

  • In this case, the name “Ying zheng” is the key word.
  • The algorithm for picking the first letter of a name is the hash function. Hash (ying Zheng) = Y.
  • through
  • When we locate the address of “Y”, there are not only “ying”, but also “winning inhumans”, that is, the address of “Y” can be found through the keyword “winning inhumans”. This phenomenon is called conflict. That is, hash(ying Zheng) = hash(win alien) = Y.
  • “Chin” and “win inhumans” are synonyms for this hash function.
  • The address book used to store contacts is called a hash table.

1.3 summary

This chapter introduces the concept of hash tables and related terms, and uses an address book example to deepen understanding of hash tables. But we still have some problems to solve, and that is “hashing”. While we can minimize the number of collisions by carefully designing hash functions, we still need to provide ways to resolve conflicts. Later sections will cover the design of hash functions and conflict resolution in more detail.

2. Design of hash function

The design principle of hash function should follow the principle of simple calculation and uniform hash address distribution. Below we introduce several commonly used hash function construction methods.

2.1 Direct addressing method

Take the keyword or a linear function value of the keyword as a hash address. Hash (key) = key or hash(key) = A * key + B, where A and B are constants. Such hash functions are also called self functions.

Hash (key) = key-1 hash(key) = key-1 hash(key) = key-1 hash(key) = key-1 hash(key) = key-1 hash(key) = key-1

The advantage of direct addressing is that it is simple, uniform and does not cause conflicts. But since this is a way of swapping space for time, it works well for small, continuous tables. Such as age, job level and so on.

2.2 Numerical analysis

By analyzing a set of data, the possible keywords in the data are known in advance, and several digits of the keywords can be taken to form a hash address. The piece of data needs to occur in its digits less frequently, so there is less chance of conflict.

For example, the first three digits of a mobile phone number are network identification numbers, the middle four digits are area codes, and the last four digits are real user numbers. In a certain company, the first seven digits of an employee’s mobile phone number have a high probability of conflict, while the last four digits have a low probability of conflict. Therefore, the last four digits can be used as the hash address.

2.3 Square method

Take the middle bits of the keyword squared as the hash address. This method is a method to generate pseudorandom numbers, proposed by Von Neumann in 1946.

2.4 fold method

Divide the keyword into parts with the same number of digits (the last part can have different numbers of digits), and then take the sum of these parts as the hash address. This method is suitable for a large number of keywords.

Key words for 1234567890, for example, the hash table long for three: (1) key words can be divided into 4 groups: 123 | 456 | 789 | 0 (2) superposition sum them: 123 + 456 + 789 + 0 = 1368 (3) give carry and: 368, is the final hash address

2.5 Divisor remainder method

Assuming that the table length of the hash table is M, take a prime number P that is not larger than m but closest to or equal to m, and convert it into a hash address by modulo operation of the key word on P. The hash function is hash(key) = key % p, where p ≤\leq≤ m.

It can not only take the modulus of the key word directly, but also take the modulus after the operation of folding method, square method and so on. The choice of P is very important, if the choice is not good, conflicts will easily occur.

3. Hash conflict handling method

There are several ways to resolve conflicts, and here we discuss two of the simplest: open addressing and chain addressing (zipper).

3.1 Open addressing method

Taking the hash address of the conflict as an independent variable, a new idle hash address can be obtained by some conflict resolution function. That is, when a conflict occurs, the next free hash address is searched. According to different ways of searching, there are several methods, which are introduced below.

3.1.1 Linear detection method

When a conflict occurs, the next address in the table is sequentially detected until a free (free) address is detected and the record is inserted. If the end address m-1 is detected, the next end address is 0. When the table is full, the probe fails.

For example, the hash function in the hash table below uses a divisor remainder method (m=10, p=10). So when we insert 12, 13 and 24 in sequence, it will be successfully inserted into the corresponding address. When we insert 34 and 44 again, there will be conflicts. We use the line detection method to continue to detect the next address and insert it into the idle address.

Linear probing causes a large number of elements to “cluster” at adjacent hash addresses, making us constantly dealing with conflicts, which greatly reduces lookup and storage efficiency.

3.1.2 square detection method

Square detection method is a conflict resolution method to eliminate the aggregation problem in linear detection method. Let the conflicting address be d, then the new address sequence obtained using the square probe is d ±\ PM ± 1^2, d ±\ PM ± 222^222, d ±\ PM ± 323^232… D ±\ PM ± key2key^2key (key ≤\leq≤ m/2)

4 + 121^212 = 5, 4-121^212 = 3, 4 + 222^222 = 8, 4-222^222 = 0, 4-222^222 = 0 5, 2, 8, 0, so 34 should be put at 5. The 44 should be at the 8.

The disadvantage of the square detection method is that it cannot detect all addresses on the hash table, but it can detect at least half of them.

3.1.3 Pseudo-random detection method

Pseudorandom detection method is that when address conflict occurs, address increment is a sequence of pseudorandom numbers.

Pseudorandom means that if we set the random seed to be the same, then we can call the random function repeatedly to generate the sequence of numbers that will not repeat. When we look for the same random seed, it will get the same sequence of numbers every time. The same address can of course get the same hash address.

3.1.4 double hash

A double hash, as its name implies, uses two hash functions. When the first hash function results in a conflict of addresses, the second hash function is executed to calculate the address increment for that keyword.

A common algorithm is :(hash1(key) + I * hash2(key)) mod m, where I is the number of collisions, hash1() is the first hash function, hash2() is the second hash function, and m is the hash table size. When a conflict occurs, we address it by repeatedly increasing the step size I.

Again, take the example above. The first hash function is set to hash1(key) = key mod 10 and the second hash function is set to hash2(key) = key mod 3.

The hash address of keyword 34 can be calculated as follows: (34 mod 10 + 1 * (34 mod 3)) mod 10 = (4 + 1 * 1) mod 10 = 5. The hash address of keyword 44 is :(44 mod 10 + 2 * (44 mod 3)) mod 10 = (4 + 2 * 2) mod 10 = 8

3.2 Chain address method

Linked address method, also known as zipper method, is a method to solve conflicts by using linked lists, that is, the keyword records with the same hash address are stored in the same single linked list, which is called synonym linked list. Each hash address has a corresponding linked list.

Using the above example, the storage model using the chain address method is shown below.

4. Code implementation

We use the chain address method to implement the hash table in the above example, and the hash function uses the division and remainder method.

4.1 node class

Because linked lists are not the focus of this article, a simple node class is designed here. For more information about linked lists, see: Linked lists (single, double, ring)

class Node {
    private final int key; / / key
    private Node next; // Next node

    public Node(int key) {
        this.key = key;
    }

    public int getKey(a) {
        return key;
    }

    public Node getNext(a) {
        return next;
    }

    public void setNext(Node next) {
        this.next = next; }}Copy the code

4.2 Hash table classes

The hash class uses arrays to store table data, points to the head node of the list, and its hash function uses a divisor remainder method.

This class mainly realizes the add keyword, delete keyword and match keyword of hash table.

class HashTable {

    private final int size; // Hash table size
    private final Node[] table; / / a hash table

    private HashTable(int size) {
        this.size = size;
        this.table = new Node[size];
    }

    /** * hash function - remainder method **@param key
         * @return* /
    private int hash(int key) {
        return key % size;
    }

    /** * add the keyword **@param key
         */
    public void add(int key) {
        int hashAddress = hash(key);
        if (table[hashAddress] == null) {
            table[hashAddress] = new Node(key);
            return;
        }
        add(table[hashAddress], key);
    }

    /** * add keyword - recursive **@param node
         * @param key
         */
    private void add(Node node, int key) {
        if (node.getNext() == null) {
            node.setNext(new Node(key));
            return;
        }
        add(node.getNext(), key);
    }

    /** * matches the keyword **@param key
         * @return* /
    public boolean contains(int key) {
        int hashAddress = hash(key);
        Node headNode = table[hashAddress];
        if (headNode == null) {
            return false;
        }

        return contains(headNode, key);
    }

    /** * match keyword - recursive **@param node
         * @param key
         * @return* /
    private boolean contains(Node node, int key) {
        if (node == null) {
            return false;
        }
        if (node.getKey() == key) {
            return true;
        }
        return contains(node.getNext(), key);
    }

    /** * remove the keyword **@param key
         */
    public void remove(int key) {
        int hashAddress = hash(key);
        Node headNode = table[hashAddress];
        if (headNode == null) {
            return;
        }
        if (headNode.getKey() == key) {
            table[hashAddress] = headNode.getNext();
            return;
        }
        remove(headNode, key);
    }

    /** * remove keyword - recursive **@param node
         * @param key
         */
    private void remove(Node node, int key) {
        if (node.getNext() == null) {
            return;
        }
        if (node.getNext().getKey() == key) {
            node.setNext(node.getNext().getNext());
            return; } remove(node.getNext(), key); }}Copy the code

5. Complete code

For the complete code, please visit my Github. If it is helpful to you, you are welcome to give a Star. Thank you!

Github.com/gozhuyinglo…

Recommended reading

  • An array of
  • The sparse array
  • Linked list (single linked list, double linked list, ring linked list)
  • The stack
  • The queue
  • The tree
  • Binary tree
  • Binary Search tree (BST)
  • AVL tree (balanced binary tree)
  • B tree