Hash table is a relatively advanced data structure. The underlying implementation of HashMap in Java standard library is hash table, which is array associative linked list and red-black tree. The hash table implemented in this paper is a simplified version of HashMap, which is realized by array and red-black tree, but the basic idea is the same;

Because the hash table array plus linked list and red black tree of the underlying implementation is too complex, we in this paper hash table implementation is a simplified version of the underlying implementation, but the underlying idea of hash table is basically the same;

The hash table in this paper is realized with the help of TreeMap in the Java standard library. Because the bottom implementation of TreeMap is red-black tree, TreeMap[] array is used in this paper, so it can be considered that the bottom implementation of this paper is array plus red-black tree.

Hash conflict refers to the same array index value calculated by different elements. In this case, multiple values need to be stored in an array index. The solution is to make each position of the array a linked list or binary tree, which is also called the linked address method. In this paper, the way to deal with hash conflict is the chain address method.

The reason why you index an array with a prime that’s close to the size of the data, is because math shows that there’s less chance of hash collisions, and we’ll talk about the underlying implementation later;

1, hash table basic properties and methods

Capacity array of type int is a set of primes. When the size of the hash table data increases, we only need to take primes of the corresponding size in Capacity.

UpperTol and lowerTol are coefficients for scaling and scaling, which we’ll talk about later;

CapacityIndex is the index of the Capacity array. When the hash table data size increases, you only need to increase the capacityIndex value and remove the value from the capacity array.

A hashTable is an array of type TreeMap that holds data.

Size indicates the number of data in the hash table.

M is the size of the HashTable array;

Initialize the HashTable array by assigning M to capacity and capacityIndex.

The hash function is to find the index of the element in the array. The hash value is calculated using hashcode first, and then M is the prime number corresponding to the data size of the hash table.

GetSize returns the number of data in the hash table.

public class HashTable<K extends Comparable<K>, V> { private final int[] capacity = {53, 97, 193, 389, 769, 1543, 3079, 6151, 12289, 24593, 49157, 98317, 196613, 393241, 786433, 1572869, 3145739, 6291469, 12582917, 25165843, 50331653, 100663319, 201326611, 402653189, 805306457, 1610612741}; private static final int upperTol = 10; private static final int lowerTol = 2; private int capacityIndex = 0; private TreeMap<K, V>[] hashtable; private int size; private int M; public HashTable(){ this.M = capacity[capacityIndex]; size = 0; hashtable = new TreeMap[M]; for(int i = 0 ; i < M ; i ++) hashtable[i] = new TreeMap<>(); } private int hash(K key){ return (key.hashCode() & 0x7fffffff) % M; } public int getSize(){ return size; },,}Copy the code

2. Add elements to hash table

The hash function is used to calculate the array index of elements, and the index is used to find TreeMap in the HashTable array. If the key already exists in the map, it is overwritten directly; if it does not, it is directly added to the map. The underlying implementation of this map is a red-black tree, so the underlying implementation of our hash table can be considered as an array plus red-black tree implementation;

After adding elements, check whether the capacity needs to be expanded. The idea of capacity expansion is to increase the capacity index and then find the corresponding prime in the capacity array. This ensures that each capacity expansion is a prime corresponding to the data size of the hash table.

The resize function is also very simple. Create a TreeMap array and copy all the values from the original Map array to the new array. There are several points to note in the copying process. The first for loop iterates over the size of the original array, while the second foreach loop iterates over the size of the new array to hash the element. Finally, the HashTable reference points to the new array;

public void add(K key, V value){
    TreeMap<K, V> map = hashtable[hash(key)];
    if(map.containsKey(key))
        map.put(key, value);
    else{
        map.put(key, value);
        size ++;

        if(size >= upperTol * M && capacityIndex + 1 < capacity.length){
            capacityIndex ++;
            resize(capacity[capacityIndex]);
        }
    }
}

private void resize(int newM){
    TreeMap<K, V>[] newHashTable = new TreeMap[newM];
    for(int i = 0 ; i < newM ; i ++)
        newHashTable[i] = new TreeMap<>();

    int oldM = M;
    this.M = newM;
    for(int i = 0 ; i < oldM ; i ++){
        TreeMap<K, V> map = hashtable[i];
        for(K key: map.keySet())
            newHashTable[hash(key)].put(key, map.get(key));
    }

    this.hashtable = newHashTable;
}
Copy the code

3. Remove elements from the hash table

First, the hash function is used to calculate the index of the element in the array, and then the index is used to find the corresponding map in the HashTable array. If the map contains the element, the element can be directly deleted from the map. Finally check whether you need to reduce capacity, the principle is exactly the same as expansion;

public V remove(K key){
    V ret = null;
    TreeMap<K, V> map = hashtable[hash(key)];
    if(map.containsKey(key)){
        ret = map.remove(key);
        size --;

        if(size < lowerTol * M && capacityIndex - 1 >= 0){
            capacityIndex --;
            resize(capacity[capacityIndex]);
        }
    }
    return ret;
}
Copy the code

Find and modify elements from the hash table

The search and modification logic is basically the same. First, the hash function is used to calculate the index of the element in the array, and then the index is used to find the corresponding map in the HashTable array. Finally, the put function of the map is used to modify the element. Find elements using the Map’s containsKey or get functions.

public void set(K key, V value){ TreeMap<K, V> map = hashtable[hash(key)]; if(! map.containsKey(key)) throw new IllegalArgumentException(key + " doesn't exist!" ); map.put(key, value); } public boolean contains(K key){ return hashtable[hash(key)].containsKey(key); } public V get(K key){ return hashtable[hash(key)].get(key); }Copy the code

5, hash table complete code

import java.util.TreeMap; public class HashTable<K extends Comparable<K>, V> { private final int[] capacity = {53, 97, 193, 389, 769, 1543, 3079, 6151, 12289, 24593, 49157, 98317, 196613, 393241, 786433, 1572869, 3145739, 6291469, 12582917, 25165843, 50331653, 100663319, 201326611, 402653189, 805306457, 1610612741}; private static final int upperTol = 10; private static final int lowerTol = 2; private int capacityIndex = 0; private TreeMap<K, V>[] hashtable; private int size; private int M; public HashTable(){ this.M = capacity[capacityIndex]; size = 0; hashtable = new TreeMap[M]; for(int i = 0 ; i < M ; i ++) hashtable[i] = new TreeMap<>(); } private int hash(K key){ return (key.hashCode() & 0x7fffffff) % M; } public int getSize(){ return size; } public void add(K key, V value){ TreeMap<K, V> map = hashtable[hash(key)]; if(map.containsKey(key)) map.put(key, value); else{ map.put(key, value); size ++; if(size >= upperTol * M && capacityIndex + 1 < capacity.length){ capacityIndex ++; resize(capacity[capacityIndex]); } } } public V remove(K key){ V ret = null; TreeMap<K, V> map = hashtable[hash(key)]; if(map.containsKey(key)){ ret = map.remove(key); size --; if(size < lowerTol * M && capacityIndex - 1 >= 0){ capacityIndex --; resize(capacity[capacityIndex]); } } return ret; } public void set(K key, V value){ TreeMap<K, V> map = hashtable[hash(key)]; if(! map.containsKey(key)) throw new IllegalArgumentException(key + " doesn't exist!" ); map.put(key, value); } public boolean contains(K key){ return hashtable[hash(key)].containsKey(key); } public V get(K key){ return hashtable[hash(key)].get(key); } private void resize(int newM){ TreeMap<K, V>[] newHashTable = new TreeMap[newM]; for(int i = 0 ; i < newM ; i ++) newHashTable[i] = new TreeMap<>(); int oldM = M; this.M = newM; for(int i = 0 ; i < oldM ; i ++){ TreeMap<K, V> map = hashtable[i]; for(K key: map.keySet()) newHashTable[hash(key)].put(key, map.get(key)); } this.hashtable = newHashTable; }}Copy the code