Hash table Hash

Goal:

  • basis
  • Design of hash functions
  • Hash conflict handling
    • Chain address method Separate Chaining
  • Implement your own hash table
  • Dynamic spatial processing and complexity of hash tables

1.1 Understanding hash tables

  • Each character corresponds to an index (hash function), O(1) lookup operation
  • It is difficult to ensure that each “key” is converted by a hash function to a different “index”.
  • Hash conflict
    • The designer of hash functions is very important
    • The more evenly distributed the “index” obtained by the “key” hash function, the better
  • Hash table fully embodies the classical idea of algorithm design: space for time. A hash table is a balance between time and space

1.2 Design of hash function

Every Object class in Java has an implementation of hashCode() by default.

  • The integer

    • Small positive integers are used directly

    • Small range of negative integers offset -100 100- > 0 200

    • Large integer

      • Modulus: die a prime number (relating to the mathematical theory, but more discussion) reference: planetmath.org/goodhashtab…

      The test code is as follows:

      int a = 42;
      System.out.println(((Integer)a).hashCode());
      
      int b = -42;
      System.out.println(((Integer)b).hashCode());
      Copy the code
  • floating-point

    Both are 32-bit and 64-bit binary representations, but the computer interprets them as floating-point numbers

    To an integer.

    double c = 3.1415926;
    System.out.println(((Double)c).hashCode());
    Copy the code
  • String, compound type

    • To an integer
    String d = "java";
    System.out.println(d.hashCode());
    Copy the code
  • The principle of

    • Consistency: Hash (a) == hash(b) if a == b
    • High efficiency: the calculation is efficient and simple
    • Uniformity: Hash values are evenly distributed

1.3 Handling of hash conflict

  • Chain address method

    • For the entire hash table, I open m Spaces, and for each space, because of the hash conflicts, I’m essentially storing a linked list

      For each position, we say save a linked list, which is actually the lookup table. After previous learning, we can also use the balanced tree structure (TreeMap, TreeSet) to achieve the lookup table.

      Before Java8, there was a linked list for each location. Starting with Java8, when hash collisions reached a certain level, each location changed from a linked list to a red-black tree.

  • TreeMap implementation

    The code is as follows:

    / / add
    public void add(K key, V value) {
        / / reuse TreeMap
        TreeMap<K, V> map = hashtable[hash(key)];
        if (map.containsKey(key)) {
           map.put(key, value);
        } else{ map.put(key, value); size++; }}/ / delete
    public V remove(K key){
        TreeMap<K, V> map = hashtable[hash(key)];
        V ret = null;
        if (map.containsKey(key)){
            ret = map.remove(key);
            size--;
        }
        return ret;
    }
    
    / / change
    public void set(K key, V value){
        TreeMap<K,V> map = hashtable[hash(key)];
        if(! map.containsKey(key)){throw new IllegalArgumentException(key + "doesn't exist");
        }
        map.put(key, value);
    }
    
    public boolean contains(K key){
        return hashtable[hash(key)].containsKey(key);
    }
    
    public V get(K key){
        return hashtable[hash(key)].get(key);
    }
    Copy the code
  • Complexity analysis

    • Suppose there are a total of M addresses. If you put them in a hash table, you get N elements, and when you implement them in a linked list, the time complexity of each address is O(N/M), and when you implement them in a balanced tree, the time complexity of each address is O(logN/M).

    • How do I get to order one? Dynamic space allocation.

      • On average, each address carries more elements than a certain level, that is, capacity expansion. N/M >= upperTol
      • The average capacity of each address is reduced to a certain extent. N/M < lowerTol

      The specific implementation code is as follows:

      private void resize(int newM){
          TreeMap<K, V>[] newHashTable = new TreeMap[newM];
          for (int i = 0; i<newM; i++){ newHashTable[i] =new TreeMap<>();
          }
      
          int oldM = M;
          this.M = newM;
          for (int i = 0; i<oldM; i++){ TreeMap<K, V> map = hashtable[i];for(K key:map.keySet()){ newHashTable[hash(key)].put(key, map.get(key)); }}this.hashtable = newHashTable;
      }
      
      // Modify the added element
      public void add(K key, V value) {
          / / reuse TreeMap
          TreeMap<K, V> map = hashtable[hash(key)];
          if (map.containsKey(key)) {
              map.put(key, value);
          } else {
              map.put(key, value);
              size++;
      
              if (size >= upperTol * M) {
                  resize(2* M); }}}/ / delete
      public V remove(K key) {
          TreeMap<K, V> map = hashtable[hash(key)];
          V ret = null;
          if (map.containsKey(key)) {
              ret = map.remove(key);
              size--;
      
              if (size < lowerTol * M && M/2>=intitCapacity) {
                  resize(M / 2); }}return ret;
      }
      Copy the code
    • For hash tables, the number of elements increases from N to upperTol*N, the address space doubles, and the average complexity is O(1). Each operation ranges from O(lowerTol) to O(upperTol)–>O(1).

    • Expand by 2 by M, not prime. Solution: according to planetmath.org/goodhashtab…

      / / prime tables
      private final int[] capacity
          = {53.97.193.389.769.1543.3079.6151.12289.24593.49157.98317.196613.393241.786433.1572869.3145739.6291469.12582917.25165843.50331653.100663319.201326611.402653189.805306457.1610612741};
      
      / / add
      public void add(K key, V value) {
          / / reuse TreeMap
          TreeMap<K, V> map = hashtable[hash(key)];
          if (map.containsKey(key)) {
              map.put(key, value);
          } else {
              map.put(key, value);
              size++;
      
              if (size >= upperTol * M && capacityIndex + 1< capacity.length) { capacityIndex++; resize(capacity[capacityIndex]); }}}/ / delete
      public V remove(K key) {
          TreeMap<K, V> map = hashtable[hash(key)];
          V ret = null;
          if (map.containsKey(key)) {
              ret = map.remove(key);
              size--;
      
              if (size < lowerTol * M && capacityIndex - 1> =0) { capacityIndex--; resize(capacity[capacityIndex]); }}return ret;
      }
      Copy the code
    • Conclusion:

      • The amortized complexity of hash tables is O(1), sacrificing orderality. The implementation of sets and maps can be linked lists, trees, or hash tables.
      • Ordered sets, ordered maps: balanced trees
      • Unordered set without mapping: hash table