This article is a translation of Swift Algorithm Club.

Swift Algorithm Club is an open source project produced by Raywenderlich.com to realize algorithms and data structures by Swift. Currently, there are 18000+⭐️ on GitHub. I have made a brief statistics, and there are about 100 algorithms and data structures. It’s basically a good resource for iOSer learning algorithms and data structures.

🐙andyRon/ swift-algorithm-club-CN is my project of learning and translating for Swift Algorithm Club. Due to the limited capacity, if you find any errors or improper translation, please correct them. Welcome to Pull Request. Also welcome interested, have time partners to participate in translation and learning 🤓. Of course, we welcome ⭐️, 🤩🤩 ️ ⭐ ⭐.

The translation of this article and the code can be found at 🐙swift-algorithm-club-cn/Hash Table


Hash Table

Hash tables allow you to store and retrieve objects by “key”.

Hash tables are used to implement structures such as dictionaries, maps, and associative arrays. These structures can be implemented through trees or plain arrays, but using hash tables is more efficient.

This also explains why Swift’s built-in Dictionary type requires keys to conform to the Hashable protocol: internal dictionaries are implemented using hash tables, as you’ll learn here.

How it works

A hash table is just an array. Initially, this array is empty. When a value is put into a hash table under a key, it uses that key to compute the index in the array. Here’s an example:

hashTable["firstName"] = "Steve"

	The hashTable array:
	+--------------+
	| 0: | + -- -- -- -- -- -- -- -- -- -- -- -- -- -- + |1: | + -- -- -- -- -- -- -- -- -- -- -- -- -- -- + |2: | + -- -- -- -- -- -- -- -- -- -- -- -- -- -- + |3: firstName |---> Steve
	+--------------+
	| 4: | + -- -- -- -- -- -- -- -- -- -- -- -- -- -- +Copy the code

In this example, the key “firstName” maps to array index 3. In this example, the key “firstName” maps to array index 3.

14. Adding a value under a different key puts it at another array index: Adding a value under a different key puts it at another array index:

hashTable["hobbies"] = "Programming Swift"

	The hashTable array:
	+--------------+
	| 0: | + -- -- -- -- -- -- -- -- -- -- -- -- -- -- + |1: hobbies   |---> Programming Swift
	+--------------+
	| 2: | + -- -- -- -- -- -- -- -- -- -- -- -- -- -- + |3: firstName |---> Steve
	+--------------+
	| 4: | + -- -- -- -- -- -- -- -- -- -- -- -- -- -- +Copy the code

The trick is how the hash table calculates those array indices. That is where the hashing comes in. When you write the The trick here is how the hash table evaluates these array indexes. That’s where hashing comes in. When you write down the statement below,

hashTable["firstName"] = "Steve"
Copy the code

The hash table uses the key “firstName” and asks for its hashValue attribute. Therefore, the key must conform to the Hashable protocol.

When you write “firstName”.hashValue, it returns a large integer: -4799450059917011053. Also, “hobbies”.hashvalue has a hashValue of 4799450060928805186. (the values you see may vary.)

These numbers are large enough to be used as an index to our array, and one of them is even negative! A common way to make these large numbers usable is to first make the hash value positive and then modulo the array size (take the remainder), which is the index of the array.

Our array size is 5, so the index of the “firstName” key becomes ABS (-4799450059917011053) % 5 = 3. You can calculate that the array index of “hobbies” is 1.

Using hashing in this way is what makes dictionaries effective: to look up elements in a hash table, you must use the hash value of the key to get the array index, and then look up elements in the underlying array. All of these operations take constant time, so insert, retrieve, and delete are O(1).

Note: It is difficult to predict the final position of an object in an array. Therefore, dictionaries do not guarantee any particular order of elements in a hash table.

To avoid conflict

One problem: because we use the modulus of the hash value and the size of the array, it can happen that two or more keys are assigned the same array index. This is called conflict.

One way to avoid collisions is to use large arrays, which makes it less likely that two keys will map to the same index. Another trick is to use prime numbers as array sizes. However, conflicts are bound to occur, so you need to find a way to handle them.

Because our tables are so small, they are prone to conflicts. For example, the key “lastName” also has an array index of 3, but we don’t want to overwrite values already at this array index.

A common way to handle conflicts is to use chaining. The array looks like this:

	buckets:
	+-----+
	|  0  |
	+-----+     +----------------------------+
	|  1  |---> | hobbies: Programming Swift |
	+-----+     +----------------------------+
	|  2  |
	+-----+     +------------------+     +----------------+
	|  3  |---> | firstName: Steve |---> | lastName: Jobs |
	+-----+     +------------------+     +----------------+
	|  4  |
	+-----+
Copy the code

With links, keys and their values are not stored directly in the array. Instead, each array element is a list of zero or more key/value pairs. Array elements are often referred to as buckets and lists are called chains. So here we have five buckets, and two of them have chains. The other three barrels were empty.

If we write the following statement to retrieve items from a hash table,

let x = hashTable["lastName"]
Copy the code

It first evaluates the array index with the hash key “lastName”, which is 3. Since bucket 3 has a chain, we step through the list to find the value with the “lastName” key. This is done by comparing keys using string comparisons. The hash table checks if the key is the last item in the chain and returns the corresponding value “Jobs”.

A common way to implement this chain mechanism is to use linked lists or other arrays. Because the order of the items in the chain doesn’t matter, you can think of it as a collection rather than a list. (Now you can also imagine where the term “bucket” came from; We just dump all the objects together into a bucket.

The chain should not get longer, because finding items in the hash table will be slow. Ideally, we have no chain at all, but it is virtually impossible to avoid conflict. You can improve the odds by providing enough buckets for the hash table using high-quality hash functions.

Note: The alternative to chain is “open addressing”. The idea is that if an array index is already used, we put the element in the next unused bucket. This approach has its own advantages and disadvantages.

code

Let’s look at the basic implementation of a hash table in Swift. We will build it up gradually.

public struct HashTable<Key: Hashable.Value> {
  private typealias Element = (key: Key, value: Value)
  private typealias Bucket = [Element]
  private var buckets: [Bucket]

  private(set) public var count = 0
  
  public var isEmpty: Bool { return count= =0 }

  public init(capacity: Int) {
    assert(capacity > 0)
    buckets = Array<Bucket>(repeatElement([], count: capacity))
  }
Copy the code

HashTable is a generic container with two generic types named Key (which must be Hashable) and Value. We also define two other types: Element is a key/value pair used in a chain, and Bucket is an array of Elements like this.

The main array is named Buckets. Init (Capacity) uses a fixed capacity to determine the size of an array. We can also use the count variable to track how many items have been added to the hash table.

Example of how to create a new hash table object:

var hashTable = HashTable<String.String>(capacity: 5)
Copy the code

Hash tables don’t do anything yet, so let’s add the following functionality. First, add a helper method to calculate the array index for the given key:

  private func index(forKey key: Key) -> Int {
    return abs(key.hashValue) % buckets.count
  }
Copy the code

This performs the calculation you saw earlier: it modulos the absolute value of the key’s hashValue against the size of the bucket array. We have placed it in its own function because it is used in a few different places.

There are four common things you can do with a hash table or dictionary:

  • insert a new element

  • Insert a new element

  • Look for the element

  • Update existing elements

  • Remove elements

The syntax for these is:

hashTable["firstName"] = "Steve"   // insert
let x = hashTable["firstName"]     // lookup
hashTable["firstName"] = "Tim"     // update
hashTable["firstName"] = nil       // delete
Copy the code

We could also use the subscript function to do all of this:

  public subscript(key: Key) - >Value? {
    get {
      return value(forKey: key)
    }
    set {
      if let value = newValue {
        updateValue(value, forKey: key)
      } else {
        removeValue(forKey: key)
      }
    }
  }
Copy the code

This requires three helper functions to do the actual work. Let’s look at value(forKey:), which retrieves an object from a hash table.

  public func value(forKey key: Key) -> Value? {
    let index = self.index(forKey: key)
    for element in buckets[index] {
      if element.key == key {
        return element.value
      }
    }
    return nil  // key not in hash table
  }
Copy the code

First, it calls index(forKey:) to convert the key to an array index. This gives us the bucket number, but if there’s a collision, the bucket could be used by multiple keys. Value (forKey:) loops through the bucket and compares keys one by one. If found, return the corresponding value, otherwise return nil.

The code to insert a new element or update an existing one is located in updateValue(_:forKey:). This is a little more complicated:

  public mutating func updateValue(_ value: Value, forKey key: Key) -> Value? {
    let index = self.index(forKey: key)
    
    // Do we already have this key in the bucket?
    for (i, element) in buckets[index].enumerated() {
      if element.key == key {
        let oldValue = element.value
        buckets[index][i].value = value
        return oldValue
      }
    }
    
    // This key isn't in the bucket yet; add it to the chain.
    buckets[index].append((key: key, value: value))
    count+ =1
    return nil
  }
Copy the code

Again, the first step is to convert the key to an array index to find buckets. Then we loop through the chain of the bucket. If we find the key in the chain, we update it with the new value. If the key is not in the chain, we insert the new key/value pair at the end of the chain.

As you can see, it is important to keep the chain short (by making the hash table large enough). Otherwise, you are in these for… If the in loop takes too long, the hash table will no longer behave like O(1), but more like O(n).

Delete similarly, traverse the chain again:

  public mutating func removeValue(forKey key: Key) -> Value? {
    let index = self.index(forKey: key)

    // Find the element in the bucket's chain and remove it.
    for (i, element) in buckets[index].enumerated() {
      if element.key == key {
        buckets[index].remove(at: i)
        count- =1
        return element.value
      }
    }
    return nil  // key not in hash table
  }
Copy the code

These are the basic functions of a hash table. They all work the same way: convert the key to an array index using its hash value, find the bucket, and then iterate through the chain of the bucket and perform the desired action.

Try these things on playground. It should work like a standard Swift Dictionary.

Resize the hash table

This version of HashTable always uses arrays of fixed size or capacity. If you want to store many items in a hash table, select a prime number greater than the maximum number of items for capacity.

The load factor for a hash table is the percentage of capacity currently in use. If there are 3 entries in the hash table with 5 buckets, then the loading factor is 3/5 = 60%.

If the hash table is small and the chain is long, the loading factor may be greater than 1, which is not a good idea.

If the load factor becomes large, greater than 75%, you can resize the hash table. The code to add this condition is left as an exercise for the reader. Remember, making the bucket array larger will change the array index that the key maps to! This requires you to insert all the elements again after resizing the array.

And then where? (Where to go from here?)

HashTable is very basic. SequenceType for efficient integration as the Swift standard library.

Translated by Andy Ron proofread by Andy Ron