Conflict resolution

  • Open addressing
  1. To initialize the hash, create an array
  2. Use hashfunc to find the position of the corresponding array, such as key=a, which corresponds to 0
  3. Assuming that key=p is also 0, but 0 is already occupied by A, we can only continue searching backwards until we find an empty pit.

Advantages: Relatively simple implementation Disadvantages: Poor efficiency when the load factor (number of elements/array length) is too large (over 70%). For example, when it is filled, the element Z should be at the top of the array, but it is at the bottom of the array because of the collision, so when you search for element Z, you have to go back all the way, and the time is O(N). When you delete an element, you’re not really deleting it. Because of the continuity of array space, even if the element is gone, the pit is still there.

  • The list

Conflict resolution through linked lists is the way maps are implemented in most languages today

  1. Initialize the number of buckets
  2. Use hash to determine which bucket the key should be in
  3. If multiple keys are assigned to the same bucket, it is a conflict. The conflicting keys are linked together in a linked list

advantages: In the event of a collision, elements can be chained together in a linked list, without the need for contiguous memory space required by open addressing.

disadvantages: The number of buckets should not be too large (a large contiguous space needs to be applied for; when deleting elements, there are too many fragments and waste), and the number of buckets should not be too small (most elements are in the same bucket. When there are many elements, the search time complexity is close to O(N).

hmap

Golang’s Map data structure is such a HMap

  • Count: indicates the number of key-value pairs
  • Flags: Indicates the status, such as whether the map is being written or migrated. Flags are required for operation because the map is not thread safe
  • Noverflow: The number of overflow buckets used
  • Hash0: Hash seed, used when doing key hashes
  • B: Number of buckets (2^B)
  • -Vanessa: You keep buckets
  • Oldbuckets: location of the oldbuckets (used for capacity expansion)
  • Nevacuate: indicates the migration progress. Buckets smaller than this address have been migrated
  • Extra: Overflow bucket related

Why two to the B buckets, six buckets, seven buckets?

There are only two main ways to hash

  1. Take mold method 100%30=10 this kind
  2. And the operation hash& (2^B-1), assuming B is 2 (3 buckets 0,1,2) then hash&0011, theoretically each bucket has a probability of being selected. If the number of buckets is 5 (0101) then the second lowest bucket is always 0 and there is always an empty bucket (e.g. 011 bucket 3)

bmap

  • Slice with a key higher than 8 bits can be used to quickly find the target data by comparing the higher 8 bits.
  • A Bmap can store 8 K/V, to make memory more compact, 8 keys together, 8 values together,

Assume that both key and value are two bytes, as shown in the following figure

capacity

The hmap mapextra

  • Overflow: The set of addresses used for overflow buckets
  • Oldoverflow: address of the oldoverflow bucket during migration
  • NextOverflow: indicates the address of the next idle overflow bucket

Overflow bucket creation

If B>=4 during map creation, it is highly likely that overflow buckets will be used, and 2^(B-4) overflow buckets will be created, which is contiguous with the regular HMAP in memory

Expansion rule

  • Double capacity: Load factor count/2^B > 6.5
  • Equal capacity expansion: When a large number of buckets overflow:
  1. B<=15 noverflow>=2^B
  2. B>15 noverflow>=2^15

The significance of equal expansion is that when the distribution of elements in the Bmap is not compact (if there are deleted elements), the following elements need to be compact again.

Why is the range map out of order

Each time the location of a bucket is random, and then the BMAP is iterated, and then the overflow bucket is iterated, looking for the next bucket…