Due to the long length of this article, the table of contents is arranged as follows

What is the Map

Wikipedia definition

In computer science, an associative array, map, symbol table, or dictionary is an abstract data type composed of a collection of (key, value) pairs, such that each possible key appears at most once in the collection.

Note: In computer science, a Map is an abstract data structure (associative array, symbol table, or dictionary) containing a collection of key-value pairs in which each possible key appears at most once.

operation

The operation of Map is mainly to add, delete, modify and check:

  • Adds key-value pairs to the collection
  • Removes a key-value pair from a collection
  • Modify an existing key-value pair
  • Find the corresponding value based on the specific key

implementation

There are two main ways to implement a Map: hash table and search tree. For example, Java’s hashMap is based on a hash table, while C++’s Map is based on a balanced search binary tree, a red-black tree. The following is a comparison of the time complexity of different implementations.

As you can see, the average and worst efficiency of the binary search tree is order log n for element lookup, and the average efficiency of the hash table implementation is order (1), but the worst case can reach order (n), but if the hash table is well designed, the worst case rarely happens (so readers don’t want to know how Go designed the Map). In addition, binary search trees return keys in order, while hash tables return keys out of order.

Hash table

Due to the hash table-based (also known as hash table) implementation of map in Go, this article does not explore the map implementation of the search tree. Here’s what the official Go blog says about Map.

One of the most useful data structures in computer science is the hash table. Many hash table implementations exist with varying properties, but in general they offer fast lookups, adds, and deletes. Go provides a built-in map type that implements a hash table.

To learn a hash table, you need to understand two concepts: hash functions and hash conflicts.

The hash function

Hash functions (often referred to as hash functions) are functions that can be used to map data of any size to fixed size values. Common ones include the MD5, SHA series, and so on.

A well-designed hash function should contain the following features:

  • Uniformity: A good hash function should map as uniformly as possible over its output range, that is, each hash value in the output range should be generated with roughly the same probability.
  • High efficiency: Hash efficiency is high, even long input parameters can be quickly calculated hash value.
  • Deterministic: The hash process must be deterministic, meaning that it must always produce the same hash value for a given input value.
  • Avalanche effect: Small changes in the input can cause huge changes in the output.
  • Irreversibility: The original data cannot be derived from the output value of the hash function.

Hash conflict

Once again, a hash function is a function that maps data of any size to a fixed size value. So, we can expect that even if the hash function is well designed, almost every input value will map to a different hash value. However, when the input data is large enough to exceed the maximum number that can be expressed by combinations of fixed size values, conflicts are inevitable! (See drawer Principle)

Drawer principle: if there are ten apples on the desk, put them in nine drawers. No matter how we put them, we will find at least one drawer with at least two apples in it. The drawer principle is sometimes called the pigeon nest principle.

How do I resolve hash conflicts

The more commonly used Has conflict resolution methods are chaining and open addressing.

Before we talk about the chain address method, let me explain two concepts.

  1. Hash bucket. A hash bucket (also called a slot, similar to a drawer in the drawer principle) can be thought of simply as a single hash value, all of which make up a hash space.
  2. Load factor. The load factor is the degree to which the elements in the hash table are filled. It is calculated by the formula: load factor = number of elements in the hash table/length of the hash table. The larger the load factor, the more elements you fill in, the more space you use, but the greater the chance of hash collisions. On the other hand, the smaller the load factor is, the fewer elements will be filled, the less conflicts will occur, the more space will be wasted, and the more times of expansion operations will be increased. The load factor is also a key indicator to determine whether the hash table is expanded. In The Java HashMap, the default load factor is 0.75. Python’s dict has a default load factor of 2/3.
Chain address method

The idea of the chain address method is to list all the elements mapped in a bucket.

The following is a simple Hash function H(key) = key MOD 7 to map a group of elements [50, 700, 76, 85, 92, 73, 101], and to understand the processing logic of chain address method to deal with Hash conflicts.

The way the chain address method resolves conflicts is similar in style to the way the adjacencies list of the graph is stored. When conflicts occur, they are organized with a single linked list.

Open addressing

For the chain-address method, the number of slots m is not directly related to the number of keys n. For open addressing, however, all elements are stored in the Hash table, so ensure that the Hash table’s slot m is greater than or equal to the key’s data n at all times (dynamically expand the Hash table if necessary).

There are many ways of open addressing: linear detection, square detection, random detection, and double hashing. Linear detection is used here to help readers understand the idea of open addressing.

  • Linear detection method

Let the Hash(key) represent the Hash value of the key and the number of slots in the Hash table (the size of the Hash table).

The linear detection rule can be expressed as:

If Hash(x) % M already has data, try (Hash(x) + 1) % M;

If (Hash(x) + 1) % M also has data, try (Hash(x) + 2) % M;

If (Hash(x) + 2) % M also has data, try (Hash(x) + 3) % M;

.

We also map [50, 700, 76, 85, 92, 73, 101] with the Hash function H(key) = key MOD 7 (MOD by divisor) to understand the Hash collisions handled by linear detection.

Empty indicates that the slot is empty; occupied indicates that the slot is occupied (the subsequent mapping to the slot requires linear downward exploration); lazy delete indicates that the data in the slot is cleared without freeing the storage space.

Compare the two solutions

For open addressing, it only has an array of data structure to complete the storage, inherited the advantages of array, CPU cache friendly, easy serialization operation. However, its memory utilization is not as good as chaining address method, and the cost of conflict is higher. When the data amount is clear and the loading factor is small, the open addressing method is suitable.

Linked list nodes can be created as needed, without having to apply for enough memory in advance as in open addressing, so linked addressing uses more memory than square root addressing. The chain-address method is more tolerant of loading factors and is suitable for storing large objects and large amounts of data in hash tables. It is also more flexible than open addressing and supports more optimization strategies, such as using red-black trees instead of linked lists. But chaining addresses requires extra space to store Pointers.

It’s worth noting that in Python, dict uses open addressing in case of hash collisions, whereas Java’s HashMap uses chaining.

Go to the Map to achieve

Like Python and Java, MAPS in Go are implemented based on hash tables. The way to resolve hash conflicts is the chained address method, that is, map is expressed by using the data structure of array + linked list.

Note: The map that appears later in this article refers uniformly to the map type implemented in Go.

Map data structure

The data in the map is stored in an array of buckets, each containing up to eight key-value pairs. ** Low-order hash bits are used to select buckets, and high-order hash bits are used to identify keys in an independent bucket. ** The high and low hash values are shown below

This article is based on go 1.15.2 Darwin/AMD64 analysis, source code located in SRC/Runtime /map.go.

  • The structure of a MAP is an HMAP
// A header for a Go map.
type hmap struct {
	count     int // represents the number of elements in the hash table, which is returned when len(map) is called.
	flags     uint8 // Status flags. The meanings of four status bits are explained in the following constants.
	B         uint8  The number of elements in the hash table can reach the loading factor *2^B
	noverflow uint16 // The approximate number of overflow buckets.
	hash0     uint32 // Hash the seed.

	buckets    unsafe.Pointer // The pointer to buckets array size 2^B, which is nil if the number of elements is 0.
	oldbuckets unsafe.Pointer // in case of expansion, the oldbuckets are Pointers to the oldbuckets, and the oldbuckets are half the size of the new buckets. In the non-expanded state, it's nil.
	nevacuate  uintptr        // Buckets less than this address indicate that the capacity expansion has been completed.

	extra *mapextra // This field is designed to optimize GC scanning. Used when neither key nor value contains Pointers and both can be inline. Extra is a pointer to type mapextra.
Copy the code
  • The structure of mapextra
// mapextra holds fields that are not present on all maps.
type mapextra struct {
    // If neither key nor value contains a pointer and can be inline(<=128 bytes)
    // Use the EXTRA field of hMAP to store overflow buckets. This prevents GC from scanning the entire map
    // However bmap.overflow is also a pointer. At this point we can only put these overflow Pointers
    // In hmap.extra.overflow and hmap.extra.oldoverflow
    // The overflow buckets are the same as the overflow buckets
    // Oldoverflow contains the overflow bucket of hmap.oldbuckets during capacity expansion
    overflow    *[]*bmap
    oldoverflow *[]*bmap

    // A pointer to the free Overflow bucket
    nextOverflow *bmap
}
Copy the code
  • Bmap structure
// A bucket for a Go map.
type bmap struct {
    // Tophash contains information about the highest byte (up to 8 bits) of the hash value for each key in the bucket (i.e., the high-order bits described earlier).
    // If tophash[0] < minTopHash, tophash[0] represents the bucket's evacuation state.
    tophash [bucketCnt]uint8
}
Copy the code

Bmap is bucket (barrel) memory model diagram is as follows (the code logic to view the SRC/CMD/compile/internal/gc/reflect. Go the bmap () function).

In the example above, the seventh cell and eighth cell of the bucket do not have corresponding key-value pairs. It is important to note that the key and value are stored separately. It is not as if key/value/key/value… In the form. Doing so makes the code organization a little more complicated, but it has the benefit of eliminating the padding needed for example map[int64]int. In addition, there is an OVERFLOW pointer after the 8 key pairs of data, because a bucket can only hold a maximum of 8 key pairs. If any extra key pairs fall into the current bucket, then another bucket (called an overflow bucket) needs to be built and linked through the Overflow pointer.

  • Important constant mark
const (
	// THE maximum number of key-value pairs that can be loaded in a bucket is 8
	bucketCntBits = 3
	bucketCnt     = 1 << bucketCntBits

  // The load factor that triggers expansion is 13/2=6.5
	loadFactorNum = 13
	loadFactorDen = 2

	// Keys and values that exceed 128 bytes are converted to Pointers
	maxKeySize  = 128
	maxElemSize = 128

	// The data offset should be the size of the Bmap structure, which needs to be properly aligned.
  // For amd64p32, this means that even if the pointer is 32 bits, it is 64 bit aligned.
	dataOffset = unsafe.Offsetof(struct {
		b bmap
		v int64
	}{}.v)


  // Each bucket (the link bucket containing the overflow if there is one), under the evacuated* States, would either contain all of its key-value pairs, or none (but not the stage where the evacuate() method is called, This method call occurs only on a write to the map, during which time other Goroutines cannot view the map). Simply put, the data in the bucket will either be moved together or none of them will be moved yet.
  // TopHash stores some special state values (indicating the migration state of the cell) in addition to the normal high 8-bit hash values. A normal tophash value should be at least 5, but some of the special state values are listed below.
	emptyRest      = 0 // Indicates that the cell is empty, and that cells higher than it or cells in overflows are empty. (This is the state when the bucket is initialized)
	emptyOne       = 1 // The empty cell has been moved to a new bucket
	evacuatedX     = 2 // The key pair has been moved, and the key is in the first half of the new buckets array
	evacuatedY     = 3 // The key pair has been moved, and the key is in the bottom half of the new buckets array
	evacuatedEmpty = 4 // The cell is empty and the entire bucket has been moved
	minTopHash     = 5 // The minimum normal value of tophash

	// flags
	iterator     = 1 // There may be an iterator using Buckets
	oldIterator  = 2 // There may be an iterator using Oldbuckets
	hashWriting  = 4 // A coroutine is writing a key to a map
	sameSizeGrow = 8 // Expand the capacity by the same amount

	// The ID of the bucket used for iterator checking
	noCheck = 1< < (8*sys.PtrSize) - 1
)
Copy the code

To sum up, let’s take B equals 4 as an example to show a complete map structure.

Create a map

There are two ways to initialize a map

make(map[k]v)
// Initialize map size to hint
make(map[k]v, hint)
Copy the code

With no initialization size specified, hint<=8 (bucketCnt), go calls the makemap_small function (source location SRC/Runtime /map.go) and allocates directly from the heap.

func makemap_small(a) *hmap {
	h := new(hmap)
	h.hash0 = fastrand()
	return h
}
Copy the code

When Hint >8, the makemap function is called

// If the compiler believes that map and the first bucket can be created directly on the stack, h and bucket may be non-empty
// If h! Equals nil, then the map can be created directly in H
// if H.bangkets! = nil, then the bucket that h points to can be used as the first bucket in the map
func makemap(t *maptype, hint int, h *hmap) *hmap {
  // Math.muluintptr returns the product of hint with t.cucket. Size and determines whether the product overruns.
	mem, overflow := math.MulUintptr(uintptr(hint), t.bucket.size)
// The value of maxAlloc varies from platform to platform. For details, see SRC /runtime/malloc.go
	if overflow || mem > maxAlloc {
		hint = 0
	}

// initialize Hmap
	if h == nil {
		h = new(hmap)
	}
  // Get hash seeds from fastrand
	h.hash0 = fastrand()

	// Give a hint to the number of elements that can be contained
	B := uint8(0)
	for overLoadFactor(hint, B) {
		B++
	}
	h.B = B

	// Allocate the initial hash table
  // If B is 0, the buckets field is subsequently lazily assigned in the mapassign method
	ifh.B ! =0 {
		var nextOverflow *bmap
    // makeBucketArray creates an array of buckets at the bottom of the map, which allocates at least h.B^2.
		h.buckets, nextOverflow = makeBucketArray(t, h.B, nil)
		ifnextOverflow ! =nil {
    h.extra = new(mapextra)
			h.extra.nextOverflow = nextOverflow
		}
	}

	return h
}
Copy the code

Allocate the makeBucketArray function of buckets array

// makeBucket creates an array of buckets for the map.
func makeBucketArray(t *maptype, b uint8, dirtyalloc unsafe.Pointer) (buckets unsafe.Pointer, nextOverflow *bmap) {
	base := bucketShift(b)
	nbuckets := base
  // For a small b value (less than 4), that is, the number of buckets is less than 16, the possibility of using overflow buckets is small. In this case, the calculation overhead is avoided.
	if b >= 4 {
    // When the number of buckets is greater than or equal to 16, 2^(b-4) additional overflow buckets are normally created
		nbuckets += bucketShift(b - 4)
		sz := t.bucket.size * nbuckets
		up := roundupsize(sz)
		ifup ! = sz { nbuckets = up / t.bucket.size } }// there are two cases of dirtyalloc. If it's nil, a new underlying array is allocated. If it's not nil, then it's referring to the underlying array that was allocated, which was allocated by the same t and b arguments as before via makeBucketArray. If the array is not empty, you need to empty and reuse the previous data from that array.
	if dirtyalloc == nil {
		buckets = newarray(t.bucket, int(nbuckets))
	} else {
		buckets = dirtyalloc
		size := t.bucket.size * nbuckets
		ift.bucket.ptrdata ! =0 {
			memclrHasPointers(buckets, size)
		} else {
			memclrNoHeapPointers(buckets, size)
		}
}

  // If b is greater than or equal to 4, some overflow buckets are pre-allocated.
  // To minimize the overhead of tracking these overflow buckets, the following convention is used:
  // If the overflow pointer of the pre-allocated bucket is nil, we can bumping the pointer to get more buckets available.
  / / (about pointer collision: assuming that memory is absolutely neat, all used memory aside and free memory on the other side, there is a middle pointer as a cut-off point indicator, the allocated memory is just put the pointer to the free space there move a and the object is equal to the size of the distance, the distribution, called "pointer collision")
  // For the last overflow bucket, you need a secure non-nil pointer to it.
	ifbase ! = nbuckets { nextOverflow = (*bmap)(add(buckets, base*uintptr(t.bucketsize)))
		last := (*bmap)(add(buckets, (nbuckets- 1) *uintptr(t.bucketsize)))
		last.setoverflow(t, (*bmap)(buckets))
	}
	return buckets, nextOverflow
}
Copy the code

Based on the above code, we can determine that under normal conditions, the normal and overflow buckets have contiguable storage space in memory, but are simply referenced by different fields in the HMAP.

The hash function

When initializing the GO runtime environment (schedInit in SRC /runtime/proc.go), you need to initialize the hash using the algInit method.

func schedinit(a) {
	lockInit(&sched.lock, lockRankSched)
	
	...
	
	tracebackinit()
	moduledataverify()
	stackinit()
	mallocinit()
	fastrandinit() // must run before mcommoninit
	mcommoninit(_g_.m, - 1)
	cpuinit()       // must run before alginit
	// Alginit ()
	alginit()       // maps must not be used before this call
	modulesinit()   // provides activeModules
	typelinksinit() // uses maps, activeModules
	itabsinit()     // uses activeModules. goargs() goenvs() parsedebugvars() gcinit() ... }Copy the code

For the choice of hash algorithm, the program will determine whether or not AES is supported based on the current architecture. If yes, AES hash is used. The implementation code is in SRC/Runtime /asm_{386, AMd64,arm64}.s. If we do not support, the hash algorithm based on xxhash algorithm (code.google.com/p/xxhash/) and…

func alginit(a) {
	// Install AES hash algorithms if the instructions needed are present.
	if (GOARCH == "386" || GOARCH == "amd64") &&
		cpu.X86.HasAES && // AESENC
		cpu.X86.HasSSSE3 && // PSHUFB
		cpu.X86.HasSSE41 { // PINSR{D,Q}
		initAlgAES()
		return
	}
	if GOARCH == "arm64" && cpu.ARM64.HasAES {
		initAlgAES()
		return
	}
	getRandomData((*[len(hashkey) * sys.PtrSize]byte)(unsafe.Pointer(&hashkey))[:])
	hashkey[0] | =1 // make sure these numbers are odd
	hashkey[1] | =1
	hashkey[2] | =1
	hashkey[3] | =1
}
Copy the code

When creating the map above, we can know that the hash seed of the map is obtained by h.hash0 = fastrand(). It is used in hasher in the following maptype, which you will see generated in the following content.

type maptype struct {
	typ    _type
	key    *_type
	elem   *_type
	bucket *_type
  // The first argument to hasher is a pointer to the key. H.hash0 = fastrand() yields hash0, which is the second argument to hasher.
  // The hasher method returns the hash value.
	hasher     func(unsafe.Pointer, uintptr) uintptr
	keysize    uint8  // size of key slot
	elemsize   uint8  // size of elem slot
	bucketsize uint16 // size of bucket
	flags      uint32
}
Copy the code

The map operation

Assume that the key has been hashed to obtain the 64-bit hash value. If B is equal to 5, the length of buckets, that is, the number of buckets, is 32 (2 ^ 5).

For example, if you want to place a key in a map, the hash value of the key is as follows:

As we already know, low-order hash bits are used to select buckets, and high-order hash bits are used to identify keys in a separate bucket. When B is equal to 5, then the low hash we choose is also 5 bits, that is, 01010, which has a decimal value of 10 for bucket 10. Then use the top 8 bits of the hash to find the key’s position in the bucket. There is no key in the bucket to start with, so new keys and values will be placed in the first key and value space.

Note: For the choice of high and low bits, the operation is essentially taking mod, but taking mod is very expensive, in the actual code implementation is a bit operation. Here is the topHash implementation code.

func tophash(hash uintptr) uint8 {
	top := uint8(hash >> (sys.PtrSize*8 - 8))
	if top < minTopHash {
		top += minTopHash
	}
	return top
}
Copy the code

Hash collisions occur when two different keys fall into the same bucket. The solution of GO is the chain address method (in order to make readers understand better, only describe the case of non-expansion and the key is added for the first time) : find the first space in the bucket in order and record it. If the key is not found in the bucket and its overflow bucket, place the key in the first space. If no overflow bucket is found, an overflow bucket is added and placed in the first space of the overflow bucket. (Because this is the first time that the key is added, the condition that the key already exists is not described.)

B in the figure above has a value of 5, so the number of buckets is 32. Hash function is used to calculate the hash value of the key to be inserted. The lower 5 bits hash 00110 corresponds to bucket 6. The high 8 bits are 10010111, which is 151 in decimal. Since the first 6 cells in the bucket have been filled with normal hash values (traversal), the high hash value corresponding to 151 is placed in the 7th cell, and the corresponding key and value are placed in the corresponding seventh space.

If we are looking for a key, then we will look for each cell in the bucket based on the high hash value. If we are not looking for a cell in the bucket, and overflow is not nil, then we will continue to look for the overflow bucket until we find it. If all cells have been searched and not yet found, then we will return the default value of key (for example, key is int, Returns 0).

To find the key

For map element lookup, the source code implementation is as follows

func mapaccess1(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer {
  // If race detection is enabled -- race
	ifraceenabled && h ! =nil {
		callerpc := getcallerpc()
		pc := funcPC(mapaccess1)
		racereadpc(unsafe.Pointer(h), callerpc, pc)
		raceReadObjectPC(t.key, key, callerpc, pc)
	}
  // If memory Sanitizer -msan is enabled
	ifmsanenabled && h ! =nil {
		msanread(key, t.key.size)
	}
  // If the map is empty or the number of elements is zero, return zero
	if h == nil || h.count == 0 {
		if t.hashMightPanic() {
			t.hasher(key, 0) // see issue 23734
		}
		return unsafe.Pointer(&zeroVal[0])}// Note that this is bitwise and operable
  // When the value of H.flags is hashWriting (representing other Goroutines writing keys into the map), the bits calculated should not be 0 and not be formed inside, so the error lags behind.
  // This also indicates that go's map is non-concurrent secure
	ifh.flags&hashWriting ! =0 {
		throw("concurrent map read and map write")}// Different hash algorithms are used for different types of keys. See the logic in the typeHash function in SRC/Runtime /alg.go
	hash := t.hasher(key, uintptr(h.hash0))
	m := bucketMask(h.B)
  // Find the corresponding bucket by bit and operation
	b := (*bmap)(add(h.buckets, (hash&m)*uintptr(t.bucketsize)))
  // If oldbuckets are not empty, then the map is expanded
  // If a capacity expansion occurs, the data in the old buckets may not have been moved to the new buckets
  // You need to look at the old buckets first
	ifc := h.oldbuckets; c ! =nil {
		if! h.sameSizeGrow() { m >>=1
		}
		oldb := (*bmap)(add(c, (hash&m)*uintptr(t.bucketsize)))
    // If tophash[0] in oldbuckets is one of evacuatedX, evacuatedY, evacuatedEmpty
    // The sentence evacuated() would return true, representing the completion of the removal.
    // Therefore, the oldbucket is traversed only if the migration is not complete
		if! evacuated(oldb) { b = oldb } }// Fetch the tophash value of the current key
	top := tophash(hash)
  // The following is the core logic of the search
  // Double loop traversal: the outer loop traverses the bucket to the overflow bucket; The inner layer is the traversal of the cells in the bucket
  // There are three conditions to break the loop. The first condition is that the key value has been found. The second is the current bucket no overflow bucket;
  // emptyRest is the tophash value of the bucket with the cell bit in it. This value is explained in the previous section. It indicates that the cell behind the bucket is not used, so there is no need to continue traversal.
bucketloop:
	for; b ! =nil; b = b.overflow(t) {
		for i := uintptr(0); i < bucketCnt; i++ {
      // Determine whether tophash values are equal
			ifb.tophash[i] ! = top {if b.tophash[i] == emptyRest {
					break bucketloop
				}
				continue
      }
      // Since the key in the bucket is stored in continuous storage space, the bucket address + the data offset (the size of the bmap structure) + the size of the keysize can be used to obtain the address of k
      // The address of value is computed similarly, but 8 keysize memory addresses are added
			k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
			if t.indirectkey() {
				k = *((*unsafe.Pointer)(k))
			}
      // Determine whether keys are equal
			if t.key.equal(key, k) {
				e := add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize))
				if t.indirectelem() {
					e = *((*unsafe.Pointer)(e))
				}
				return e
			}
		}
	}
  // If no buckets are found, return zero
	return unsafe.Pointer(&zeroVal[0])}Copy the code

The following is a diagram of the mapAccess1 lookup process

Map element lookup, the corresponding GO code has two forms

	/ / form a
	v := m[k]
	/ / form 2
	v, ok := m[k]
Copy the code

The code implementation of Form one is the mapAccess1 method described above. In addition, there is a mapAccess2 method in the source code, which has the following function signature.

func mapaccess2(t *maptype, h *hmap, key unsafe.Pointer) (unsafe.Pointer, bool) {}
Copy the code

Compared with mapAccess1, mapAccess2 returns a bool indicating whether the corresponding key is found in the map. Since it is basically the same as mapAccess1, the detailed code is no longer posted.

At the same time, the source code also has the mapaccessK method, its function signature is as follows.

func mapaccessK(t *maptype, h *hmap, key unsafe.Pointer) (unsafe.Pointer, unsafe.Pointer) {}
Copy the code

Compared with MapAccess1, mapaccessK returns both key and value, and its code logic is consistent.

Assignment key

For the logic that writes a key, the source code implementation is as follows

func mapassign(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer {
  // If h is null, the assignment will cause a panic
  // For example
  // var m map[string]int
	// m["k"] = 1
	if h == nil {
		panic(plainError("assignment to entry in nil map"))}// If race detection is enabled -- race
	if raceenabled {
		callerpc := getcallerpc()
		pc := funcPC(mapassign)
		racewritepc(unsafe.Pointer(h), callerpc, pc)
		raceReadObjectPC(t.key, key, callerpc, pc)
	}
  // If memory Sanitizer -msan is enabled
	if msanenabled {
		msanread(key, t.key.size)
	}
  // Another goroutine is writing keys to the map
	ifh.flags&hashWriting ! =0 {
		throw("concurrent map writes")}// Use key and hash seed to calculate the corresponding hash value
	hash := t.hasher(key, uintptr(h.hash0))

  // Bitwise or with hashWriting
  // If we call t. asher again, a panic will occur because the goroutine may not have finished writing the key.
	h.flags ^= hashWriting

	if h.buckets == nil {
		h.buckets = newobject(t.bucket) // newarray(t.bucket, 1)
}

again:
  // bucketMask returns 2 to the power of B minus 1
  // Hence, we operate the hash value and the return value of bucketMask in bit-by-bit to determine the number of buckets returned
	bucket := hash & bucketMask(h.B)
  // If map is moving (i.e., H.ldbuckets! = nil), then the relocation is carried out first.
	if h.growing() {
		growWork(t, h, bucket)
	}
  // Calculate the memory location of the bucket
  // post = start + bucketNumber * bucketsize
	b := (*bmap)(unsafe.Pointer(uintptr(h.buckets) + bucket*uintptr(t.bucketsize)))
	top := tophash(hash)

	var inserti *uint8
	var insertk unsafe.Pointer
	var elem unsafe.Pointer
bucketloop:
	for {
    // Iterate over the 8 cells in the bucket
		for i := uintptr(0); i < bucketCnt; i++ {
      // There are two cases. The first case is that the tophash value of the cell bit is different from the current tophash value
      / / in b.t ophash [I]! = top
      // It is theoretically possible to have an empty slot
      // Empty is a map with empty slots:
      // [h0][h1][h2][h3][h4][e][e][e]
      // But after the delete operation has been performed, it might look like this:
      // [h0][h1][e][e][h5][e][e][e]
      // So if you insert it again, try to insert it forward
      // [h0][h1][e][e][h5][e][e][e]
      / / ^
      / / ^
      // This position
      // Write down the empty position before the loop
      // Because it is possible to find the same key later, or not find the same key
			ifb.tophash[i] ! = top {// If the cell bit is empty, insert it at the corresponding position
				if isEmpty(b.tophash[i]) && inserti == nil {
					inserti = &b.tophash[i]
					insertk = add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
					elem = add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize))
				}
				if b.tophash[i] == emptyRest {
					break bucketloop
				}
				continue
			}
      // In the second case, the tophash value of the cell bit is equal to the current tophash value
			k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
			if t.indirectkey() {
				k = *((*unsafe.Pointer)(k))
			}
      // Note that even if the tophash value of the current cell bit is equal, it is not necessarily the same as the corresponding key
			if! t.key.equal(key, k) {continue
			}
			// Update the key if it already exists
			if t.needkeyupdate() {
				typedmemmove(t.key, k, key)
			}
      // Here we get the memory address of the value to be inserted
      // pos = start + dataOffset + 8*keysize + i*elemsize
			elem = add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize))
      // If you get there, skip to the done end logic
			goto done
		}
    // If no corresponding empty cell or overwritten cell is found after traversal of the 8 cells in the bucket, the overflow bucket is traversed
		ovf := b.overflow(t)
    // If no suitable cell is found in the overflow bucket, break out of the loop.
		if ovf == nil {
			break
		}
		b = ovf
	}

	// If no suitable cell is found in either the existing bucket or the overflow bucket for the key to write, the following two conditions may be triggered
  // Situation 1:
  // Check whether the load factor of the current map reaches the threshold of 6.5 or whether the number of overflow buckets of the current map is too many. If either of the two conditions exists, perform capacity expansion.
  // hashGrow() is not actually expanded, the migration (replication) of the hash table data is done by growWork().
  // Jump into the again logic and iterate over the new bucket again after growWork().
	if! h.growing() && (overLoadFactor(h.count+1, h.B) || tooManyOverflowBuckets(h.noverflow, h.B)) {
		hashGrow(t, h)
		goto again // Growing the table invalidates everything, so try again
	}

  // Situation 2:
// If case 1 is not met, a new overflow bucket is created for the current bucket, and tophash and key are inserted into position 0 of the memory corresponding to the new overflow bucket
	if inserti == nil {
		// all current buckets are full, allocate a new one.
		newb := h.newoverflow(t, b)
		inserti = &newb.tophash[0]
		insertk = add(unsafe.Pointer(newb), dataOffset)
		elem = add(insertk, bucketCnt*uintptr(t.keysize))
	}

  // Store the new key and value at the insert position
	if t.indirectkey() {
		kmem := newobject(t.key)
		*(*unsafe.Pointer)(insertk) = kmem
		insertk = kmem
	}
	if t.indirectelem() {
		vmem := newobject(t.elem)
		*(*unsafe.Pointer)(elem) = vmem
	}
	typedmemmove(t.key, insertk, key)
	*inserti = top
  // The number of keys in the map +1
	h.count++

done:
	if h.flags&hashWriting == 0 {
		throw("concurrent map writes")
	}
	h.flags &^= hashWriting
	if t.indirectelem() {
		elem = *((*unsafe.Pointer)(elem))
	}
	return elem
}
Copy the code

After analyzing the mapassign code, we find that mapassign does not write the value corresponding to the inserted key to the corresponding memory, but returns the memory address where the value should be inserted. To find out when value is written to memory, examine the following map.go code.

package main

func main(a) {
	m := make(map[int]int)
	for i := 0; i < 100; i++ {
		m[i] = Awesome!}}Copy the code

M [I] = 666

$ go tool compile -S map.go
...
        0x0098 00152 (map.go:6) LEAQ    type.map[int]int(SB), CX
        0x009f 00159 (map.go:6) MOVQ    CX, (SP)
        0x00a3 00163 (map.go:6) LEAQ    "". autotmp_2+184(SP), DX 0x00ab 00171 (map.go:6) MOVQ DX, 8(SP) 0x00b0 00176 (map.go:6) MOVQ AX, 16(SP) 0x00b5 00181 (map.go:6) CALL runtime.mapassign_fast64(SB) 0x00ba 00186 (map.go:6) MOVQ 24(SP), AX 24(SP), AX 24(SP) The memory address where value should be stored 0x00BF 00191 (map.go:6) MOVQThe $666, (AX) // put 666 in the address...Copy the code

The last step of the assignment is actually done by the compiler’s extra assembly instructions, so there’s some work left undone by Runtime. So, in GO, the compiler and Runtime work together to do some complicated work. At the same time, in peacetime learning go source code implementation, when necessary, also need to see some assembly code.

Remove the key

Once you understand the logic of assigning keys, the logic of deleting keys is relatively simple. We won’t cover that in this article, but you can check out the mapDelete method logic in SRC/Runtime /map.go.

Traverse map

Conclusion: The result of iterating a map is unordered

	m := make(map[int]int)
	for i := 0; i < 10; i++ {
		m[i] = i
	}
	for k, v := range m {
		fmt.Println(k, v)
	}
Copy the code

Running the above code, we can see that the order of output is different each time.

Map traversal is done by traversing the buckets sequentially and traversing the buckets and the cells in their Overflow buckets as needed. However, after map expansion, key relocation will occur, which will result in keys originally in one bucket may fall into other buckets after relocation. From this perspective, the result of traversing the map cannot be in the original order (see map expansion below for details).

In fact, in order to ensure that the result of the map traversal is unordered, go does the following: The map traversal does not start from a fixed bucket with no. 0. Each traversal starts from a bucket with a random number and then starts from a random cell. It then iterates through the buckets until it returns to the starting bucket.

In the example above, a map is traversed in an unexpanded state. If the map is being expanded, check whether the traversal bucket has been moved. If the data is still in the old bucket, go to the old bucket to get the data.

Note: We will cover incremental and equivalent expansion in the following sections. When incremental expansion occurs, data from an old bucket may be split into two different buckets. In this case, if you need to traverse data from the old bucket, such as 1, you cannot fetch all the data from the old bucket. Only the keys from the old bucket 1 that were allocated to the new bucket 1 after the fission can be removed (please refer back to the map expansion section below).

For space reasons, the detailed source code for map traversal will not be commented and posted in this article. Readers can check the source code SRC/Runtime /map.go mapiterInit () and mapiterNext () method logic.

Here’s a comment on the key code for the random guarantee in mapiterinit()

// Generate random numbers
r := uintptr(fastrand())
if h.B > 31-bucketCntBits {
   r += uintptr(fastrand()) << 31
}
// Determines which random bucket to start with
it.startBucket = r & bucketMask(h.B)
// Determines the location of a random cell in each bucket
it.offset = uint8(r >> h.B & (bucketCnt - 1))
Copy the code

The map scale

When we talk about the load factor in this article, we mentioned that the load factor is the key indicator to determine whether the hash table is scaled. In GO map expansion, in addition to the loading factor, the number of overflow buckets is another key indicator for expansion.

To ensure access efficiency, when a key is to be added, modified, or deleted from a map, the system checks whether the map needs to be expanded. In fact, expansion is to exchange space for time. Mapassign = mapAssign = mapassign = MapAssign = MapAssign

  1. It is judged that the critical point of loading factor has been reached, that is, the number of elements >= the total number of buckets * 6.5, which indicates that most buckets may be nearly full (that is, the average number of key-value pairs stored in each bucket reaches 6.5). If new elements are inserted, there is a high probability that they need to be hung on the overflow bucket.
func overLoadFactor(count int, B uint8) bool {
	return count > bucketCnt && uintptr(count) > loadFactorNum*(bucketShift(B)/loadFactorDen)
}
Copy the code
  1. Check whether the number of overflow buckets is too many. If the number of overflow buckets is less than 2 ^ 15 and the number of overflow buckets is greater than or equal to the number of buckets, the number of overflow buckets is too many. When the total number of buckets is greater than or equal to 2 ^ 15, it is directly compared with 2 ^ 15. When the total number of buckets is greater than or equal to 2 ^ 15, it is considered that too many buckets are spilled.
func tooManyOverflowBuckets(noverflow uint16, B uint8) bool {
	if B > 15 {
		B = 15
	}
	return noverflow >= uint16(1)<<(B&15)}Copy the code

The second point is actually a supplement to the first point. Because when the load factor is relatively small, it is possible that the map lookup and insertion efficiency is also very low, and point 1 does not recognize this situation. The apparent phenomenon is that the number of molecules to calculate the loading factor is small, that is, the total number of elements in the map is small, but the number of buckets is large (the actual number of buckets allocated is large, including a large number of overflow buckets).

In some scenarios, such as continuous addition and deletion, the overflow bucket number will increase, but the load factor is not high enough to reach the threshold of point 1, and expansion cannot be triggered to alleviate the situation. This will result in low bucket usage, sparse value storage, and very low lookup and insertion efficiency, hence point 2. It’s like an empty city. There are a lot of houses, but very few people. They’re all scattered.

As shown in the figure above, due to the continuous addition and deletion of the map, bucket 0 is taken as an example, resulting in a large number of sparse buckets in the bucket chain.

In both cases, the official solution was different

  • For 1, we take B + 1, create a new array of buckets, the new buckets are twice the size of the original, and then relocate the old buckets data to the new buckets. This approach is called incremental scaling.
  • Instead of increasing the capacity of buckets, the number of buckets remains the same, and the move is repeated by rearranging loose key pairs to increase bucket usage and ensure faster access. This method is called equivalent scaling.

For the 2 solution, there is an extreme case: if the keys inserted into the map are all the same, they will fall into the same bucket. If more than 8 keys are inserted into the map, then they will fall into the same bucket. As a result, there will be too many Overflow buckets. Moving elements around doesn’t really solve the problem, because then the whole hash table is reduced to a linked list, and the efficiency becomes O(n). But each map in Go has a random hash seed at makemap during initialization, so it’s not that easy to construct such collisions.

In the source code, the main functions related to scaling are the hashGrow() and growWork() functions. The hashGrow() function doesn’t actually “relocate”, it just assigns new buckets and hangs the old ones on the Oldbuckets field. The action of real buckets is in growWork(), and the action of calling growWork() is in mapassign() and mapdelete(). When you insert (modify) or delete a key, you attempt to relocate buckets. They check to see if oldbuckets have been moved (check if oldbuckets are nil) before deciding whether to proceed with the move.

HashGrow () function

func hashGrow(t *maptype, h *hmap) {
  // If condition 1 is reached, then the value of B is increased by 1, which is equal to twice the original value
  // If not, expand the capacity by the same amount according to condition 2, so B remains unchanged
	bigger := uint8(1)
	if! overLoadFactor(h.count+1, h.B) {
		bigger = 0
		h.flags |= sameSizeGrow
	}
  // Mark the old buckets
	oldbuckets := h.buckets
  // Apply for a new buckets space
	newbuckets, nextOverflow := makeBucketArray(t, h.B+bigger, nil)
  // Notice the &^ operator. The logic of this piece of code is to transfer flag bits
	flags := h.flags &^ (iterator | oldIterator)
	ifh.flags&iterator ! =0 {
		flags |= oldIterator
	}
	// Commit grow (atomic WRT GC)
	h.B += bigger
	h.flags = flags
	h.oldbuckets = oldbuckets
	h.buckets = newbuckets
  // The relocation progress is 0
	h.nevacuate = 0
  // The overflow buckets count is 0
	h.noverflow = 0

  // If the HMAP stores overflow buckets through the extra field
	ifh.extra ! =nil&& h.extra.overflow ! =nil {
		ifh.extra.oldoverflow ! =nil {
			throw("oldoverflow is not nil")
		}
		h.extra.oldoverflow = h.extra.overflow
		h.extra.overflow = nil
	}
	ifnextOverflow ! =nil {
		if h.extra == nil {
			h.extra = new(mapextra)
		}
		h.extra.nextOverflow = nextOverflow
	}
}
Copy the code

GrowWork () function

func growWork(t *maptype, h *hmap, bucket uintptr) {
  // To make sure that the bucket we are moving is the one we are using
  // If the current key maps to the old bucket1, then move the bucket1.
	evacuate(t, h, bucket&h.oldbucketmask())

	// If the capacity expansion is not complete, move another bucket.
	if h.growing() {
		evacuate(t, h, h.nevacuate)
	}
}
Copy the code

As you can see from the growWork() function, the core logic for relocation is the evacuate() function. Here’s a question for the reader to ponder: Why move at most two buckets at a time? This is actually a performance consideration. If the map stores billions of key-values, a one-time move will cause a large delay, so the gradual move strategy is used.

Before explaining this logic, the reader is required to understand two things.

  • 1. Change the bucket number

As mentioned earlier, both incremental expansion (Condition 1) and equal expansion (condition 2) require bucket relocation. As for the same capacity expansion, the number of buckets remains the same, so the relocation can be carried out according to the sequence number. For example, the old bucket no. 0 is still moved to the new bucket no. 0.

For incremental expansion, however, things are different. For example, if B is equal to 5, then when we incrementalize it, B will become 6. The low hash value that determines which bucket the key falls into changes (from 5 bits to 6 bits), and the process of fetching the new low hash value is called rehash.

Therefore, in incremental expansion, the bucket number of a key before and after the migration may be equal to the original bucket number, or it may be added 2^B (the original B value), depending on whether the reciprocal B+1 bit of the low hash value is 0 or 1.

As shown in the figure above, when the original B = 3, the old buckets have length 8, and cell 2 and cell 5 in bucket 2 have the same hash value, but their lower 4 bits are 0010 and 1010 respectively. When incremental expansion occurs, number 2 is moved to bucket number 2 of the new buckets array, and number 5 is moved to bucket number 10 of the new Buckets array, with a difference of two to the third power.

  • Point 2: Determine the relocation interval

In the source code, there are bucket X and bucket Y concept, in fact, is the incremental expansion to the original 2 times, the number of buckets is 2 times the original, the first half of the bucket is called bucket X, the second half of the bucket is called bucket Y. A key in a bucket may be split into two buckets, one in bucket X, or one in bucket y. Therefore, before you move a cell, you need to know where the key in the cell falls (for the same bucket, the difference between buckets X and Y is the size of the old buckets, namely 2^old_B).

Here’s the question: Why is it important to determine where the key falls?

Once you have identified the target bucket to move to, the move is easier. Copy the source key/value value to the corresponding location in the destination. Set the tophash of key in the original buckets as evacuatedX or evacuatedY, indicating that bucket X or y has been relocated to a new map, The tophash of the new map normally takes the highest 8 bits of the key hash.

The migrate core code, evacuate() function, is formally interpreted below.

Evacuate () function

func evacuate(t *maptype, h *hmap, oldbucket uintptr) {
  // First locate the old bucket's address
	b := (*bmap)(add(h.oldbuckets, oldbucket*uintptr(t.bucketsize)))
  // newbit indicates the number of old buckets before expansion
	newbit := h.noldbuckets()
  // Check whether the bucket has been moved
	if! evacuated(b) {// Official TODO, maybe in a later version
		// TODO: reuse overflow buckets instead of using new ones, if there
		// is no iterator using the old buckets.  (If !oldIterator.)

    // xy contains the high and low range of the destination memory information
    // x.b is the destination bucket
    // x.k indicates the memory address of the current key stored in the corresponding destination bucket
    // x.e refers to the memory address that stores the current value in the corresponding destination bucket
		var xy [2]evacDst
		x := &xy[0]
		x.b = (*bmap)(add(h.buckets, oldbucket*uintptr(t.bucketsize)))
		x.k = add(unsafe.Pointer(x.b), dataOffset)
		x.e = add(x.k, bucketCnt*uintptr(t.keysize))

    // Bucket y is calculated only when incremented (in parallel with useY)
		if! h.sameSizeGrow() { y := &xy[1]
			y.b = (*bmap)(add(h.buckets, (oldbucket+newbit)*uintptr(t.bucketsize)))
			y.k = add(unsafe.Pointer(y.b), dataOffset)
			y.e = add(y.k, bucketCnt*uintptr(t.keysize))
		}

    // The evacuate function migrates only one bucket at a time. Therefore, it traverses all the cells in the bucket and copies the cells with values to the new location.
    // Buckets also link to overflow buckets, which also need to be relocated.
    // So there will also be a two-layer loop, with the outer layer traversing the bucket and the overflow bucket, and the inner layer traversing all the cells in the bucket.
    
    // Iterate through the current bucket and the following overflow bucket overflow bucket
    // Note: the initial b is the old bucket to be moved
		for; b ! =nil; b = b.overflow(t) {
			k := add(unsafe.Pointer(b), dataOffset)
			e := add(k, bucketCnt*uintptr(t.keysize))
      // Iterate through the bucket for the corresponding tophash, key, and value of cell, I, k, and e, respectively
			for i := 0; i < bucketCnt; i, k, e = i+1, add(k, uintptr(t.keysize)), add(e, uintptr(t.elemsize)) {
				top := b.tophash[i]
        // If the tophash value of the current cell is emptyOne or emptyRest, the cell has no key. And marks it as evacuatedEmpty, meaning it "has been relocated."
				if isEmpty(top) {
					b.tophash[i] = evacuatedEmpty
					continue
				}
        // This does not normally happen
        // Unmoved cells can only be emptyOne, emptyRest, or a normal top hash (greater than or equal to minTopHash).
				if top < minTopHash {
					throw("bad map state")
				}
				k2 := k
        // If key is a pointer, dereferencing it
				if t.indirectkey() {
					k2 = *((*unsafe.Pointer)(k2))
				}
				var useY uint8
        // If the capacity is expanded incrementally
				if! h.sameSizeGrow() {// Calculate the hash to determine whether the current key and Vale are to be moved to bucket x or bucket y
					hash := t.hasher(k2, uintptr(h.hash0))
					ifh.flags&iterator ! =0&&! t.reflexivekey() && ! t.key.equal(k2, k2) {// There is a special case: there is a key, and every time you hash it, you get a different result.
            // This key is the result of math.nan (), which means not a number and type float64.
            // When it is used as the key of the map, there is a problem: the hash value calculated again is different from the hash value calculated when it was inserted into the map!
            // This key is never retrieved by the Get operation! When the m[math.nan ()] statement is used, the result is not found.
            // This key can only be found when traversing the entire map.
            // Also, you can insert multiple math.nan () keys into a map, and they will not be overwritten by each other.
            // When the relocation hits the key of math.nan (), only the lowest order of the tophash determines whether to assign X part or Y part (if the capacity is expanded to double the number of buckets). If the lowest value of tophash is 0, allocate to X part; If it is 1, it is assigned to the Y part.
						useY = top & 1
						top = tophash(hash)
          // For a normal key, enter the else logic below
					} else {
						ifhash&newbit ! =0 {
							useY = 1}}}if evacuatedX+1! = evacuatedY || evacuatedX^1! = evacuatedY { throw("bad evacuatedN")}// evacuatedX + 1 == evacuatedY
				b.tophash[i] = evacuatedX + useY
        // useY is either 0 or 1. In this case, either the start memory location at bucket X or the start memory location at bucket Y (this is only possible with incremental synchronization).
				dst := &xy[useY]

        // If the destination bucket is already full (8 cells), create a new bucket and move to the overflow bucket.
				if dst.i == bucketCnt {
					dst.b = h.newoverflow(t, dst.b)
					dst.i = 0
					dst.k = add(unsafe.Pointer(dst.b), dataOffset)
					dst.e = add(dst.k, bucketCnt*uintptr(t.keysize))
				}
				dst.b.tophash[dst.i&(bucketCnt- 1)] = top
        // If the key to be moved is a pointer, copy the pointer to it
				if t.indirectkey() {
					*(*unsafe.Pointer)(dst.k) = k2 // copy pointer
        // If the key to be moved is a value, copy the value to it
				} else {
					typedmemmove(t.key, dst.k, k) // copy elem
				}
        // The same is true for value and key
				if t.indirectelem() {
					*(*unsafe.Pointer)(dst.e) = *(*unsafe.Pointer)(e)
				} else {
					typedmemmove(t.elem, dst.e, e)
				}
        // Add one to the key/ Value index (also known as the index value of the cell) of the record in the destination bucket
				dst.i++
        // There is also a pointer to overflow at the end of the bucket layout, so there is no need to worry about updating the pointer address of the key and value arrays.
				dst.k = add(dst.k, uintptr(t.keysize))
				dst.e = add(dst.e, uintptr(t.elemsize))
			}
		}
    // If no coroutine is using the old bucket, clean the old bucket to help gc
		if h.flags&oldIterator == 0&& t.bucket.ptrdata ! =0 {
			b := add(h.oldbuckets, oldbucket*uintptr(t.bucketsize))
      // Clear only the key and value parts of the bucket, leaving the top hash part to indicate the migration status
			ptr := add(b, dataOffset)
			n := uintptr(t.bucketsize) - dataOffset
			memclrHasPointers(ptr, n)
		}
	}

  // Used to update the relocation progress
	if oldbucket == h.nevacuate {
		advanceEvacuationMark(h, t, newbit)
	}
}

func advanceEvacuationMark(h *hmap, t *maptype, newbit uintptr) {
  // Move the bucket by 1
	h.nevacuate++
  // Experiments show that 1024 is at least an order of magnitude higher than newbit (newbit represents the number of old buckets before capacity expansion). So, the current progress plus 1024 is used to ensure O(1) behavior.
	stop := h.nevacuate + 1024
	if stop > newbit {
		stop = newbit
	}
  // Count the number of buckets that have been moved
	forh.nevacuate ! = stop && bucketEvacuated(t, h, h.nevacuate) { h.nevacuate++ }// If h.nevacuate == newbit is displayed, all buckets are relocated
	if h.nevacuate == newbit {
    // The pointer to the old buckets is set to nil
		h.oldbuckets = nil
    // This is explained in the hMAP structure. If neither key nor value contains a pointer, both can be inline.
    // The array of buckets to save them is actually in hmap.extra. So, in this case, we are actually moving the extra buckets array.
    // Therefore, in this case, you need to set the hmap.extra.oldoverFlow pointer to nil after relocation.
		ifh.extra ! =nil {
			h.extra.oldoverflow = nil
		}
    // Finally, clear the flag bit of expansion, expansion is complete.
		h.flags &^= sameSizeGrow
	}
}
Copy the code

The code is long, but the comments are clear. If you are not aware of the map expansion, see the diagram below.

For the map above, B is 3, so the original buckets array is 8. When the number of map elements increases, the load factor exceeds 6.5, causing incremental expansion.

Take bucket # 3 as an example. As the value of B is increased by 1, the lower four hash values are used to select a new bucket. This will cause the cell to be moved to a different bucket (bucket # 3 or bucket # 11) in the new bucket array. Note that in a bucket, the move cells work in order: they are filled into the corresponding cells in the new bucket in order.

Of course, in the actual case, bucket 3 is likely to have an overflow bucket. For the sake of simplicity, let’s assume that bucket 3 does not have an overflow bucket. If there is an overflow bucket, add it to the new buckets 3 and 11 accordingly, and if the corresponding buckets 3 and 11 are full, add an overflow bucket to the new bucket.

For the map above, B is also 3. Suppose there is too much overflow in the entire map, triggering an equivalent expansion. Note that the new buckets array is the same size as the old one when scaled up.

For example, bucket 6 has one bucket and three Overflow buckets, but we can see that the data in the bucket is very sparse. The purpose of scaling up is to rearrange loose key pairs to make the bucket more used and ensure faster access. After the relocation, there is only one basic bucket in the new no.6 bucket, and there is no need to overflow the bucket temporarily. In this way, compared to the original bucket 6, the data becomes tighter, making the subsequent data access faster.

Finally, answer the question left above: Why is it important to determine where the key falls? For incremental expansion, the key in one bucket is split into two buckets, one in x and one in y, but the relationship between them is bucket x + 2^B = bucket y (where, B is the value of B for the old bucket. If the key is in the new bucket x, then it should be placed in the memory where bucket x starts + n*bucket; if the key is in the new bucket x, it should be in the memory where bucket x starts + n*bucket. If the key falls into the new bucket y, it should be placed in memory at the start of bucket y + n*bucket. Therefore, determining the range in which the key falls makes it easier to calculate the memory address and quickly find the memory address where the key should be inserted.

conclusion

Go language map, the bottom is the implementation of the hash table, through the chain address method to solve the hash conflict, it depends on the core data structure is the array plus the list.

The map defines 2 to the power of B buckets, and each bucket can hold 8 keys. The keys are scattered into different buckets depending on their hash values. The low value (the last B bits of the hash value) determines the bucket number, and the high value (the first 8 bits of the hash value) identifies different keys in the same bucket.

If too many keys are added to a bucket and the number of elements exceeds the threshold of the loading factor, or too many buckets overflow due to repeated operations, capacity expansion is triggered.

Expansion includes incremental expansion and equal expansion. Incremental capacity expansion doubles the number of buckets and redistributes keys from one bucket to two buckets. Equal capacity expansion does not change the number of buckets, but compacts the data in the buckets. Both incremental and equivalent capacity expansion require the creation of new buckets and arrays, not in place.

The capacity expansion process is gradual. The main purpose is to prevent performance problems caused by too many keys that need to be moved during capacity expansion. The expansion is triggered when new elements are added, and the bucket relocation occurs during assignment or deletion. A maximum of two buckets can be moved at a time. Find, assign, delete a very core content is how to locate the key location, need to focus on understanding. Once understood, the map source code can be understood.

Use advice

As you can see from the map design, it is not a concurrency safe data structure. When reading and writing to a Map at the same time, the program is prone to errors. Therefore, to use map concurrently, add a lock (sync.mutex or sync.rwmutex). In fact, the Go standard library already has a concurrency-safe map for us, sync.map. I’ve written about the implementation before, and you can check out Golang Technology Sharing: Understanding Sync.map.

The result of traversing a map is unordered, which should be noted in use.

We can see from the map structure that it actually points to the underlying buckets array through the pointer. So, as with Slice, although go functions are value-passed, when a map is called as an argument to a function, operations on the map inside the function also affect the external map.

In addition, there is a special key, Math.nan, which generates a different hash each time. This will result in m[Math.nan] not having a value, and assigning it multiple times will result in multiple Math.nan keys in the map. But you don’t really need this, you just know that there’s this particular case.

Refer to the link

En.wikipedia.org/wiki/Associ…

blog.golang.org/maps

Mp.weixin.qq.com/s/OHROn0ya_…

www.cse.cuhk.edu.hk/irwin.king/…

Github.com/cch123/gola…

zhuanlan.zhihu.com/p/66676224

Draveness. Me/golang/docs…

Github.com/talkgo/nigh…

My.oschina.net/renhc/blog/…