Golang Map interview questions

The interview questions

  1. The underlying implementation principles of Map
  2. Why is traversal map unordered?
  3. How to implement ordered traversal map?
  4. Why is Go Map non-thread-safe?
  5. How is thread-safe Map implemented?
  6. Who performs better in Go Sync. map or native Map and why?
  7. Why is the load factor of Go Map 6.5?
  8. What is the MAP capacity expansion policy?

Realize the principle of

The map in Go is an 8-byte pointer to the HMAP structure. Source code SRC /runtime/map.go you can see the map structure

The underlying structure of each map is an Hmap, which contains several bucket arrays structured as BMaps. Each bucket uses a linked list structure at the bottom. Next, let’s take a closer look at the map structure

Hmap structure

// A header for A Go map. type hmap struct {count int int len(map) Flags Uint8 // State flag. The meanings of the four state bits are explained in the constants below. // uint8 = 2^5=32 Hash0 uint32 // Buckets unsafe.Pointer // a Pointer to the buckets array. The size of the array is 2^B. It is nil. // if expansion occurs, oldbuckets is a Pointer to the oldbuckets array, which is 1/2 the size of the new buckets; In the non-expanded state, it's nil. Nevacuate uintptr // Indicates the capacity expansion progress. Buckets whose address is smaller than this one are relocated. Extra *mapextra // This field is designed to optimize GC scans. Used when neither key nor value contains a pointer and both can be inline. Extra is a pointer to the mapExtra type. }Copy the code

Bmap structure

A Bmap is often referred to as a “bucket”. A bucket can hold a maximum of eight keys. The reason why these keys fall into the same bucket is because they are hashed and the hashing result is “one class”. In the bucket, the hash value calculated by the key determines which position the key falls into (a bucket has a maximum of eight positions).

// A bucket for A Go map. type bmap struct {tophash [bucketCnt]uint8 A bucket has a maximum of eight slots. If the key slot is in the TopHash, } // Const (bucketCntBits = 3 bucketCnt = 1 << bucketCntBits // bucketCntBits = 8) But this is just the surface (SRC /runtime/hashmap.go) structure, which is added at compile time to dynamically create a new structure: 1 uintptr // uintptr //Copy the code

Visualize the bucket memory data structure as follows:

Key /value/key/value/… It looks like this. The advantage of this is that you can omit the padding field in some cases to save memory space.

Mapextra structure

If the map’s key and value are not Pointers and the size is less than 128 bytes, the BMAP is marked as pointerless to avoid scanning the entire Hmap during GC. If bmap does not contain any Pointers, it will move overflow to extra.

// struct mapextra holds fields that are not present on all maps. Type mapextra struct { It can also be inline(<=128 bytes) // to store overflow buckets using hmap's extra field, which prevents GC from scanning the entire map. Bmap. Overflow is also a pointer. Overflow and map.extra. Oldoverflow // Overflow contains the overflow of MAP.buckets Oldbuckets overflow bucket overflow *[]*bmap oldoverFlow *[]*bmap nextOverflow *bmap // Pointer to an empty Overflow bucket}Copy the code

The main features

Reference types

Map is a pointer to hMAP, so it’s a reference type

Golang has three commonly used advanced types, Slice, Map, and Channel, which are reference types that may modify the original content data when used as function parameters.

There is no reference passing in Golang, only value and pointer passing. Therefore, when map is passed as a function argument, it is essentially value passing. However, because the underlying data structure of Map points to the actual element storage space through a pointer, map modification in the called function is also visible to callers, so map passing as a function argument shows the effect of passing by reference.

Therefore, when passing a map, if you want to modify the contents of the map rather than the map itself, you do not need to use Pointers for function parameters

func TestSliceFn(t *testing.T) {
	m := map[string]int{}
	t.Log(m, len(m))
	// map[a:1]
	mapAppend(m, "b", 2)
	t.Log(m, len(m))
	// map[a:1 b:2] 2
}

func mapAppend(m map[string]int, key string, val int) {
	m[key] = val
}
Copy the code

Shared Storage Space

The underlying map data structure points to the actual element storage space through Pointers. In this case, changes to one map affect other maps

func TestMapShareMemory(t *testing.T) {
	m1 := map[string]int{}
	m2 := m1
	m1["a"] = 1
	t.Log(m1, len(m1))
	// map[a:1] 1
	t.Log(m2, len(m2))
	// map[a:1]
}
Copy the code

Traversal order is random

If the map is not modified, the keys and values may be output in different order when the map is traversed multiple times using range. This is intentional by the designers of Go language. The order of each range is randomized to remind developers that the underlying implementation of Go does not guarantee the stable order of map traversal. Please do not rely on the order of result of range traversal.

The map itself is unordered, and the traversal order is randomized. To traverse a map in sequence, sort the map keys first and then traverse the map in the order of the keys.

func TestMapRange(t *testing.T) { m := map[int]string{1: "a", 2: "b", 3: "C"} t.L og (" first range:) / / the default disorderly traversal for I, v: range m = {t.L ogf (" m/v % = % v ", I, v)} t.L og (" \ nsecond range: ") for I, V := range m {t. logf ("m[%v]=%v ", I, v)} var sl []int for k := range m {sl = append(sl, For _, k := range sl {t.log (k, m[k])}}Copy the code

Non-thread-safe

Map is concurrency insecure by default for the following reasons:

After a long discussion, Go officials decided that Go Map should be more suitable for typical usage scenarios (no need for secure access from multiple Goroutines), rather than a small number of cases (concurrent access) that would result in locking costs (performance) for most applications.

Fatal error: Concurrent map writes and read at the same time

Func main() {m := make(map[int]int) go func() {// open a map for I := 0; i < 10000; I++ [I] {m = I}} () go func () {/ / open the collaborators cheng to read a map for I: = 0; i < 10000; i++ { fmt.Println(m[i]) } }() //time.Sleep(time.Second * 20) for { ; }}Copy the code

If you want to make map thread safe, there are two ways:

Method 1: Use the read/write lock map + sync.rwmutex

Func BenchmarkMapConcurrencySafeByMutex (b * testing. B) {var lock sync. The Mutex / / Mutex m: = make (map (int) int, 0) var wg sync.WaitGroup for i := 0; i < b.N; i++ { wg.Add(1) go func(i int) { defer wg.Done() lock.Lock() defer lock.Unlock() m[i] = i }(i) } wg.Wait() b.Log(len(m),  b.N) }Copy the code

Method 2: Use Sync.map provided by Golang

Sync.map is implemented with read/write separation, with the idea of space for time. Compared to the map+RWLock implementation, it makes some optimizations: Read map can be accessed without lock, and read Map is preferred. If only read Map can satisfy the requirement (add, delete, change, search and iterate), then write Map is not required (lock both read and write). So in certain scenarios, lock contention will occur much less frequently than map+RWLock implementation.

func BenchmarkMapConcurrencySafeBySyncMap(b *testing.B) {	var m sync.Map	var wg sync.WaitGroup	for i := 0; i < b.N; i++ {		wg.Add(1)		go func(i int) {			defer wg.Done()			m.Store(i, i)		}(i)	}	wg.Wait()	b.Log(b.N)}
Copy the code

Hash conflict

Golang map is a set of KV pairs. The bottom layer uses hash table and linked list to solve conflicts. In case of conflicts, each key is not applied for a structure to be linked by linked list. Instead, it is mounted with bmap as the minimum granularity, and a Bmap can store 8 kV. In the selection of hash function, when the program starts, it checks whether the CPU supports AES. If yes, USE AES hash, otherwise use memhash.

Common operations

create

A map can be initialized in three modes, and is usually created using make

Func TestMapInit(t * testing.t) { // unsafe.sizeof (unsafe.broadening) // unsafe.sizeof (unsafe.broadening) // unsafe.sizeof (unsafe.broadening) Assignment to nil map // Be careful because writing to an uninitialized map (value nil) will cause panic, so you need to nullize the map before writing to it. Unsafe.sizeof (m2) // unsafe.sizeof (m2) // unsafe.sizeof (m2) // unsafe.sizeof (m2) // unsafe.sizeof (m2) // unsafe.sizeof (m2) // unsafe.sizeof (m2) // unsafe.sizeof (m2) // Unsafe := make(map[string]int) m3["a"] = 3 t.og (m3, unsafe.sizeof (m3)) // map[a:3] 8}Copy the code

Runtime.makemap (); runtime.makemap (); If the initial capacity of your map is less than or equal to 8, you will find runtime.fastrand because if the initial capacity is less than 8, you do not need to generate multiple buckets

Create a process

The makemap function creates a random hash seed with fastrand, calculates the minimum number of buckets needed from the incoming hint, and then creates an array of buckets using makeBucketArray. In this method, a contiguous space is allocated to the memory for data storage based on the number of buckets to be created. During the bucket creation process, additional buckets are created to store overflow data. The number of buckets is 2^(B-4). After initialization, the HMAP pointer is returned.

Compute the initial value of B

Find a B so that the map’s load factor is within the normal range

B := uint8(0)for overLoadFactor(hint, B) {	B++}h.B = B// overLoadFactor reports whether count items placed in 1<<B buckets is over loadFactor.func overLoadFactor(count int, B uint8) bool {	return count > bucketCnt && uintptr(count) > loadFactorNum*(bucketShift(B)/loadFactorDen)}
Copy the code

To find the

Map can be read with or without comma in the Go language. When the key to be queried is not in the map, comma returns a bool indicating whether the key is in the map. Statements without comma return a zero value of type value. It returns 0 if value is an int, and an empty string if value is a string.

Value := m["name"] ftt. Printf("value:%s", value) ok := m["name"]if ok { fmt.Printf("value:%s", value)}
Copy the code

Depending on the type of key, the compiler will replace the lookup function with a more specific function to optimize efficiency:

Key type To find the
uint32 mapaccess1_fast32(t maptype, h hmap, key uint32) unsafe.Pointer
uint32 mapaccess2_fast32(t maptype, h hmap, key uint32) (unsafe.Pointer, bool)
uint64 mapaccess1_fast64(t maptype, h hmap, key uint64) unsafe.Pointer
uint64 mapaccess2_fast64(t maptype, h hmap, key uint64) (unsafe.Pointer, bool)
string mapaccess1_faststr(t maptype, h hmap, ky string) unsafe.Pointer
string mapaccess2_faststr(t maptype, h hmap, ky string) (unsafe.Pointer, bool)

Lookup process

Write protection monitoring

The function first checks the map flag bit. If the write bit of flags is set to 1, it indicates that other coroutines are “writing”, causing panic. This also shows that map is not secure for coroutines.

if h.flags&hashWriting ! = 0 { throw("concurrent map read and map write")}Copy the code

Calculate the hash value

hash := t.hasher(noescape(unsafe.Pointer(&ky)), uintptr(h.hash0))
Copy the code

After the key hash function is calculated, the hash value obtained is as follows (64 bits for mainstream 64-bit computers) :

 10010111 | 000011110110110010001111001010100010010110010101010 │ 01010
Copy the code

Find the bucket corresponding to the hash

M: Number of buckets

The corresponding bucket is obtained from buckets using hash & M. If the bucket is expanding and the expansion is not complete, the corresponding bucket is obtained from Oldbuckets

M := (*bmap)(add(h.buckkets, (hash&m)*uintptr(t.bucksize)))); c ! = nil { if ! H samesizeGrow () {*bmap (add(c, (hash&m)*uintptr(t.buketsize)))) if! evacuated(oldb) { b = oldb }}Copy the code

Number of the bucket where the hash resides:

The last 5 bits of the previous hash, 01010, are 10, which is bucket 10 (range 0 to 31).

Traverse the bucket

Compute the hash slot:

top := tophash(hash)func tophash(hash uintptr) uint8 {	top := uint8(hash >> (goarch.PtrSize*8 - 8))	if top < minTopHash {		top += minTopHash	}	return top}
Copy the code

If the hash value is 10010111, then the hash value is 10010111. If the hash value is 10010111, then the hash value is 151. If the hash value is 151, the hash value is 151. That’s the end of the search process.

If not found in a bucket, and overflow is not empty, continue searching overflow buckets until all or all key slots are found, including all Overflow buckets.

Returns a pointer to key

The corresponding slot is found above. Here we analyze how to obtain the key/value value in detail:

// unsafe.Pointer(b),dataOffset+ I *uintptr(t.keysize))// value v:= Add (unsafe.Pointer(b),dataOffset+bucketCnt*uintptr(t.keysize)+ I *uintptr(t.valuesize)) dataOffset = unsafe.Offsetof(struct{ b bmap v int64}{}.v)
Copy the code

The start address of the key in the bucket is unsafe.Pointer(b)+dataOffset. The address of the ith key is going to span the size of the I key; And we know that the address of value comes after all the keys, so the address of the ith value needs to be offset by all the keys.

The assignment

As you can see from assembly language, mapAssign is called when a key is inserted or modified into a map.

In fact, the syntax for inserting or modifying keys is the same, except that the key for the former operation does not exist in the map, while the key for the latter operation does exist in the map.

Mapassign has a series of functions that the compiler optimizes for “quick functions” depending on the key type.

Key type insert
uint32 mapassign_fast32(t maptype, h hmap, key uint32) unsafe.Pointer
uint64 mapassign_fast64(t maptype, h hmap, key uint64) unsafe.Pointer
string mapassign_faststr(t maptype, h hmap, ky string) unsafe.Pointer

We’ll just look at the most general assignment function, mapassign.

Assignment process

Map assignment is accompanied by map expansion and migration. Map expansion only doubles the size of the underlying array, but does not transfer data. Data transfer is carried out gradually after expansion.

Checksum initialization

1. Check whether map is nil

  1. Check whether the map is read and written concurrently. If so, throw an exception
  2. Check whether buckets are nil. If so, newObject is called to allocate buckets based on the current bucket size

The migration

Every time you assign/delete, just oldBuckets! = nil is considered to be expanding and will do a migration, which will be described in more detail below

Find & Update

According to the above search process, find the location of the key, if found, update, not find space to insert

capacity

After the previous iteration search action, if no insertion position is found, it means that expansion is needed for insertion. The expansion process will be described in detail below

delete

As you can see from assembly language, when a key is deleted from a map, the mapDelete function is called

func mapdelete(t \*maptype, h _hmap, key unsafe.Pointer)
Copy the code

Delete logic is relatively simple, most functions have been used in the assignment operation, the core is to find the specific location of the key. The search process is similar, looking for each cell in a bucket. After finding the corresponding position, “clear” the key or value, subtract the count value by 1, and set the tophash value of the corresponding position to Empty

e := add(unsafe.Pointer(b), dataOffset+bucketCnt*2*goarch.PtrSize+i*uintptr(t.elemsize))if t.elem.ptrdata ! = 0 { memclrHasPointers(e, t.elem.size)} else { memclrNoHeapPointers(e, t.elem.size)}b.tophash[i] = emptyOneCopy the code

capacity

Increase the timing

When a new key is inserted into a map, the system checks the conditions. If the conditions meet the following two conditions, the system triggers the expansion:

if ! h.growing() && (overLoadFactor(h.count+1, h.B) || tooManyOverflowBuckets(h.noverflow, h.B)) { hashGrow(t, h) goto again // Growing the table invalidates everything, so try again }Copy the code

1. The loading factor exceeds the threshold

Source in the definition of the threshold is 6.5 (loadFactorNum/loadFactorDen), after the test is taken a more reasonable factor

We know that each bucket has eight empty slots, and that if there is no overflow and all the buckets are full, the load factor works out to be 8. Therefore, when the load factor exceeds 6.5, it indicates that many buckets are about to be filled, and both lookup and insert efficiency are reduced. Expansion is necessary at this time.

For condition 1, where there are too many elements and too few buckets, it’s simple: Add B by 1, and the maximum number of buckets (2^B) directly doubles the original number of buckets. So there are old and new buckets. Note that the elements are in the old bucket and have not been migrated to the new bucket. The new bucket simply doubles the maximum number (2^B * 2).

Overflow has too many buckets

When the loading factor is relatively small, the search and insertion efficiency of map is also very low, and point 1 cannot identify this situation. The surface phenomenon is that the molecules used to calculate the load factor are small, that is, the map has a small number of elements but a large number of buckets (the actual number of buckets allocated is large, including a large overflow bucket).

It’s not hard to imagine the reason for this: constantly inserting and deleting elements. Many elements were inserted first, resulting in the creation of many buckets, but the load factor did not reach the threshold at point 1, and expansion was not triggered to alleviate the situation. Then, deleting the element reduces the total number of elements, and inserting a lot of elements will create overflow buckets, but they won’t trigger point 1. What can you do about me? Overflow buckets are so numerous that keys can be scattered and lookup inserts can be horribly inefficient, hence rule 2. It was like an empty town. There were many houses, but few people. They were scattered and it was difficult to find people

For condition 2, there are not that many elements, but the overflow bucket number is too high, indicating that many of the buckets are not full. The solution is to create a new bucket space and move elements from the old bucket to the new bucket so that the keys in the same bucket are aligned closer together. This way, the key in overflow Bucket can be moved to the bucket. The result is space savings, higher bucket utilization, and higher map lookup and insert efficiency.

Expansion function

func hashGrow(t *maptype, h *hmap) { bigger := uint8(1) if ! overLoadFactor(h.count+1, h.B) { bigger = 0 h.flags |= sameSizeGrow } oldbuckets := h.buckets newbuckets, nextOverflow := makeBucketArray(t, h.B+bigger, nil) flags := h.flags &^ (iterator | oldIterator) if h.flags&iterator ! = 0 { flags |= oldIterator } // commit the grow (atomic wrt gc) h.B += bigger h.flags = flags h.oldbuckets = oldbuckets h.buckets = newbuckets h.nevacuate = 0 h.noverflow = 0 if h.extra ! = nil && h.extra.overflow ! = nil { // Promote current overflow buckets to the old generation. if h.extra.oldoverflow ! = nil { throw("oldoverflow is not nil") } h.extra.oldoverflow = h.extra.overflow h.extra.overflow = nil } if nextOverflow ! = nil { if h.extra == nil { h.extra = new(mapextra) } h.extra.nextOverflow = nextOverflow } // the actual copying of the  hash table data is done incrementally // by growWork() and evacuate().}Copy the code

Map capacity expansion requires relocation of existing keys and values to new memory addresses. If a large number of keys and values need to be relocated, performance deteriorates. Therefore, the capacity expansion of Go Map adopts an “incremental” approach. The original keys are not moved all at once, but only two buckets are moved at a time.

The hashGrow() function above does not actually “move”, it just allocates the new buckets and attaches the oldbuckets to the oldbuckets field. The action to actually move buckets is in growWork(), and the action to call growWork() is in mapassign and mapDelete. They try to move buckets when inserting, modifying, or deleting keys. Check that the Oldbuckets are moved, and specifically that the Oldbuckets are nil.

The migration

The migration time

If the migration is not complete, assignment/deletion, and expansion (pre-allocated memory) will not be performed immediately after the migration. Instead, it incrementally migrates oldBucket to bucket as the specific bukcet is accessed.

if h.growing() {		growWork(t, h, bucket)}
Copy the code

Transfer function

Func growWork(t * mapType, h *hmap, bucket uintptr) { // Bucket if h.growing() {evacuate(t, h, h.nevacuate)}}Copy the code

Nevacuate identifies the current progress, which, if all are removed, should be the same length as 2^B

In the evacuate method, the data on the bucket corresponding to the location and the conflicting chain is transferred to the new buckets.

  1. First determine whether the current bucket has been migrated. (Oldbucket identifies the location of the bucket to be moved)
b := (*bmap)(add(h.oldbuckets, oldbucket*uintptr(t.bucketsize)))// judge if! Evacuated (b) {// Perform a transfer operation}
Copy the code

The transfer can be determined by the tophash, which is the first hash value in the tophash

func evacuated(b *bmap) bool {  h := b.tophash[0]  Return h > emptyOne && h < minTopHash // 1 ~ 5}
Copy the code
  1. If not, the data needs to be migrated. Data may be migrated to buckets of the same size or twice as large. Here xy is a marker for the target’s migration location: x is for the same location and Y is for a location twice as large. Let’s first look at the determination of the target position:
var xy [2]evacDstx := &xy[0]x.b = (*bmap)(add(h.buckets, oldbucket*uintptr(t.bucketsize)))x.k = add(unsafe.Pointer(x.b), dataOffset)x.v = add(x.k, bucketCnt*uintptr(t.keysize))if ! H.amesizegrow () {// If it is twice the size, Y := &xy[1] y.b = (*bmap)(add(h.peckets, (oldbucket+newbit)*uintptr(t.bucketsize))) y.k = add(unsafe.Pointer(y.b), dataOffset) y.v = add(y.k, bucketCnt*uintptr(t.keysize))}Copy the code
  1. After the location of the bucket is determined, the migration shall be carried out one by one according to kv.

  2. If the location of the currently relocated bucket is the same as that of the overall relocated bucket, we need to update the overall progress flag nevacuate

// newbit is the length of oldbuckets. Func advanceEvacuationMark(h *hmap, t *maptype, Newbit uintptr) {// Update the flag H.n evacuate++ // To view a maximum of 2^10 bucket stops := h.evacuate + 1024 if stop > newbit {stop = Newbit} // If there is no move, stop. = stop && bucketEvacuated(t, H, h.evacuate) {H.n evacuate++} // If oldbukets are completely relocated, == newbit {h.ldbarrels = nil if h.xtra! = nil { h.extra.oldoverflow = nil } h.flags &^= sameSizeGrow }Copy the code

traverse

The traversal process is that buckets are traversed sequentially and keys in buckets are traversed sequentially.

Map traversal is unordered. If you want to achieve ordered traversal, you can sort the keys first

Why is traversal map unordered?

If there has been a migration, the location of keys has changed significantly, with some keys flying into higher branches and others staying put. As a result, it is impossible to traverse the map in the original order.

If you have a dead map, no insert or delete operations are performed on the map, which should return a fixed sequence of keys and values each time you traverse the map. But Go puts an end to this practice because it can lead novice programmers to think it’s something that’s going to happen, and in some cases, it can be a big mistake.

Go goes even further. When traversing a map, we do not always traverse from bucket 0, but from a bucket with a random number of **, and from a cell with a random number of that bucket. This way, even if you are a dead-written map and just iterate over it, it is unlikely that you will return a fixed sequence of key/value pairs.

Func mapiterinit(t * mapType, h *hmap, it *hiter) {//runtime. it.t = t it.h = h it.B = h.B it.buckets = h.buckets if t.bucket.kind&kindNoPointers ! = 0 { h.createOverflow() it.overflow = h.extra.overflow it.oldoverflow = h.extra.oldoverflow } r := uintptr(fastrand()) if h.B > 31-bucketCntBits { r += uintptr(fastrand()) << 31 } it.startBucket = r & bucketMask(h.B) it.offset = uint8(r >>  h.B & (bucketCnt - 1)) it.bucket = it.startBucket ... mapiternext(it)}Copy the code

conclusion

  1. Map is a reference type
  2. The map traversal is unordered
  3. Map is not thread-safe
  4. The hash conflict resolution method of MAP is linked list method
  5. The expansion of a map does not necessarily add space, or it may just clean up memory
  6. The map migration is gradual, with at least one migration per assignment
  7. Deleting keys from the map may result in many empty kV’s, which may lead to migration operations. If possible, avoid them