[Go source analysis] interpretation of excelsior sync.mutex

“Mutex” is essential for concurrent programming, and while the Go language advocates Channel access to concurrent resources, it also implements sync. Mutex for programmers to use. Some people have made special statistics, among well-known open source software Docker, Kubernutes, ETcd, gRPC, the frequency of using Mutex is the highest. With the iteration of Go language, the implementation of Sync.Mutex is becoming more and more refined. The current version of sync. Mutex core implementation code has more than 100 lines, which uses a lot of complex bit operations and flow control to solve various problems, so that sync. Mutex source code has reached an unreadable state. This article analyzes sync. Mutex from the perspective of historical development, hoping to help readers understand the source code of Sync.Mutex and master its design idea.

1. Initial mutex — first come, first served

Github address: Mutex for the initial version

The amount of code is very small, let’s analyze it:

Package sync import (" Runtime ""sync/atomic") type Mutex struct {key INT32 SEMA uint32 *Mutex) Lock() {if atomy.addint32 (&m.keey, 1) == 1 { Func (m *Mutex) Unlock() {switch v := atomic.AddInt32(&m.key, -1); Return case v == -1: // If this is -1, this is an exception, or the number of goroutines that can wait is exceeded. Panic ("sync: Unlocked of unlocked mutex")} runtime.semrelease (&m.sema) // Unlock of unlocked mutex")}Copy the code

The primary version of Mutex contains two fields:

When goroutine calls the Lock method to request the Lock, it adds 1 to the key atomically by means of the atom.addint32 method. If it is lucky that no one is waiting, then the key value equals 1 and the Lock is successfully obtained. If the lock is already held by another goroutine, the current Goroutine will call runtime.Semacquire at the same time that the key is increded by 1, and then sleep with the semaphore. When the lock is released, the semaphore will wake it up. The operation of releasing the lock is also relatively simple, that is, the atomic key is reduced by 1, but when the value becomes -1, the program will panic, for example, there is no lock request, directly releasing the lock will panic the error; In addition, the Mutex key is an INT32 type, which can express the maximum integer of more than 4 billion. When the number of goroutines scrambling for locks reaches this threshold, panic will also occur, but this situation is not possible in the case of limited server resources.

Let’s start with two questions:

Why do atomic operations need to be used when requesting locks?
Why do we need semaphores? What is its implementation mechanism?

1.1 The necessity of atomic operations

Let’s start with a simple technology program called counter.go.

package main

import (
   "fmt"
   "sync"
)

func main() {
   //runtime.GOMAXPROCS(1)
   var counter int64
   var wg sync.WaitGroup
   wg.Add(2)
   for i := 0; i < 2; i++ {
      go func() {
         defer wg.Done()
         for j := 0; j < 10000; j++ {
            counter++  //atomic.AddInt64(&counter, 1)
         }
      }()
   }
   wg.Wait()
   fmt.Println(counter)
}
Copy the code

We start two goroutines to accumulate counter concurrently, and each goroutine accumulates counter by 10,000. When running the program, it is highly likely that the printed counter value will not be 20,000, but a number less than 20,000. The result of each run is different. For readers familiar with concurrent programming, there are race conditions in this program. Counter ++ is not an atomic operation, it consists of three steps: 1. Read the current value of the counter variable; 2. Add 1 to the current value of counter and save it to a temporary variable in the register; 3. Save the results of temporary variables to counter again. For each of these steps, there is a corresponding assembly implementation.

When we run the program multiple times with one CPU core (using runtime.gomaxprocs (1) setting), it seems that we can always get an accurate count of 20,000, but this is not a strong guarantee. When we use go run-race counter. Go race detection, it is highly likely to detect that the program has asynchronous concurrent reads and writes to the memory of the counter variable:

================== WARNING: DATA RACE Read at 0x00c000136008 by goroutine 8: main.main.func1() /Users/guozhaoran/goCode/basic/goConcurrent/mutex/example.go:17 +0x78 Previous write at 0x00c000136008  by goroutine 7: main.main.func1() /Users/guozhaoran/goCode/basic/goConcurrent/mutex/example.go:17 +0x91 Goroutine 8 (running) created at: main.main() /Users/guozhaoran/goCode/basic/goConcurrent/mutex/example.go:14 +0xe4 Goroutine 7 (finished) created at: main.main() /Users/guozhaoran/goCode/basic/goConcurrent/mutex/example.go:14 +0xe4 ==================Copy the code

So the solution is to use atomic packages, which are well suited to atomic addition and subtraction of global single variables, and this atomicity is achieved through the ability provided by different CPU architecture hardware to LOCK the data bus with the LOCK assembly instruction. Atomic operations are of course essential for Mutex scenarios, which ensure consistency in the accumulation of the shared variable key across goroutines.

1.2 a semaphore

The concept of semaphores was developed by Dutch computer scientist Edsger Dijkstra around 1963 and is widely used in different operating systems. In the system, each process is given a semaphore, representing the current state of each process. A process that is not under control is forced to stop at a specific point and wait for a signal that it can continue.

Dijkstra defined two operations P and V for semaphores in his paper. P operations (descrease, wait, acquire) decrease the count of the semaphore, while V operations (increase, signal, release) increase the count of the semaphore. Use pseudocode to represent the following (the brackets represent atomic operations) :

Function V(Semaphore S, INTEGER I): [S ← S + I] Function P(Semaphore S, INTEGER I): repeat: [if S ≥ I: S ← S − I break]Copy the code

As you can see, the initial semaphore S has a specified number of resources (n), which is like a pool of n resources. The P operation is equivalent to requesting a resource and returning it immediately if it is available; If there are no resources or not enough, then it can keep trying or block waiting. The V operation frees its own resources and returns them to the semaphore. The value of a semaphore can only be changed by P/V operations, except for initialization operations.

Now, let’s summarize the semaphore implementation. Initializing semaphore:

Set the initial number of resources.
P operation: Subtract 1 from the semaphore count. If the new value is already negative, the caller is blocked and queued. Otherwise, the caller continues and gets a resource.
V operation: increments the count of the semaphore by 1. If the previous count is negative, there is a waiting caller for the P operation. It takes a waiting caller from the wait queue, wakes it up, and lets it continue.

At run time, Go uses semaphores internally to control blocking and awakening of goroutine. For example, on the second field of the mutex, sema, the P/V operation of the semaphore is implemented by a function (Go internal runtime semaphore is also implemented by atomic and gopark, see Runtime /sema.go for details).

func runtime_Semacquire(s *uint32)
func runtime_SemacquireMutex(s *uint32, lifo bool, skipframes int)
func runtime_Semrelease(s *uint32, handoff bool, skipframes int)
Copy the code

It is worth mentioning that the semaphore of the Go runtime implements a priority wait queue, which is the basis for the current Mutex starvation mode implementation. The P/V operation function of the semaphore, which can add the goroutine to the head or tail of the priority queue after sleep; Goroutine can also be pulled from the head or tail of the priority queue to wake up.

2. Preliminary optimization of mutex — give goroutine a chance to steal the lock

Finish reading) the realization of the mutex, readers may find a problem, when the lock is held, the new arrival for the lock goroutine directly by runtime semaphore dormancy and added to the priority queue, though this strictly to ensure the order of first come, first serve for the lock, but goroutine to sleep and wake up very impact performance, In response to this, Go developers made a major change to Mutex.

Github address: Mutex further optimized

The Mutex structure is as follows:

type Mutex struct {
    state int32
    sema  uint32
}

const (
    mutexLocked = 1 << iota // mutex is locked
    mutexWoken
    mutexWaiterShift = iota
)
Copy the code

The key of the Mutex structure is changed to state, and the meanings are split into three.

MutexLocked: The first bit of state indicates whether the lock is held
MutexWoken: The second bit of state indicates whether there is an awakened Goroutine
The remaining bits of MutexWaiters: State represent the number of Goroutines waiting for this lock

Before we look at this version of the code, let’s add one very important point to use: CAS.

2.1 Spin lock (CAS) and its implementation principle

The CAS instruction is implemented by comparing the given value to the value in memory, replacing it with the new value if it is the same value, and then returning it. If not, return the comparison of the first step, hence the name “spinlock”. Here is a graph to describe the CAS algorithm and compare it with the atomic.CompareAndSwap function cluster in Go:

It is important to note that the comparison and swap process in CAS is atomic, and CAS returns false if any other goroutine changes the value in memory during this process.

CAS is also the basis for the Mutex implementation, and the reader may wonder: why semaphores with CAS? In fact, neither atomic operations nor spin locks are suitable for long waits, because there are a lot of resources (data) that have a certain amount of time, and if you want to get it, the CPU doesn’t return it to you immediately, it has to wait a certain amount of time before it can return it to you. In this case, if you use spin locks to access this resource synchronously, you will find that this is a huge waste of CPU time. Of course the implementation of Mutex is perfect for using CAS.

2.2 Lock implementation after preliminary optimization of mutex

Having introduced CAS, let’s take a look at the current version of the Lock implementation and go straight to the code:

func (m *Mutex) Lock() { // Fast path: Lucky case, Can directly access to lock the if atomic.Com pareAndSwapInt32 (& Margaret spellings Tate, 0, MutexLocked) {return} awoke: = false for {old: Margaret spellings Tate new: = = old | new state mutexLocked / / lock if old&mutexLocked! = 0 {new = old + 1<< the number of waiters + 1} if a ~ of work {// the goroutine was awoke, New state clear wake tag new & ^ = mutexWoken} if atomic.Com pareAndSwapInt32 (& Margaret spellings Tate, old, If old&mutexLocked == 0 {// the nature of the awoke awoke awoke = true}} }Copy the code

Let’s focus on the operation on state. First, CAS checks whether the Mutex is not held by the Goroutine and there are no waiters. If so, then the Goroutine is lucky enough to get the lock directly, which is the Fast path indicated in the code.

If the goroutine is unlucky at the moment, it comes to the lower cycle of checks, where the for loop constantly tries to acquire locks, and if it fails to do so, tries to disturb the awoke awoke awoke to true nature of the lock. We know that state has three meanings:

throughnew := old | mutexLockedSet mutexLocked in state to Mutex.
throughnew = old + 1<<mutexWaiterShiftSet mutexWaiterShift to Mutex, wait + 1;
throughnew &^= mutexWokenClear Mutex’s wake up flag.

Then atomic.Com pareAndSwapInt32 (& Margaret spellings Tate, old and new) implementation success suggests to set up a new value, the state will distinguish between two cases to consider, contained in the value of the first kind of circumstance is the state new lock succeeded, then direct break, Goroutine grabs the lock and the program ends; Otherwise, state simply clears the mutexWoken flag or adds a waiter.

The code for the loop state check here has two types of goroutine that execute simultaneously:

New Goroutine (possibly multiple)
Goroutine awakened from semaphore priority queue (maximum possible one)

The above description may be abstract, let’s combine the following flow chart to help understand it:

2.3 Unlock after preliminary optimization

The Unlock method has also become more complex, but unlike Lock, if you look closely, you can still make sense of it. Here is the code:

func (m *Mutex) Unlock() { // Fast path: drop lock bit. new := atomic.AddInt32(&m.state, -mutexlocked) // If (new+mutexLocked)&mutexLocked == 0 {// If (new+mutexLocked)&mutexLocked == 0 { Unlocked of unlocked mutex")} old := new for {unlocked of unlocked mutex")} old := new for {unlocked of unlocked mutex")} old := new for {unlocked of unlocked mutex") Returns the if old > > mutexWaiterShift = = 0 | | old & (mutexLocked | mutexWoken)! = 0 {return} / / mutexWaiterShift number minus 1 and set mutexWoken to true new = (old - 1 < < mutexWaiterShift) | mutexWoken the if Atomic.Com pareAndSwapInt32 (& Margaret spellings Tate, old and new) {/ / CAS set success, Runtime. Semrelease(&m.sema) return} old = m.stateCopy the code

The Unlock method first defines a new variable to remove the lock flag. If you Unlock an unlocked Mutex, panic will occur. Then the program does some extra checking and does not return directly. A goroutine call to Unlock returns in one of the following cases:

There is no waiter on Mutex
Mutex was locked again by someone else
A Goroutine has been awakened

Otherwise, set the mutexWaiterShift in Mutex to minus 1 and mark mutexWoken as true. Using the CAS method, if set successfully, a Goroutine is woken up from the semaphore’s priority queue. Otherwise, record the current mutex status and continue the loop. Until you return.

Compared to the original design, this sync. Mutex implementation mainly gives new Goroutine a chance to acquire the lock, breaking the original first-come-first-served logic and increasing the code complexity.

3. Mutexes are further optimized — more opportunities for Goroutine to steal the lock

Our previous optimization of Sync.mutex was based on the assumption that the new goroutine that fights for the lock has a good chance of getting the lock! The goroutine that holds the lock is more likely to hold it for less time, so why not give the goroutine that fights for the lock more of a chance? Let them wait a little while, and if they don’t get it after a while, they just add it to the priority queue of the semaphore. In fact, that’s what the official Go team did.

Gihub address: More chances for goroutine to snatch lock

Let’s look at the code:

func (m *Mutex) Lock() { // Fast path: Lucky case, Can directly access to lock the if atomic.Com pareAndSwapInt32 (& Margaret spellings Tate, 0, // between the new goroutine, which requested the lock, and the awoke goroutine, Are constantly trying to request old lock: Margaret spellings Tate = new: = old | new state mutexLocked / / lock if old&mutexLocked! = 0 {runtime_canSpin(iter) {// Can spin if! awoke && old&mutexWoken == 0 && old>>mutexWaiterShift ! = 0 && atomic.Com pareAndSwapInt32 (& Margaret spellings Tate, old, old | mutexWoken) {awoke = true} runtime_doSpin () iter++ continue / / spin, Try again to acquire locks} new = old + 1<<mutexWaiterShift // number of waiters + 1} if a ~ of work {// awoke, Remove the tag new & ^ = mutexWoken} if atomic.Com pareAndSwapInt32 (& Margaret spellings Tate, old, New) {// set the new state if old&mutexLocked == 0 {// the old&mutexLocked state was not broken} runtime_Semacquire(&m.sema) // request a awoke semaphore = true // set the semaphore = 0 // Reset spin counter}}}Copy the code

This optimization is small, adding only runtime_canSpin detection. There is a spin counter called iter, which spins to wait for the goroutine holding the lock to release the lock if the goroutine can still spin, thus increasing the probability of the lock being stolen.

I think readers are confused by this part of the code:

if ! awoke && old&mutexWoken == 0 && old>>mutexWaiterShift ! = 0 && atomic.CompareAndSwapInt32(&m.state, old, old|mutexWoken) { awoke = true }Copy the code

These four ampersands are really confusing, but with the interpretation of Sync.mutex in the previous section, we can see what this code is for. First && is the short-circuit operator, and if one of them is false, the judgment will not proceed. ! The awoke awoke procedural logic was brought in by the newly inserted goroutine, not by the goroutine awakened from the sema priority queue; Then if the old value of Mutex is not waking up the new goroutine(old&mutexWoken == 0) and there is a wait (old>>mutexWaiterShift! = 0), then try to Mutex through CAS set a wake mark (atomic.Com pareAndSwapInt32 (& Margaret spellings Tate, old, old | mutexWoken)), if successful, will awoke set to true, This allows the goroutine that called Unlock to return quickly without having to wake up the goroutine from the priority queue to claim the lock.

With this release we can see that Sync.mutex is fairly friendly to the new lock-grab Goroutine, which on the surface seems to be fine, allowing more Goroutines to acquire locks as quickly as possible. But let’s consider what happens to those waiting in the semaphore priority queue all the time? The lock could be snatched by the new Goroutine all the time, creating a “hunger problem”.

4. The ultimate mutex — well-off society, no more hunger

Sync.mutex’s “hunger problem” was solved in Go 1.9, and after some optimization work, sync.Mutex is now fully implemented. We’ll take a look at the code implementation of the current version of Go1.17, and you’ll have to join me in brain burn mode.

Github address: Sync.mutex, the ultimate version

4.1 Implementation of Mutex structure

To solve the hunger problem, the Mutex structure separates a bit from the state field to indicate whether the Mutex is currently hungry, and defines a constant that sets a 1-millisecond threshold for the Goroutine wait to acquire the lock.

type Mutex struct { state int32 sema uint32 } const ( mutexLocked = 1 << iota // mutex is locked mutexWoken Mutexmedicine = iota starvationThresholdNs = 1e6 //1000000ns = 1ms)Copy the code

4.2 Sync.Mutex ultimate version of Lock function implementation

The latest versions of Sync.Mutex’s Lock and Unlock methods split fast path and slow path into separate functions for inline performance. In this section we first look at the Lock method implementation:

func (m *Mutex) Lock() { // Fast path: Smooth access to lock the if atomic.Com pareAndSwapInt32 (& Margaret spellings Tate, 0, MutexLocked) {if race.enabled {race.acquire (unsafe.Pointer(m))} return} Func (m *Mutex) lockSlow() {var waitStartTime int64 medicine := false // Identify the current goroutine starvation // wake up tag iter := 0 // spin count old := m.state // The current lock state for {// The lock is not hungry and is not released, Try to spin the if old & (mutexLocked | mutexStarving) = = mutexLocked && runtime_canSpin (iter) {/ / active spin scenarios mutexWoken barriers / / try to notice Unlock does not wake up other blocked Goroutine if! awoke && old&mutexWoken == 0 && old>>mutexWaiterShift ! = 0 && atomic.CompareAndSwapInt32(&m.state, old, Old | mutexWoken) {awoke = true} runtime_doSpin () / / spin iter++ old = Margaret spellings Tate continue} new: = old / / don't try to get hungry mutex, New arrival goroutine must queue if old&mutexStarving = = 0 {new | = mutexLocked / / not hungry, locking} if old & (mutexLocked | mutexStarving)! << mutexWaiterShift} // The current goroutine switches the mutex to starvation mode. if starving && old&mutexLocked ! = 0 {new | = mutexStarving} if awoke {/ / remove awoke to identify new & ^ = mutexWoken} if atomic.Com pareAndSwapInt32 (& Margaret spellings Tate, old, (new) {if old & mutexLocked | mutexStarving) = = 0 {break / / unlocked success} / / waiting for the first time, added to the team first queueLifo semaphore queues: = waitStartTime! = 0 if waitStartTime == 0 { waitStartTime = runtime_nanotime() } runtime_SemacquireMutex(&m.sema, queueLifo, 1) / / set the hunger tag starving = starving | | runtime_nanotime () - waitStartTime > starvationThresholdNs old = Margaret spellings Tate the if old&mutexStarving ! Delta := int32(mutexLocked - 1<<mutexWaiterShift) if! Starving | | old > > mutexWaiterShift = = 1 {/ / not hungry goroutine, finally a waiter is not hungry, Delta -= mutexmedicine} atomic.addint32 (&m.state, delta) break } awoke = true iter = 0 } else { old = m.state } } }Copy the code

The code is also hard to understand with comments, but look at it with flowcharts.

With luck, Fast Path can be used to quickly acquire locks. Let’s walk through the implementation of the lockSlow function step by step. We first define some variable information used by the current Goroutine, and then use old to hold the current lock state.

Var waitStartTime int64 medicine := false // Identify the current goroutine starvation pattern between starvation and play. Iter := 0 // Number of spins used to determine whether the lock can continue to spin old := m.state // current lock statusCopy the code

The code then enters the for loop, and lockSlow is designed to do as few loops as possible, but make sure that the Goroutine that acquired the lock gets the lock. In the for loop, the goroutine who currently robbed the lock found that the lock had not been released (in normal mode), and called runtime_canSpin to wait a while in the hope that the lock would be released. During this time, the program did something else, such as setting a awoke state of Mutex, Try to prevent Unlock from waking up goroutines in the semA priority queue, because the more goroutines that compete for a lock, the less likely it is to get one!

For {// Lock is not hungry and is not released, Try to spin the if old & (mutexLocked | mutexStarving) = = mutexLocked && runtime_canSpin (iter) {/ / active spin scenarios mutexWoken barriers / / try to notice Unlock does not wake up other blocked Goroutine if! awoke && old&mutexWoken == 0 && old>>mutexWaiterShift ! = 0 && atomic.CompareAndSwapInt32(&m.state, old, Old | mutexWoken) {awoke = true} runtime_doSpin () / / spin iter++ old = Margaret spellings Tate continue}Copy the code

After spin wait, the Goroutine that acquires the lock has two fates:

Now that the current lock has been released, it can compete
If the current lock is not released, or if it fails to compete for the lock, it will enter semA’s priority queue

But whatever the result, it will follow the general logic below, which is to set Mutex to a new state. There are several steps:

Check whether the lock is in normal mode. If it is, set mutexLocked to lock it
If the lock is not released or is in starvation mode, the goroutine must be in the semA priority queue
If the lock has been marked as hungry and has not been released, mark the lock as hungry. Why, some readers may ask, is hunger marked only when the lock is not released? This is because Unlock wakes up goroutines from the SEMA priority queue based on whether the lock is in hungry mode, meaning there must be waiting Goroutines in the priority queue when the lock is in hungry mode.
Wipe out the mutexWoken tag on the lock, if there is a awoke tag, because whether the current goroutine ends up in the sema priority queue or acquires the lock, the new state should be clear of the awoke tag.

New: = old / / don't try to get hungry mutex, new arrival goroutine must queue if old&mutexStarving = = 0 {new | = mutexLocked / / not hungry, Lock} if old & (mutexLocked | mutexStarving)! << mutexWaiterShift} // The current goroutine switches the mutex to starvation mode. if starving && old&mutexLocked ! = 0 {new | = mutexStarving} if awoke {/ / remove awoke to identify new & ^ = mutexWoken}Copy the code

After the step of setting the state is complete, the NEXT step is the CAS. There are two outcomes: THE CAS succeeds or the CAS fails. Failure is the mother of success. CAS failed to register the current lock status, and then go through the for loop again. Note that the spin count does not need to be updated, because failed lock attempts do not enter the SEMA priority queue. If the CAS is successful, then the goroutine has successfully acquired the lock. Otherwise, it must be inserted into the SEMA priority queue, but whether it is inserted into the head or tail depends on the situation. In this case, waitStartTime is used to determine whether the lock grabby goroutine is the first to be inserted into the SEMA priority queue. It can also determine whether the current lock state has reached the hunger threshold.

Let’s talk about whether goroutine is inserted into the semA priority queue the first time, at the end, so it can only wait until the next schedule wakes up; If the current goroutine is not inserted into the semA priority queue for the first time, the program inserts it into the head, so that it is still the next to wake up, increasing the probability that it will acquire the lock!

The code to calculate the current lock state mode is that after the Goroutine is awakened from the SEMA priority queue, according to the record of waitStartTime and the identifier of upper medicine, if the current goroutine waits for a lock for more than 1ms, The current Goroutine will mark the lock as hungry (note that Mutex is not marked as hungry, just hungry, and if the current Goroutine still does not get the lock, this will be done in the next loop). If the lock is in starvation mode, it will be handed to the current goroutine. Otherwise, the current goroutine will have to grab the lock again (reset the spin counter).

If atomic.Com pareAndSwapInt32 (& Margaret spellings Tate, old and new) {if old & (mutexLocked | mutexStarving) = = 0 {break / / unlocked success} / / waiting for the first time, Queue head added to the semaphore queue queueLifo := waitStartTime! = 0 if waitStartTime == 0 { waitStartTime = runtime_nanotime() } runtime_SemacquireMutex(&m.sema, queueLifo, 1) / / set the hunger tag starving = starving | | runtime_nanotime () - waitStartTime > starvationThresholdNs old = Margaret spellings Tate the if old&mutexStarving ! Delta := int32(mutexLocked - 1<<mutexWaiterShift) if! Starving | | old > > mutexWaiterShift = = 1 {/ / not hungry goroutine, finally a waiter is not hungry, Delta-= mutexmedicine} atom.addint32 (&M.state, delta) break} awoke = true iter = 0} else {old = M.state}Copy the code

So that’s the ultimate Mutex Lock function implementation. Let’s answer some common questions:

1. If Mutex has been marked as “hunger mode”, when will it become “normal mode”?

Waiter, who owns a Mutex, will convert the Mutex to normal mode if he finds one of the following two cases; First this waiter is already the last waiter in the queue (by code! Starving | | old > > mutexWaiterShift = = 1), no other wait for lock goroutine; The second is the waiter the wait time is less than 1 millisecond (via code starving = starving | | runtime_nanotime () – waitStartTime > starvationThresholdNs judgment).

1. Why hunger mode?

Starvation mode is a balance between fairness and performance, avoiding long wait locks on some Goroutines. In hunger mode, priority is given to the waiter who has been waiting. Normal mode has better performance because goroutine can acquire the lock multiple times in a row even with a waiter waiting for the lock.

1. How does Mutex work in Hunger mode?

In starvation mode, the lock is handed directly to the first Goroutine in the queue. This is reflected in the code.

It is recommended that you read this section several times. It is better to start from the article and think along the historical development. If you have any questions, please feel free to communicate with me in the comments section.

Let’s move on to the Unlock method.

4.3 Sync.Mutex Ultimate version of Unlock function implementation

The code logic of Unlock is easier to understand than that of Lock. Let’s look at the flow chart.

The code is as follows:

Func (m *Mutex) Unlock() {// Fast path: = atomy.addint32 (&m.state, -mutexlocked) if new! Func (m *Mutex) unlockSlow(new int32) {// The unlocked Mutex will panic if (new+mutexLocked)&mutexLocked == 0 { throw("sync: Unlocked of unlocked mutex")} new&mutexmedicine == 0 {old := new for {unlocked of unlocked mutex")} new&mutexmedicine == 0 {old := new for { Them return if old > > mutexWaiterShift = = 0 | | old & (mutexLocked | mutexWoken | mutexStarving)! = 0 {return} // Reduce the number of waits and set the Mutex wakeup flag, CAS lock is released new = (old - 1 < < mutexWaiterShift) | mutexWoken if atomic.Com pareAndSwapInt32 (& Margaret spellings Tate, old, new) { runtime_Semrelease(&m.sema, false, Return} old = m.state}} else {// In hungry mode, wake up the waiting goroutine from the head of the priority queue. Runtime_Semrelease (&m.sema, true, 1)}}Copy the code

We’ll remove the Lock flag, but if the rest of the state field is not zero, we’ll need to do some extra work. This is the unlockSlow code logic. If we’re in starvation mode, we’ll wake up a Goroutine from the head of the priority queue. [Fixed] [Hunger] Locks are given directly to the awakened Goroutine. In normal mode, enter the for loop. If other states of the lock have been changed, unlockSlow does nothing but return, otherwise the number of lock waiters is -1 and the wake up flag is set. After the CAS operation succeeds, Wake up a Goroutine from the end of the SEMA priority queue to participate in the lock grab.

That’s all the implementation of sync.mutex Ultimate. Let’s take a look at some of the bugs in sync. Mutex.

5. Sync.Mutex common pit

The use of Mutex can cause a deadlock or panic if you are not careful. If you do not use Mutex properly, there will be a large performance cost. Next, we will analyze the above source code.

Mutex is stateless to goroutine

We can see from the source code that the different goroutine operations Mutex, a global variable, have no state record, which can occur in two ways: 1. One Goroutine can release another goroutine’s lock; 2. If the goroutine reenters the Lock twice, it will deadlock. When using Mutex, it is recommended that Lock/Unlock come in pairs, preferably encapsulated in a function. Defer is a good solution. It is also possible to encapsulate the statelessness of Mutex, such as reentrant locks, adding lock detection mechanisms, and so on.

Mutex must not be copied

We can see from the source code analysis that Mutex can be used by millions of Goroutines. When copying Mutex, it is only a temporary state of copying. After replication, a new Mutex may somehow be locked, awake, or hungry, or even wait for blocks that are much greater than zero. Unlock does not affect the replication lock.

When a Go struct meets a Mutex

What happens when there is a strong lock contention

If you use Mutex, you may find that a large number of goroutine blocks on the Lock function of Mutex. In this case, you need to find ways to optimize the program. Firstly, you should use Mutex as little as possible. The program logic for Mutex protection should not be too complicated, because a large number of Goroutines blocking Mutex’s Lock function will cause starvation. The problem with starvation is worse performance, and failure to return to normal mode faster would be a disaster.

For optimization of lock contention, read: Online problems caused by an incorrect use of Go-cache

6. Summary

The analysis of strategic models in DDD design points out that every complex system has an evolutionary process and we should not expect to do it well at once. “Rome was not built in a day”. We have to admire the ingenuity of the developers of Sync.Mutex, who have polished and perfected the design of Sync.Mutex. Although it may not be perfect, it can still be refined. However, we should learn from the mentality and spirit of finding and solving problems. This article also analyzed a lot of the basics of sync. Mutex implementation: atomicity, semaphore, CAS; I also pointed out the problems to be noticed when using Mutex. I hope interested readers continue to communicate with me in the comments section!

Reference:

Mutex source code:

Colobu.com/2018/12/18/…

How are atomic operations implemented

Github.com/luohaha/MyB…

Semaphores

Baike.baidu.com/item/%E4%BF…

This paper thoroughly understand the CAS implementation principle

zhuanlan.zhihu.com/p/94762520

Go sync.Mutex goes deep

Juejin. Cn/post / 698498…

An online problem caused by an incorrect use of go-cache

Mp.weixin.qq.com/s/tcsSgCRj-…

When Go struct meets Mutex

Mp.weixin.qq.com/s/OYGVR0d-f…