Interviewer: What is a deadlock? How to check for deadlocks? How do I avoid deadlocks?

Suddenly realized that my diagram system is missing the “deadlock” content, here to fix it.

Deadlocks are also a common topic in the interview process, because if you have a lot of deadlocks in your online environment, it can be a big deal.

This time, let’s talk systematically about deadlocks.

The concept of deadlock;
Simulate the occurrence of deadlock problem;
Troubleshoot deadlocks using tools;
Avoid deadlock problems;

The concept of deadlock

In multithreaded programming, in order to prevent multiple threads from competing for shared resources and causing data disorder, mutex is added before the operation of shared resources. Only the thread that successfully obtains the lock can operate the shared resources, while the thread that cannot obtain the lock can only wait until the lock is released.

When two threads, then, in order to protect the two different sharing resources and USES two mutex, then the two mutexes used improperly, can cause two threads are waiting for the other party releases the lock, in under the action of external force, the thread will have been waiting for each other, just can’t continue to run, this kind of situation is happened deadlock.

For example, Kobayashi takes the key to Kobayashi’s room, and Kobayashi is in her room, and Kobayashi takes the key to Kobayashi’s room, and Kobayashi is in her room. If Kobayashi wants to get out of his room, he must get the key in Kobayashi’s hand, but kobayashi must get the key in Kobayashi’s hand in order to get out, which forms a deadlock.

A deadlock can only occur if the following four conditions are met simultaneously:

Mutual exclusion condition;
Hold and wait for conditions;
Inalienable conditions;
Loop waiting condition;

Mutual exclusion conditions

A mutual exclusion condition means that multiple threads cannot use the same resource at the same time.

For example, if A resource that thread A already holds cannot be held by thread B at the same time, if thread B requests A resource that thread A already holds, thread B must wait until thread A releases the resource.

Hold and wait for conditions

The hold-and-wait condition means that thread A has already held resource 1 and wants to apply for resource 2, which has already been held by thread C. Therefore, thread A will be in the wait state. However, thread A will not release its own resource 1 while waiting for resource 2.

Inalienable condition

An inalienable condition means that A thread that has A resource cannot be acquired by another thread until it has used it up. If thread B also wants to use the resource, it can obtain it only after thread A has used it up and released it.

Loop waiting condition

Loop wait conditions refer to the order in which two threads acquire resources in a loop chain when a deadlock occurs.

For example, if thread A already holds resource 2 and wants to request resource 1, thread B already has resource 1 and wants to request resource 2, this forms A ring graph of resource request waiting.

Simulate the occurrence of deadlock problems

Talk is cheap. Show me the code.

Below, we use code to simulate deadlock problems.

First, create two threads, thread A and thread B, and two mutex_A and mutex_B mutex_A and mutex_B.

pthread_mutex_t mutex_A = PTHREAD_MUTEX_INITIALIZER; pthread_mutex_t mutex_B = PTHREAD_MUTEX_INITIALIZER; int main() { pthread_t tidA, tidB; // Create two threads pthread_create(&tidA, NULL, threadA_proc, NULL); pthread_create(&tidB, NULL, threadB_proc, NULL); pthread_join(tidA, NULL); pthread_join(tidB, NULL); printf("exit\n"); return 0; }Copy the code

Next, let’s look at what the thread A function does.

// thread function A void *threadA_proc(void *data) {printf("thread A waiting get ResourceA \n"); pthread_mutex_lock(&mutex_A); printf("thread A got ResourceA \n"); sleep(1); printf("thread A waiting get ResourceB \n"); pthread_mutex_lock(&mutex_B); printf("thread A got ResourceB \n"); pthread_mutex_unlock(&mutex_B); pthread_mutex_unlock(&mutex_A); return (void *)0; }Copy the code

As can be seen, the process of thread A function:

Get mutex A, then sleep for 1 second;
Acquire mutex B, and release mutex B;
Finally, release the mutex A;

// thread function B void *threadB_proc(void *data) {printf("thread B waiting get ResourceB \n"); pthread_mutex_lock(&mutex_B); printf("thread B got ResourceB \n"); sleep(1); printf("thread B waiting get ResourceA \n"); pthread_mutex_lock(&mutex_A); printf("thread B got ResourceA \n"); pthread_mutex_unlock(&mutex_A); pthread_mutex_unlock(&mutex_B); return (void *)0; }Copy the code

As can be seen, thread B functions process:

Get mutex B, then sleep for 1 second;
Acquire mutex A, and release mutex A;
Finally, release the mutex B;

We then run the program with the following results:

thread B waiting get ResourceB thread B got ResourceB thread A waiting get ResourceA thread A got ResourceA thread B Waiting get ResourceA thread A waiting get ResourceBCopy the code

You can see that thread B is waiting for mutex A to be released, thread A is waiting for mutex B to be released, and both sides are waiting for each other’s resources to be released. Obviously, you have A deadlock problem.

Troubleshoot deadlocks using tools

If you want to check whether your Java program is deadlocked, you can use the JStack tool, which is the JDK’s own thread stack analysis tool.

Since Kobayashi’s deadlock code example is written in C, under Linux, we can use the pstack + GDB tool to locate the deadlock problem.

The pstack command displays stack trace information for each thread (function call) and is easy to use by pstack < PID >.

Therefore, when locating the deadlock problem, we can run the pstack command for several times to check the function call process of the thread, and compare the results for several times to confirm which threads remain unchanged and are waiting for the lock, so it is highly likely to be caused by the deadlock problem.

I used pStack to output all the threads of the process that I used to simulate the deadlock problem. After I executed the command several times, the result was the same, as follows:

$ pstack 87746
Thread 3 (Thread 0x7f60a610a700 (LWP 87747)):
#0  0x0000003720e0da1d in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x0000003720e093ca in _L_lock_829 () from /lib64/libpthread.so.0
#2  0x0000003720e09298 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x0000000000400725 in threadA_proc ()
#4  0x0000003720e07893 in start_thread () from /lib64/libpthread.so.0
#5  0x00000037206f4bfd in clone () from /lib64/libc.so.6
Thread 2 (Thread 0x7f60a5709700 (LWP 87748)):
#0  0x0000003720e0da1d in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x0000003720e093ca in _L_lock_829 () from /lib64/libpthread.so.0
#2  0x0000003720e09298 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x0000000000400792 in threadB_proc ()
#4  0x0000003720e07893 in start_thread () from /lib64/libpthread.so.0
#5  0x00000037206f4bfd in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x7f60a610c700 (LWP 87746)):
#0  0x0000003720e080e5 in pthread_join () from /lib64/libpthread.so.0
#1  0x0000000000400806 in main ()

....

$ pstack 87746
Thread 3 (Thread 0x7f60a610a700 (LWP 87747)):
#0  0x0000003720e0da1d in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x0000003720e093ca in _L_lock_829 () from /lib64/libpthread.so.0
#2  0x0000003720e09298 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x0000000000400725 in threadA_proc ()
#4  0x0000003720e07893 in start_thread () from /lib64/libpthread.so.0
#5  0x00000037206f4bfd in clone () from /lib64/libc.so.6
Thread 2 (Thread 0x7f60a5709700 (LWP 87748)):
#0  0x0000003720e0da1d in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x0000003720e093ca in _L_lock_829 () from /lib64/libpthread.so.0
#2  0x0000003720e09298 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x0000000000400792 in threadB_proc ()
#4  0x0000003720e07893 in start_thread () from /lib64/libpthread.so.0
#5  0x00000037206f4bfd in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x7f60a610c700 (LWP 87746)):
#0  0x0000003720e080e5 in pthread_join () from /lib64/libpthread.so.0
#1  0x0000000000400806 in main ()
Copy the code

Thread 2 and Thread 3 are blocking the pthread_mutex_lock process, and the pstack output does not change many times, so there is a high probability of deadlock.

However, it is not possible to confirm that the two threads are waiting for each other’s locks to be released because we cannot see which lock object they are waiting on, so we can use the GDB tool to further confirm.

The whole GDB debugging process is as follows:

$GDB -p 87746 info Thread 3 Thread 0x7F60A610a700 (LWP 87747) 0x0000003720e0da1D in __lll_lock_wait () from /lib64/libpthread.so.0 2 Thread 0x7f60a5709700 (LWP 87748) 0x0000003720e0da1d in __lll_lock_wait  () from /lib64/libpthread.so.0 * 1 Thread 0x7f60a610c700 (LWP 87746) 0x0000003720e080e5 in pthread_join () from // lib64/libpthread.so.0 // the leftmost * represents the thread that GDB locks, Switch to second thread to view // Switch to second thread (GDB) thread 2 [Switching to Thread 2 (thread 0x7F60A5709700 (LWP 87748))]#0 0x0000003720e0DA1D In __lll_lock_wait () from /lib64/libpthread.so. Same as pstack command (GDB) bt #0 0x0000003720e0da1d in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x0000003720e093ca in _L_lock_829 () from /lib64/libpthread.so.0 #2 0x0000003720e09298 in pthread_mutex_lock () from /lib64/libpthread.so.0 #3  0x0000000000400792 in threadB_proc (data=0x0) at dead_lock.c:25 #4 0x0000003720e07893 in start_thread () from /lib64/libpthread.so.0 #5 0x00000037206f4BFD in clone () from /lib64/ libbc.so. Frame 3 #3 0x0000000000400792 in threadB_proc (data=0x0) at dead_lock.c:25 27 printf("thread B" waiting get ResourceA \n"); 28 pthread_mutex_lock(&mutex_A); // Prints the value of mutex_A, __owner represents the value of the thread in GDB, LWP (GDB) p mutex_A $1 = {__data = {__lock = 2, __count = 0, __owner = 87747, __nusers = 1, __kind = 0, __spins = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = "\002\000\000\000\000\000\000\000\303V\001\000\001", '\000' <repeats 26 times>, __align = 2} // prints the value of mutex_B, __owner indicates the value of thread in GDB, LWP (GDB) p mutex_B $2 = {__data = {__lock = 2, __count = 0, __owner = 87748, __nusers = 1, __kind = 0, __spins = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = "\002\000\000\000\000\000\000\000\304V\001\000\001", '\000' <repeats 26 times>, __align = 2}Copy the code

Let me explain the debugging process above:

throughinfo threadAfter printing all thread information, we can see that there are three threads, one is the main thread (LWP 87746), and the other two threads are created by ourselves (LWP 87747 and 87748).
throughthread 2, will switch to the second thread (LWP 87748);
throughbtThread B (LWP 87748) is threadB (LWP 87748).
throughframe 3Print the third frame in the call stack. You can see that thread B blocked on acquiring mutex A.
throughp mutex_A, print the mutex A object information, you can see that it is held by LWP thread 87747 (thread A);
throughp mutex_B, print the mutex A object information, it can be seen that it is held by LWP thread 87748 (thread B);

Because thread B is waiting on mutex_A owned by thread A, and thread A is waiting on mutex_B owned by thread B, you can conclude that the program is deadlocked.

Avoid deadlock problems

As mentioned earlier, the four necessary conditions for deadlock are mutual exclusion, hold-and-wait, inalienable, and loop wait.

Therefore, to avoid the deadlock problem, only one of the conditions can be broken. The most common and feasible method is to use the orderly allocation of resources to break the loop waiting condition.

So what is the orderly allocation of resources?

Thread A and thread B must obtain resources in the same order. If thread A tries to obtain resources A and then B, thread B also tries to obtain resources A and then B. That is, thread A and thread B always request the resources they want in the same order.

We use an orderly allocation of resources to modify the code where the deadlock occurred. We can leave thread A’s code unchanged.

Let’s first understand the order in which thread A acquires resources. It acquires mutex A, then mutex B.

So we can break the deadlock by simply changing thread B to acquire resources in the same order.

The improved code for thread B function is as follows:

Void *threadB_proc(void *data) {printf("thread B waiting get ResourceA \n"); pthread_mutex_lock(&mutex_A); printf("thread B got ResourceA \n"); sleep(1); printf("thread B waiting get ResourceB \n"); pthread_mutex_lock(&mutex_B); printf("thread B got ResourceB \n"); pthread_mutex_unlock(&mutex_B); pthread_mutex_unlock(&mutex_A); return (void *)0; }Copy the code

As you can see, no deadlocks have occurred.

thread B waiting get ResourceA 
thread B got ResourceA 
thread A waiting get ResourceA 
thread B waiting  get ResourceB 
thread B got ResourceB 
thread A got ResourceA 
thread A waiting get ResourceB 
thread A got ResourceB
exit
Copy the code

conclusion

Simply put, the deadlock problem occurs when two or more threads are running in parallel, competing for resources and waiting for each other.

A deadlock can occur only if the conditions of mutual exclusion, hold and wait, inalienable, and loop wait are met simultaneously.

Therefore, to avoid the deadlock problem, it is necessary to destroy one of the conditions, the most common method is to use the orderly allocation of resources to destroy the loop waiting condition.