This article explains how processes and threads communicate

Interprocess communication

Processes need to communicate frequently with other processes. For example, in a shell pipe, the output of the first process must be passed to the second process, and so along the pipe. Therefore, if processes need to communicate with each other, they must use a good data structure that cannot be interrupted. We will discuss Inter Process Communication (IPC) together.

There are three problems with interprocess communication

The first problem, mentioned above, is how one process passes messages to other processes.
The second problem is how to ensure that two or more threads do not interfere with each other. For example, both airlines are trying to snap up the last seat on a plane for different customers.
The third problem is the order of the data, if process A produces the data and process B prints it. Process B can print data only after process A generates data.

It is important to note that the last two of these three questions also apply to threads

The first problem is easier to solve between threads because they share an address space, they have the same runtime environment, and you can imagine if you were writing multithreaded code in a high-level language, the thread communication problem was easier to solve?

The other two problems also apply to threads, and the same problem can be solved in the same way. We’ll talk about these three questions later, but you’ll just have an idea in your head.

A race condition

In some operating systems, collaborating processes may share common resources that can be read and written to each other. Common resources may be in memory or in a shared file. To illustrate how processes communicate, let’s take an example: a background printer. When a process needs to print a file, it places the file name in a special spooler directory. Another printer daemon checks periodically to see if a file needs to be printed, and if so, prints and deletes the file name from the directory.

Suppose our background directory has a very large number of slots, numbered 0,1,2… , each slot stores a file name. Also assume that there are two shared variables: out, which points to the next file to print; In, pointing to the next free slot in the directory. You can save these two files in a two-word file accessible to all processes. At one point, slots 0 to 3 were empty, and slots 4 to 6 were occupied. At the same time, process A and process B both decide to queue A file to print, as follows

Murphy’s Law says that anything that can go wrong will go wrong, and this is what can happen when it goes into effect.

Process A reads in with A value of 7 and stores 7 in A local variable next_free_slot. A clock interrupt occurs. The CPU determines that process A has been running for long enough and switches to process B. Process B also reads the value of in and finds that it is 7. Process B then writes 7 to its local variable next_free_slot, at which point both processes think that the next available slot is 7.

Process B will now continue running, it will write the print file name to slot 7, change the pointer to IN to 8, and then process B will leave to do something else

Now process A is running again. Since process A also found that slot 7 was empty by checking next_free_slot, process A saved the printed file name into slot 7 and updated the value of in to 8. Because slot 7 already contains values written by process B, the print file name of process A will overwrite the file name of process B. Because the printer cannot discover which process updated the file, process B will never print output. That is, when two or more threads modify a shared data at the same time, affecting the correctness of the program, this is called a race condition. Debugging race conditions is a very difficult task, because the vast majority of the time the program works fine, but in the rare cases where something strange happens that cannot be explained. Unfortunately, this problem with multicore growth makes race conditions more common.

A critical region

Not only can race conditions be created by sharing resources, but they can also be created by sharing files and memory, so how can you avoid them? It is forbidden for one or more processes to read or write shared resources (including shared memory and shared files) at the same time. In other words, we need a mutual exclusion condition, which means that if one process uses shared variables and files in a certain way, other processes are prohibited from doing so (accessing unified resources). The crux of the above problem is that process B uses the shared variable before process A finishes using it. In any operating system, choosing the right primitive to implement mutex is a major design problem, which we’ll focus on next.

The conditions for avoiding competition problems can be described in an abstract way. Most of the time, processes are busy with internal calculations and other calculations that do not result in race conditions. However, sometimes a process accesses shared memory or files, or does something that can cause a race condition. We refer to program segments that access shared memory as critical regions or critical sections. If we can do it correctly so that it is impossible for two different processes to be in a critical region at the same time, we can avoid race conditions, which is also from an operating system design perspective.

Although the above design avoids race conditions, it does not ensure the correctness and efficiency of concurrent threads accessing shared data simultaneously. A good solution should include the following four conditions

At no time can two processes be in a critical region at the same time
No assumptions should be made about the speed and number of cpus
Processes outside a critical area must not block other processes
You cannot make any process wait indefinitely to enter a critical section

At time T1, process A enters the critical zone. At time T2, process B attempts to enter the critical zone. Since process A is in the critical zone, process B blocks until process A leaves the critical zone at time t3. Process B can then allow access to the critical section. Finally, at t4, process B leaves the critical region and the system returns to its original state without process.

Busy like a mutex

We’ll continue to explore the various designs that implement mutual exclusion, where while a process is busy updating its critical region of shared memory, no other process enters its critical region and is not affected.

Shielding interrupt

On a uniprocessor system, the simplest solution is to have each process mask all interrupts immediately after entering a critical section and re-enable them before leaving the critical section. After masking interrupts, clock interrupts are also masked. The CPU switches processes only when clock interruptions or other interruptions occur. This way, the CPU does not switch to another process after masking interrupts. So, once a process has shielded itself from interrupts, it can examine and modify shared memory without worrying about other processes accessing shared data.

Is this plan feasible? Who decides when a process enters a critical region? Not a user process? When a process enters a critical region, the user process shuts down and interrupts. If the process does not leave after a long period of time, then interrupts will not be enabled. What happens? It could bring down the entire system. And in the case of multiple processors, masking interrupts only apply to cpus that execute the Disable instruction. Other cpus will continue to run and have access to shared memory.

On the other hand, it is convenient for the kernel to break masking when it is executing several instructions to update a variable or list. For example, a race condition may occur if multiple processes interrupt while processing in the ready list. So while interrupt masking is a useful technique for the operating system itself, it is not a universal mutually exclusive mechanism for user threads.

Lock variable

As a second attempt, look for a software-level solution. Consider having a single shared (lock) variable, starting with a value of 0. When a thread wants to enter a critical area, it first looks to see if the value of the lock is 0. If the value of the lock is 0, the process sets it to 1 and lets the process enter the critical area. If the lock state is 1, the process waits until the value of the lock variable changes to 0. Therefore, a value of 0 for the lock variable means that no threads are entering the key area. A value of 1 means that processes are in the critical area. After we modify the above figure, it is shown as follows

Is this the right way to design? Is there a flaw? Suppose one process reads the value of the lock variable and finds it to be 0, and just before it sets it to 1, another process schedules to run, reads the lock variable to be 0, and sets the lock variable to be 1. The first thread then runs, setting the lock variable to 1 again, and the critical region will have two processes running simultaneously.

Perhaps some readers can take the view that checking once before entering and checking again in the key areas to leave will not solve the problem. In practice, this situation is not helpful because it is possible for other threads to change the value of the lock variable during the second check. In other words, set-before-check is not an atomic operation, so race conditions can also occur.

Strict polling method

In this case, the program is written in C because operating systems are generally written in C (and occasionally C++) rather than Java, Modula3, or Pascal. Java native keyword is also C or C++ written source code. While C is a powerful, efficient, predictable, and characteristic language for writing an operating system, Java is unpredictable because it runs out of memory at critical moments and calls garbage collection to reclaim memory at inappropriate times. This does not happen in C, where garbage collection is not actively invoked to reclaim memory. A comparison of C, C++, Java, and the other four languages can be found in the links

Code for process 0

while(TRUE){
  while(turn ! =0) {/* Enter key area */
    critical_region();
    turn = 1;
    /* Leave the critical area */noncritical_region(); }}Copy the code

The code for process 1

while(TRUE){
  while(turn ! =1){
    critical_region();
    turn = 0; noncritical_region(); }}Copy the code

In the code above, the variable turn, with an initial value of 0, is used to record which process’s turn to enter the critical section and to check or update shared memory. At the beginning, process 0 checks the TURN, finds that it has a value of 0, and enters the critical section. Process 1 also finds that it has a value of 0, so it tests turn repeatedly in a waiting loop to see when it changes to 1. The method of continuously examining a variable until a value appears is called busywaiting. This approach should generally be avoided because it wastes CPU time. Use busy wait only when there is reason to believe that the wait time is very short. Locks used for busy waiting are called spinlocks.

When process 0 leaves the critical region, it sets the value of TURN to 1 to allow process 1 to enter its critical region. If process 1 leaves the critical zone soon, then both processes are outside the critical zone and the turn value is set to 0. Now process 0 quickly completes the loop, exits the critical section and sets the value of TURN to 1. At this point, the value of TURN is 1, and both processes are executing outside their critical region.

Suddenly, process 0 finishes operating on the non-critical section and returns to the beginning of the loop. However, it cannot enter the critical section at this point because the current value of turn is 1 and process 1 is still busy with the non-critical section. Process 0 can only continue the while loop until process 1 changes the value of turn to 0. This shows that it is not a good idea to take turns entering a critical section when one process is executing much slower than the other.

This situation violates statement 3 above, which states that processes outside the critical region may not block other processes, and that process 0 is blocked by a process outside the critical region. As it violates article 3, it is not a good plan.

Peterson solution

Dutch mathematician T.Dekker first proposed a software mutual-exclusion algorithm that does not require strict rotation by combining lock variables with warning variables. For Dekker’s algorithm, see link

Later, G.L.Peterson discovered a much simpler mutual exclusion algorithm, which is as follows

#define FALSE 0
#define TRUE  1
#define N     2								    /* Number of processes */

int turn;										/* Whose turn is it now */
int interested[N];								/* All values are initialized to 0 (FALSE) */

void enter_region(int process){					/* The process is 0 or 1 */
  
  int other;									/* Another process number */
  
  other = 1 - process;							/* Another process */
  interested[process] = TRUE;					/* Indicates a willingness to enter the critical section */
  turn = process;
  while(turn == process 
        && interested[other] == true) {}/* Empty loop */
  
}

void leave_region(int process){
  
  interested[process] == FALSE;				    /* indicates leaving the critical section */
}
Copy the code

Before using a shared variable (that is, before entering its critical region), each process calls enter_region with its own process number 0 or 1 as an argument. This function call will cause the process to wait until the critical region is safe as needed. After completing operations on shared variables, the process calls leave_region to indicate completion and allows other processes to enter.

Now let’s see how this works. At first, no process was in the critical region, and now process 0 calls enter_region. It indicates that it wants to enter the critical section by setting the array elements and setting TURN to 0. Because process 1 does not want to enter the critical region, enter_region returns quickly. If the process now calls enter_region, process 1 will hang there until interested[0] turns FALSE, which can only happen if process 0 calls leave_region to exit the critical region.

Now consider a case where two processes call enter_region at the same time. They both save their own processes to turn, but only the last saved process number is valid; the previous process number was lost due to overwriting. If process 1 is the last saved, turn is 1. While both processes are running, process 0 will not loop and enter the critical section, while process 1 will loop indefinitely and will not enter the critical section until process 0 exits the position.

TSL instruction

Now consider a solution that requires hardware help. Some computers, especially those designed for multiprocessors, have the following instruction

TSL RX,LOCK	
Copy the code

Called test and set lock, it reads a memory word LOCK into register RX and stores a non-zero value on that memory address. Read and write instructions are guaranteed to be one, indivisible, and executed together. No other processor is allowed to access memory until this instruction terminates. The CPU executing the TSL instruction locks the memory bus, preventing other cpus from accessing memory until the instruction ends.

It is important to note that locking the memory bus is not the same as disabling interrupts. Disabling interrupts does not guarantee that one processor will read or write to memory between reads or writes. That is, masking interrupts on processor 1 has no effect on processor 2. The best way to keep processor 2 out of memory until processor 1 finishes reading and writing is to lock the bus. This requires a special piece of hardware (basically, a bus ensures that the bus is used by the processor that locks it and not by other processors)

To use TSL instructions, a shared variable lock is used to coordinate access to shared memory. When lock is 0, any process can use the TSL directive to set it to 1 and read and write to the shared memory. When the operation ends, the process uses the move instruction to reset the lock value to 0.

How does this instruction prevent two processes from entering a critical section at the same time? Here’s the solution

Enter_region: TSL REGISTER, LOCK LOCK | copy to REGISTER and LOCK set to 1 CMP REGISTER, # 0 | LOCK is 0? Developed enter_region | if it is not zero, LOCK has been set up, so the cycle RET | returned to the caller, enter the critical section leave_region: MOVE the LOCK, # 0 | in LOCK into 0 RET | returned to the callerCopy the code

We can see that the idea of this solution is very similar to Peterson’s idea. Suppose the following 4-instruction assembly language program exists. The first instruction copies the original value of lock into the register and sets lock to 1, then compares the original value to 0. If it is not zero, it has been locked before, and the program returns to the start and tests again. After a period of time (long or short), the value changes to 0 (when the process currently in the critical section exits the critical section), and the process returns, locked. To clear the lock, the program simply stores 0 in the lock. No special synchronization instruction is required.

It is now clear that the process will call enter_region before entering the critical region to determine whether to loop, if lock is 1, loop indefinitely, and if lock is 0, not loop and enter the critical region. It calls leave_region when the process returns from a critical region, which sets lock to 0. As with all solutions to critical region based problems, the process must call enter_region and leave_region at the right time for the solution to work.

Another alternative to TSL is XCHG, which atomically swaps the contents of two locations, for example, a register and a memory word, as shown below

Enter_region: MOVE the REGISTER, the # 1 | 1 in the memory device for XCHG REGISTER, LOCK | exchange REGISTER and LOCK the contents of the CMP REGISTER variables, # 0 | LOCK is 0? Developed enter_region | if not zero, the LOCK has been set up, cycle RET | returned to the caller, enter the critical section leave_region: MOVE the LOCK, # 0 | in LOCK into 0 RET | returned to the callerCopy the code

XCHG is essentially the same solution as TSL. All Intel x86 cpus use XCHG instructions for low-level synchronization.

Sleep and wake up

Peterson, TSL and XCHG in the above solutions are all correct, but they all have the disadvantage of busy waiting. The essence of these solutions is the same: first check to see if the critical region can be entered, and if not, the process will wait until it is allowed.

Not only does this approach waste CPU time, but it can lead to unexpected results. Consider that there are two processes on a computer with different priorities. H is the process with a higher priority and L is the process with a lower priority. The rule of process scheduling is to start running whenever process H is in a ready state. At some point L is in the critical region, at which point H becomes ready to run (for example, an I/O operation ends). H is now busy waiting, but since L is not scheduled when H is ready and L never has a chance to leave the critical area, H becomes an infinite loop, which is sometimes referred to as a Priority Inversion problem.

Now let’s look at interprocess communication primitives that block instead of wasting CPU time until they are not allowed to enter critical areas, the simplest being sleep and wakeup. Sleep is a system call that blocks the caller, that is, the system call is paused until another process wakes it up. The wakeup call takes one argument, which is the process to wakeup. Another way is that wakeup and sleep each take a single parameter, which is the memory address that sleep and wakeup need to match.

Producer-consumer problems

As an example of these private primitives, let’s consider the producer-consumer problem, also known as the bounded buffer problem. Both processes share a common fixed-size buffer. One is a producer who puts information into the buffer. The other is a consumer who takes it out of the buffer. It is also possible to generalize the problem to m producers and N consumers, but we will only discuss the case of one producer and one consumer to simplify the implementation.

If the buffer queue is full, there is a problem when producers still want to write data to the buffer. Its solution is to put the producer to sleep, that is, to block the producer. Wait until the consumer has fetched one or more data items from the buffer to wake it up. Similarly, when a consumer tries to fetch data from the buffer, but finds that the buffer is empty, the consumer also sleeps and blocks. Until the producer puts a new data into it.

This logic sounds simple, and it also requires a variable called a listener, which monitors the data in the buffer, tentatively called count. If the buffer holds at most N items, the producer will check whether count reaches N each time. Otherwise the producer puts a data item into the buffer and increments the value of count. The logic for the consumer is similar: first test if the value of count is 0, if it is 0 the consumer sleeps, blocks, otherwise it fetches data from the buffer and decays count. Each process also checks to see if other threads should be woken up, and if so, wakes that thread. Below is the code for the producer consumer

#define N 100										/* Number of buffer slots */
int count = 0										/* The number of buffer data */
  
/ / producer
void producer(void) {int item;
  
  while(TRUE){									    /* infinite loop */
    item = produce_item()				            /* Generate the next item */
    if(count == N){
      sleep();									    /* If the cache is full, it blocks */
    }
    
    insert_item(item);					           /* Puts the current data in the buffer */
    count = count + 1;					           /* Increase the number of buffers count */
    if(count == 1){
      wakeup(consumer);					           /* Is the buffer empty? * /}}}/ / consumer
void consumer(void){
  
  int item;
  
  while(TRUE){									     /* infinite loop */
  	if(count == 0) {/* If the buffer is empty, it blocks */
      sleep();
    }
   	item = remove_item();				              /* Retrieves a data from the buffer */
    count = count - 1						         /* Subtracts the buffer count by one */
    if(count == N - 1) {/* Is the buffer full? * /
      wakeup(producer);		
    }
    consumer_item(item);				             /* Prints the data item */}}Copy the code

To describe system calls like sleep and wakeup in C, we will represent them in the form of library function calls. They are not part of the C library, but can be used on any system that actually has these system calls. Insert_item and remove_item, which are not implemented in the code, are used to record things like putting items into the buffer and pulling data out of the buffer.

Now let’s go back to the producer-consumer problem, where the competition condition is created in the code above because the count variable is exposed to the public. It is possible that the buffer is empty and the consumer just reads the value of count and finds that it is 0. At this point the scheduler decides to suspend the consumer and start running the producer. The producer produces a piece of data, places it in the buffer, and then increments the value of count, noting that its value is 1. Since count is 0, the consumer must be asleep, so the producer calls wakeup to wake the consumer. However, the consumer is not logically asleep at this point, so the wakeup signal is lost. When the consumer next starts up, it looks at the count value it read earlier, finds that it has a value of 0, and goes to sleep there. After a while the producer will fill the entire buffer, after which it will block, so that both processes will sleep forever.

The nature of the above problem is that waking up a process that is not yet in a sleep state results in wakefulness loss. If it is not lost, everything is fine. A quick way to solve the above problem is to add a wakeup waiting bit. This position is 1 when a wakeup signal is sent to a process that is still awake. Later, when the process tries to sleep, if the wake wait bit is 1, the bit is cleared while the process remains awake.

However, when the number of processes is large, you can say that you want to wake up the wait bits by increasing the number of wakeup wait bits, so you have 2, 4, 6, 8 wakeup wait bits, but that doesn’t fundamentally solve the problem.

A semaphore

A semaphore is a method proposed by E.W.Dijkstra in 1965 that uses an orthopedic variable to accumulate the number of wakes for later use. In his view, there was a new type of variable called a semaphore. The value of a semaphore can be 0, or any positive number. Zero means no wakes are needed, and any positive number is the number of wakes.

Dijkstra proposed that semaphores have two operations, now usually using down and up (denoted by sleep and wakeup respectively). The down command checks to see if the value is greater than 0. If it is greater than 0, subtract 1; If this value is 0, the process will sleep and the down operation will continue. Checking values, changing variable values, and possible sleep actions are all performed as a single, indivisible atomic action. This ensures that once a semaphore operation is started, no other process can access the semaphore until the operation is complete or blocked. This atomicity is absolutely essential for solving synchronization problems and avoiding competition.

Atomic operation refers to the fact that in many other areas of computer science, a group of related operations are all performed without interruption or at all.

The up operation causes the semaphore value to + 1. If one or more processes are asleep on the semaphore and unable to complete a previous DOWN operation, the system selects one of them and allows the process to complete the down operation. Therefore, after an up operation is performed on the semaphore on which a process is sleeping, the value of the semaphore is still 0, but there is one less process sleeping on it. Increasing the semaphore value by 1 and waking up a process are also inseparable. No process will block for up, just as no process will block for wakeUp in the previous model.

Solve producer-consumer problems with semaphores

Solve the missing wakeup problem with the semaphore as follows

#define N 100								/* Define the number of buffer slots */
typedef int semaphore;						/* Semaphore is a special kind of int */
semaphore mutex = 1;						/* Control access to key areas */
semaphore empty = N;					    /* Count the number of empty buffer slots */
semaphore full = 0;							/* Count the number of full buffers */

void producer(void){ 
  
  int item;  
  
  while(TRUE){								/* TRUE's constant is 1 */
    item = producer_item();					/* Generates some data in the buffer */
    down(&empty);							/* Reduce the number of empty slots by 1 */
    down(&mutex);							/* Enter key area */
    insert_item(item);						/* Put the data into the buffer */
    up(&mutex);								/* Leave the critical section */
    up(&full);							    /* Will buffer the number of full slots + 1 */}}void consumer(void){
  
  int item;
  
  while(TRUE){								/* infinite loop */
    down(&full);							/* Number of full slots in the cache - 1 */
    down(&mutex);							/* Enter the buffer */	
    item = remove_item();					/* Fetch data from the buffer */
    up(&mutex);								/* Leave the critical section */
    up(&empty);								/* The number of empty slots + 1 */
    consume_item(item);					    /* Process data */}}Copy the code

To ensure that the semaphore works correctly, it is important to implement it in an indivisible way. Up and down are typically implemented as system calls. And the operating system only needs to temporarily block all interrupts when it checks for semaphores, updates, and puts the process to sleep if necessary. Because these operations require very few instructions, interruptions are not significant. If multiple cpus are used, the semaphore should be locked for protection. The TSL or XCHG instructions are used to ensure that only one CPU is operating on the semaphore at a time.

Using TSL or XCHG to prevent several cpus from accessing a semaphore at the same time is quite different from producers or consumers using busy wait to wait for others to vacate or fill buffers. The former takes only a few milliseconds, while the producer or consumer may take arbitrarily long.

The above solution uses three semaphores: one called full, which records the number of full buffer slots; One, called empty, records the number of empty buffer slots. One, called mutex, ensures that producers and consumers do not enter the buffer at the same time. Full is initialized to 0, empty is initialized to the number of slots in the buffer, and mutex is initialized to 1. The semaphore is initialized to 1 and used by two or more processes to ensure that only one of them can enter a critical region at a time is called a binary semaphores. Each process is guaranteed to be mutually exclusive if it executes down before entering a critical area and up after leaving a critical area.

Now we have a guarantee of a good interprocess primitive. And then let’s look at the sequence guarantee of interrupts

Hardware push stack program counter etc
Hardware loads the new program counter from the interrupt vector
The assembly language procedure saves the value of a register
Assembly language procedure sets up the new stack
C Interrupts server running (typical reads and cache writes)
The scheduler decides which of the following programs to run first
C procedure returns to assembly code
The assembly language procedure starts running the new current process

In systems that use semaphores, the natural way to hide interrupts is to equip each I/O device with a semaphore, which is initially set to 0. Immediately after the I/O device is started, the interrupt handler performs a down operation on the associated signal, and the process is blocked. When an interrupt enters, the interrupt handler then performs an up operation on the relevant semaphore, enabling the blocked process to resume running. In the interrupt handling steps above, step 5 C interrupt server running is an up operation performed by the interrupt handler on the semaphore, so in step 6, the operating system can execute the device driver. Of course, if several processes are already in a ready state, the scheduler may choose to run a more important process next, and we’ll discuss the scheduling algorithm later.

The above code actually uses semaphores in two different ways, and the distinction between the two is important. A mutex semaphore is used for mutual exclusion. It is used to ensure that only one process can read or write to the buffer and related variables at any one time. Mutual exclusion is a necessary operation to avoid process chaos.

Another semaphore is about synchronization. Full and Empty semaphores are used to ensure that an event occurs or does not occur. In this case, they ensure that the producer stops running when the buffer is full; The consumer stops running when the buffer is empty. These two semaphores are used differently from Mutex.

The mutex

When you don’t need the counting power of a semaphore, you can use a simple version of the semaphore called mutex. Mutexes have the advantage of keeping them in a shared resource and a piece of code. Because the implementation of mutex is simple and efficient, this makes mutex useful when implementing user-space threading packages.

A mutex is a shared variable that is in one of two states: unlocked or locked. Thus, only one binary bit is needed to represent it, although in general, it is usually represented by an integer. 0 indicates unlock, all other values indicate lock, and values greater than 1 indicate the number of locks.

Mutex uses two procedures. When a thread (or process) needs to access a critical area, it calls mutex_lock to lock it. If the mutex is currently unlocked (indicating that the critical region is available), the call succeeds and the calling thread is free to enter the critical region.

On the other hand, if the mutex is locked, the calling thread blocks until the thread in the key region completes and calls mutex_UNLOCK. If multiple threads block on a Mutex, a thread is randomly selected and allowed to acquire the lock.

Because mutex mutexes are so simple, they can be easily implemented in user space with TSL or XCHG instructions. The mutex_lock and mutex_UNLOCK codes for user-level threaded packages are as follows, as is the essence of XCHG.

Mutex_lock: TSL REGISTER, MUTEX | copies MUTEX to REGISTER, and the MUTEX CMP REGISTER set to 1, # 0 | 0 MUTEX is? JZE ok | if the mutex is 0, it is unlocked, so return CALL thread_yield | a mutex is used; Scheduling other threads JMP mutex_lock | try again ok: RET | returned to the caller, enter the critical section mutex_unlcok: MOVE the MUTEX, # 0 | the MUTEX set to 0 RET | returned to the callerCopy the code

The mutex_lock code is similar to the enter_region code above

Do you see the biggest difference in the code above?

Based on our analysis of TSL above, we know that if TSL determines that a process is not in a critical region, it will do an infinite loop to acquire the lock, while in TSL processing, if MUtex is in use, another thread will be scheduled to process. So the biggest difference above is actually the processing after judging MUtex /TSL.
In the (user) thread, the situation is different because there is no clock to stop a thread that is running too long. The result is that the thread that tries to acquire the lock by busy waiting will loop forever and never get the lock, because the running thread will not let the other thread run to release the lock, and the other thread will never get the lock at all. When the latter fails to acquire the lock, it calls thread_yield to relinquish the CPU to another thread. As a result, there is no busy waiting. The next time the thread runs, it tests the lock again.

That’s the difference between enter_region and mutex_lock. Since thread_yield is simply a user-space thread schedule, it runs very quickly. Thus, neither mutex_lock nor mutex_unlock requires any kernel calls. By using these procedures, user threads can fully synchronize in user space, requiring only a small amount of synchronization.

The mutex we described above is actually a set of instructions in the invocation framework. From a software perspective, more features and synchronization primitives are always needed. For example, sometimes a thread package provides a call to mutex_trylock that attempts to acquire a lock or return an error code, but does not lock. This gives the calling thread the flexibility to decide what to do next, whether to use an alternative method or wait.

Futexes

As parallelism increases, effective synchronization and locking are important for performance. Spin locks are very effective if the process waits for a short time. However, if the wait time is long, CPU cycles are wasted. If there are many processes, it is more efficient to block the process and let the kernel unblock only when the lock is released. Unfortunately, this approach also leads to another problem: it can work well when processes are contested, but kernel switching costs can be very high when processes are not contested, and it’s even harder to predict the number of lock contention.

An interesting solution is to combine the best of both and come up with a new idea called FUtex, or Fast User Space Mutex. Sounds interesting, doesn’t it?

Futex is a feature in Linux that implements basic locking (much like mutexes) and avoids getting stuck in the kernel, which is very expensive to switch, which can greatly improve performance. Futex consists of two parts: the kernel service and the user library. The kernel service provides a wait queue that allows multiple processes to queue on a lock. They won’t run unless the kernel explicitly unblocks them.

Putting a process on a wait queue requires an expensive system call and should be avoided. Without competition, Futex can work directly in user space. These processes share a 32-bit integer (INTEGER) as a common lock variable. Assuming the lock is initialized to 1, we assume that the lock has been released. Threads preempt locks by performing atomic operations decrement and test. Decrement and Set are atomic features in Linux that consist of inline compilations wrapped in C functions and defined in header files. Next, the thread checks the result to see if the lock has been released. If the lock is not currently locked, then it just happens that our thread can successfully preempt the lock. However, if the lock is held by another thread, the thread preempting the lock has to wait. In this case, the Futex library does not spin, but uses a system call to put threads in a wait queue in the kernel. Thus, the overhead of switching to the kernel is justified, since threads can block at any time. When the thread is done with the lock, it uses atomic increment and test to release the lock and checks the result to see if any processes are still blocked on the kernel wait queue. If so, it informs the kernel that it can unblock one or more processes in the wait queue. If there is no lock contention, the kernel does not need to compete.

Mutex in Pthreads

Pthreads provides some functionality for synchronizing threads. The most basic mechanism is the use of mutex variables, which can be locked and unlocked to protect each critical region. A thread that wants to enter a critical region first tries to fetch mutex. If the mutex is not locked, the thread can enter immediately and the mutex can be locked automatically, preventing other threads from entering. If the mutex is locked, the calling thread blocks until the MUtex is unlocked. If multiple threads are waiting on the same mutex, when the mutex is unlocked, only one thread can enter and re-lock. These locks are not required and programmers need to use them properly.

Here are the mutex related function calls

As expected, mutex can be created and destroyed by Phread_mutex_init and Pthread_mutex_destroy. Mutex can also be locked by Pthread_mutex_lock, which blocks the caller if the mutex is already locked. There is also a call to Pthread_mutex_trylock to try to lock the thread, and when the mutex is already locked, it returns an error code rather than blocking the caller. This call allows the thread to be busy effectively. Finally, Pthread_mutex_unlock unlocks mutex and releases a waiting thread.

In addition to mutex, Pthreads provides a second synchronization mechanism: condition variables. Mutex does a good job of allowing or blocking access to critical areas. A condition variable allows a thread to block because some condition is not met. Most of the time these two methods are used together. Let’s take a closer look at the relationships between threads, mutex, and condition variables.

Let’s revisit the producer/consumer problem: one thread puts things in a buffer, and another thread takes them out. If the producer finds that there are no empty slots in the buffer to use, the producer thread blocks until another thread is available. Producers use Mutex to perform atomicity checks without interference from other threads. But when the buffer is found to be full, the producer needs a way to block itself and wake up later. That’s what the condition variable does.

Here are some of the most important pThread calls related to condition variables

The above table shows some calls to create and destroy condition variables. The main properties on condition variables are Pthread_cond_wait and Pthread_cond_signal. The former blocks the calling thread until another thread signals (using the latter call). A blocked thread usually needs to wait for a wakeup signal to release resources or perform some other activity. Only then can the blocked thread continue to work. Condition variables allow waiting and blocking atomicity of the process. Pthread_cond_broadcast is used to wake up multiple blocked threads waiting for a signal to wake up.

Note that condition variables (unlike semaphores) do not exist in memory. Note that if a semaphore is passed to a condition variable where no thread is waiting, the signal will be lost

Here is an example using mutex and condition variables

#include <stdio.h>
#include <pthread.h>

#define MAX 1000000000								/* The number to produce */
pthread_mutex_t the_mutex;
pthread_cond_t condc,condp;							/* Use semaphore */
int buffer = 0;

void *producer(void *ptr){							/* Production data */
  
  int i;
  
  for(int i = 0; i <= MAX; i++){ pthread_mutex_lock(&the_mutex);/* Buffer exclusive access, that is, use mutex to get the lock */
    while(buffer ! =0){
      pthread_cond_wait(&condp,&the_mutex);
    }
    buffer = i;										/* Put them in the buffer */
    pthread_cond_signal(&condc);					/* Wake up the consumer */
    pthread_mutex_unlock(&the_mutex);			    /* Release buffer */
  }
  pthread_exit(0);
  
}

void *consumer(void *ptr){							/* Consumption data */
  
  int i;
  
  for(int i = 0; i <= MAX; i++){ pthread_mutex_lock(&the_mutex);/* Buffer exclusive access, that is, use mutex to get the lock */
    while(buffer == 0){
      pthread_cond_wait(&condc,&the_mutex);
    }
    buffer = 0;										/* Remove them from the buffer */
    pthread_cond_signal(&condp);					/* Wake up the producer */
    pthread_mutex_unlock(&the_mutex);			    /* Release buffer */
  }
  pthread_exit(0);
  
}							  
Copy the code

Tube side

In order to write more accurate programs, Brinch Hansen and Hoare propose a more advanced synchronization primitive called monitor. The two proposals are slightly different, as you can see from the descriptions below. A pipe is a collection of programs, variables, and data structures that form a special module or package. Processes can call programs in a pipe whenever they want, but they cannot access data structures and programs from outside the pipe. The following shows an abstract, concise procedure similar to that shown in Pascal. It cannot be described in C because a pipe is a language concept and C does not support pipe.

monitor example
	integer i;
	condition c;
	
	procedure producer(a);.end;
	
	
	procedure consumer(a);
	.
	end;
end monitor;
Copy the code

One of the most important features of a pipe is that only one process can be active at any one time. This makes it very easy to implement mutex operations. Pipes are a feature of the programming language, so the compiler knows they are special and can handle calls to them differently than other procedure calls. Normally, when a process calls a program in a pipe, the first few instructions of that program check to see if there are other active processes in the pipe. If so, the calling process will be suspended until the other process leaves the pipe. The calling process can only enter if no active process is using the pipe.

The mutex that enters the pipe is the responsibility of the compiler, but a common practice is to use mutex and binary semaphore. Because the compiler, not the programmer, is doing the work, the chances of an error are much lower. In no case should the programmer who writes the pipe program care about how the compiler handles it. He just needs to know how to convert all critical sections into pipe procedures. There are never two processes executing code in a critical section at the same time.

Even though pipe routines provide an easy way to implement mutex, in our opinion, this is not enough. Because we also need a way to block when the process cannot execute. In a producer-consumer problem, it’s easy to put tests for full and empty buffers in the pipe program, but how does a producer block if it finds the buffer full?

The solution is to introduce condition variables and the related operations Wait and signal. When a piped program finds that it cannot run (for example, if the producer finds that the buffer is full), it performs a wait operation on a condition variable, such as full. This action causes the calling process to block and also calls in another process that was waiting outside the pipe. We discussed the implementation details of condition variables in the previous pThread article. Another process, such as a consumer, can wake up a blocked calling process by executing signal.

Brinch Hansen differs from Hoare in waking up a process. Hoare recommends letting a newly woken process continue running. Suspend another process. Brinch Hansen recommends that the process executing signal be forced out of the pipeline. We’ll use Brinch Hansen’s recommendation here because it’s conceptually simpler and easier to implement.

If several processes are waiting on a condition variable, the system scheduler can select only one of them to resume running after the condition is signaled.

By the way, there is a third option not mentioned by the professors above, which is to let the process that executes signal continue to run and wait for it to exit the pipe so that other processes can enter the pipe.

A condition variable is not a counter. Condition variables also do not accumulate signals for later use as semaphores do. So, if a signal is sent to a condition variable, but there is no waiting process on that condition variable, the signal will be lost. That is, wait must be executed before signal.

The following is a solution to a producer-consumer problem implemented through a pipe program using Pascal

monitor ProducerConsumer
		condition full,empty;
		integer count;
		
		procedure insert(item:integer);
		begin
				if count = N then wait(full);
				insert_item(item);
				count := count + 1;
				if count = 1 then signal(empty);
		end;
		
		function remove:integer;
		begin
				if count = 0 then wait(empty);
				remove = remove_item;
				count := count - 1;
				if count = N - 1 then signal(full);
		end;
		
		count := 0;
end monitor;

procedure producer;
begin
			while true do
      begin 
      			item = produce_item;
      			ProducerConsumer.insert(item);
      end
end;

procedure consumer;
begin 
			while true do
			begin
						item = ProducerConsumer.remove;
						consume_item(item);
			end
end;
Copy the code

The reader might think that wait and signal look like the aforementioned sleep and wakeup operations, and that the latter have serious competing conditions. They are similar, but there is a key difference: sleep and wakeup fail because when one process tries to sleep, the other process tries to wake it up. This does not happen with pipework. This is guaranteed by the automatic mutex of the pipework, and if the producer in the pipework finds that the buffer is full, it will be able to complete the wait without worrying that the scheduler might switch to the consumer before the wait is complete. Even more, consumers are not allowed to enter the pipeline until the wait is complete and the producer is flagged as unexecutable.

Although class Pascal is an imaginary language, there are some real programming languages that support it, such as Java (finally Big Java), which can support pipe procedures, which is an object-oriented language that supports user-level threading, and which allows methods to be divided into classes. Simply add the keyword synchronized to the method. Java can guarantee that once the method is executed by one thread, no other thread is allowed to execute any synchronized methods in the object. Without the keyword synchronized, no cross-execution can be guaranteed.

Below are the producer-consumer problems that Java solves using pipe procedures

public class ProducerConsumer {
  static final int N = 100;							// Define the length of the buffer size
  static Producer p = new Producer();				// Initialize a new producer thread
  static Consumer c = new Consumer();				// Initialize a new consumer thread
  static Our_monitor mon = new Our_monitor();       // Initialize a pipe procedure
  
  static class Producer extends Thread{
    public void run(a){								// Run contains thread code
      int item;
      while(true) {// Producer loopitem = produce_item(); mon.insert(item); }}private int produce_item(a){... }// Production code
  }
  
  static class consumer extends Thread {
    public void run( ) {							// Run contains thread code
   		int item;
      while(true){ item = mon.remove(); consume_item(item); }}private int produce_item(a){... }// Consume code
  }
  
  static class Our_monitor {						// This is the pipe process
    private int buffer[] = new int[N];
    private int count = 0,lo = 0,hi = 0;			                                        // Counter and index
    
    private synchronized void insert(int val){
      if(count == N){
        go_to_sleep();								// If the buffer is full, it goes to sleep
      }
			buffer[hi] = val;						// Inserts content into the buffer
      hi = (hi + 1) % N; 							// Until the next slot is found
      count = count + 1;							// The number in the buffer increases by 1
      if(count == 1){
        notify();									// If the consumer sleeps, wake up}}private synchronized void remove(int val){
      int val;
      if(count == 0){
        go_to_sleep();								// The buffer is empty and goes to sleep
      }
      val = buffer[lo];								// Fetch data from the buffer
      lo = (lo + 1) % N;							// Set the slot for the data item to be retrieved
      count = count - 1;							// The number of data items in the buffer is reduced by 1
      if(count = N - 1){
        notify();									// If the producer sleeps, wake it up
      }
      return val;
    }
    
    private void go_to_sleep(a) {
      try{
        wait( );
      }catch(Interr uptedExceptionexc) {}; }}}Copy the code

There are four main classes designed in the above code. The outer class ProducerConsumer creates and starts two threads, P and C. The second and third classes, Producer and Consumer, contain the Producer and Consumer codes, respectively. Finally, Our_monitor is a tube procedure that has two synchronous threads for inserting and fetching data into and out of a shared buffer.

In all the previous examples, the producer and consumer threads are functionally identical to them. The producer has an infinite loop that generates data and puts it into a public buffer; The consumer also has an equivalent infinite loop, which is used to pull data out of the buffer and do a series of jobs.

One of the more interesting aspects of the program is Our_monitor, which contains buffers, administrative variables, and two synchronization methods. When a producer is active within insert, it ensures that the consumer cannot run in the remove method, thus ensuring the security of updated variables and buffers without worrying about race conditions. The variable count records the amount of data in the buffer. The variable LO is the serial number of the buffer slot, indicating the next data item to be fetched. Similarly, hi is the sequence number of the next data item to be placed in the buffer. Allow LO = hi, meaning there are 0 or N data in the buffer.

The synchronization approach in Java differs substantially from other classical pipe procedures: Java has no built-in condition variables. However, Java provides the equivalent of wait and notify to sleep and wakeup, respectively.

With the automatic mutual exclusion of critical sections, it is easier for a tube to ensure the correctness of parallel programming than a semaphore. But a pipe has its drawbacks. We mentioned earlier that a pipe is a programming language, and the compiler must recognize a pipe and guarantee its mutual exclusion in some way. C, Pascal, and most other programming languages have no pipe programs, so you can’t rely on the compiler to follow the mutex rule.

Another problem related to pipe and semaphore is that these mechanisms are designed to solve the problem of mutual exclusion on one or more cpus accessing shared memory. You can avoid contention by placing semaphores in shared memory and protecting them with TSL or XCHG instructions. But in distributed systems, where there may be multiple cpus, each with its own private memory, connected over a network, these primitives will be invalidated. Because semaphores are too low-level and pipe procedures cannot be used outside of a few programming languages, other methods are needed.

The messaging

The other method mentioned above is messaage passing. This approach to interprocess communication uses two primitives, Send and receive, which act like semaphores rather than tubes, and are system calls rather than language levels. The sample is as follows

send(destination, &message);

receive(source, &message);
Copy the code

The send method is used to send a message to a given target, and receive receives a message from a given source. If there is no message, the recipient may be blocked until it accepts a message or returns with an error code.

Design essentials of messaging system

Messaging systems now face many problems and design difficulties that are not covered by semaphores and pipes, especially those that communicate on different machines in the network. For example, messages may be lost by the network. To prevent message loss, senders and recipients can agree that the recipient will send a special acknowledgement message back as soon as the message is received. If the sender does not receive confirmation within a certain period of time, the message is resend.

Now consider the case where the message itself was received correctly, but the acknowledgement message returned to the sender was lost. The sender resends the message so that the recipient receives the same message twice.

It is important for the receiver to distinguish between a new message and an old message that has been resent. This problem is usually solved by embedding a sequential sequence number in each original message. If a recipient receives a message with the same sequence number as a previous message, it knows that the message is duplicate and can be ignored.

The messaging system must also deal with how to name processes so that they are clearly identified in send or receive calls. Authentication is also a problem, such as how the client knows it is communicating with a real file server, and information from sender to receiver can be tampered with by middlemen.

Solve producer-consumer problems with messaging

Now let’s consider how messaging can be used to solve the producer-consumer problem rather than shared caching. Here’s one solution

#define N 100								/* The number of slots in the buffer */

void producer(void){
  
  int item;
  message m;								/* The number of slots in the buffer */
  
  while(TRUE){
    item = produce_item();					/* Generates data to put into the buffer */
    receive(consumer,&m);					/* Wait for the consumer to send an empty buffer */
    build_message(&m,item);					/* Create a message to be sent */
    send(consumer,&m);						/* Send to the consumer */}}void consumer(void){
  
  int item,i;
  message m;
  
  for(int i = 0; i < N; i++){/* Loop N times */
    send(producer,&m);						/* Send N buffers */
  }
  while(TRUE){
    receive(producer,&m);					/* Accept a message containing data */
  	item = extract_item(&m);				/* Extract data from the message */
    send(producer,&m);						/* Sends the empty buffer back to the producer */
    consume_item(item);						/* Process data */}}Copy the code

It is assumed that all messages are of the same size and are automatically buffered by the operating system when outgoing messages have not been received. There are N messages used in this solution, which is similar to N slots in a shared memory buffer. The consumer first sends N empty messages to the producer. When a producer passes a data item to a consumer, it takes an empty message and returns a message populated with content. In this way, the total number of messages in the system remains the same, so messages can be stored in a predetermined amount of memory.

If the producer is faster than the consumer, all messages will eventually fill up, waiting for the consumer, and the producer will be blocked, waiting for an empty message to return. If the consumer is fast, the opposite is true: all messages are empty, waiting to be filled by the producer, and the consumer is blocked waiting for a filled message.

There are many variations on how messages are delivered, but here’s how messages are addressed.

One way is to assign a unique address to each process and have messages addressed at the process’s address.
Another approach is to introduce a new data structure calledEmail address (mailbox)Mailbox is a data structure used to buffer certain data. There are many ways to set messages in the mailbox. The typical method is to determine the number of messages when the mailbox is created. When using mailboxes, the address arguments in send and receive calls are the address of the mailbox, not the address of the process. When a process tries to send a message to a full mailbox, it will hang until a message is picked up from the mailbox, freeing up address space for new messages.

barrier

The last synchronization mechanism is intended for producer-consumer situations in groups of processes rather than between processes. In some applications, phases are divided and no process can proceed to the next phase unless all processes are ready to proceed to the next phase, which can be achieved by installing a barrier at the end of each phase. When a process reaches the barrier, it is blocked by the barrier until all the barriers are reached. A barrier can be used to synchronize a group of processes, as shown in the figure below

In the figure above, we can see that there are four processes approaching the barrier, which means that each process is performing operations but has not yet reached the end of each phase. After some time, processes A, B, and D all reach the barrier and their respective processes are suspended, but they cannot proceed to the next phase because process B has not finished executing. As a result, when the last C reaches the barrier, the process group can proceed to the next phase.

Avoid locking: read-copy-update

The fastest lock is no lock at all. The question is whether we allow access to concurrent reads and writes of shared data structures without locking. The answer, of course, is no. Suppose process A is sorting an array of numbers and process B is averaging it, and if you move A, B will read duplicate values many times, some of which are never encountered at all.

However, in some cases, we can allow write operations to update data structures, even if other processes are using them. The trick is to make sure that each read reads either the old version or the new version, such as the tree below

In the tree above, the read operation traverses the tree from root to leaf. To do this, we add a new node X and make it “exactly right” before it becomes visible in the tree: we initialize all the values in node X, including its child Pointers. X is then called A child of A by atomic write operations. None of the read operations will read inconsistent versions

In the figure above, we then remove B and D. First, the pointer to the left child of A points to C. All reads originally in A will be read to node C and never to node B or D. That is, they will only read the new version of the data. Similarly, all current reads in B and D continue to follow the original data structure pointer and read the old version of the data. Everything works correctly and we don’t need to lock anything. The main reason for removing B and D without locking data is ready-copy-update (RCU), which separates removal from redistribution in the Update process.

Article Reference:

Modern Operating Systems

Modern Operating System Forth Edition

www.encyclopedia.com/computing/n…

J00ru.vexillium.org/syscalls/nt…

www.bottomupcs.com/process_hie…

En.wikipedia.org/wiki/Runtim…

En.wikipedia.org/wiki/Execut…