Anything that can go wrong will go wrong. — Murphy’s law


preface

From the previous article, we learned that a process is an operating system abstraction of a running program. In modern operating systems, processes often need to communicate with other processes. We call this interprocess communication problem. It is also called Inter Process Communication (IPC). IPC mainly solves the following three problems:

  • How does one process pass information to another
  • There is no crossover between multiple processes in the same task, that is, multiple processes compete for the same resource
  • A problem of execution sequence between multiple interrelated processes, a classic producer-consumer problem

The last two of the three questions above apply equally to threads. So the solutions to the latter two IPC problems in this article also apply to threads. Therefore, although the title of this paper is called interprocess communication problem, the scheme and ideas introduced in it also apply to interthread communication problem. Here are some key terms for process/thread communication covered in this article:

  • Competitive conditions
  • The Shared memory
  • Mutual exclusion (exclusive)
  • A critical region
  • Busy waiting
  • spinlocks
  • The mutex
  • Condition variables,
  • Priority inversion

Competitive conditions

Shared data in an operating system usually includes shared memory, shared files, and shared hardware and software resources, all of which are called shared data. In real software development, the shared data we encounter is usually shared memory.

Two or more processes read and write some shared data, and the final result depends on the precise order in which the processes run, known as the race condition. The crux of the race condition is that process B starts using the same shared data before process A finishes using the shared data. Debugging programs with race conditions is a hassle, and they work well most of the time, with only a few cases where unexplained phenomena occur. Consider race conditions when encountering similar problems (the same is true for multithreading). In addition, the true parallelism of multicore growth makes competitive conditions more common.

The mutex

Now that we know why competition conditions occur, how do we avoid competition conditions? To avoid this mistake, it is crucial to find some way to organize multiple processes to simultaneously read and write shared data. In other words, what we need is mutual exclusion. The nature of mutual exclusion is exclusivity, which ensures that when one process uses a shared variable or other shared data, other processes cannot operate at the same time. Choosing the appropriate primitive to implement mutex is one of the major aspects of any operating system.

A critical region

In operating systems, we refer to program segments that access shared memory as critical sections. A critical section is a piece of code that accesses shared data, not a region of memory. So when writing critical section code, you need to be very careful.

Avoid competition conditions

We already know about shared memory, race conditions, critical sections. You also learned that race conditions can be avoided by mutual exclusion between processes. Therefore, it can be concluded that any two processes are not in their critical region at the same time can effectively avoid the competition condition. But this is only a necessary condition, and a good solution to avoid competition conditions meets the following four conditions:

  • No two processes can be in their critical sections at the same time
  • Make no assumptions about the speed or number of cpus
  • Processes running outside a critical region cannot block other processes
  • A process cannot wait indefinitely to enter a critical section

From an abstract point of view, the ideal process behavior is shown in the figure below. Process B cannot enter its own critical region until process A leaves and then releases the shared memory.

A critical region

(Busy waiting) Mutually exclusive

Busy waiting refers to testing a variable continuously until a value appears. Essentially, when a process wants to enter a critical section, it checks to see if it is allowed to enter. If not, the process waits in place until it is allowed. This approach wastes CPU time and can have the unintended consequence of priority inversion.

Shielding interrupt

Interrupt masking is a mutually exclusive scheme for busy processes. The idea is that each process blocks all interrupts as soon as it enters the critical zone and opens interrupts again before leaving the critical zone. When interrupts are masked, clock interrupts are also masked, so no process switching occurs. This is because the CPU will switch processes only when a clock interrupt or other interrupt occurs. In this way, once a process has entered the critical region, it is safe to modify the shared memory without worrying about other processes entering before leaving the critical region.

The drawbacks of the scheme are also obvious. Disadvantages:

  • It is unwise to give the user process the power to mask interrupts.
    • On the one hand, if one process blocks interrupts and does not turn interrupts on again, other processes cannot obtain CPU and encounter malicious programs, the whole system may terminate.
  • Masking interrupts do not resolve simultaneous access to shared memory by multiple processors or cores.
    • Masking interrupts on a multiprocessor system only works on the CPU that executes the disable instruction, and other cpus continue to run and have access to shared memory.
    • In a multi-core system (such as a multi-processor system), masking interrupts from one CPU core does not prevent other cpus from interfering with the operations of the first CPU. Therefore, masking interrupts is not practical on multi-core cpus.

In summary, masking interrupts is a useful technique for the operating system itself, but is not a suitable, universal mutual-exclusion mechanism for user processes. On the one hand, masking interrupts should not be handed over to user processes, and on the other hand, masking interrupts cannot solve the problem of multi-core cpus accessing shared memory.

Lock variable

Since masking interrupts cannot solve the race condition problem. Orway’s second approach is to find a software solution. This method sets a shared (lock) variable. Its initial value is 0, indicating that no process has entered the critical section. When a process wants to enter a critical region, it first tests the lock. If the lock has a value of 0, the process sets it to 1 and enters the critical section. If the lock has a value of 1, the process waits until the lock variable becomes 0.

This method doesn’t help because it introduces a new shared memory (lock variable). There is still the problem of both processes setting lock variables at the same time. Suppose A process A reads the lock variable concurrency with A value of 0, and then A clock interrupt occurs just before the lock variable is set to 1. The clock interrupt causes another process B to be scheduled, and the current process A is suspended. Process B also reads the lock variable and finds that it is also 0, so it sets the lock variable to 1. When process A runs again, it picks up where it left off — setting the lock variable to 1 and entering the critical section. At this point, both processes A and B have entered A critical section, that is, both processes have accessed shared memory at the same time.

Strict rotation method

As we described above, testing a variable continuously until a value appears is called busy waiting. Therefore, testing a different value (usually a while loop) is a waste of CPU time and should generally be avoided. We should only use busy wait if we have reason to believe that the wait time is very short. A spin lock is a busy waiting lock. Therefore, its performance can be imagined.

The idea behind the strict rotation method is to assume that two processes, process 0 and process 1, have an integer lock variable turn, whose initial value is 0. Process 0 can enter the critical section when the lock variable is 0, and set the lock variable to 1 before exiting the critical section. Process 1 can enter the critical section when the lock variable is 1, and set the lock variable to 0 before exiting the critical section. The strict rotation method can solve the competition condition problem, but the process is blocked by the process outside the critical region. Execution depends on the completion of processes outside the critical section. This violates principle 3 above: “Processes running outside of a critical region cannot block other processes.” This is not a good approach. The specific reasons are:

  • First, the initial value of TURN is 0, and process 0 tries to enter the critical region. The test turn is 0, meeting the conditions for entering the critical region, and process 0 enters the critical region
  • Then, process 0 leaves the critical section and sets turn to 1, at which point only process 1 can enter the critical section
  • Finally, process 0 finishes the operation on the non-critical section and returns to the beginning of the while loop. Since the value of TURN was set to 1 by process 0, process 0 cannot enter the critical section and has to wait for non-critical section process 1 to set TURN to 0 while process 1 is still busy working on the non-critical section. By doing so, process 0 is blocked by process 1. When process 0 can run again depends on when process 1 visits and leaves the critical section and sets TURN to 0.

Assuming that process 0 and process 1 are strictly rotated, the above problem does not exist. Strict rotation means that process 0 and process 1 alternately visit the critical section in turn, i.e., 0, 1, 0, 1… But this is clearly not realistic. The strict rotation order is:

  • Turn is 0, process 0 accesses the critical section, process 1 is busy waiting
  • Process 0 leaves the critical section, and turn is set to 1 by process 0
  • If turn is 1, process 1 enters the critical section, and process 0 is busy waiting

If process 0 and process 1 are not rotated strictly, then process 0 will set TURN to 1 after leaving the critical region. To enter the critical region again, process 0 needs to wait for process access and set TURN to 0 after leaving the critical region. It is unpredictable if and when process 1 will access a critical section. This results in “processes running outside the critical region (process 1 in this case) cannot block other processes (process 0 that will access the critical region).”

Peterson solution

Peterson’s solution is also a software solution. The algorithm consists of two ANSI C written procedures (functions). ANSI is short for American National Standards Institute. ANSI C specifies that the C language provides function prototypes, or declarations, for defined functions. Here are two C functions. How it works:

  • Turn indicates which process’s turn is to access the critical section
  • If other processes in the Intersted array are TRUE, wait busy, otherwise critical sections can be entered
  • If two processes call enter_region almost simultaneously, they store their own process number into the turn, but only the process number assigned to the turn is valid. Let’s say I write turn after process 1. When the while statement is run, turn is 1, and their processes are 0 and 1, respectively. This is the condition for process 1’s while statementturn === process && interested[other] == TRUEIs true, so process 1 is busy waiting. While the turn and process statements of process 0 are not equal, process 0 loops 0 (without waiting) directly into the critical section.

The place of clever

The trick of this algorithm is to use two variables to achieve mutually exclusive operation of the process. Two variables cooperate to complete the mutually exclusive operation, one is indispensable. Turn is an integer variable that represents the number of the process currently accessing the critical section. Interested is an array variable whose subscript is the process number, so that multiple processes can operate on the same element at the same time. This is indeed a clever point. If multiple processes operate on turn at the same time, it does not matter, and the value of the index corresponding to interested will be referred to later.

TSL instruction

The TSL instruction is a scheme that requires hardware support. TSL is called test and set lock. He reads the memory word LOCK into register RX. A non-zero value is then stored at the memory address. The word reading and writing operations are inseparable, that is, no other processor is allowed to access the memory word until the instruction ends. The CPU executing the TSL instruction locks the Central Line of memory to prevent other cpus from accessing memory until the end of this instruction. The TSL instruction solves the problem that multiple processors cannot be shielded from accessing shared memory in the busy wait masking interrupt scheme. Because locking the memory bus is different from masking interrupts, no processor can access memory words through the memory bus. Those multiprocessor computers all have TSL instructions. As follows:

TSL RX, LOCK
Copy the code

The following two figures are the entry and exit critical regions realized by TSL and XCHG instructions respectively:

Sleep and wake are mutually exclusive

The basic principle of busy wait exclusion is as follows: Before a process enters a critical area, it checks whether it is allowed to enter, that is, whether other processes are in the critical area. If the process is not allowed to enter the critical area, it waits in place and performs different tests until it enters the critical area. The disadvantages of busy waiting are also obvious: busy waiting wastes CPU time because the process is constantly checking to see if it can enter a critical section.

On the other hand, busy waiting also introduces priority inversion problems. Priority inversion is a phenomenon in which a high-priority process (thread, task) is blocked by a low-priority process (thread, task).

For example, there are two processes H and L. The H process has a high priority and the L process has a low priority. The scheduling rule states that the H process can run as long as it is in a ready state. But if at some point L is in the critical region, H suddenly changes from a blocked state to a ready state, so the scheduler is ready to run H. Because L will not be scheduled when H is ready, L cannot leave the critical region. L did not leave the critical region, and H had to wait. So the result is that H is always busy waiting and L is always waiting to be scheduled. A priority inversion phenomenon similar to a “deadlock” occurs.

Because of the CPU performance issues of busy wait, you need to consider a non-busy wait approach to avoid race conditions. That is, to block a process if it cannot enter a critical section instead of waiting. Sleep and wake up are implemented in this way.

Producer-consumer problem

The producer-consumer problem is also known as bounded buffer problem. It usually solves the problem of multiple processes/threads working together. Take two processes, one producer and one consumer. They share a common fixed-size buffer. The producer is responsible for producing the data and putting it into the buffer, and the consumer is responsible for reading the data from the buffer and consuming it. Of course, the number of producers and consumers may not be one, or we can generalize the problem to m producers and N consumers.

The problem is when the buffer is filled with data produced by the producer and the producer wants to put new data into the buffer. The solution is to let the producer sleep, and when the consumer retrieves a piece of data from the buffer (when the buffer has free space), it wakes the producer so that the producer can continue producing the data. Similarly, if the consumer takes data from the buffer and finds that the number of data in the buffer is 0, the consumer goes to sleep, and when the producer finds that the number of data in the buffer has increased to 1, the producer wakes the consumer again.

But there are competitive conditions. To keep track of the number of buffers, we need an integer variable count to record the amount of data in the buffer and to set a buffer’s group size N. When count reaches N, let the producer sleep; When the count reaches zero, the consumer is put to sleep. When count drops from N to n-1, the consumer wakes up the producer; When count changes from 0 to 1, the producer wakes up the consumer.

Why do competitive conditions exist? The essence is that there is a missing signal sent to a process/thread that has not yet slept. Because we don’t restrict access to count. A situation may occur where the buffer is empty, that is, count = 0, the consumer reads count == 0, the consumer is ready to sleep, but before the consumer sleeps the scheduler decides to pause the consumer, and the consumer is suspended but not logically asleep. The producer produces a piece of data and puts it into the shared buffer. At this time, the count changes from 0 to 1, so the producer thinks that the consumer has just read the count as 0, and the consumer must be sleeping at this time, so the producer calls weakUp to wake up the consumer. But because the consumer is logically not asleep but suspended by the scheduler, the signal is lost. When the consumer is scheduled and running the next time, it determines that the previously read count is 0, that the shared buffer is empty (when in fact there is already data in the buffer), and that the consumer goes to sleep. In this case, the consumer is not woken up, the producer keeps producing data until the buffer is filled, and eventually the producer goes to sleep. Both of these processes will sleep forever.

# define N 100 int count = 0; void producer(void) { int item; while(TRUE) { item = produce_item(); if (count == N) { sleep(); } insert_item(item); count += 1; if (count == 1) { weakUp(cousumer); } } } void cousumer(void) { int item; while(TRUE) { if (count == 0) { sleep(); } item = remove_item(); count -= 1; if (count = N -1) ( weakUp(producer); ) cousume_item(item); }}Copy the code

A semaphore

As we know above, the essential reason of competition conditions for producer-consumer problems is that weakUp sent by producers to consumers is lost. Missing weakUp problems can be solved with semaphores. A semaphore is an integer variable that counts the number of times a process/thread is awakened for future use. If the semaphore is greater than 0, there will be one or more wake up operations. A semaphore of 0 means no wake up operation. I’ve written about iOS semaphores before, and underlying that is the operating system semaphores mechanism.

A semaphore is an integer variable, so it has two operations: down and up.

  • Down: This operation decrement the semaphore by one. A down operation on a signal checks to see if its value is greater than 0, and if it is greater than 0, the value is reduced by one, and the process/thread continues to execute its task. If the value is 0, it is not subtracted by one and the process/thread is put to sleep. Wait for a semaphore value greater than 0.
  • Up: This operation adds one to the semaphore. Performing the up operation on a signal increments the semaphore value by one. After incrementing by one, the semaphore value must be greater than 0 if there are processes (one or more) sleeping on the semaphore (i.e., processes/threads waiting on the semaphore). The system selects a process/thread waiting for the semaphore. The process/thread waiting for the semaphore will then perform down and the semaphore will be decrement by one. So, for the semaphore on which the process is sleeping (or waiting), an up operation on the semaphore is immediately followed by a Down operation, with no change in the semaphore value, but with one less process sleeping on the semaphore. Because when the semaphore changes from 0 to 1, the system will select a waiting process to acquire the semaphore. The process that obtains the semaphore will decrease the semaphore by one, and then the process will stop sleeping and continue to execute.

Solve producer-consumer problems with semaphores

Solve producer-consumer problems with semaphores. The operating system temporarily shields all interrupts by testing the semaphore, updating the semaphore, and putting the process to sleep when needed (when the semaphore is 0).

The solution uses three semaphores:

  • Full: Used to record the filled buffer slot, that is, the number of data items cached. The initial value is 0 because there are no production data items.
  • Empty: Records the empty buffer slot, the number of free cao’s in the buffer. The initial value is the number of buffer slots in the buffer because there are no production data items.
  • Mutex: A mutex that ensures that producers and consumers do not access the buffer at the same time. The initial value is 1. A mutex is a variable in one of two states, so it is also called a binary semaphore.
# define N 100 typedef int semaphore; semaphore mutex = 1; semaphore empty = N; semaphore full = 0; void producer(void) { int item; while (TRUE) { item = produce_item(); down(&empty); down(&mutex); insert_item(item); up(&mutex); up(&full); } } void cousumer(void) { int item; while (TRUE) { down(&full); down(&mutex); item = remove_item(); up(&mutex); up(&empty); consume_item(item); }}Copy the code

The semaphore empty/full is used to guarantee the order in which certain events occur or not occur. In the above example, the semaphore ensures that the producer stops running when the buffer is full and the consumer stops running when the buffer is empty.

Another use of semaphores is to solve the problem of synchronous process/thread execution. For example, sequential execution of asynchronous network requests can use semaphores. You can refer to the author’s previous iOS semaphores to achieve synchronous execution of asynchronous tasks.

The mutex

A mutex can be thought of as a simplified version of a semaphore. If you don’t need the counting power of a semaphore, you can use mutex. A mutex is an integer variable in one of two states, so it is also called a binary semaphore. Mutex is useful only for managing shared resources or small pieces of code, and can be useful in allowing or blocking access to critical sections. That is, it is usually used to solve the problem of multi-thread competition condition, allowing only one thread to access the critical area at the same time, so that multiple threads can access the critical area synchronously and sequentially. Because they are simple and efficient to implement, mutex can be very useful when implementing user-space threading packages.

Like a semaphore, a mutex is an integer variable that has two states: unlocked and locked. So a mutex needs only one binary bit to be represented. 0 indicates unlock, non-zero indicates lock. Mutexes have two procedures (functions) : mutex_lock and mutex_unlock. Mutex_lock is used to lock and mutex_unlock is used to unlock.

How mutex works: When a thread accesses a critical section, mutex_lock is called first. If the mutex is currently unlocked (0), no other threads are currently in the critical section and the critical section is available. The calling thread locks and enters the critical section. Conversely, if the mutex is currently locked (non-zero), a thread is currently in a critical region and the critical region is not available. The calling thread is blocked until the thread in the critical section completes and calls mutex_unlock. If multiple threads are blocked on the mutex (that is, multiple threads are accessing the same critical region), a thread is randomly selected and allowed to acquire the lock.

Because mutex is very simple, it can be easily implemented in user space by operating systems with TSL or XCHG instructions available. As follows:

mutex_lock:
    TSL REGISTER, MUTEX
    CMP REGISTER, #0
    JZE ok
    CALL thread_yield
    JMP mutex_lock
ok: RET

mutex_unlock:
    MOVE MUTEX, #0
    RET

Copy the code

Mutex in pthreads

Pthread provides a number of functions that you can use to synchronize threads. As follows:

Thread calls describe
pthread_mutex_init Create a mutex
pthread_mutex_destroy Destroys a mutex
pthread_mutex_lock Get a lock or block
pthread_mutex_unlock Release a lock
pthread_mutex_trylock Obtaining a lock or failing

Mutex are useful in allowing or blocking access to critical sections. A condition variable allows a thread to block due to some condition that has not been reached. A mutex based implementation is a mutex. The conditional variable based implementation is conditional locking. Condition variables and mutex are often used together: a thread locks a mutex to perform exclusive operations on a critical region (a shared buffer) without interference from other threads. The thread then waits on a condition variable when no other result is available until another thread sends it a signal to continue execution.

Tube side

After the introduction of semaphores and mutex, the problem of competition conditions for interprocess communication is solved. But that doesn’t make inter-process communication any easier, and developers need to be careful about locking and unlocking. Considering the order in which locks are placed, the complexity of programming increases dramatically, and it is possible to write the wrong code, which can lead to serious consequences, such as deadlocks.

To make it easier to write correct programs, Brinch Hansen(1973) proposed a high-level synchronization primitive called a pipe program. A pipe is a collection of procedures, variables, data structures, etc. It sounds like a class object in object-oriented programming.

A process can call a procedure in a pipe whenever it needs to, but it cannot directly access a data structure in a pipe during a declared procedure outside the pipe. In other words, a process can only call procedures in a tube, not access variables in a tube.

A pipe is not a system call. It is a concept within and part of a programming language. It exists only in some programming languages, and the C language does not support pipe procedures.

The messaging

Semaphores and routines are effective for solving the problem of multi-process/multi-thread access to shared memory (critical region) or mutual exclusion on multiple cpus. But in a distributed system, where there are multiple cpus and each CPU has its own private memory, they communicate over a local area network, there is no shared memory, and the above primitives such as semaphores and pipe procedures will not help. The reason is that semaphores, pipe procedures, and other primitives cannot provide information exchange between machines. An additional primitive for interprocess communication is also needed. This is message Passing.

Message passing This type of interprocess communication uses two communication primitives: send and receive. They are system calls like up and down semaphores, not language components like pipe procedures. So, we can easily add them to the system library. Such as:

  1. send(destination, &message)
  2. receive(source, &message)

These two procedures represent sending a message to the target process and receiving a message from the source process, respectively.

barrier

Semaphores, mutex, and pipe routines are used to synchronize mutex between processes and threads. The synchronous mutex mechanism of messaging was also introduced. Barriers, described below, are also a process/thread synchronization mechanism. But it usually applies to a group of processes/threads. To put it simply: Suppose there is a group of four processes that form a related group of processes. When one process reaches the barrier, it is suspended until all other processes in the group reach the barrier. As shown below:

In multithreading, barriers are also widely used. For example, there are three asynchronous network requests from sub-threads, and we need all three network requests to be returned before we can execute the next task. In this case, barriers can be used.

QA

Why is the operating system written in C? The bottom layer of the operating system is written in C or C-like language (C++). Languages such as Java, Modula3 and Pascal are rarely used. Because C is a powerful, efficient, predictable and feature-rich language. With Java, it is unpredictable and may run out of memory at a critical moment. Its garbage collection mechanism may also call the garbage collector to reclaim memory at the wrong time. The author summarizes the reasons why C language is used as the official language of the operating system, mainly from the aspects of cross-platform, operation efficiency, memory management and so on:

  • A language with high maturity.

    • The operating system must be developed in a mature language. The logic behind a mature language is as follows: 1. The language itself is relatively perfect and conforms to certain international standards. 2. The existing tool library is rich. 3. There are enough active people.
  • Preferably an open source language.

    • It is better to have an open source language, so there are no copyright issues for the vendor. Open source languages, on the other hand, have more community activists and contributors, and the maturity of the language and the community itself is high.
  • Must be a cross-platform language.

    • Some operating system kernels need to be portable. They can be ported to Linux, Unix, macOS, Windows, etc. A cross-platform language is preferred.
  • Must be a compiled language. Code compiles before running.

    • On the one hand, you can pre-check for syntax errors and do all sorts of checks during compilation, rather than reporting errors at explain time, which is disastrous for the operating system.
    • On the other hand, runtime efficiency can be ensured. Compiled languages use a compiler to compile into lower-level assembly or machine code, which can be run directly without runtime interpretation, and run more efficiently than interpreted languages.
  • Memory management should be clear and clear.

    • Operating system memory management should be very strict. It usually requires the developer writing the operating system to manually open/free memory. For languages with garbage collection mechanisms, such as Java, this is obviously not possible. Because you leave memory reclamation to the Java language, memory can be full before it is reclaimed, or it can be reclaimed before it is used. So it makes sense to leave memory management to the developers.
  • Strike a balance between development time and software efficiency.

    • Some people say that the use of C language development system than other languages trouble, but we can not ignore the C language running and memory management and other aspects of the high efficiency. On the one hand, the operating system belongs to the underlying code, and its running efficiency and stability are very important. On the other hand, the operating system effect for the upper software code iteration frequency is low, the operating system kernel for several years or even more than ten years have little change, so the so-called use of C language development trouble, in the operating system is only “trouble once”.