0 foreword

In the old single-CPU era, a single task could only execute a single program at a point in time. Then came the multi-task stage, in which the computer could perform multiple tasks or processes in parallel at the same time. Although not in the true sense of the “same point in time”, but multiple tasks or processes share a CPU, and the operating system to complete the multi-task CPU switching, so that each task has a chance to get a certain time slice to run.

Later, it developed into multi-threading technology, which enables multiple threads to execute in parallel within a program. The execution of a thread can be thought of as a CPU executing the program. When a program runs in multiple threads, it is as if more than one CPU is executing the program at the same time.

Multithreading is more challenging than multitasking. Multithreading is executed in parallel within the same program, so concurrent reads and writes are performed on the same memory space. This is probably never a problem in a single-threaded program. Some of these errors are also unlikely to occur on a single-CPU machine, as the two threads are never really executed in parallel. However, more modern computers come with multi-core cpus, which means that different threads can be executed in true parallel by different CPU cores.

Therefore, in multi-threading, multi-task situations, thread context switch is necessary, but you should be familiar with the concepts in CPU architecture design, so that you can understand the principle of thread context switch.

1 Multi-core, multi-CPU, hyper-threading, multi-threading

1.1 Why Multicore

The first is to say that multi-core, multi-CPU, super-threading, these three are actually THE concept of CPU architecture design, a modern CPU in addition to the processor core also includes registers, L1L2 cache these storage devices, floating point operation unit, integer operation unit and other auxiliary operation equipment and internal bus. What’s the advantage of having a multicore CPU that means multiple processor cores on one CPU? For example, if you want to run a multithreaded program on a computer, because it is a thread in a process, you need to share some storage variables. If the computer has a single-core single-thread CPU, this means that the different threads of the program need to communicate frequently on the external bus between the cpus. At the same time, the problem of data inconsistency caused by different caches between different cpus is also dealt with. Therefore, in this scenario, the multi-core single-CPU architecture can play a great advantage, communication is all on the internal bus, sharing the same cache.

1.2 Why more cpus

Given the benefits of multiple cores, why more cpus? This actually very easy to think of, if you want to run any more programs (process), if there is only one CPU, means regular process context switching, because single even multi-core CPU, is more than one processor core, other equipment are common, so multiple processes are bound to regular process context switching, The price is high.

1.3 Why Hyperthreading

The concept is put forward by the Intel hyper-threading, simple is true on a CPU concurrent threads, two sounds seems unlikely, because the CPU is a time-sharing ah, but there are points, because the front also mentioned a CPU in addition to the processor core and other equipment, a piece of code execution process also not only is the only processor core work, If two threads, A and B, are using the processor core and B is using the cache or some other device, then both threads, AB, can execute concurrently, but if both threads are accessing the same device, then only one thread can execute after the other. This concurrency is achieved by adding a coordination assistant core to the CPU. According to Intel, such a device increases the area of the device by 5%, but improves performance by 15% to 30%.

1.4 Why Multithreading

This problem may be the interview to ask the most classic question, between multiple threads in a process can be Shared variables, communication overhead between threads is lesser, also can make better use of multi-core CPU performance, multi-core CPU run multithreaded programs tend to be faster than single thread, sometimes even on a single core CPU multi-threaded programs also have better performance, Because although multithreading has the overhead of context switch and thread creation and destruction, single-threaded programs will be blocked by IO and cannot make full use of CPU resources, coupled with the low context overhead of threads and a large number of thread pool applications, multithreading will be more efficient in many scenarios.

1.5 Threads and Processes

Process is the management unit of operating system, and thread is the management unit of process; A process contains at least one thread of execution. In both single-threaded and multithreaded threads, each thread has a program counter (to record the next instruction to execute), a set of registers (to hold the current thread’s work variables), and a stack (to record the execution history, in which each frame holds a procedure that was called but not returned). Although a thread parasitizes a process, it is a different concept from its own process and can be treated separately: the process is the basic unit of resources allocated by the system, and the thread is the basic unit of CPU scheduling.

A thread refers to a single sequential flow of control in a process. Multiple threads can be parallel in a process, each performing a different task in parallel. Each thread shares the heap space and has its own separate stack space.

  1. Thread partition scale is smaller than process, thread belongs to a process;
  2. Process is the basic unit of CPU, memory and other resources, threads can not occupy these resources independently.
  3. The processes are independent and difficult to communicate with each other, while threads share a memory area and communicate easily.
  4. The process in the execution process, including: fixed entry, execution sequence and exit, and these processes will be controlled by the application;

2 Context Switch

Supporting multitasking is one of the biggest leaps in THE history of CPU design. In computers, multitasking means running two or more programs simultaneously. From a user’s point of view, this doesn’t seem complicated or difficult to implement, but it’s a great leap forward in computer design. In a multitasking system, the CPU needs to handle all the programs’ operations, and when the user switches them back and forth, it needs to keep track of where those programs are going. Context switching is one such process that allows the CPU to record and restore the state of various running programs, enabling it to complete the switching operation.

Multitasking systems often require multiple jobs to be performed simultaneously. The number of jobs is often larger than the number of cpus on the machine, but a CPU can only perform one task at a time. How do you make the user feel that these tasks are being performed simultaneously? The designers of the operating system cleverly take advantage of the time-slice rotation, in which the CPU serves each task for a certain amount of time, then saves the state of the current task and continues to serve the next task after loading the state of the next task. The state of a task is saved and reloaded, a process called context switching. Time slice rotation makes it possible to execute multiple tasks on the same CPU.

2.1 Basic Concepts

Context switching (sometimes called process switching or task switching) is when the CPU switches from one process or thread to another.

  1. A process (sometimes called a task) is an instance of a program running.
  2. In Linux, threads are lightweight processes that can run in parallel and share the same address space (an area of memory) and other resources with their parent process (the process that created them).
  3. Context refers to the contents of CPU registers and program counters at a point in time.
  4. Registers are the small but fast memory inside the CPU (as opposed to the relatively slow RAM main memory outside the CPU). Registers increase the speed of computer programs by providing fast access to commonly used values, usually intermediate values for operations.
  5. A program counter is a special register that indicates where in an instruction sequence the CPU is executing, either at the current instruction or at the next instruction to be executed, depending on the particular system.

Context switching can be thought of as the kernel (the core of the operating system) performing the following activities on the CPU for processes (including threads) :

  1. Suspend a process and store its state (context) in the CPU somewhere in memory.
  2. Restore a process by retrieving the context of the next process in memory and restoring it in a CPU register;
  3. Jump to the point pointed to by the program counter (that is, jump to the line of code where the process was interrupted) to resume the process.

2.2 Switching Types

Context switching has different meanings in different situations, as listed in the following table:

Context switching type describe
thread A switch between two threads in the same process
Process switching A switch between two processes
The mode switch Switch between user mode and kernel mode in a given thread
Address space switching Switch virtual memory to physical memory

2.3 Switchover Procedure

During a context switch, the CPU stops processing the currently running program and saves the exact location of the current program for later execution. From this point of view, context switching is a bit like reading several books at the same time. We need to remember the current page number of each book as we switch back and forth. In a program, the “page number” information during the context switch is stored in the PROCESS control block (PCB). A PCB is also often referred to as a switchframe. Page numbers are stored in the CPU’s memory until they are used again.

PCB is usually a system memory footprint area of a continuous storage area, storage of the operating system is used to describe the process condition and all the information necessary to control the operation of process, it makes a in a multiprogramming environment can’t be a stand-alone program can run independently of the basic unit or a process that can execute concurrently with other processes.

  1. Save the state of process A (registers and operating system data);
  2. Update the information in PCB and make corresponding changes to the “running state” of process A;
  3. Put the PCB of process A into the queue of relevant state;
  4. Change the PCB information of process B to Running state, and execute process B.
  5. After the execution, B takes out the PCB of process A from the queue, restores the context when process A is switched, and continues to execute A.

The steps for thread switching and process switching are also different. The process context switch is divided into two steps:

  1. Switch the page directory to use the new address space;
  2. Switch kernel stack and hardware context;

For Linux, the biggest difference between threads and processes is the address space. For thread switches, step 1 is not required, and step 2 is required for both process and thread switches. So obviously, process switching costs a lot. One of the main differences between thread context switching and process context switching is that the virtual memory space for thread switching is still the same, but the process switching is different. Both of these context switches are handled through the operating system kernel. The most significant performance cost associated with this switching process is switching out the contents of registers.

For an executing process, program counters, registers, and current values of variables are stored in THE CPU registers, and these registers can only be used by the process that is using the CPU. First of all, the data of the previous process must be saved (so that the next time the CPU usage is obtained, the sequential execution can continue from the last break, instead of returning to the beginning of the process, otherwise, the tasks processed by the process when the CPU is regained are the last repeat, and may never reach the end of the process. Since it is almost impossible for a process to complete all the tasks before releasing the CPU), it can then load the data of the process that acquired the CPU into the CPU register and continue to execute the remaining tasks from the last breakpoint.

To facilitate the management of internal system processes, the operating system creates a process table entry for each process:

2.4 Viewing Switchover

In Linux, you can run the vmstat command to check the number of context switches. The following is an example of checking the number of context switches using vmstat:

Vmstat 1 is counted once per second, where the CS column is the number of context switches. Typically, context switches on idle systems are under 1500 per second.

3 Switchover Cause

There are three main reasons for thread context switch:

  1. Interrupt handling: In interrupt handling, another program “interrupts” the currently running program. When the CPU receives an interrupt request, a context switch is performed between the running program and the program that initiated the interrupt request. Interrupts are classified into hardware interrupts and software interrupts. Software interrupts include threads that are suspended due to I/O blocking, resource failure, or user code failure.
  2. Multitasking: In multitasking, the CPU switches back and forth between different programs, each with a corresponding slice of processing time, and the CPU switches context between the two slices.
  3. User mode switch: For some operating systems, a context switch is also performed when user mode is switched, although this is not required.

For the preemptive operating systems we often use, thread context switching can be caused by a number of things:

  1. After the time slice of the current task is used up, the system CPU schedules the next task.
  2. If the current task encounters I/O blocking, the scheduler suspends the task and continues to the next task.
  3. Multiple tasks preempt lock resources. The current task does not grab lock resources and is suspended by the scheduler to continue the next task.
  4. User code suspends the current task, freeing up CPU time.
  5. Hardware interrupt;

4 Switching loss

Context switching costs both direct and indirect factors that affect program performance.

  1. Direct consumption: refers to the CPU registers need to save and load, the system scheduler code needs to execute, TLB instance needs to reload, CPU pipeline needs to brush off;
  2. Indirect consumption: refers to the sharing of data between multiple caches. The impact of indirect consumption on the application depends on the size of data operated by the thread workspace.

5 Reduce switching

Since context switches cause additional overhead, reducing the number of context switches can improve the performance of multithreaded programs. However, there are two types of context switching:

  1. Concessional context switch: it means that the executing thread releases CPU actively, which is proportional to the severity of lock competition and can be avoided by reducing lock competition.
  2. Preemptive context switch: A thread is forced to abandon the CPU or is preempted by another thread with a higher priority because the allocated time slice is used up. Usually, the number of threads is greater than the number of available CPU cores. You can adjust the number of threads to reduce the number of threads to avoid this problem.

So, the ways to reduce context switching are lockless concurrent programming, CAS algorithms, using minimal threads, and using coroutines.

  1. Lockless concurrency: When multi-threads compete, context switch will be caused, so when multi-threads process data, you can use some methods to avoid using locks, such as dividing the ID of the data according to the Hash module, different threads process different segments of data;
  2. CAS algorithm: Java Atomic packages use CAS algorithms to update data without locking;
  3. Minimum threads: Avoid creating unnecessary threads, such as creating a large number of threads to process a small number of tasks, which may result in a large number of threads waiting.
  4. The use of coroutines: in a single thread to achieve the scheduling of multiple tasks, and in a single thread to maintain the switch between multiple tasks;

6 Number of threads

The key points of reasonable setting of the number of threads are: 1. Minimize the overhead of thread switching and management; 2. 2. Maximize CPU utilization;

For 1, the number of threads should be as small as possible to reduce the cost of thread switching and management.

For 2, as many threads are required to ensure maximum utilization of CPU resources.

Therefore, in the case of short task time, it is required to have as few threads as possible. If there are too many threads, the time of thread switching and management may be longer than the time of task execution, and the efficiency will be low.

For time-consuming tasks, they should be classified into CPU tasks or IO tasks. For CPU tasks, the number of threads should not be too large. But for IO type tasks, it is better to have more threads to make better use of the CPU.

In the case of high concurrency and low consumption, it is recommended to have fewer threads, as long as the concurrency is satisfied, because the context switch is already many, and high concurrency means that the CPU is in a busy state, adding more threads will not let the thread get the execution time slice, but will increase the overhead of thread switch; For example, if the concurrency is 100, the thread pool might be set to 10.

Low concurrency and high time consuming: multi-threading is recommended to ensure that there are idle threads to accept new tasks; For example, if concurrency is 10, the thread pool might be set to 20;

High concurrency and high time consuming: 1. Analyze the task type; 2. Increase queuing; 3. Increase the number of threads.