Background of 0.

Group in recruit recently, the HH thief likes to ask somebody else “process and thread have what difference “, THE HH ask somebody else calculate, he still want to run to ask me, I open mouth is the eight part essay process is the smallest unit of resource allocation, the thread is the smallest unit of CPU scheduling balla balla balla, the HH smiled and shook his head, said my skill is not enough, gave me the whole Mongolia.

For the sake of argument, we also need to clarify the process and thread, at least the next time we can talk about it. This post will answer the question of processes and threads from a resource point of view, but more complex scheduling/parallelism/scenarios will be covered later.

1. Von Neumann structure

Von Neumann Architecture (Von Neumann Architecture) is a computer design concept that combines program instruction memory and data memory.

The main characteristics of the computer architecture of von Neumann structure are:

  1. Centered on the operation unit
  2. The principle of stored program is adopted
  3. Memory is linearly addressed space accessed by address
  4. The control flow is generated by the instruction flow
  5. The instruction consists of opcodes and address codes
  6. Data is encoded in binary

Separating the CPU from storage is not perfect and leads to what is known as a von Neumann bottleneck: In traffic between the CPU and memory (data transfer rate) compared with the capacity of the memory is quite small, in the modern computer, flow rate and the working efficiency of the CPU in contrast is very small, in some cases (when the CPU needs to run on huge data some simple instructions), data traffic is overall efficiency very serious limitations.

The CPU will be idle during data input or output memory. Because CPU speeds are much faster than memory read and write speeds, bottlenecks are getting worse.

2. Check the CPU /cache/ memory speed

(1) How fast the CPU is

CPU frequency concept:

In simple terms

Is the number of CPU clock pulses per second. The clock is the smallest unit in which the CPU executes instructions. The more pulses a clock has per second, the more instructions the CPU can execute.

Detailed said

In the COMPLEX digital system of CPU, in order to ensure that all internal hardware units can work together quickly, CPU architecture engineers often design a set of clock signals to operate synchronously with the system. Clock signal is composed of a series of pulse signals, and always according to a certain voltage amplitude, time interval continuous square wave signal, it periodically between 0 and 1 reciprocating change.

The time interval between the first pulse and the second pulse is called the period and is expressed in seconds (s). But the number of pulses per second is called frequency, and the basic unit of frequency is Hertz, Hz.

Clock frequency (f) and period (T) are reciprocal: f=1/T

The formula shows that frequency is the number of times a clock repeats itself in a second, and today’s cpus are generally in the GHz range, which means they generate 1 billion pulses per second.

For example

Take the Intel Core I3-8350K as an example. Its silent frequency is 4GHz, which means its internal clock frequency is 4GHz. It can generate 4 billion pulses per second, in other words, each pulse takes only 0.25ns (clock cycle). This is how shocking the clock, can imagine the CPU internal structure is how exquisite, can deal with such a short signal, the whole system coordinated and orderly operation, so it will say that the CPU is the crystallization of the wisdom of mankind, greatly promoted our scientific and technological progress.

The clock cycle is the minimum unit of time for CPU operations. All internal operations are based on this clock cycle. Generally speaking, THE CPU is based on the rising edge of the clock pulse as the benchmark of the execution of instructions, the higher the frequency, the more instructions CPU executes, the faster the working speed.

Take a closer look at how fast cpus are today (source: Wiki)

Computing speed: Computing speed is an important index to measure computer performance. Commonly referred to as computer operation speed (average operation speed), single-word length fixed-point instruction average execution speed MIPS(Million Instructions Per Second), the number of machine language Instructions processed Per Second. This is a measure of CPU speed. An Intel80386 computer, for example, can process 3 to 5 million machine-language instructions per second, so we can say the 80386 is a 3 to 5MIPS CPU. MIPS is just a measure of CPU performance. It refers to the number of instructions that can be executed per second. Generally, it is described as “million instructions per second”. Microcomputer generally uses the main frequency to describe the operation speed, the higher the main frequency, the faster the operation speed.

Millions of instructions per second (MIPS)

Processor / System Dhrystone MIPS / MIPS Year
Intel Core i7 4770K 133,740 MIPS at 3.9 GHz 2013
Intel Core i7 5960X 298,190 MIPS at 3.5 GHz 2014
Raspberry Pi 2 4,744 MIPS at 1.0 GHz 2014
Intel Core i7 6950X 320,440 MIPS at 3.5 GHz 2016
ARM Cortex A73 (4-core) 71,120 MIPS at 2.8 GHz 2016
AMD Ryzen 7 1800X 304,510 MIPS at 3.7 GHz 2017
Intel Core i7-8086K 221,720 MIPS at 5.0 GHz 2018
Intel Core i9-9900K 412,090 MIPS at 4.7ghz 2018
AMD Ryzen 9 3950X 749,070 MIPS at 4.6 GHz 2019
AMD Ryzen Threadripper 3990X MIPS at 4.35 GHz 2020

AMD’s latest thread ripper, 3990X, has achieved a MIPS of 2356230. Thanks to instruction level concurrency and instruction pipelinage, instruction execution speed has reached terrifying levels.

A simple calculation shows that each NS3990X can execute 235 instructions.

2356230*10^6/10^9 = 235/ns

(2) How fast is the cache speed

What is the cache

Wiki: THE CPU Cache is a component used to reduce the average time it takes a processor to access memory. It is the second layer from the top down in a pyramid storage system, after the CPU register. It has far less capacity than memory, but speeds close to processor frequency. When the processor makes a memory access request, it first looks to see if there is any request data in the cache. If there is (hit), the data is returned without accessing memory; If not, the data in memory is loaded into the cache before being returned to the processor.

How fast the cache

Source: blinkfox. Making. IO / 2018/11/18 /…

From CPU to Approximately the CPU cycles required Approximate time required (ns)
register 1 cycle
L1 Cache ~3-4 cycles Ns – 0.5-1
L2 Cache ~10-20 cycles ~3-7 ns
L3 Cache ~40-45 cycles ~15 ns
Across the slot transmission ~20 ns
memory ~120-240 cycles ~60-120ns

Why is the cache so fast

  1. Material is good, do small, electrical signal run fast
  2. Three-level cache, tiered storage, to ensure hit ratio
  3. Caching strategies

(3) How fast the memory is

We all know that the average PC memory (DDR3 / DDR4) frequency is at the kilo-mhz level, which means that the memory read time is 10^3 orders of magnitude higher than the CPU running time.

(4) Summary

What I’m trying to say is pretty obvious

The facts:

  1. In today’s hardware environment, CPU processing speed is thousands of times faster than memory processing speed;
  2. Large amounts of data and instructions are stored in memory;

Corollary:

If the CPU performs a synchronous wait to fetch data from memory (or even from a slower external device/hard disk), it will cause the CPU to wait too long, wasting CPU performance and causing the overall computational bottleneck to occur in memory speed.

Direction:

  1. Speed up memory – is already in the pipeline, but cost must be considered, memory is by far the most cost-effective storage medium
  2. Add cache –> Already added cache, but memory reading still cannot be ignored
  3. Execute instructions concurrently, execute other instructions while waiting for data ———–>Processes and threads

3. Get down to business: processes and threads

(1) process

I. What is the process

WIKI

Wiki:zh.wikipedia.org/wiki/%E8%A1…

A process (English: process) is a program that is already running in a computer. Processes were once the basic operating unit of time-sharing systems. In process-oriented systems (such as early UNIX, Linux 2.4 and earlier), a process is the basic execution entity of a program; In thread-oriented systems (such as most modern operating systems, Linux 2.6 and later), the process itself is not the basic unit of operation, but the container for the thread.

Operating system virtualization/abstraction

The abstraction that the operating system provides for a running program is called a process. Then we can understand it as: process is not an abstract logical concept, but the generalization of the program running in the operating system, including the necessary operating system resources, is in the process of dynamic change.

II. Resources owned by the process

To put it simply, a process contains the following four resources, or four types of information that constitute a process:

  1. Code copy (binary executable machine code)
  2. A block of memory (containing executable code, heap, stack, call stack)
  3. System access (file access, memory access)
  4. Processor state

In detail:

A memory image of the executable machine code for that program. Allocated storage (usually a virtual storage area). The contents of the storage include executable code, process-specific data (inputs, outputs), call stacks, and stacks (used to hold data generated during run-time transportation). Operating system descriptors for resources assigned to the process, such as file descriptors (Unix terms) or file handles (Windows), data sources, and data terminals. Security features, such as the process owner and the permission set (permissible operations) of the process. Processor state (text), such as register contents, physical memory addressing, etc. The state is usually stored in registers while the process is running, and in other cases in memory.

(To add, in my personal understanding, the most critical things for a process are CPU computing resources and executable machine code, everything else is secondary. After all, the heart of a computer is executing machine code. Concepts including permissions/memory/state were introduced to extend machine efficiency and reuse. If you think about the earliest machines, people were all performing logic together with the machine. After all, the essence of a computer is to use computing units + computing logic.)

III. Process status

The state of a process depends on two key factors, one is whether it is scheduled by CUP and the other is whether it is blocked.

  1. The process that is consuming the CPU is in the running state. The running state is when the process’s instructions are executed by the CPU.
  2. When he needs to initiate IO (such as reading a disk), the state changes to block, which waits for other events to complete.
  3. Finally, when the IO is finished, you can queue up to the CPU and enter the ready state. The ready state is to queue up for CPU resources.

But why do we need these states? Of course, to improve CPU efficiency (think of the first part of this article), it’s not worth scheduling if the machine is only executing a process, let alone state.

Two key apis

  1. fork()

    Get the same process (child process) as the calling process (parent process), although the child process still has its own address space/registers/program counters.

  2. exec()

    If the operating system only provides forks, then the entire system can only have one process and countless copies of it, which is obviously unreasonable.

    Calling exec() loads code and static data from the executable and uses it to copy its own code snippet (as well as static data). The heap/stack and other memory Spaces are reinitialized, and the operating system executes the program.

IV. Kernel mode and user mode

The concept of a process was invented for multi-process systems. All of our code is written by a user process, but the fork/exec/ file read/write operations are performed by the operating system (which, after all, is a program) without permissions. Here are two interesting questions:

  1. Operating system a what thing?

    An operating system is not a process (my own used to think of an operating system as a process, but with superprivileges and painstaking efforts to act as butler), but a pile of code sitting in memory waiting to be called. The timing of the call is by interrupt or exception.

    When user processes are running on the CPU, there is no operating system problem; When an interrupt occurs, such as when the clock is up, trigger interrupt (internal interrupt), call the operating system scheduling function, it checks whether there is a task to be executed in the queue, execute the scheduling logic, work finished to return the CPU to the scheduled program; Or IO triggers (such as keyboard keystrokes) interrupts (external interrupts), which execute keyboard input interrupt handlers that are also done by the operating system.

    The operating system has a whole bunch of functions that are associated with (trigger conditions) hardware and system-level events that are isolated from user processes as opposed to user programs. The operating system is not a process, but behaves like one when triggered, unlike user processes with trigger events and permissions.

  2. Why can an operating system call be called just like a function I wrote myself?

    The answer to this question is that an operating system call is a procedure call. Interrupts and system calls can be matched using trap tables.

So with all of this going back to user-kernel mode, the logic is already there.

Kernel/user mode (Wiki)

In processor storage protection, the kernel state, or privileged state, known in Mainland China as kernel state (as opposed to user state), is the mode in which the operating system kernel operates. Code running in this mode can have unlimited access to system storage and external devices.

Microkernel operating systems try to minimize the amount of code running in a privileged state for security and elegance.

X86 structures typically have four privilege levels, the highest of which is Ring 0, known as the core mentality; The lowest level is Ring 3, which is often regarded as user mode. Rings 1 and 2 are rarely used.

V. Context switch

The necessary operation when the process switch is context switch, save the above is in order to be scheduled again can restore the scene, as if there is no switch; The switch below is to provide the current process running live.

Question 1: What is the context switch

General purpose register/program counter/kernel stack pointer for the current process

Question 2: How to make a switch

Process A execute –> clock interrupt –> process A save register to kernel stack (hardware implicit save)–> call system interrupt switch(),A save register to user stack of A, from user stack of B restore register (OS explicit context switch)–> kernel stack (B) restore register B(hardware implicit restore)–> Set the program counter to B –> process B to execute

There are two kinds of register saving, one is kernel stack saving and the other is user stack saving. The user stack is located in the user address space; The kernel stack is located in kernel space. When a process executes in the user address space, it uses the user stack, and the CPU stack pointer register stores the address of the user stack. Similarly, when a process executes in kernel space, the CPU stack pointer registers the address of the kernel stack.

VI. Summary

  1. The operating system wants to make more use of the CPU, but it takes too much time to execute various IO and network operations. The CPU utilization cannot be raised completely, so the whole process runs in the system.

  2. Because there is more than one process in the system, resources must be isolated, or the set of chaos, so that each process has its own set of resources needed to run;

  3. There is more than one process in the system, but a CPU can only run one process at a time, so what are the other processes doing? Therefore, it is necessary to give the process a reasonable scheduling state, which should enjoy, wait and wait;

  4. When a process is done, or waiting for IO, the next process comes on the scene, the previous process is done, and the next process is loaded onto the CPU.

  5. Operating systems implement an important concept through this set of logic: concurrency. Creates the illusion that the computer is running multiple programs simultaneously.

(2) threads

Once you understand what a process is, understanding threads is a piece of cake, because we’ve already laid out enough groundwork.

I. Why threads

Why do you need threads when processes can improve CPU utilization?

The possible answer is

  1. We finally implemented a multi-CPU hardware architecture. Multicore means true parallelism is possible. Multithreading is a must if you want to use all 4 cores in one process.

  2. Processes are too heavy, take up too many system resources, create processes/process switches are expensive, there are scenarios where context switches are not required, but different logic/control flows are desired.

(Using multiple threads on a single-core CPU doesn’t improve efficiency either.)

II. What is a thread

WIKI

A thread (English: thread) is the smallest unit in which an operating system can schedule operations. In most cases, it is contained within a process and is the actual operational unit of the process. A thread is a single sequential flow of control in a process, and multiple threads can be concurrent in a process, each performing a different task in parallel. In Unix System V and SunOS, they are also referred to as Lightweight processes, but more commonly referred to as kernel threads, while user threads are referred to as threads.

Threads are the basic unit of independent scheduling and dispatch. Threads Kernel threads that can be scheduled for the operating system kernel, such as Win32 threads; User threads scheduled by user processes, such as POSIX threads on Linux. Or it can be mixed by the kernel and user processes, such as Windows 7 threads.

Multiple threads in the same process share all system resources in that process, such as virtual address space, file descriptors, signal processing, and so on. But multiple threads in the same process have their own call stack, their own register context, and their own thread-local storage.

A process can have many threads working on it, each performing a different task in parallel. If the process has a lot of work to do that requires a lot of threads and calls a lot of cores, the benefits of multithreaded programming on a multi-core or multi-CPU, or hyper-threading enabled CPU are obvious: increased program throughput. Think of it in terms of people working. Cores are like people. The more people you have, the more things you can do at the same time. In a single CPU single-core computer, the use of multithreading technology, can also be responsible for THE PROCESS of I/O processing, human-computer interaction and often blocked part and the part of intensive computing separated to execute, write special workhorse thread to perform intensive computing, although multitasking is not as good as multi-core, but because of the ability of multithreading, Thus, the execution efficiency of the program is improved.

People speaking

A control flow in a process.

III. Resources owned by the process

If some resources are not exclusive, which causes a thread to run incorrectly, then that resource is exclusive to each thread, while other resources are shared by all threads in the process.

1. Shared resources

  • Address space
  • The global variable
  • The program code
  • The heap
  • File descriptor
  • .

2. Exclusive resources (may result)

  • Program counter (code entry must be different)
  • Register (CPU field must be different)
  • Stack (variable/parameter/return value etc. must be different on thread)
  • Status word (different threads have different states)

IV. Thread status

Frankly, in Linux, processes and threads are both represented by task_struct, so their states are the same.

 #define TASK_RUNNING            0
 #define TASK_INTERRUPTIBLE      1
 #define TASK_UNINTERRUPTIBLE    2
 #define __TASK_STOPPED          4
 #define __TASK_TRACED           8
Copy the code

If you really want to look at task_struct instead of the source code, here is a simple list.

struct task_struct {
    volatile long state;  // Specifies whether the process can be executed or broken
    unsigned long flags;  //Flage is the process number given when fork() is called
    int sigpending;    // Whether there is a signal waiting to be processed on the process
    mm_segment_t addr_limit; // Process address space, which distinguishes kernel process from normal process in memory location
                            //0-0xBFFFFFFF for user-thead
                            //0-0xFFFFFFFF for kernel-thread
    // Scheduling flag, indicating whether the process needs to be rescheduled. If it is not 0, scheduling will occur when the process returns from kernel state to user state
    volatile long need_resched;
    int lock_depth;  / / lock depth
    long nice;       // The basic time slice of the process
    SCHED_FIFO,SCHED_RR and SCHED_OTHER for real-time processes
    unsigned long policy;
    struct mm_struct *mm; // Process memory management information
    int processor;
    // CPUS_runnable is 0 if the process is not running on any CPU, otherwise 1. This value is updated when the run queue is locked
    unsigned long cpus_runnable, cpus_allowed;
    struct list_head run_list; // A pointer to the run queue
    unsigned long sleep_time;  // The sleep time of the process
    // It is used to connect all the processes in the system into a bidirectional circular linked list, whose root is init_task
    struct task_struct *next_task, *prev_task;
    struct mm_struct *active_mm;
    struct list_head local_pages;       // Point to a local page
    unsigned int allocation_order, nr_local_pages;
    struct linux_binfmt *binfmt;  // The format of the executable that the process is running
    int exit_code, exit_signal;
    int pdeath_signal;     // The signal that the parent sends to the child when it terminates
    unsigned long personality;
    //Linux can run ibCS2-compliant programs generated by other UNIX operating systems
    int did_exec:1; 
    pid_t pid;    // Process identifier, used to represent a process
    pid_t pgrp;   // Process group id, which indicates the process group to which the process belongs
    pid_t tty_old_pgrp;  // Identifies the group where the process control terminal resides
    pid_t session;  // The session id of the process
    pid_t tgid;
    int leader;     // Indicates whether the process is the session manager
    struct task_struct *p_opptr, *p_pptr, *p_cptr, *p_ysptr, *p_osptr;
    struct list_head thread_group;   // Thread linked list
    struct task_struct *pidhash_next; // Used to chain the process into the HASH table
    struct task_struct台湾国pidhash_pprev;
    wait_queue_head_t wait_chldexit;  // for wait4()
    struct completion *vfork_done;  // For vfork()
    unsigned long rt_priority; // Real-time priority, which is used to calculate the weight value of real-time process scheduling

    //it_real_value, it_real_INCr Used for REAL timer, the unit is jiffies, the system based on it_real_value
    // Set the first end time of the timer. When the timer expires, the SIGALRM signal is sent to the process
    // it_real_INCr resets the end time, it_prof_value, it_prof_INCr Used for the Profile timer, the unit is jiffies.
    // When the process is running, in any state, each tick reduces it_prof_value by one, and when it reaches 0, it sends the tick to the process
    // Signal SIGPROF and reset time according to IT_prof_INCr.
    //it_virt_value, it_virt_value is used for a Virtual timer. The unit is jiffies. When the process is running, no matter what
    // Each tick decreases the IT_virt_value value by one. When the value decreases to 0, SIGVTALRM is sent to the process, according to
    It_virt_incr resets the initial value.
    unsigned long it_real_value, it_prof_value, it_virt_value;
    unsigned long it_real_incr, it_prof_incr, it_virt_value;
    struct timer_list real_timer;   // Pointer to the real-time timer
    struct tms times;      // Record the time consumed by the process
    unsigned long start_time;  // When the process was created
    // Record the user mode time and core mode time consumed by the process per CPU
    long per_cpu_utime[NR_CPUS], per_cpu_stime[NR_CPUS]; 
    // Memory missing pages and exchange information:
    //min_flt, maj_flt aggregates the number of secondary missing pages (Copy onWrite pages and anonymous pages) and the number of primary missing pages (from mapped files or swaps) of the process
    // The number of pages read by the device); Nswap records the cumulative number of pages that the process swapped out, that is, the number of pages written to the switching device.
    // cMIN_flt, cMAJ_FLt, cnswap records the total number of missing pages, the number of main missing pages, and the number of pages swapped out.
    // When the parent reclaims the terminated child, it accumulates the information about the child into the fields of its structure
    unsigned long min_flt, maj_flt, nswap, cmin_flt, cmaj_flt, cnswap;
    int swappable:1; // Indicates whether the virtual address space of the process can be swapped out
    // Process authentication information
    //uid, GID is the user id and group id of the user running the process, usually the uid, GID of the process creator
    //euid, egid is a valid uid,gid
    //fsuid, fsgid is the file system uid,gid, these two ID numbers are usually equal to the valid uid,gid, when checking for files
    // Use them when accessing the system.
    //suid, sgid is the backup uid,gid
    uid_t uid,euid,suid,fsuid;
    gid_t gid,egid,sgid,fsgid;
    int ngroups; // Record how many user groups the process is in
    gid_t groups[NGROUPS]; // Record the group in which the process resides
    // The power of the process is the effective bit set, the inherited bit set, and the allowed bit set
    kernel_cap_t cap_effective, cap_inheritable, cap_permitted;
    int keep_capabilities:1;
    struct user_struct *user;
    struct rlimit rlim[RLIM_NLIMITS];  // Information about process-related resource limits
    unsigned short used_math;   // Whether to use FPU
    char comm[16];   // The executable name of the running process
     // File system information
    int link_count, total_link_count;
    //NULL if no Control terminal where the TTY process resides. If no control terminal is required, this pointer is NULL
    struct tty_struct *tty;
    unsigned int locks;
    // Process communication information
    struct sem_undo *semundo;  // All undo operations on semaphores
    struct sem_queue *semsleeping; // When a process is suspended due to a semaphore operation, it records the waiting operation in this queue
    // The CPU state of the process is saved to the task_struct that stops the process when switching
    struct thread_struct thread;
      // File system information
    struct fs_struct *fs;
      // Open the file information
    struct files_struct *files;
      // Signal handler function
    spinlock_t sigmask_lock;
    struct signal_struct *sig; // Signal handler function
    sigset_t blocked;  // Signal that the process is currently blocking, one bit for each signal
    struct sigpending pending;  // Whether there is a signal waiting to be processed on the process
    unsigned long sas_ss_sp;
    size_t sas_ss_size;
    int (*notifier)(void *priv);
    void *notifier_data;
    sigset_t *notifier_mask;
    u32 parent_exec_id;
    u32 self_exec_id;

    spinlock_t alloc_lock;
    void *journal_info;
};
Copy the code

4. Back to the text

Text said: process is the smallest unit of resource allocation, thread is the smallest unit of CPU scheduling, I code so many words is to explain exactly why the text to say so, why can say so.

Before see this sentence feel thief abstract, see now feel thief cordial, summary of good.

Finally, to summarize the full text:

The operating system encapsulates system resources (CPU/memory/files, etc.) through the concept of processes. A process seems to own such resources (falsely, of course), so the process is the smallest unit of resource allocation.

Each thread can be executed individually on the CPU. In a multi-core environment, concurrency is possible with multi-threading, so threads are the smallest unit of CPU scheduling.

Next time, we will talk about the problems and usage scenarios of multi-process and multi-threading, and how to temporarily name processes and threads.

5. Reference materials

  1. en.wikipedia.org/
  2. www.zhihu.com/
  3. Introduction to Operating systems -Remzi Andrea