Modern operating systems react to unexpected situations (disk read-write data ready, hardware timers generating signals, etc.) by mutating the control flow. In general, we name these mutations Exceptional Contral Flow ECF. Abnormal control flows occur at all levels of a computer system. For example, at the hardware level, a time detected by the hardware triggers a sudden transfer of control to the exception handler. At the operating system level, the kernel transfers control from one user process to another through a context switch. At the application level, one process can send signals to another process, and the receiver will transfer control to a signal handler. A program can respond by circumventing the usual stack rules and executing a non-local jump error anywhere in another function.

Why do you need to understand ECF?

  • Helps to understand important system concepts
  • Helps you understand how your application interacts with the operating system
  • Helps you understand concurrency
  • Helps you understand how software exceptions work (e.g. C++/ Java try-cache-throw software exception mechanism)

abnormal

An exception is a form of exception control flow that is implemented partly by hardware and partly by the operating system.

An exception is a mutation in the control flow in response to some change in the processor state. In the figure above, the processor is executing a current instruction, ICUR, when an important change in processor state occurs. In the processor, states are encoded as different bits and signals. Changes in state are called events. The event may be directly related to the execution of the current execution, such as virtual memory page failure, arithmetic overflow, dividing by zero, or it may not be related to the current instruction, such as system timer signal, I/O request completion, etc.

In any case, when the processor detects that an event has occurred, it makes an indirect procedure call through a jump table called the Exception Table. To an operating system subroutine (Exception Handler) that deals specifically with such events. When the exception handler has finished processing, depending on the type of event that caused the exception, the following happens:

  • Re-execute ICUR (if page missing interrupts occur)
  • Continue executing I_next (if I/O device signal is received)
  • Terminate the program (if it receives a KILL signal)

Exception handling

Every possible type of exception in the system is assigned a unique, non-negative integer exception number. Some of these numbers are assigned by the processor designer, while others are assigned by the operating system kernel (the part of the operating system that lives in memory) designer. Examples of the former include dividing by zero, missing pages, memory access violations (such as segment fault), breakpoints, arithmetic operation overflows, and so on, while the latter include system calls and signals from external I/O devices. The operating system allocates and initializes a jump table called an exception when the system is booted (rebooted or powered up). Each entry k contains the jump address of the handler for exception k. The starting address of the exception table is placed in a special CPU register called the exception table base address register.

Exceptions are similar to procedure calls, but there are still important differences:

  • During a procedure call, the handler pushes the return address onto the stack before jumping to the handler. But for different types of exceptions, the return address may be the current instruction or the next instruction
  • The processor also pushes some additional processor state on the stack, which is needed to restart execution of the interrupted program when the processor returns
  • If control is transferred from user programs to the kernel, then all these items are pushed onto the kernel stack
  • Exception handlers run in kernel mode, which means that the exception handler has full access to all system resources (Q: what about user-specified exception handlers?).

When the exception handler completes, it optionally returns the interrupted program by executing a special “return from interrupt” instruction that bounces the appropriate state back into the processor’s control and data registers. If the exception interrupts a user program, the state is restored to user mode and control is returned to the interrupted program.

Type of exception

Exceptions fall into four categories: interrupt, trap, fault, and abort:

  • Interrupt: Occults asynchronously as a result of a signal from an I/O device outside the processor (such as the completion of a disk read). Typically, this signal is triggered by an external hardware device signaling a pin on the processor and placing an exception number (which identifies the device causing the interrupt) on the system bus. After the current instruction completes, the processor notices the high interrupt pin voltage, reads the exception number from the system bus and invokes the appropriate exception handler. When the exception handling is complete, execute the next instruction, I_next.
  • Traps and system calls: An intentional exception is the result of executing an instruction (such as malloc, read, write, fork, execve, exit, etc.). The processor provides a special “syscall n” (n is the number of system calls, and the operating system has a corresponding system call table). Entry I in the table identifies the handler address of system call I) to handle these system calls. When the interrupt handler completes execution, switch the program to user mode and execute the next instruction, I_next. A normal function running in user mode can only access the same stack as the calling function, but a system call running in kernel mode allows one line of privileged instructions and access to the stack defined in the kernel.
  • Fault: Caused by an error and usually corrected by a fault handler. When a failure occurs, the processor transfers control to the fault handler. If the fault handler is able to correct the error, the control returns to the instruction for the fault and reexecutes it. The non-handler returns to the ABORT process in the kernel, and ABORT terminates the application that caused the failure. Common faults such as: missing pages.
  • Terminate: the result of an unrecoverable fatal error, usually some hardware error, such as a parity error that occurs when DRAM/SRAM is corrupted. The termination handler never returns control to the application, but instead returns directly to the kernel’s ABORT path.
The Linux system call function first writes the system call to register %rax, then writes the parameters (such as the number of bytes of mallo) to register %rdi, etc., and then calls the "syscall" instruction to invoke the system call.

process

The classic definition of a process is an instance of an executing program. Every program in the system runs in the context of some process. The context consists of the states required for the program to function correctly. This state includes the code and data of the program held in memory, its stack, the contents of general-purpose registers, program counters, environment variables, a collection of open file descriptors, and so on.

Logical control flow

Processes take turns using the processor. Each process executes part of its stream, which is then preempted, and then the turn of the other process. For a program running in the context of one of these processes, it looks as if it is using the processor exclusively.

Concurrent flow

A logical flow whose execution time overlaps another flow is called a concurrent flow. The two streams are said to run concurrently. The general phenomenon of multiple streams executing concurrently is called concurrency. The idea of a process taking turns with other processes is called multitasking. Each time period during which a process executes part of its control flow is called a time slice.

Private address space

The process also provides an illusion for each program: it appears to have exclusive use of the system address space. A process provides each program with its own private address space. In general, the memory byte associated with an address in this space (also known as the virtual address space) cannot be read or written by other processes.

Although the contents of the memory associated with each private address space are generally different, each such space has the same common structure. The address space is reserved for user programs, including the usual code, data, heap, and stack segments. Code snippets always start at 0x400000. The top of the address space is reserved for the kernel (the part of the operating system’s resident memory). This part of the address space contains the code, data, and stack that the kernel uses when executing instructions with white inheritance, such as when an application makes a system call.

User mode and kernel mode

To limit an application can execute instruction and it can access the address space of the scope, the processor USES a pattern in a control register bits to describe the process of authority: mode bits to 1 identification process runs in kernel mode, you can perform any instruction, instruction set and access any memory location in the system. If the mode bit is not set, the identity is in user mode and privileged instructions (such as stopping the processor, changing the bit mode, initiating I/O operations, referring to code and data in the kernel area of the address space) are not allowed. User programs must access kernel code and data through system calls.

The only way a process can change from user mode to kernel mode is through an exception such as an interrupt, a failure, or getting caught in a system call. When an exception occurs, control is passed to the exception handler, and the handler changes the mode from user mode to kernel mode. When the exception handler returns to the application code, the processor changes the mode from kernel mode to user mode.

The /proc file system in Linux allows user-mode processes to access the contents of the kernel structure. The /proc file system outputs the contents of many kernel data structures as a hierarchy of text files that a user program can read.

  • /proc/cpuinfo
  • The/proc / $pid/maps, etc

Think about:

  • Is /proc stored on disk? If not, how did it work?
  • Implements a program, mimics /proc, the current program uses the process ID, memory usage to write to a file.

Context switch

The kernel maintains a context for each process. Context is the kernel to restart a process needed to be preempted state (general purpose registers, floating point register, the program counter, user, status register, the kernel stack, kernel data structures), such as describing the address space of a page table, the process table contains information about the current process, already open file descriptors, etc.

System calls may cause context switches, such as I/O reads and writes. Interrupts may also cause a context switch. For example, all operating systems have a mechanism for periodic timer interrupts, usually 1ms or 10ms. Each time a timer interrupt occurs. The kernel determines that the current process has been running for long enough and switches to a new process.

System call error handling

When UNIX system-level functions have an error, they typically return -1 and set the global variable errno to identify the error. The program should always check for errors.

If ((pid = fork()) < 0){strError returns a string of text describing the error associated with an error value. fprintf(stderr, "fork error: %s\n", strerror(errno)); exit(0) }

Process control

UNIX systems provide a large number of system calls to manipulate processes from C programs.

pid_t getpid<void>;
pid_t getppid<void>;
pid_t fork(void);
void exit(int status);

The newly created child process is almost but not exactly the same as the parent process. The child process gets an identical (and separate) copy of the parent’s user and virtual address space, including code and data segments, heap, shared library, and user stack. The child process also gets the same copy as any open file descriptor of the parent process, meaning that the child process can read and write to any file opened by the parent process. Any subsequent changes made by the parent and child processes are independent and have their own private address space and are not reflected in the memory of the other process.

The fork function is called only once, but returns twice. Once in the parent process and once in the newly created child process. In programs with multiple forks, this can be very confusing. How many times is hello printed in the following example?

int main(){
    fork();
    fork();
    printf("hello\n");
    exit(0);
}

When a process terminates for some reason, the kernel does not immediately purge it from the system. Instead, the process was protected in a terminated state until it was recycled by its parent (i.e. the child process dropping out signal being handled by the parent). When a parent reclaims a terminated child process, the kernel passes the child’s exit status to the parent, then abandons the terminated process, and from that point on, the process no longer exists. A process that is terminated but not yet recycled is called a zombie. Dead processes still consume memory, so we should always be careful to recycle children we create.

If a parent process dies, the kernel arranges for the init process to be the adoptive parent of its orphaned process. The init process, which has a PID of 1, is created by the kernel when the system is started. It does not terminate and is the ancestor of all processes. If the parent process terminates without reclaiming its dead children, the kernel arranges for the init process to repossess them.

// If successful, the child process pid is returned. If pid=-1, it waits for all children. Pid_t waitpid(pid_t pid, int *statusp, int options); pid_t waitpid(pid_t pid, int *statusp, int options); // waitpid(-1, &status, 0) pid_t wait(int *statusp)
Unsigned int sleep(unsigned int secs); unsigned int sleep(unsigned int secs); // Hibernates the process until it receives a signal int pause(void);
// Load and run filename (const char *filename, const char *argv[], const char *envp[]);

Execve loads and runs a new program in the context of the current process. It overwrites the address space of the current process, but does not create a new process, and inherits all file descriptors that were opened when execve was called. Refer to the Links section.

signal

Linux is a higher-level software form of exception that allows processes and the kernel to interrupt other processes.

The figure above shows the signals supported on Linux systems, the first 30 of which are the most common in practical applications. Each signal type corresponds to a certain system time. Low-level hardware exceptions are handled by the kernel exception handler and are normally invisible to the user process. Signals provide a mechanism for notifying the user that these exceptions have occurred. For example, if a process tries to divide by 0, the kernel sends a SIGFPE signal; Ctrl-C sends SIGINT; Ctrl-z sends SIGSTP signal; SIGKILL is a forced termination (the signal cannot be caught and the handler cannot be overwritten); SIGCHLD is child process termination.

Transmitting a signal to a destination consists of two distinct steps.

  • Send a signal: the kernel sends a signal to the destination program by updating some state (the signal bit table of the process) in the destination program context. Signals can be sent for one of two reasons: the kernel detects a system event, such as a division by zero error or child process termination; When a process calls kill, it explicitly asks the kernel to send a signal to the destination process. A process can send signals to itself.
  • Receiving a signal: A destination process receives a signal when it is forced by the kernel to respond in some way to its sending. A process can ignore this signal, terminate it, or catch it by executing a user-level function called a signal handler.

A signal that is sent but not received is called a pending signal. There can be at most one signal of a type to be processed at any one time. Therefore, if you send multiple signals k to a process repeatedly, all subsequent signals k will be discarded if the process fails to process the previous one.

The kernel maintains a set of pending signals for each process in the pending bit. The Blocked bit, on the other hand, maintains the set of blocked signals. Therefore, for sending, the kernel sets the KTH position of pending to 1 and receives to 0.

Send a signal

$sig $pid int kill(pid_t pid, int sig)

When we are in the shell to start a job (for example, the ls | sort), will start the two processes, both belong to a process group. When we do Ctrl-C, the kernel sends SIGINT to each process in the process group.

Received signal

When the kernel switches process P from kernel mode to user mode (for example, after returning from a system call or completing a context switch), it checks the set of unblocked pending signals for process P. If the set is empty, the kernel passes control to the next instruction, I_next, in P’s logical control flow. However, if the set is non-empty, then the kernel selects a signal k of the set species (usually the signal with the smallest value first) and forces process p to receive the signal k. Receiving a signal triggers the process to take some action (the signal handler). Once this behavior is completed, the process passes control to I_next, the next instruction in the logical control flow of p. Each signal type has a predefined default behavior (partial signal behavior is allowed to be overridden by the user program, SIGSTOP, SIGKILL are not allowed to be overridden), which is one of the following:

  • Terminate: if a SIGKILL signal is received
  • Terminate and dump to memory
  • Stop until restarted by SIGCONT signal
  • Ignore: if received SIGCHLD

A signal handler can be interrupted by another signal handler.

Blocking and contact blocking signals

// how: SIG_BLOCK represents the shielded signal and SIG_UNBLOCK represents the received signal. If not empty, store the Blocked bit vector's value in oldset int sigprocmask(int how, const sigset_t *set, sigset_t *oldset)

The principle of

Signal processing is cumbersome: handlers run concurrently with the main program, share the same global variables, and so may interfere with each other. Different systems have different signal processing semantics; The signal handler may be interrupted by other signals. Therefore, in general, we need to follow the following principles when writing signal processing programs:

  • The handler is as simple as possible
  • Only asynchronous signal-safe functions (functions that can be reentrant or cannot be interrupted) are called in the handler. Printf, malloc, exit, etc are not asynchronous signal safety
  • Save and restore errno. Many asynchronous signal-safe functions set the errno on an error return and may interfere with other parts of the main program that follow the errno
  • Block all signals to protect access to shared global data structures. If the handler and the main program will share a global data structure, then all signals should be blocked until the structure is accessed
  • Declare global variables with voliatile
  • Declare the flag with SIG_ATOMIC_T
  • When we receive a signal, it means that this type of event has only occurred once (since duplicate signals are discarded).

In the following example, the operation on Job is a global operation, and in practice, the operation on Job is generally not atomic.

#include "csapp.h" void initjobs() { } void addjob(int pid) { } void deletejob(int pid) { } /* $begin procmask2 */ void handler(int sig) { int olderrno = errno; sigset_t mask_all, prev_all; pid_t pid; Sigfillset(&mask_all); While ((pid = waitpid(-1, NULL, 0)) > 0) {/* Reap a Zombie Child */ / SigProMask (SIG_BLOCK, &MASK_ALL, &PREV_ALL); deletejob(pid); /* Delete the child from the job list */ Sigprocmask(SIG_SETMASK, &prev_all, NULL); } if (errno ! = ECHILD) Sio_error("waitpid error"); errno = olderrno; } int main(int argc, char **argv) { int pid; sigset_t mask_all, mask_one, prev_one; Sigfillset(&mask_all); Sigemptyset(&mask_one); Sigaddset(&mask_one, SIGCHLD); Signal(SIGCHLD, handler); initjobs(); /* Initialize the job list */ while (1) { Sigprocmask(SIG_BLOCK, &mask_one, &prev_one); /* Block SIGCHLD */ if ((pid = Fork()) == 0) { /* Child process */ Sigprocmask(SIG_SETMASK, &prev_one, NULL); /* Unblock SIGCHLD */ Execve("/bin/date", argv, NULL); } Sigprocmask(SIG_BLOCK, &mask_all, NULL); /* Parent process */ addjob(pid); /* Add the child to the job list */ Sigprocmask(SIG_SETMASK, &prev_one, NULL); /* Unblock SIGCHLD */ } exit(0); } /* $end procmask2 */

Non-local jump

C provides a form of user-level exception control flow, called a nonlocal jump, which transfers control directly from one function to another executing function without going through the normal call-return sequence. Non-local jumps are provided through the setjmp and longjmp functions.

int setjmp(jmp_buf env); int longjmp(jmp_buf env, int retval); Int sigsetjmp(sigjmp_buf env, int saveigs); int sigsetjmp(sigjmp_buf env, int saveigs); int siglongjmp(sigjmp_buf, int retval);

The setjmp function saves the current calling environment in the env buffer for later longjmp use and returns 0. The invocation environment includes program counters, stack Pointers, and general-purpose registers. Note that the value returned by setjmp cannot be assigned to a variable (for reasons of your own discretion), but it can be safely tested in a switch or conditional statement.

rc = setjmp(env);       // Wrong

The longjmp function recovers the call environment from the EVN buffer and then triggers a return from the last setjmp call that initialized the env. Then setjmp returns with the non-zero return value retVal.

The setjmp function is called only once, but returns multiple times: once when setjmp is called for the first time, saving the calling environment in the buffer env; Once for each corresponding longjmp call. The longjmp function, on the other hand, is called once but never returns. An important application of a non-local jump is to run a deeply nested function call that returns immediately, usually caused by detecting an error condition. We can use a non-local jump to return directly to a normal localized error handler without having to laborious untangle the call stack.

/* $begin setjmp */
#include "csapp.h"

jmp_buf buf;

int error1 = 0; 
int error2 = 1;

void foo(void), bar(void);

int main() 
{
    switch(setjmp(buf)) {
    case 0: 
    foo();
        break;
    case 1: 
    printf("Detected an error1 condition in foo\n");
        break;
    case 2: 
    printf("Detected an error2 condition in foo\n");
        break;
    default:
    printf("Unknown error condition in foo\n");
    }
    exit(0);
}

/* Deeply nested function foo */
void foo(void) 
{
    if (error1)
    longjmp(buf, 1); 
    bar();
}

void bar(void) 
{
    if (error2)
    longjmp(buf, 2); 
}
/* $end setjmp */

LongJMP allowing it to skip the intermediate call feature can have serious consequences. If some resources (memory, network connections, and so on) are allocated in an intermediate function call that was expected to be released at the end of the function, the release code will be skipped, resulting in a resource leak. Another important use of non-local jumps is to branch a signal handler to a particular code location, rather than returning to the location of the instruction that was interrupted by the signal arrival. For example, we can use sigsetjmp and siglgongjmp for soft restarts.

/* $begin restart */ #include "csapp.h" sigjmp_buf buf; void handler(int sig) { siglongjmp(buf, 1); } int main() {// The first call returns 0. When jump returns here, return non-0 if (! sigsetjmp(buf, 1)) { Signal(SIGINT, handler); Sio_puts("starting\n"); } else Sio_puts("restarting\n"); while(1) { Sleep(1); Sio_puts("processing... \n"); } exit(0); /* Control never reaches here */ } /* $end restart */

The exception mechanism provided by C++ and Java is at a higher level, which is a more structured version of C’s setjmp and longjmp functions. You can think of a catch in a try statement as similar to a setjmp function. Similarly, the trhow statement is similar to the longjmp function.

Here is an example of a try-catch-throw. The program will always print “KeyboardInterrupt.”

jmp_buf ex_buf__; #define TRY do{ if(! setjmp(ex_buf__)) { #define CATCH } else { #define ETRY } } while(0) #define THROW longjmp(ex_buf__, 1) void sigint_handler(int sig) { THROW; } int main(void) { if (signal(SIGINT, sigint_handler) == SIG_ERR) { return 0; } TRY {// raise(sig) raise(SIGINT); } CATCH { printf("KeyboardInterrupt"); } ETRY; return 0; }

When the macro is expanded, the code looks like this:

jmp_buf ex_buf__; void sigint_handler(int sig) { longjmp(ex_buf__, 1); } int main(void) { if (signal(SIGINT, sigint_handler) == ((_crt_signal_t)-1)) { return 0; } do{ if(! _setjmp(ex_buf__)) { { raise(SIGINT); } } else { { printf("KeyboardInterrupt"); } } } while(0); return 0; }

Tools for manipulating processes

Linux systems provide a number of useful tools for monitoring and manipulating processes.

  • Strace: Prints the trace of each system call made by a running program and its children. Strace cat /dev/null
  • PS: List the processes in the current system (including dead processes)
  • Top: Prints information about the current process resource usage
  • Pmap: Shows the memory map of the process
  • /proc: A virtual file system that outputs the contents of a large number of kernel data structures in ASCII text format that can be read by user programs. Something like “cat /proc/loadavg” can see the average load on the current system

Every ordinary anomaly we experience may be a continuous miracle.