The environment for UNIX processes

When the kernel starts a C program, the main function is preceded by a call to the startup routine, which does some processing before calling the main function. At a minimum, the startup routine sets command-line parameters and environment variables. 5 ways a UNIX process can exit:

  1. Return in the main function.
  2. Call the exit function, which is normally called after the main function returns.
  3. Call the _exit function.
  4. Calls to abort.
  5. Terminated by a signal. Case 4 is a special case of this case.

The difference between the exit and _exit functions Exit is located in the header:

_exit is located in the header:

is a system call function used to handle UNIX-specific details. _exit goes directly to the kernel, while the exit function first executes the termination handler and closes all standard IO streams (calling fclose to flush the buffer), and then enters the kernel (typically calling _exit).

function: void atexit(void (\*func)(void)); In the header file:

ANSI C specifies that a process can register up to 32 termination handlers, called by the exit function, and called in the reverse order of registration.

Storage space layout

The storage space layout system of C program allocates virtual process space for each process, and the storage space layout is established in this virtual process space. The mapping of virtual process space to physical memory space is done by the operating system, related concepts such as paging mechanism, segmentation mechanism, page swap mechanism, etc. I will skip here. Body segment: The part of a machine instruction that is executed. It is read only and can be shared. Initialized data segment: Non-initialized data segment, also known as BSS segment, is set to 0 by exec and does not need to be placed in a disk file. Heap: Heap memory allocation: void* malloc(size_t size); Allocates the specified number of bytes of dynamic memory, the content is not specified. void* calloc(size_t obj, size_t size); To allocate a specified amount of storage for an object of a specified length, with each bit set to zero. void realloc(void ptr, size_t new_size); Changing the size of the allocated storage area may move the previous content to a larger storage area, and the content of the new area is not determined. Returns the NullPtr pointer in case of error. The pointer returned by these three functions must be in proper memory to meet the most stringent alignment requirements. void free(void* ptr); Stack: command-line parameters and environment variables:

The environment variable

char getenv(const char name);

: return nullptr if name does not exist.

Process control

Process 0 is the scheduler process. Process 1 is the init process, called by the kernel at the end of the bootup process. It is a user process that has superuser privileges and is the parent of all orphaned processes. The six identifiers of the process.

Documentation and understanding of the actual user/group ID, valid user/group ID, and stored user/group ID. The actual user/group ID of a process is determined by the user who started the process, which we refer to as RUID and RGID. The process itself is also a file, so it must have the file owner ID and the corresponding group ID, which we call st_uid and st_gid. A valid user/group ID is an ID that is specified at program execution time. By default, it is specified as the actual user/group ID. This ID is used to check permissions at program execution time. Because the owner and user of the program may not be the same user, the program written by user A requires certain permissions of user A itself to access resources, but the program may not have corresponding permissions when executed by other users. So we provide a valid user/group ID to perform permission checks, and then provide a mechanism to modify the valid user/group ID at program execution in a reasonable way. This mechanism must be secure. If we allow any user to arbitrarily specify a valid user ID and a valid group ID for executing a program, then there is no security at all. Why do we need a permission mechanism? Therefore, this mechanism must be provided by the program writer, that is, if User B wants to use User A’s program, User A must use some mechanism to allow other users to temporarily obtain their own permissions while executing their own programs. This is where the set bits and valid user/group IDs exist. When the program is used by other users, the author of the program sets the valid user/group ID to the ST_UID and ST_GID of the program by setting the corresponding setting bit. So what is the use of storing user/group IDs? Is used to store a valid user/group ids, and sometimes in the program need to adjust the permissions for many times, some point to the actual application user/group id to set up the effective user/group ids, and then need before a valid user/group id to test authority, which means that we need to store a valid user/group id in one place, When the valid user/group ID has been modified to other values, the value of the initial setting can also be found in this storage location. Two composite functions that read/set the three values getResuId and getResgid. Of course, not any program can set these three values arbitrarily. Users without superuser privileges cannot set the actual user/group ID, and can only set the valid user/group ID to the actual user/group ID or to the storage user/group ID. This makes sense, and provides the appropriate modification mechanism for ordinary users to accomplish this function. Call it a day. Refer to the article: https://blog.csdn.net/hubinbi…

Pid \_t fork(void) in header file:

returns 0 in child, id in parent, -1 on error. Fork is used to create a new process, and the child process gets a copy of the parent process’s data space, heap, and stack. Note the relationship between the fork and IO functions. All descriptors opened by the parent process are copied to the child process. The same descriptor of the parent process shares a file table entry. This sharing allows parent and child processes to use the same offset for the shared file. In this case, if the parent process is not synchronized, the output to the same file will be mixed. So either the parent waits for the child to finish, or it doesn’t interfere with the file descriptors used by the other.

The difference between vfork and fork: vfork also creates a child process, but instead of copying the parent process, it shares its information with it. Vfork guarantees that the child process executes first, and that the parent process cannot continue until the child process calls the exec or exit function. Vfork was designed for exec because it prevents children from copying information from the parent process — but nowadays fork functions tend to copy on write, so fork+exec is much less expensive. Note: Calling return on vfork causes the parent process to die together because it shares information about the parent process. The return triggers a local variable destructor on the mian function and pops the stack, which does not happen when using exit directly. About the difference between the fork and vfork see links: https://www.cnblogs.com/19322…

The exec family of functions provides methods to switch to another process in a process. There are six functions in this family, with some differences in the interface provided, but ultimately with a call to execve system call. The handling of open files is related to the EXEC close flag value for each descriptor. Each file descriptor in the process has an EXEC close flag (FD\_CLOEXEC). If this flag is set, the descriptor is closed when exec is executed, otherwise it remains open. Unless the flag is specifically set with FCNTL, the system’s default operation is to leave the descriptor on after exec, which can be used for I/O redirection. See the link: https://blog.csdn.net/amoscyk…

File IO

int dup(int fd); int dup2(int fd1, int fd2); In the header file:

These two functions can be used to copy an existing file descriptor. DUP returns a new descriptor value, which must be the smallest currently available. DUP2 allows you to specify the value of the new descriptor, copying fd1 to fd2 and closing it if there is an open file at fd2. The shutdown is not performed. Returns the value, a new file descriptor on success or -1 on failure.

Interprocess communication

1. Plumbing pipe

Two limitations: PIPE is one-way and can only be used between processes that have a common ancestor process. example:

#include <iostream> #include <unistd.h> #include <cassert> #include <sys/wait.h> #include <sys/signal.h> #include <cstring> #include <cstdio> typedef void Sigfunc(int); Sigfunc* signaler(int signo, Sigfunc *func) { struct sigaction act, oact; act.sa_handler = func; sigemptyset(&act.sa_mask); act.sa_flags = 0; if(sigaction(signo, &act, &oact) < 0) { return SIG_ERR; } return oact.sa_handler; } void child_handle(int) { pid_t pid; int stat; while((pid = waitpid(-1, &stat, WNOHANG)) > 0) { ; } } int main() { signaler(SIGPIPE, SIG_IGN); signaler(SIGCHLD, child_handle); int fd[2]; int result = pipe(fd); assert(result == 0); pid_t pid = fork(); assert(pid >= 0); if(pid == 0) { close(fd[1]); if(fd[0] ! = STDIN_FILENO) { int result = dup2(fd[0], STDIN_FILENO); assert(result >= 0); } int n = -1; char buf[64]; while((n = read(STDIN_FILENO, buf, sizeof(buf) - 1)) > 0) { buf[n] = '\0'; std::cout << buf; } } else { //parent; const char* ptr = "hello world! \n"; int result = close(fd[0]); int n = write(fd[1], ptr, strlen(ptr)); } return 0; }

2.FIFO

In contrast to pipe, FIFO is not used only between processes of a common ancestor process. The constant PIPE_BUF specifies the maximum amount of data that can be written atomically to the FIFO.

3. Three kinds of IPC

1. There is no access count, so it cannot be deleted in time. 2. Unknown to the file system by name, cannot be used by a multiplexer function. 3. The kernel ID that identifies its structure is not easily shared. It is either assigned by the system and then transferred via a file, specified by the display, but may specify a previously assigned ID, or generated by the ftok function. 4. It has the advantages of message information + priority and record oriented. Message queue the specific method of use: see https://www.jianshu.com/p/7e3… Semaphore specific usage see: https://blog.csdn.net/xiaojia… A semaphore is essentially a counter that is used by multiple processes to protect shared data. The theoretical model of a semaphore is not complicated, but the implementation in Linux is rather tedious. Shared memory allows two or more processes to share the same memory area. There is no synchronization mechanism and it is usually used in conjunction with semaphore. https://blog.csdn.net/qq_2766… Unix domain sockets

struct sockaddr_un {
sa_family_t sun_family; //AF_LOCAL
char sun_path[104];
}

In the header file: <sys/un.h>

#include <iostream> #include <sys/socket.h> #include <sys/un.h> #include <cstring> #include <cassert> #include <error.h>  int main() { sockaddr_un server_addr; bzero(&server_addr, sizeof(server_addr)); server_addr.sun_family = AF_LOCAL; strncpy(server_addr.sun_path, "/home/pn/unix222", sizeof(server_addr.sun_path) - 1); int fd = socket(AF_LOCAL, SOCK_STREAM, 0); assert(fd >= 0); socklen_t len = sizeof(server_addr); int result = bind(fd, (struct sockaddr*)&server_addr, len); assert(result == 0); sockaddr_un server2; socklen_t llen = sizeof(server2); result = getsockname(fd, (struct sockaddr*)&server2, &llen); assert(result == 0); std::cout << "name : " << server2.sun_path; return 0; }

signal

The signal is asynchronous, and the signal is random to the process. The process can set three ways to handle the signal :(1) ignore this signal, SIGKILL and SIGSTOP cannot be ignored. (2) Provide a signal handler function, specifying that this function is called when the signal occurs. (3) Perform the default action, terminate the process or ignore the signal.

Early unreliable signaling mechanisms

typedef void Sigfunc(int); Sigfunc \signal(int, Sigfunc\); Return value: The previous signal-handling configuration is returned on success, or SIG_ERR on failure. The first parameter is the set signal. The second parameter can be: SIG_IGN, SIG_DFL, or a custom signal handler. Feature 1: Each time the signal is processed, the signal is set to the default value. If you call signal again in a signal handler to set the handler, the default action is taken between entering the handler and calling signal if a signal is generated. In addition, if you want to set a variable in the signal handler function, and then in the ordinary program based on the value of the variable to identify whether the signal is generated, this mechanism is also flawed. In summary, a purely asynchronous mechanism must be fully and reasonably synchronized in order to work, and signal does not meet this requirement.

Signal generation interrupts some low-speed system calls, and if a signal is caught while a process is blocking during a low-speed system call, the system call is interrupted and no further execution occurs. The system call returns an error with its errno set to EINTR. The reason for this is that because a signal has occurred and the process has caught it, it means that something has happened, so it’s a good opportunity to wake up a blocking system call. Typical in network programming are the connect and accept functions. The details of the early signal function for restarting the system call varied from platform to platform and were generally confusing.

Reliable signal mechanism

We need to define some terms that we use when talking about signals:

  1. Generates a signal for the process when the event that causes the signal occurs. When a signal is generated, the kernel usually sets some kind of flag in the process table. When this action is done on a signal, we say that a signal has been delivered to the process. During the time interval between the generation and transmission of the signal, the signal is said to be pending.
  2. Processes can choose to block signal delivery. If a signal is generated for a process that is set to block, and the signal is processed by default or by catch, the signal remains pending. Until the process unblocks the signal or is set to ignore the signal. A process can change its action on a signal before it is delivered to it by calling the sigpending function to set the specified signal to block. Signals that occur more than once in a short period of time are not queued, that is, each UNIX kernel is not queued to the signal.
  3. Each process maintains a signal masking word structure that records the set of signals that the process is blocking.

The kill function sends a signal to a process or a group of processes. int kill(pid_t pid, int signo); In the header file:

returns 0 on success and -1 on error.

POSIX.1 defines the type SIGSET_T to contain a signal set, and defines five functions to handle the signal set: two for initialization, two for set and clear, and one for check.

The SIGPROMASK function detects or changes the signal mask of the process. int sigprocmask(int how, const sigset_t\ set, sigset_t\ oset); In the header file:

First, oset is a non-null pointer, and the current signal mask word of the process is returned through oset. Second, if set is a non-null pointer, the how parameter indicates how to modify the current signal mask word.

The sigPending function is used to return a set of signals that are currently pending because they are not committed due to blocking. int sigpending(sigset_t* set); Returns 0 on success and -1 on error.

The sigaction function replaces the previous signal function, which detects or modifies the processing action associated with a given signal. int sigaction(int signum, const struct sigaction \act, struct sigaction \oldact); Returns 0 on success and -1 on error. If the act pointer is not null, the processing action of the signal signum will be modified; if the oldact pointer is not null, the previous processing action will be returned. About the structure sigaction:

struct sigaction {
  void (*sa_handler)();
  sigset_t sa_mask;
  int sa_flags;
}

The sa_handler member is either a handler or SIG_IGN or SIG_DFL. When it is a handler, the signal set represented by sa_mask is set to the masking signal set of the current process before the handler call and is set back to the old masking signal set after the handler returns. This blocks certain signals while they are being processed. By default, the signal being delivered is blocked. This setup works long term, except to change it again with sigaction functions, which is different from the earlier unreliable mechanisms. The SA_FLAGS field contains the options for processing signals, as detailed in Section 10.14 of Advanced Programming for UNIX Environments. Note one option: SA_RESTART. The system call interrupted by this signal is automatically restarted. So much for the introduction of signals, and the rest of the content should be studied carefully. I found the chapter on signals to be annoying, it was too tedious.

Signal processing in the era of multithreading

If the signal is generated by an exception (such as a program error such as SIGPIPE, SIGEGV, etc.), only the thread that generated the exception receives it and processes it. If the internal signal is generated with pthread_kill, only the target thread specified in the pthread_kill parameter receives and processes it. If it is an external signal generated by using the kill command, usually a SIGINT, SIGHUP or other job control signal, all threads will be trawled until a thread that does not block the signal is found, and then it will be called to process. (usually from the main thread), notice that only one thread can receive.

Second, each thread has its own separate Signal Mask, but all threads share the process’s Signal Action. This means that you can call pthread\ _SIGMask (not SIGMask) in the thread to determine which signals the thread is blocking. But you can’t call sigaction to specify how a single thread will handle the signal. If SIGACTION is called on a thread to process a signal, then any thread in the process that does not block the signal will process the signal in the same way when it receives the signal. Also, note that the child thread’s mask is inherited from the main thread. See the link: https://www.cnblogs.com/codin… It can be seen that the standard has the following principles for signal processing in the era of multi-threading: if the signal source or the specified signal is clear, the signal processing function of the corresponding thread is called accurately; The signal source or specification is targeted at the process and is delivered to the thread that does not block the signal for processing, and only to one thread.

Another: the kernel provides SignalFD, the signal is abstracted into a file, the generation of the signal means that the file is readable, so you can use IO multiplexing to deal with the file FD and the signal FD together. See: http://www.man7.org/linux/man… English translation: https://blog.csdn.net/yusiguy… I personally prefer this approach.