preface

Welcome to our GitHub repository Star: github.com/bin39232820… The best time to plant a tree was ten years ago, followed by now

Tips

The interview guide series, which in many cases does not delve into the details, is a way for students to review their knowledge in the role of the interviewee, so I assume that most of the things you, as the interviewer, know.

www.processon.com/view/link/6…

This is the brain map address

Where t

Operating system is indeed more important, but it is also where we often ignore, today we will look at some common interview questions, but really to learn to understand or suggest system learning CSAPP, come on!

Then below is a summary of previous articles

  • 2021-Java Backend Engineer Interview Guide (Introduction)
  • 2021-Java Backend Engineer Interview Guide
  • 2021-Java Backend Engineer Interview Guide -(Concurrency – Multithreading)
  • 2021-Java Backend Engineer Interview Guide -(JVM)
  • 2021-Java Backend Engineer Interview Guide -(MySQL)
  • 2021-Java Backend Engineer Interview Guide -(Redis)
  • Java Backend Engineer Interview Guide -(Elasticsearch)
  • 2021-Java Backend Engineer Interview Guide -(Message Queue)
  • 2021-Java Backend Engineer Interview Guide -(SSM)
  • 2021-Java Backend Engineer Interview Guide (SpringBoot+SpringCloud)
  • 2021-Java Backend Engineer Interview Guide -(Distributed Theory +Zookeeper)
  • 2021-Java Backend Engineer Interview Guide -(Computer Networking)

What is an operating system?

  • Operating System (OS) is a program that manages computer hardware and software resources, and is the cornerstone of computer.
  • An operating system is essentially a software program that runs on a computer and is used to manage computer hardware and software resources. For example: All applications running on your computer call system memory, disk and other hardware through the operating system.
  • Operating systems have complexity that hides the hardware layer. The operating system is like the person in charge of the use of the hardware.
  • The Kernel is the core part of the operating system, which is responsible for the system memory management, hardware management, file system management and application management.

Tell us roughly what the Kernel does

  • The Kernel is the core part of the operating system, which is responsible for the system memory management, hardware management, file system management and application management.

  • The kernel of an operating system is the bridge between application programs and hardware, which determines the performance and stability of the operating system.

A quick talk about cpus

  • CPU is the Core + Control Unit of a computer, which can be called the brain of a computer.
  • CPU mainly consists of two parts: controller + arithmetic unit.
  • The CPU’s fundamental task is to execute instructions, which ultimately appear to a computer as a sequence of zeros and ones.

Talk about user and kernel, and our system calls

Ha ha, this little six six has been asked ah

The first is that both kernel and user modes are based on our process

According to the characteristics of process access to resources, we can divide the process running on the system into two levels:

  • User mode: a process running in user mode may read data directly from a user program.
  • Kernel mode: The process or program running in the kernel mode can access almost any resources of the computer without restriction.

Most of the programs we run are in user mode. What if we call the system-level subfunctions provided by the operating system? Then you need a system call!

That is to say, in the user program we run, all operations related to system-state level resources (such as file management, process control, memory management, etc.) must be put forward service requests to the operating system through system call, and completed by the operating system.

These system calls can be roughly divided into the following categories by function:

  • Device management: Requests, releases, and starts devices.
  • File management: Reads, writes, creates, and deletes files.
  • Process control: complete process creation, cancellation, blocking, wake up and other functions.
  • Process communication: Completes the function of message passing or signal passing between processes.
  • Memory management: Allocates and reclaims memory, and obtains the size and address of the memory occupied by a job.

We know about Java thread switching, and you know what the operating system is doing during thread switching.

As we all know, Linux is a multitasking operating system that allows tasks far larger than the number of cpus to run simultaneously. Of course, these tasks are not actually running at the same time, but because the system allocates cpus to them in turn for a very short period of time, creating the illusion of multitasking.

Before each task can be executed, the CPU needs to know where the task is loaded from and where it should be run from. In other words, the system needs to set up the CPU registers and program counters for it

CPU registers and program counters are THE CPU context, because they are the environment that the CPU must depend on before running any task.

  • A CPU register is a small but extremely fast memory built into the CPU.
  • Program counters are used to store where the CPU is executing an instruction, or where the next instruction will be executed.

Talk about Linux file systems

In Linux, all resources managed by the operating system, such as network interface cards, disk drives, printers, I/O devices, common files, or directories, are treated as one file. That said, there is an important concept in Linux: everything is a file.

Tell me about the inode

Inodes are the basis of Linux/Unix file systems. So, what is an inode? What does it do?

The minimum storage unit of a disk is Sector. A block consists of multiple sectors. File data is stored in blocks. The most common size of a block is 4KB, consisting of about eight consecutive sectors (each storing 512 bytes). A file may occupy multiple blocks, but a block can only hold one file.

We store files in blocks, but we also need a space to store metadata about the file, such as the location of each block, the owner of the file, the creation time, permission, size, etc. The area where the meta information is stored is called an inode (I (index) +node). Each file has an inode, which stores meta information about the file.

You can run the stat command to view the inode information of a file. Each inode has a number, and Linux/Unix operating systems do not use file names to distinguish files, but inode numbers to distinguish different files.

Inodes are used to maintain information about how many blocks a file is divided into, where each block is located, who owns the file, when it was created, permissions, and size.

A quick summary:

  • Inode: records file attributes. You can run the stat command to view inode information.
  • Block: The actual contents of a file. If a file is larger than one block, it will occupy multiple blocks, but a block can only hold one file. (Since data is pointed to by inodes, it would be confusing if two files were in the same block.)

How many states does a process have?

  • Create State (new) : The process is being created and has not yet reached the ready state.
  • Ready: The process is ready to run. That is, the process has all the resources it needs except the processor and can run once it gets the processor resources (the time slice allocated by the processor).
  • Running: Processes are running on the processor (only one process is running at any time on a single-core CPU).
  • Waiting: A state in which a process is suspended while waiting for an event, such as for a resource to become available or for an I/O operation to complete. The process cannot run even if the processor is idle.
  • Terminated: The process is disappearing from the system. The process may end or be interrupted for other reasons.

What are the ways in which processes communicate, and what is Java’s approach

  • Pipes/Anonymous Pipes: Used for communication between related parent or sibling processes.
  • Named Pipes: Anonymous Pipes, because they do not have Names, can only be used for communication between related processes. To overcome this shortcoming, a named pipe is proposed. Named pipes are strictly first in first out. Named pipes exist as disk files and can communicate with any two processes on the host.
  • Signal: A complex communication method used to inform the receiving process that an event has occurred.
  • Message queues are linked lists of messages in a specific format that are stored in memory and identified by Message queue identifiers. Communication data for pipes and message queues is first-in, first-out. And pipes (nameless pipes: files that only exist in memory; Named pipes: exist in the actual disk media or file system. The difference is that message queues are stored in the kernel and are not actually deleted until the kernel is restarted (i.e., the operating system is restarted) or a message queue is explicitly deleted. Message queues can be used to query messages randomly. Messages are not necessarily read in first-in, first-out order, but can also be read by message type. More advantages than FIFO. Message queue overcomes the deficiency of signal carrying less information, pipe carrying only unformatted byte stream and limited buffer size.
  • Semaphores: A Semaphores is a counter used by multiple processes to access shared data. The Semaphores are intended for interprocess synchronization. This type of communication is mainly used to solve problems related to synchronization and avoid race conditions.
  • Shared memory: Multiple processes can access the same memory space, and different processes can see the data updates in the Shared memory made by other processes in a timely manner. This approach relies on some kind of synchronous operation, such as mutex and semaphore. This is arguably the most useful form of interprocess communication.
  • Sockets: This method is used to communicate between a client and a server over a network. Socket is the basic operation unit supporting TCP/IP network communication, can be seen as the endpoint of two-way communication between the process of different hosts, simply speaking, is a convention of the two sides of communication, with the socket in the related function to complete the communication process.

CPU addressing? Why a virtual address space?

Modern processors use a method of Addressing called Virtual Addressing. With virtual addressing, the CPU needs to translate virtual addresses into physical addresses in order to access real physical memory. The actual hardware that does this is a piece of hardware in the CPU called the Memory Management Unit (MMU)

Exposing physical addresses can cause serious problems, such as possible damage to the operating system and difficulty in running multiple programs at the same time.

Using virtual addresses to access memory has the following advantages:

  • A program can use a series of contiguous virtual addresses to access large memory buffers that are not contiguous in physical memory.
  • A program can use a series of virtual addresses to access memory buffers larger than the available physical memory. When the supply of physical memory becomes small, the memory manager saves physical memory pages (typically 4 KB in size) to disk files. Data or code pages are moved between physical memory and disk as needed.
  • The virtual addresses used by different processes are isolated from each other. Code in one process cannot change physical memory that is being used by another process or the operating system.

Talk about the Unix IO model

In Linux, for an IO access (read, for example), data is copied to the operating system kernel’s buffer before being copied from the operating system kernel’s buffer to the application’s address space. So, when a read operation occurs, it goes through two phases:

  • Waiting for the data to be ready
  • Copying the data from the kernel to the process

As a result of these two phases, Linux system produces the following five network mode schemes:

  • Blocking IO model
  • Noblocking IO model
  • IO multiplexing model
  • Signal-driven IO Model
  • Asynchronous IO Model

Let’s talk about each of these IO models

Blocking IO model

In Linux, all IO operations are blocking by default. A typical read operation would look like this:

When the recvfrom system call is made by a user process, the kernel begins the first phase of IO: preparing data. (For network IO, there are many times when data does not arrive in the first place. For example, a complete UDP packet has not been received. The kernel waits for enough data to arrive), and the data is copied into the buffer of the operating system kernel, which requires a process of waiting. On the user side, the entire process is blocked (by the process’s own choice, of course). When the kernel waits until the data is ready, it copies the data from the kernel to a buffer in user-space. Then the kernel returns the result, and the user process unblocks and starts running again.

So, blocking IO is typically blocked in the next two phases of IO execution.

  • Waiting for the data to be ready
  • Copying the data from the kernel to the process

Nonblocking I/O (NONblocking IO)

On Linux, you can set the socket to make it non-blocking. With Java you can do this:

InetAddress host = InetAddress.getByName("localhost");
Selector selector = Selector.open();
ServerSocketChannel serverSocketChannel = ServerSocketChannel.open();
serverSocketChannel.configureBlocking(false);
serverSocketChannel.bind(new InetSocketAddress(hos1234));
serverSocketChannel.register(selector, SelectionKey.OP_ACCEPT);
Copy the code

The socket set to NONBLOCK tells the kernel not to put the process to sleep when the requested I/O cannot be completed, but to return an error code (EWOULDBLOCK) so that the request will not be blocked.

When the recvfrom system call is made by the user process, if the data in the kernel is not ready, it does not block the user process and immediately returns an EWOULDBLOCK error. From the user process’s point of view, when it initiates a read operation, it does not wait, but gets a result immediately. When the user process determines that the result is an EWOULDBLOCK error, it knows the data is not ready and can send read again. Once the kernel is ready and receives a system call from the user process again, it copies the data to the user-space buffer and returns.

As you can see, the I/O operations will continuously test if the data is ready, and if not, continue polling until the data is ready. In the whole I/O request process, although the user thread can immediately return after each I/O request, it still needs to continuously poll and repeat the request to wait for data, consuming a large amount of CPU resources.

As a result, the non-blocking IO feature requires the user process to constantly ask the kernel for data:

  • Waiting for the data to be ready “non-blocking”
  • Copying the data from the kernel to the process

This model is rarely used directly, but the non-blocking I/O feature is used in other I/O models. This approach makes little sense for individual I/O requests, but paves the way for I/O multiplexing.

I/O multiplexing

IO multiplexing is often called select poll epoll, some places also called this IO mode event driven IO. The advantage of select/epoll is that a single process can handle THE IO of multiple network connections simultaneously. The basic principle is that the select, poll, and epoll functions poll all sockets and notify the user process when data arrives on a socket.

When a user process calls a SELECT, the entire process is blocked, and the kernel “monitors” all select sockets, returning when data is ready in any socket. The user process then calls the read operation to copy data from the kernel to the user process.

So, I/O multiplexing is characterized by a mechanism whereby a process can wait for multiple file descriptors at the same time, and select() returns when any one of these file descriptors (socket descriptors) is read ready.

The graph is not that different from the blocking IO diagram, in fact it is much less efficient because of the I/O multiplexing and the extra operation of adding monitoring sockets and calling select functions. It’s worse. Because two system calls (select and recvfrom) are required, blocking IO calls only one system call (recvfrom). However, the biggest advantage of using SELECT is that the user can process I/O requests on multiple sockets simultaneously in a single thread. The user can register multiple sockets, and then repeatedly call SELECT to read the activated socket, to achieve the purpose of processing multiple I/O requests in the same thread. In the synchronous blocking model, multithreading is necessary to achieve this goal.

So, a Web server using SELECT /epoll does not necessarily perform better than a Web server using multi-threading + blocking IO and may have even greater latency if the number of connections being processed is not very high. The advantage of select/epoll is not that it can process individual connections faster, but that it can process more connections.

In IO multiplexing Model, in practice, for each socket, set become non – blocking commonly, however, as shown in the above, the whole process of user is actually has been block. The process is blocked by the select function rather than by the socket IO to the block.

So for the IO multiplexing model: – Waiting for the data to be ready “blocked”

  • Copying the data from the kernel to the process

Asynchronous I/O (Asynchronous IO)

Next let’s look at the asynchronous IO flow in Linux:

As soon as the user process makes the aio_read call, it can start doing other things. On the other hand, from the kernel’s perspective, when it finds an asynchronous read, it first returns immediately, so no blocks are generated for the user process. The kernel then waits for the data to be ready and copies the data to the user’s memory. When this is done, the kernel sends a signal to the user process telling it that the read operation is complete.

The asynchronous I/O model implements this mechanism using the Proactor design pattern.

So for the asynchronous IO model:

  • Waiting for the data to be ready “non-blocking”
  • Copying the data from the kernel to the process “Non-blocking”

Signal-driven IO Model

First we allow the socket to do signal-driven I/O and install a signal handler. The process continues to run without blocking. When the data is ready, the process receives a SIGIO signal and can process the data by calling the I/O operation function in the signal handler function.

But this IO module is not used very much, so I won’t go into it here.

The difference between SELECT, poll, and epoll

Select ==> time complexity O(n) it only knows that an I/O event occurred, but it does not know which streams (there may be one, more, or even all of them). We have to poll all streams indiscriminately to find the streams that can read or write data, and then operate on them. So select has O(n) undifferentiated polling complexity, and the more streams that are processed at the same time, the longer the undifferentiated polling time.

Poll ==> Time complexity O(n) Poll is essentially no different from SELECT in that it copies arrays passed in by the user into kernel space and then queries the device state for each FD, but there is no limit to the maximum number of connections because it is stored based on linked lists.

Epoll ==> Time complexity O(1) Epoll can be interpreted as an event poll. Different from busy polling and undifferentiated polling, epoll notifies us of the I/O events generated by the flow. So we say that epoll is actually event-driven (each event is associated with a FD) and that our operations on these streams make sense. (complexity reduced to O(1)) SELECT, poll, and epoll are all mechanisms for I/O multiplexing. I/O multiplexing uses a mechanism to monitor multiple descriptors, and once a descriptor is ready, either read or write, the program can be told to read or write accordingly. But SELECT, poll, and epoll are all synchronous I/ OS in nature, because they need to do the reading and writing themselves after the read and write event is ready, that is, the reading and writing process is blocked, whereas asynchronous I/O does not need to do the reading and writing itself. The implementation of asynchronous I/O takes care of copying data from the kernel to user space.

At the end

Ha ha, the operating system of things can be more than this, but when the interview, these ask more, in fact, we should pay more attention to the accumulation of the foundation, we refueling TOGETHER B station csAPP recommendation. Ha ha

Daily for praise

Ok, everybody, that’s all for this article, you can see people here, they are real fans.

Creation is not easy, your support and recognition, is the biggest motivation for my creation, we will see in the next article

Six pulse excalibur | article “original” if there are any errors in this blog, please give criticisms, be obliged!