This is the 11th day of my participation in the August More Text Challenge

Operating system interview questions have been common in programmer interview, is a very important part of the content.

So, I needle operating system of the interview questions to do a series of collation, I referred to a large number of network resources and classic reference books, for the specific interview questions to speak clearly, emphasize understanding memory, refused to eight essay!! . This article is not only a summary of my own learning, but also hope to help friends who are reviewing the interview.

This article is divided into processes and threads, memory management, IO, and Linux.

Processes and threads

What are processes and threads

  • Process is the encapsulation of the runtime program and the basic unit of system resource scheduling and allocation
  • Thread is a subtask of a process, the basic unit of CPU scheduling and allocation, and the realization of intra-process concurrency
  • A process can contain multiple threads that depend on the process and share process memory

Processes are the smallest unit of resource allocation and threads are the smallest unit of CPU scheduling

  • All process-related resources are recorded in the PCB
  • Process is the scheduling unit of preemption processor. Threads belong to a process and share its resources
  • Threads consist only of stack registers, program counters, and TCBS

Conclusion:

  • Threads are not considered stand-alone applications, whereas processes are considered stand-alone applications
  • Processes have independent address Spaces that do not affect each other, and threads are just different execution paths for processes
  • Threads have no independent address space, and multi-process programs are more robust than multithreaded programs
  • Process switching is more expensive than thread switching

How many states does a process have?

Processes are generally divided into roughly five states, much like threads!

  • Create State (new) : The process is being created and has not yet reached the ready state.
  • Ready: The process is ready to run. That is, the process has all the resources it needs except the processor and can run once it gets the processor resources (the time slice allocated by the processor).
  • Running: Processes are running on the processor (only one process is running at any time on a single-core CPU).
  • Waiting: A state in which a process is suspended while waiting for an event, such as for a resource to become available or for an I/O operation to complete. The process cannot run even if the processor is idle.
  • Terminated: The process is disappearing from the system. The process may end or be interrupted for other reasons.

Thread safety

  • An operation can be used safely in a multithreaded environment to get the correct result
  • Thread-safe operations are like threads executing sequentially rather than concurrently (I +=1).
  • In general, if you have write operations you have to think about how do you make it safe for multiple threads to access the data

How threads are synchronized

Thread synchronization is the concurrent execution of two or more threads that share a critical resource. Threads should be synchronized to avoid critical resource usage conflicts. Operating systems generally have the following three ways of thread synchronization:

  1. Mutex: Using the Mutex mechanism, only the thread that owns the Mutex has access to a common resource. Because there is only one mutex, it is guaranteed that common resources will not be accessed by multiple threads at the same time. For example, the Synchronized keyword in Java and various types of Locks are examples of this mechanism.
  2. Semphares: This allows multiple threads to access the same resource at the same time, but requires control over the maximum number of threads that can access the resource at the same time
  3. Event :Wait/Notify: The Wait/Notify operation is used to keep the synchronization of multiple threads and facilitate the comparison of priorities of multiple threads

The way in which processes communicate

Inter-process Communication Transmits signals or data between processes

  • Pipes/Anonymous Pipes/Named Pipes (Pipe)
  • Signal: For example, the user uses Ctrl+ C to generate a SIGINT program termination Signal
  • Message queue
  • Shared memory
  • Semaphores (Semaphore)
  • Socket: The most common method, which is used in all of our Web applications

More detailed version:

  1. Pipe: Slow, limited capacity, only parent and child processes can communicate
  2. FIFO: Any process can communicate with each other, but the speed is slow
  3. Message queue: Capacity is limited by the system, and be aware that the first time you read, you need to consider the last time you did not read the data
  4. Semaphore: cannot transmit complex messages, can only be used for synchronization
  5. Shared memory area: can easily control capacity, fast speed, but keep synchronization, such as a process at the time of writing, another process to pay attention to the problem of reading and writing, equivalent to the threads in a thread safety, of course, the Shared memory area can also be used for communication between threads but it isn’t necessary, already Shared the same process between threads within a block of memory

References:

  • www.cnblogs.com/zgq0/p/8780…
  • www.jianshu.com/p/c1015f5ff…

Process scheduling algorithm

To determine which process to execute first and which process to execute last for maximum CPU utilization, computer scientists have defined algorithms that are:

  • First come, first served (FCFS) scheduling algorithm: Select the first process to enter the ready queue and allocate resources to it. Make it execute immediately and continue to execute until it completes or is blocked by some event and gives up CPU usage.
  • Short Job First (SJF) scheduling algorithm: select a process with the shortest estimated running time from the ready queue, allocate resources to it, make it execute immediately and continue to execute until the completion or occurrence of some event is blocked to give up occupying CPU and then reschedulation.
  • Time slice scheduling algorithm: Time slice scheduling is the oldest, simplest, fairest, and most widely used algorithm. It is also called Round Robin (RR) scheduling. Each process is assigned a period of time, called its slice, which is the amount of time the process is allowed to run.
  • Multi – stage feedback queue scheduling algorithm: several process scheduling algorithms introduced previously have certain limitations. For example, the short process first scheduling algorithm only takes care of the short process and ignores the long process. Multi-level feedback queue scheduling algorithm can not only make the high priority job get response but also make the short job (process) complete quickly. Therefore, it is currently recognized as a better process scheduling algorithm, UNIX operating system is adopted by this scheduling algorithm.
  • Priority scheduling: Assigning a priority to each process, executing the process with the highest priority first, and so on. Processes with the same priority are executed in FCFS mode. Priorities can be determined based on memory requirements, time requirements, or any other resource requirements.

How do you use multithreading in Python

  • The threading.thread class is used to create threads
  • The start() method starts the thread
  • You can use join() to wait for the thread to terminate

How to use multiple processes in Python

Python has a GIL and can implement CPU-intensive programs with multiple processes

  • Multiprocessing Multiprocess module
  • The Multiprocessing. Process class implements multiple processes
  • Usually used in CPU-intensive applications to avoid GIL effects

Memory Management

What is paging

todo

What is segmentation

Segmentation is to satisfy some logical requirements of the code

The difference between paging and segmentation

todo

What is virtual memory

By putting some of the temporarily unused memory information onto the hard drive

  • Locality principle, the program run time only part of the necessary information into memory
  • The contents that are not needed in memory are transferred to the hard disk
  • The system seems to provide much more capacity than actual memory, called virtual memory

What is memory jitter?

The essence is frequent page scheduling behavior

  • Frequent page scheduling, the process continues to produce page – missing interruption
  • Replace a page and need the page again and again
  • Running too many programs; Bad page replacement strategy. Terminate the process or increase physical memory

Python’s garbage collection mechanism

  • Reference technology based (disadvantage: Circular reference can’t be solved)
  • The problem of reference counting is solved by introducing tag clearing and generational collection
  • Reference counting is the primary plus token clearing and generational collection are the secondary

IO article

How to improve concurrency

Some common ways to improve concurrency

  • Multithreaded model, creating new threads to handle requests
  • Multi-process model, creating new processes to handle requests
  • IO multiplexing, a single process at the same time processing multiple socket requests
  • Thread/process creation is expensive and can be solved by thread pool
  • Threads and processes are resource-intensive and difficult to create too many at once

What is IO multiplexing

A mechanism provided by the operating system to listen on multiple sockets at the same time

  • To achieve high concurrency you need a mechanism to process multiple sockets concurrently
  • Select /poll/epoll is common in Linux
  • You can handle multiple sockets using a single thread and process

Two processes:

  • Kernel wait data
  • Data is copied from the kernel to the user process

BIO,NIO,AIO

  • BIO: Synchronous blocking call
  • NIO: Synchronous non-blocking. (Widely used in Java)
  • AIO: asynchronous I/O.

BIO(Blockio):InputStream and OutputStream, Reader and Writer

NIO(nonblock-IO): Builds multiplexed, synchronous, non-blocking IO operations

  • nio-channels
  • nio-buffer
  • nio-selector
  • The underlying layer of NIO uses IO multiplexing

AIO: Based on event and callback mechanisms

BIO, NIO, AIO

Attribute \ model Blocking the BIO Non-blocking NIO Asynchronous AIO
blocking Block and synchronize Non-blocking but synchronous Non-blocking and asynchronous
The number of threads 1:1 1:N 0:N
The complexity of the simple More complicated complex
throughput low high high

Five IO models for operating systems

The first four are all synchronous

If you go to a restaurant and have dinner, you can go do something else. Come back to the restaurant from time to time. If you go to a restaurant and there is a waiter, he will let you in when there is a waiter. After going to the restaurant to have a meal to get a number can go to do something else to you directly wechat to play the message for you to asynchronous IO model: you go to sign a to do something else to do a good job directly looking for you to send over

Differences between SELECT and epoll

  • Upper limit of the number
  • Polling or callback

Select the model

  • Application — select — unblock — loop through to find connections that receive read signals — socket recv
  • Select defect: 1024 array limit, traversal poll trigger to find socket connections with real signals

Epoll model

  • Epoll improvement: There is no limit to the number of arrays 1024. Asynchronous callback to perform handler operations
  • The application — epoll, socket connects to handler — triggers the handler callback mechanism — handler

Epoll performs much better than SELECT. Nginx is the Epoll model

Select, epoll, epoll

  • Supports the maximum number of connections a process can open.
    • Select: The maximum number of connections that a single process can open is defined by the FD_SETSIZE macro, which is the size of 32 integers (32 on 32-bit machines)32, FD_SETSIZE is 32 on 64-bit machines64), we can modify it and recompile the kernel, but performance is not guaranteed and further testing is required
    • Poll: Essentially no different from SELECT, but it has no limit on the maximum number of connections because it is stored based on linked lists
    • Epoll: Although the number of connections is capped, it is very large, about 100,000 connections can be opened on a machine with 1 GB of memory
  • IO efficiency problems caused by FD surge
    • Select: Because the connection is traversed linearly on each call, the performance problem is a linear decrease in traversal speed as FD increases
    • Poll: same as above
    • Epoll: Since epoll is implemented according to the callback function on each FD, only active sockets will actively call callback, so there is no linear performance problem with epoll in the case of fewer active sockets, but in the case of all active sockets, There may be performance issues.
  • Message passing mode
    • Select: The kernel needs to pass the message to user space, which requires the kernel copy action
    • Poll: same as above
    • Epoll: The kernel and user space share the same memory to achieve high performance

Linux article

What is an operating system

  1. Operating System (OS) is a program that manages computer hardware and software resources, and is the cornerstone of computer.
  2. An operating system is essentially a software program that runs on a computer and is used to manage computer hardware and software resources. For example: All applications running on your computer call system memory, disk and other hardware through the operating system.
  3. Operating systems have complexity that hides the hardware layer. The operating system is like the person in charge of the use of the hardware.
  4. The Kernel is the core part of the operating system, which is responsible for the system memory management, hardware management, file system management and application management. The kernel is the bridge between the application and the hardware and determines the performance and stability of the system.

Linux Architecture

  • The architecture is mainly divided into user mode (user upper-layer activities) and kernel mode
  • Kernel: Essentially a program that manages a computer’s hardware (CPU, memory, hard disk, network)
  • System call: kernel access interface, an operation that can be further simplified
  • Common libraries: a combination of system calls
  • Shell: Command interpreter, programmable

What is a system call

Before introducing system calls, let’s take a look at user and system states.

According to the characteristics of process access to resources, we can divide the process running on the system into two levels:

  1. User mode: a process running in user mode may read data directly from a user program.
  2. Kernel mode: The process or program running in the kernel mode can access almost any resources of the computer without restriction.

Now that we’ve talked about user and system states, what is a system call?

Most of the programs we run are in user mode. What if we call the system-level subfunctions provided by the operating system? Then you need a system call!

That is to say, in the user program we run, all operations related to system-state level resources (such as file management, process control, memory management, etc.) must be put forward service requests to the operating system through system call, and completed by the operating system.

These system calls can be roughly divided into the following categories by function:

  • Equipment management. Request or release the device, and start the device.
  • File management. Read, write, create, and delete files.
  • Process control. Complete the process creation, cancellation, blocking, and wake up functions.
  • Process communication. Completes functions such as messaging or signaling between processes.
  • Memory management. The system allocates and reclaims memory, and obtains the size and address of the memory occupied by a job.

Common Linux Commands

  • top
  • ps -ef
  • jobs
  • pgrep
  • meminfo
  • free
  • vmstat
  • df
  • du
  • netstat
  • route
  • Lsof -i: 8000 Indicates the process that occupies port 8000

Learning skills

  • –help
  • Man: Check the help manual

Common Shell Commands

How do I find a particular file

find

find path [options] params

Function: Searches for files in the specified directory

find -name “target3.java”

The current directory recursively looks for the directory

find / -name “target3.java”

Find the file recursively from the root directory

find ~ -name “target*”

Fuzzy query

find ~ -iname “target*”

Ignore case

man find

Retrieving file contents

grep

grep [options] pattern file

Look for strings in the file that match the criteria

grep “mooc” target*

The pipe operator |

Instructions can be concatenated with the output of one instruction as the input of the next

find ~ | grep “target”

Points to note when using pipes:

  • Only the correct output of the previous command is processed, not the error output
  • The command on the right must be able to receive the standard input stream, otherwise the data will be discarded during transmission
  • Sed, awk, grep, cur, head, top, less, more, wc, join, sort, split, etc

grep ‘xxx[true]’ xx.log | grep -o ‘engine[[0-9a-z]*]’

ps -ef | grep tomcat

Ps – ef | grep tomcat | grep -v “grep” filter contains grep

Collect statistics on file contents

awk

Data extraction and statistics

Replace file contents in batches

sed

Suitable for processing the line content of text

The resources

  • Xiao Lin coding
  • www.cyc2018.xyz/%E8%AE%A1%E…
  • Segmentfault.com/a/119000000…
  • A good article about Java NIO and IO model text. www.cnblogs.com/ljl150/p/12…
  • Meituan technical team,NIO analysis. Tech.meituan.com/2016/11/04/…