Original video address: www.youtube.com/watch?v=Ftg…

Overview of the content of this article:

  • File descriptor
  • Operating System Basics
  • File descriptors and processes
  • Network and sockets

File descriptor

On Unix, Everything is a file. For example, /dev, /proc, etc. You can get almost all the data your system can get by manipulating files. To interact with files, you need file descriptors (fd, which is a number unique within a process).

You can do this with a file.

Figuratively speaking, file descriptors are like a ticket that you hand to the operating system every time you need to do something, and the operating system does it for you.

There are several standard file descriptors defined by the system:

  • 0: standard input, corresponding to Pythonsys.stdin
  • 1: indicates standard outputsys.stdout
  • 2: indicates standard error outputsys.stderr

Then let’s reinforce it with a few examples:

with open('example.txt') as f:
    print(f.read())
Copy the code

The above application uses two fd’s, one created by open and the other by print which uses the fd of standard output, i.e. 1.

You can also change the file permissions according to fd:

with open('example.txt') as f:
    stat = os.stat(f.fileno())
    os.chmod(f.fileno(), 0o640)
Copy the code

You can even control the hardware: the following code can be used to eject CDROM.

Operating System Basics

In the dark Middle Ages… Applications can control the hardware directly, which means that if they crash, the entire system will crash and the computer will need to be restarted.

Now, thanks to the isolation effect of the operating system, applications cannot access hardware directly and must request access to hardware resources from the operating system through system calls. The operating system has a permission management mechanism and can choose not to accept your request.

syscall

Each time syscall is run, the user space is transferred to the kernel space, and the user space is transferred back to the user space. This means that context switch is required, so the speed is relatively low (the number of clock cycles required is in the hundreds of orders of magnitude).

Back to the previous example:

# read.py
with open('example.txt') as f:
    print(f.read())
Copy the code

There is a tool to see if a program has called syscall:

$ strace ./read.py
Copy the code

Open file tables and FD tables

Open File table is a global table where all open files of processes are located. Each process has its own FD table, which actually points to the open file table:

The same process opens the same file for the second time, but the fd is different.

The contents of a FD table are simple, and the actual detailed data is stored in an Open File table.

fork()

Fork creates a child. The child inherits the parent’s FD and copies the parent’s memory by copying opn write. Inheriting fd means:

After pid = os.fork(), the child inherits the parent’s file descriptor: f above. They point to the same file in the open file table, which means their offsets are the same:

exec()

Replacing the current program’s memory, fd is also inherited unless cloexec is set. The child process inherits the parent process’s FD, which is equivalent to leaking the parent process’s FD to the child process, which will bring some security risks. So a pwp446 proposal was drafted to provide an option to close the parent FDS: the subprocess module has a close_fds parameter, which is set to True to indicate that all FDS except 0,1, and 2 are closed when exec is called. For more information, see the pep446 article and the official subprocess documentation.

Network and sockets

Network communication requires a pair of network sockets, one running on the server and the other on the client. The socket process on the server side is usually bound to a specific port, such as 80,3306, 5432, etc. It will always listen on this port, after listening to the request to trigger the corresponding logic. The result is finally returned to the client. The socket on the client side is usually temporarily bound to a high segment port, and when the request ends (the process ends) the list is cancelled, freeing up precious port resources.

The socket module in Python needs to specify two parameters: IPV4 or IPV6, UDP or TCP? A simple server example:

The client example:

unix sockets

Unix sockets and Network sockets are essentially the same, except that they only allow processes on the same machine to communicate. One side of a UNIX domain socket knows that the other process is on the same machine, so it does not need operations such as validating measures, routing operations, etc., so it is relatively faster and lighter.

Here’s how it works in Python: Bind and connect are now files instead of IP addresses and port numbers. Because it is a file, so you can set read and write permissions! For example, you can restrict reading from the socket to a specific group of users.

If you love computer science and basic logic like me, welcome to follow my wechat official account: