The kernel must understand (2): file system exploration

preface

This time, file systems. The file system is very important, improve disk utilization, reduce disk wear and so on are the file system to solve the problem. There are numerous file systems on the market, such as ext4, XFS and NTFS, etc., commonly used in China, such as TFS of Goose Factory, and then Sun called “last Word in File System “ZFS, BTRFS derived from ZFS.

The previous architecture diagram of Linux file system components was drawn by combining my own experience with various literature. As you can see, the most important is VFS, which allows Linux to support multiple file systems at once. For example, if you install mint+ Windows, in Mint, you can see Windows NTFS disk, but if you return to Windows, you can’t see Mint disk.

What file systems does Linux support? Go to the fs folder of the source code, Linux can support many file systems to go to, notice the blue.

File system structure

Not to mention disk sectors. Maybe there will be an article about storage media, SSD structure or something. Skip the hardware and start with the file system structure. The ext family is the default FS on Linux, and in fact ext2/ext3 is quite different from ext4.

Superblock: Records the overall information of the FS, including the total number of inodes/blocks, usage, and remaining amount, as well as the format and related information of the file system.

Inode table: After the superblock, the inode table stores all inodes.

Data block: data block after inode table. The contents of the file are stored in this area. All blocks on the disk are the same size.

Inode: records the attributes of the file and the block number of the file. Each inode corresponds to a file/directory structure, which contains information about a file’s length, creation and modification time, permissions, ownership, and location on disk.

Block: actually records the contents of the file. A large file can easily be distributed among thousands of unique disk blocks, and it is usually not contiguous. If it is too fragmented, read/write performance will decrease dramatically.

Ok, I guess you’re right-brained like me, here’s a good one:

As you can see, this is a multi-level indexed file system. Inode tables point to inodes, and inodes point to one or more blocks. Note that the diagram still points directly, and we’ll talk about multi-layer pointing later. The biggest fear is that the block to which the inode points is too scattered. A better solution is to keep adding data to the end of the file rather than creating new files.

Create a file and inode

Create a new file and folder and view the file information using the stat command.

touch hello
stat hello
mkdir hellodir
stat hellodir
Copy the code

You can see some information. For example, the initial size of a directory is 4KB, eight blocks, one sector is 512 bytes, and one IO block is 4KB, corresponding to the General block Device Layer in the first figure. You don’t have to look at this, if this is regular FS.

File Creation process

There are 4 steps to successfully create a file:

Storage properties: The storage of file properties. The kernel first finds an empty inode. For example, the previous 1049143. Kernel logs file information in it. The size, owner, and creation time of the file can be seen using the stat command.

Storing data: The storage of the contents of a file, such as a 1B file, a block, 8 sectors, the kernel puts the data into a free logical block, i.e., a free block. Obviously, the fragmentation problem is already present here. 1B is going to be in 4K, right?

Record allocation: If the data is stored in three blocks, the location is recorded in the disk sequence number list in the inode. These three numbers are in the first three positions. And then read it all at once, as you can see in my second picture. Fat, of course, does not have inodes. It places the position of the next block in a block, forming a chain. Usb disks are fs.

Add a file name to a directory: The relationship between the file name and the inode connects the file name to the file and its content properties. Find the file name to find the inode of the file. In other words, there is a link between what a machine sees and what a person sees, such as a web address and an IP address. Of course, if you can distinguish files by looking at inodes, step 4 can be removed.

With catalogs, it’s a lot. File (pointing to himself),.. File (pointing to the parent directory). Then add your own inode to the parent directory. Just look at the picture.

The inode parsing

The df command is used to see the total number and usage of inodes.

df -i
Copy the code

Dumpe2fs open the specified disk to see the inode size, in this case 256.

How do inodes record files and what is the maximum? The inode is defined as 12 direct, one indirect, one double indirect, and one triple indirect record areas. An inode is 4B, so with 4K blocks you can have 1K inodes.

Direct: 12 * 4K

Indirect: (4K / 4) * 4K

Dual indirect: (4K / 4) * (4K / 4) * 4K

Three indirect: (4K / 4) * (4K / 4) * (4K / 4) * 4K

So, 4T, are you OK? After all, FS continues to grow, and this size is outdated. Single files can reach 16 terabytes on ext4, and fs can reach 1 exabyte. But note that the authors of Ext4 have said that ext4 is a transition, that BTRFS is going to be better, and that XFS for Cent OS is actually pretty good, too.

Open the file

After you create it, of course, you have to open it. Let’s start with two instructions:

sysctl -a | grep fs.file-max
Copy the code

ulimit -n
Copy the code

The first command looks at the maximum number of OS openings, which is a system-level limit.

The second instruction looks at the maximum number of open processes for a single process, which is a user-level limit.

Process descriptor (task_struct):

To manage processes, the operating system needs a clear description of what each process does. To do this, the operating system uses a data structure to represent the different entities it processes. This data structure is commonly known as a process descriptor or process control block (PCB). In plain English, the structure of the operating system that describes the process is called PCB.

The Linux kernel manages processes through a task_struct structure called a process descriptor, which contains all the information a process needs. It is defined in the include/ Linux /sched.h file. That’s not the focus of this time, but the task_struct structure is important and complex.

Each process is assigned a task_struct structure, which contains all information about the process and can be tracked by the operating system at any time.

File descriptor table (file_struct): This table records the files opened by the process. Its entry contains a pointer to an entry in the file table stored in kernel space. It provides the user with a simple file descriptor (FD), enabling the user to easily access a file. For example, when a process uses open to open a file, the kernel adds an entry to the table. If you open the same file more than once, there will be multiple entries. When dUP is used, an entry is also added.
File table: The file table holds the offset of the file read and write by the process. This table also holds access to files for processes, and so on. For example, if a process opens a file in O_RDONLY mode, this is recorded in the corresponding file table entry. Each table then has a pointer to the inode in the inode table. Given the previous picture, all the structures are connected, so the inode is the core point.

Above:

In process A, file descriptors 1 and 2 both refer to the same open file table A. This can be done by calling dup(), dup2(), FCNTL (), or by calling open() multiple times on the same file.

File descriptor 0 for process A and file descriptor 2 for process B both refer to the same open file table A. This can happen after A fork() call (that is, processes A and B are parent-child), or when one process passes an open file descriptor to another through A UNIX domain socket. In addition, different processes independently call open to open the same file, and the internal descriptors of the process are allocated to the same descriptors as those of other processes that open the file.

In addition, descriptor 0 for process A and descriptor 255 for process B point to different open file tables, but these file tables point to the same entry (presumably) in the inode table, that is, to the same file. This happens because each process individually makes an open() call to the same file. A similar situation occurs when the same process opens the same file twice.

Why are we talking about these cases? If you don’t understand this, multi-process and multi-threaded read and write can lead to read and write chaos.

reference

Read a lot of great articles and share them here.

Blog.csdn.net/yuexiaxiaox…

www.ibm.com/developerwo…

C.biancheng.net/cpp/html/27…

www.ruanyifeng.com/blog/2011/1…

www.cnblogs.com/xiaojiang10…

Blog.csdn.net/cywosp/arti…

Blog.csdn.net/luotuo44/ar…

Blog.csdn.net/MBuger/arti…

Blog.csdn.net/u014379540/…

zhuanlan.zhihu.com/p/34280875

Blog.csdn.net/gatieme/art…

Gityuan.com/2017/07/30/…

The last

The inode is the core point at which the file system is dissected structurally. Of course, there will be two or more subsequent articles, and finally a simple user-mode file system. Please like me or follow me

The kernel must understand (2): file system exploration

directory

preface

File system structure

Create a file and inode

File Creation process

The inode parsing

Open the file

reference

The last

Related Posts

MyBatis dynamic SQL

Introduction to C++ data types

LeetCode 160: Intersection of Two Linked Lists