Linux file system details

Everything in Linux is a file

All sorts of things in Linux like documents, directories (called folders in Mac OS X and Windows), keyboards, monitors, hard drives, removable media devices, printers, modems, virtual terminals, Input/output resources such as interprocess communication (IPC) and network communication are byte streams defined in file system space. Everything can be viewed as a file, and the most significant benefit is that you only need the same set of Linux tools, utilities, and apis for the input/output resources listed above. You can use the same set of apis (read, write) and tools (CAT, redirect, pipe) to handle most resources in Unix. The ultimate goal of designing a system is often to find atomic operations. Once the atomic operations are locked down, the design becomes simple and orderly. As an abstract concept, “file” has very simple atomic operations, only read and write, which is a very good model. Through this model, API design can be simplified, users can use a common way to access any resource, own the corresponding middleware to do the adaptation of the bottom layer. In order to solve the problem that information can be stored independently of the process for a long time, modern operating systems have introduced files, which can be used concurrently by multiple processes as logical units of information created by processes. On UNIX systems, the operating system designs a common set of apis for I/O operations such as text and images on disk, input devices such as mouse and keyboard, and network interactions, all of which can be processed in a uniform byte stream manner. In other words, everything except processes on UNIX systems is a file, while Linux retains this feature. To facilitate file management, Linux also introduced the concept of directories (sometimes called folders). Directories enable files to be managed by category, and the introduction of directories enables Linux file systems to form a hierarchical directory tree

In Linux, everything is a file, and understanding the file system is an essential prerequisite for learning Linux

The file system on Linux is generally referred to as EXT2 or EXT3, but this article is not going to cover them directly in the beginning. Instead, we want to combine the Linux operating system with the hard disk, which is the foundation of the file system, and get to know the Linux file system step by step.

1. Physical storage mechanism of a mechanical disk

Most of the file storage function of modern computers is provided by mechanical hard disks. (Today’s SSDS and flash are partially conceptually and logically inherited from mechanical hard drives, so it’s okay to use them to understand.)

The mechanical hard disk can realize the function of information storage based on: magnetic storage medium can be magnetized, and after magnetization will remain magnetized state for a long time, this magnetized state can be read, at the same time, this magnetization state can be constantly modified, magnetization just has two directions, so it can represent 0 and 1. So the hard disk is the magnetic storage medium into a disk, each disk is distributed on a large number of magnetic storage units, the use of magnetic read-write head to write and read the disk (in principle, similar to the vinyl record playing).

A drive in the hundreds of millions of magnetic storage unit (1 t drive about 8 billion), so I need a set of rules to plan how information access (such as a book store information we will be divided into pages, each page read from left to right, from top to bottom and chapter directory) and there was the physical and logical concept:

A hard disk is composed of multiple stacked disks, the different disks are numbered

The storage particles on each disk are arranged in numbered circles called tracks

Each track has a circle of storage particles, each 512 x 8 (512 bytes, 0.5KB) as a sector, which is the smallest physical unit of storage on a hard disk

N sectors can form clusters. N depends on the file system or the configuration of the file system. A cluster is the smallest storage unit in a file system

The same track on all disk surfaces forms a cylinder, called a cylinder, which is the smallest unit of system partition

When a file is read or written by a magnetic head, it is first read or written by partition. The inode number is used to find the corresponding track and sector, and then it is read or written cylinder by cylinder. Mechanical hard disk read and write control system is a breathtaking precision engineering (on a disk with hundreds of millions of a storage unit, each track width less than a few nanometers, disk tens of thousands of revolutions per minute), and logic is also has a lot of details about the reading and writing (such as the serial number of sectors are not continuous), is very interesting, can search the article reading.

Having a hard disk doesn’t mean LInux can immediately use it for storage; it also needs to be integrated into LInux’s filesystem before LInux can use it.

2.Linux file system

Linux manages data and hardware resources in the computer in the form of files, i.e., complete everything files, which are reflected in Linux file types: normal files, directory files (i.e., folders), device files, link files, pipe files, socket files (data communication interfaces), and so on. These various files are managed by Linux using a directory tree, which is a file structure that branches down from the root directory (/). Unlike pure file systems like ext2, which I call file systems, everything files and file directory tree resource management together constitute the Linux file system, so that the Linux operating system can easily use system resources. Therefore, the file system covers much less content than the file system. The Linux file system is mainly to realize the operating system related things with the file carrier: the file system is mounted on the operating system, and the entire operating system is in the file system. However, file systems are not covered much in this article, and you can use file systems instead of file systems in most places.

1. File types in Linux:

1.1. Common Documents (-)

From the perspective of Linux, files on the application layer such as MP4, PDF, and HTML are ordinary files. Linux users can view, modify, and delete ordinary files based on their access rights

1.2. Directory file (d, directory file)

Directory files are not easy to understand for Windows users, directory is also a type of file directory files contain the names of their directories and Pointers to those files, opening a directory is actually opening a directory file, as long as you have access to it, You can access files in these directories at will (the execution permission of ordinary files is the access permission of directory files), but only the kernel process can modify them, although they cannot be modified, but we can use Vim to view the contents of directory files

1.3. Symbolic Link (L, Symbolic Link)

This type of file is similar to a Shortcut in Windows, which is an indirect pointer to another file, also known as a soft link

1.4. Block device Files (B, block) and Character device files (C, char)

These files are usually hidden in the /dev directory and are used for device reads and peripheral interactions. For example, disk drives are block device files, and serial devices are character device files. All devices in the file system are either block device files or character device files

1.5. FIFO (P, pipe)

Pipe files are mainly used for interprocess communication. For example, using the mkFIFo command, you can create A FIFO file, enable process A to read data from the FIFO file, and start process B to write data into the FIFO file, FIFO first in first out, write with read.

1.6. Socket (S)

These files are usually hidden in the /var/run directory to prove the existence of related processes

Linux files do not have extensions. A Linux file can be executed depending on whether it is executable or not. If you have x in your permissions, for example, [-rwx-r-xr-x], the file can be executed, regardless of the name. Different from the file extensions that can be executed under Windows, such as com.exe.bat. However, being able to execute is not the same as being able to execute successfully. For example, the install.log file in the root home directory is a text file. Can the file be executed successfully after the permission is changed to -rwxrwxrwx? Of course not, because its content simply doesn’t have executable data. So, this x means that the file has the ability to execute, but the success depends on the contents of the file. However, we still want to know what the file is from the extension, so we usually use the appropriate extension to indicate what kind of file it is. So the file name on Linux really just gives you an idea of what the file might be used for, and whether it’s actually executed or not still requires a permission specification. For example, the common /bin/ls command, which displays file properties, becomes unexecutable when permissions are changed to unexecutable. This problem most often occurs during file transfer. For example, if you download an executable file from the Internet, but it doesn’t work on your Linux system, the file’s properties may have been changed. And it’s true that file properties and permissions are changed when uploaded from the web to your Linux system

2. Linux directory tree

For Linux and users, all operable computer resources exist in the logical structure of the directory tree, and access to computer resources can be considered as access to the directory tree. In the case of a hard disk, all access to the hard disk becomes access to a node in the directory tree, a folder, without knowing whether it is a hard disk or a folder in the hard disk. The logical structure of the directory tree is very simple, that is, starting from the root directory (/), and continuing to expand the subdirectories.

3. Disk partitions

Hard disk partition is the first step of hard disk integration into the file system. In essence, the physical concept of “hard disk” is transformed into the logical concept of “area”, which is ready for the next formatting. So partitions in and of themselves are not necessary; you can use a whole hard drive as a partition. However, from the point of view of data security and system performance, partition is still a lot of use, so generally will partition the hard disk.

Speaking of partition, we have to first mention the most important first sector of each hard disk. This sector contains the Master Boot Record (MBR) and partition table (PARTITION table). The MBR occupies 446 bytes. The partition table takes 64 bytes. The hard disk master boot log contains the most basic boot loader, which is the key part of the system boot up, and is explained in more detail in the appendix. The partition table is related to partitions. It records information about disk partitions, but because the partition table is only 64bytes, it can only record a maximum of four partitions (partitioning itself is actually setting the partition table).

Four partitions is too small, so the concept of extended partitioning comes up. Since the partition table in which the first sector is located can only record four data, can I use additional sectors to record more partition information? The common accessible partition is called the primary partition. The extended partition is different from the primary partition because it has no content in itself and provides space for further logical partitions. After a partition is designated as an extended partition, the extended partition can be further divided into multiple logical partitions. Operating system requirements:

Each of the four partitions can be a primary partition or an extended partition
There can be at most one extended partition (and no need for multiple partitions)
Extended partitions can be further divided into multiple logical partitions
Extended partitions are only logical concepts and cannot be accessed. That is, they cannot be formatted and used as data access partitions. Only primary partitions and logical partitions can be used as data access partitions
The number of logical partitions varies depending on the operating system. In Linux, IDE disks have a maximum of 59 logical partitions (numbers 5 to 63), while SATA disks have 11 logical partitions (numbers 5 to 15). It is best to divide the extended partition into N logical partitions

Is it possible to do without the primary partition? I don’t know, but it doesn’t seem to matter. When you create a partition, it will automatically assign you a special type of partition. You’d better have a separate type of swap. When data is stored in physical memory that is not frequently accessed by the CPU, the infrequently used programs are dumped into the swap space on the hard disk, freeing up the faster physical memory space for the programs that really need it

4. The format

We know that The Linux operating system supports many different file systems, such as ext2, ext3, XFS, FAT, and so on, but Linux gives access to different file systems to the VFS (virtual file system), which can access and manage many different file systems. So once you have an extent, you need to format it into a concrete file system that the VFS can access.

The standard Linux file system Ext2 uses “inode-based file system”

As we know, file data in general operating system has many attributes besides the actual file content, such as file permissions (RWX) and file attributes (owner, group, time parameters, etc.) in Linux operating system. File systems typically store properties and actual content in separate blocks. In inode-based file systems, permissions and properties are placed in inodes, and the actual data is placed in data blocks. And inodes and data blocks are numbered and Ext2 file systems are based on that

The file system has a boot sector at the front where boot managers can be installed. This design allows us to install different boot loaders on the front of individual file systems without overwriting the entire disk’s unique MBR. That’s how you can multiboot each extents into separate block groups, each with its own inode/block system. If the file system is hundreds of gigabytes, It makes sense to put all the inodes and blocks together because the inodes and blocks are so large that they’re not easy to manage, because partitions are the user’s partitions, and there’s an optimal size for actual computer management, The computer then further blocks the partition (but isn’t that possible? Is there any mechanism to clean up the mess? Each block group is actually divided into 6 parts. In addition to inode table and data block, there are also 4 subsidiary modules to optimize and improve system performance. Therefore, the whole partition will be roughly divided as follows:

1. inode table

It records the attributes of the file and the blocks in which the actual data of the file is placed. It records at least the following information: size, block number (one or more) of the actual content, access mode (read/write/excute) Owner and group Various times: Creation or state change time, last read time, last modified time no file name! The file name is in the block of the directory! A file occupies one inode, and each inode is numbered. On Linux, inodes are used up but disk space is still available. Note that this file is not just a file, but a directory file or folder is actually a file, The number and size of inodes are fixed during formatting. Each inode size is fixed at 128 bytes (new ext4 and XFS can be set to 256 bytes). The number of files a file system can create depends on the number of inodes. When the system reads a file, it needs to find the inode first and analyze whether the permission recorded by the inode matches that of the user. If so, it can actually read the contents of the block. The inode takes 4 bytes to record a block number. Assuming I have a file of 400MB and each block is 4K, then I need at least 100,000 block numbers to record! Where does the inode have that much space to store it? For this reason, our system cleverly defines the area where the inode records block numbers as 12 direct, one indirect, one double indirect, and one triple indirect (see appendix for details).

2. data block

The size of blocks is fixed during formatting, and each block is numbered to facilitate the recording of inodes. In principle, The Ext2 file system supports 1K, 2K, and 4K block sizes. The maximum disk capacity supported by an Ext2 file system is different from the maximum single file capacity. Block size 1KB 2KB 4KB maximum limit for a single file 16GB 256GB 2TB maximum total capacity of the file system 2TB 8TB 16TB each Block can hold a maximum of one file. But a file can be stored in multiple blocks (large). If the file is smaller than a block, the remaining capacity of that block cannot be used (disk space is wasted). So if your files are very small, but your blocks are formatted in 4K at maximum, Since large blocks can cause serious disk capacity waste, should we set the block size to 1K? This is not appropriate because if blocks are small, large files will take up more blocks and inodes will have to record more block numbers. This can lead to poor read/write performance of the file system. In fact, disk capacity is too large nowadays, so 4K blocks are generally used

3. superblock

The size of the file system is 1024bytes. The following information is recorded: Total number of blocks and inodes not used and number of inodes/blocks in use a valid bit value. If the file system is mounted, the valid bit value is 0. If the file system is not mounted, The valid bit is 1. The size of block and inode (blocks are 1, 2, 4K, inode 128bytes or 256bytes). Other file system related information: The filesystem’s mount time, the last time data was written, and the last time the disk (FSCK) was checked superblocks are very important. Without a Superblock, the filesystem would not exist, so if the Superblock died, Your file system may have to spend a lot of time trying to save each block from having a superblock, but we also say that a file system should only have one superblock. What’s going on? In fact, in addition to the first block will contain a Superblock, subsequent blocks do not necessarily contain a Superblock, and if the superblock contains a superblock, the superblock is mainly used as a backup of the superblock in the first block, so that the superblock can be rescued

4. Filesystem Description

File system Description This segment can describe the start and end block numbers of each block group, and specify which block numbers are between each segment (superblock, Bitmap, inodemap, and data block)

5. block bitmap

Which block do you use if you want to add new files? Of course, select “empty block” to record. So how do you know which block is empty? This is done through block bitmaps, which record which blocks are empty, so our system can quickly find space to record and also when you delete files, the block numbers that used to occupy those files have to be released. At this point in the block bitmap, the flag bit for the block number must be changed to “not in use”.

6. inode bitmap

A block bitmap records used and unused block numbers. An inode bitmap records used and unused inode numbers

5. Mount

After an area is formatted as a file system, it can be used by the Linux operating system. However, the Linux operating system cannot find it yet, so we need to register the file system into the Linux operating system. This operation is called “mount”. Mount is to use a directory as the entry point (similar to selecting an existing directory as a proxy) and place the file system under that directory, that is, to enter the directory and read the contents of the file system, similar to the entire file system is just a folder (directory) in the directory tree. The directory of this entry point is called the mount point.

Since the root directory is the most important directory in the entire Linux system, the root directory must be mounted to some partition. Other directories can be mounted to different points according to the user’s own needs.

The hard disk is partitioned and formatted so that each partition becomes a file system. Mounting this file system allows the Linux operating system to access the hard disk through the VFS as if it were a normal folder. Here’s a closer look at directory files and regular files with a practical example of reading files in a directory tree.

6. Read the directory tree

The first thing we need to know

Each file (whether it’s a regular file or a directory file) occupies an inode
Assign one or more blocks to a file based on the size of the file’s contents
After a file is created, the complete file information is distributed in three places, and two new files are generated: 3.1 File names are recorded in blocks of directory files in the directory where the file resides. No new file is generated. 3.2 File attributes, permissions, and block numbers that record specific contents are recorded in inodes, where inodes are newly generated files
Because the file was recorded in the directory of block, “add/delete/rename the file name and directory permissions on the w So in Linux/Unix, the file name is an attribute of the file, called an alias, called a nickname, and use only as a convenience to the user memory, but the system does not need to use the file name to a file location, The most obvious benefit of this process is that you can change the name of the file you are using, change the directory, or even put it in the wastebasket without affecting the use of the current file, which is unimaginable in Windows. For example, if you open a Word file and rename it, Windows will tell you that there is no way. Close the file first! But there is no pressure on the Mac, which also uses inode design.

Creating a File

When a regular file is created in ext2, ext2 allocates an inode and a number of blocks relative to the file size to the file

For example, if I have a block of 4 Kbytes and I want to create a file of 100 Kbytes, Linux will allocate an inode with 25 blocks to store the file
Note, however, that since the inode has only 12 direct points, one more block is required to record the block number

Creating a directory

When an ext2 file system creates a directory (that is, a new directory file), the file system allocates an inode and at least one block to the directory

Inode records the permissions and attributes of the directory and the allocated block number
A block is the inode number of the file name recorded in the directory
Two other records are automatically generated in the block, one of which is. Folder records, inodes pointing to themselves, and the other is… Folder records, inode points to the parent folder

The process of reading a file from a directory tree

Because the file name is recorded in the block of the directory, when we want to read a file, we must pass through the inode and block of the directory. Then we can find the inode number of the file to be read. Finally, we can read the block of the correct file.
Since the directory tree starts from the root directory, the operating system first finds the inode number of the mount point through the mount information, obtains the inode content of the root directory, reads block information of the root directory based on the inode, and then reads the correct file layer by layer. For example, if I want to read the /etc/passwd file, how does the system read it? Take a look at this file and information about the path folder:

1$ ll -di / /etc /etc/passwd

2     128 dr-xr-x r-x . 17 root root 4096 May 4 17:56 /

333595521 drwxr-x r-x . 131 root root 8192 Jun 17 00:20 /etc

436628004 -rw-r-- r-- . 1 root root 2092 Jun 17 00:20 /etc/passwd

Copy the code

Thus, the reading process of the file is as follows:

/ : find the root directory inode 128 from the mount point and allow us to read the contents of the block (with r and x).
Block (33595521) : block (33595521) : block (33595521)
Inode 33595521 (‘ r ‘, ‘x’, ‘r’)
Inode (36628004); inode (36628004); inode (36628004);
The inode (36628004) of passwd has permission to read block contents
Passwd block: finally read the contents of the block

I have a day of leisure, refer to the Linux file system related articles, accidentally found the article, read the full text after the aha, sigh the author of the original Linux thorough, admire extremely, so excerpt to my public number, a little repair, so that the majority of Linux enthusiasts read.

Find this article helpful? Please share it with more people

Focus on “programming without boundaries” and improve your skills