Not too hard! This article will take you through the file system

storage
retrieve
Virtual address space

The second problem is that information is lost when the process terminates. For some applications, such as databases, information is retained for a long time. When these processes terminate, relevant information should be retained and not lost. Even after these applications crash, the information should remain.

The third problem is that it often takes many processes to access this information at the same time. The way to solve this problem is to keep this information in separate processes.

Therefore, we have three basic needs for long-lived information:

It has to be possible to store a lot of information
- Information must be preserved when the process terminates
Multiple processes must be able to access the information simultaneously

Magnetic disks have always been devices that store information for a long time. Solid-state drives have become increasingly popular in recent years.

Solid-state drives not only have no easily broken moving parts, but also provide fast random access. By contrast, tapes and CDS are also widely used, but their performance is relatively poor and they are often used for backup. We’ll talk about disks later, but for now think of disks as a linear sequence of fixed-size blocks that support the following operations

Read piece of k
Write a block of k

Disks actually support more operations, but as long as there are read and write operations, the problem of long-term storage can be solved in principle.

However, there are other things that are not easy to do with disks, especially on large systems with many programs or multiple users (such as servers). In this case, it is easy to produce some problems, such as

How do you find this information?
How do you guarantee that one user won’t read another user’s data?
How do you know which blocks are free? Problems, etc.

We can propose a new abstraction for these problems – files. Process and thread abstractions, address Spaces, and files are all important concepts in operating systems. If you can really dig into these three concepts, you are well on your way to becoming an operating system expert.

Files are logical units of information created by a process. A disk can contain thousands or even millions of files, and each file is independent of the others. In fact, if you can think of each file as a separate address space, then you can really understand the concept of files.

The process can read existing files and recreate them as needed. Information stored in files must be persistent, that is, not affected by process creation or termination. A file can only disappear when the user explicitly deletes it. Although reading and writing are the most basic operations, there are many others, some of which we describe below.

Files are managed by the operating system. The construction, naming, access, use, protection, implementation and management of files are the main contents of the operating system design. In general, the part of the operating system that handles files is called the file system, and that’s what we’re talking about.

From the user’s perspective, users often care about what files are made of, how they are named, how they are protected, what they can do with them, and so on. Although whether to use linked lists or bitmaps to record free areas of memory is not a topic of user concern, it is of vital importance to the system designer. Let’s take a look at these topics

file

The file name

A file is an abstraction mechanism that provides a way to store information and read it later. Perhaps the most important feature of any mechanism is the way managed objects are named. After you create a file, it gives it a name. When a process terminates, the file continues to exist and can be accessed by other processes using the name.

File naming rules vary from operating system to operating system, but all modern operating systems allow strings of 1 to 8 letters as legal filenames.

Some files are case-sensitive, but most are not. UNIX falls into the first category; The venerable MS-DOS falls into the second category (by the way, mS-DOS is still very widely used in embedded systems despite its long history, so it is by no means obsolete); Therefore, UNIX systems have three different named files: Maria, Maria, and Maria. In MS-DOS, all these names belong to the same file.

You might want to reserve a place on the file system. Windows 95 and Windows 98 both use the MS-DOS file system, called fat-16, and therefore inherit some of its features, such as the way filenames are constructed. Windows 98 introduced some extensions to FAT-16 that led to the creation of fat-32, but the two are very similar. In addition, Windows NT, Windows 2000, Windows XP, Windows Vista, Windows 7 and Windows 8 all support FAT file systems, which are somewhat outdated. However, these newer operating systems also have more advanced native file systems (NTFS) with a different feature, which is unicode-based file names. In fact, Windows 8 comes with another Resilient File System, called ReFS(Resilient File System), but this File System is typically used in the Server version of Windows 8. Unless otherwise stated, we refer to Windows fat-16 and fat-32 when referring to MS-DOS and FAT file systems. As a reminder, there is a new file system similar to FAT called exFAT. FAT 32 is an optimized version of Microsoft’s FAT 32 extension for flash memory and large file systems. ExFAT is currently the only Microsoft file system that supports OS X read and write operations.

Many operating systems support two-part filenames separated by a., such as the file name prog.c. The file after the origin is called a file extension, and a file extension usually represents some information about a file. For example, in MS-DOS, file names are 1-8 characters, plus an optional extension of 1-3 characters. In UNIX, the length of the extension, if any, is determined by the user, and a file can even include two or more, such as homepage.html.zip, HTML represents a Web page and.zip represents the file homepage.html has been compressed using the ZIP program. The following figure shows some common file name extensions and their meanings

extension	meaning
bak	Backup file
c	C Source program files
gif	Image files that conform to the graphics interchange format
hlp	The help file
html	WWW Hypertext Markup Language documentation
jpg	Still images that conform to the JPEG encoding standard
mp3	Conforms to the MP3 audio encoding format of the music file
mpg	MPEG compliant movies
o	Object file (compiler output format, not yet linked)
pdf	File in PDF format
ps	PostScript file
tex	An input file prepared for a TEX formatter
txt	Text file
zip	The compressed file

On UNIX systems, file extensions are conventions that are not enforced by operating systems.

The file named file.txt is a text file, and the file name is more of a reminder to the owner than a message to the computer. On the other hand, the C compiler may require files it compiles to end in.c, or it will refuse to compile. However, the operating system does not care about this.

For programs that can handle multiple types, conventions can be extremely useful. For example, the C compiler can compile and link various files, including C files and assembly language files. Extensions are necessary, and the compiler uses them to distinguish between C files, assembler files, and other files. Therefore, extensions become critical for the compiler to determine which files are C files, which are assembler files, and which are other files.

In contrast to UNIX, Windows focuses on extensions and gives them new meaning. Users (or processes) can register extensions in the operating system and specify which programs can have extensions. When the user double-clicks a file name, the program that owns the file name starts and runs the file. For example, double-click file.docx to launch the Word program with file.docx as the initial file.

File structure

Files can be constructed in a number of ways. The following figure lists three commonly used constructs

In the figure above, a is an unstructured sequence of bytes. The operating system does not care what the contents of the sequence are. All the operating system can see are bytes. Any meaning of its file contents is interpreted only in the user program. Both UNIX and Windows use this approach.

Treating a file as a sequence of bytes provides maximum flexibility. User programs can write anything to a file and can be named in any convenient form. The operating system doesn’t help you write to the user, and it certainly doesn’t block you. The latter is important for users who want to do special things. All versions of UNIX (including Linux and OS X) and Windows use this file model.

Figure B represents the first improvement in file structure. In this model, a file is a sequence of fixed-length records, each with its own internal structure. The central idea behind using files as sequences of records is that reads return a record and writes rewrite or append a record. The third file structure is shown in figure C. In this organization, a file consists of a record tree, which may not be the same length, and each record tree contains a key field at a fixed position in the record. The tree sorts by key, allowing for quick lookups of specific keys.

In the structure of the record tree, the next record can be fetched, but the key is to search the specified record by key. As shown in figure C, the user can read out the specified PONY record without worrying about the exact location of the record in the file. Users can also add new records to the file. But the user does not decide where to add to; the operating system decides where to add to.

The file type

Many operating systems support multiple file types. For example, UNIX (also including OS X) and Windows have regular files and directories. In addition, UNIX has character special files and block Special files. Regular files are files that contain user information. The files commonly used by users are mostly conventional files, which generally include executable files, text files and image files. When reading or writing data from conventional files, the kernel will perform operations according to the rules of the file system, and the writing may be delayed, log or accept other operations.

Character special files relate to input/output and are used for serial I/O devices such as terminals, printers, networks, etc. Block special files are used for disk-like devices. We are mainly talking about regular documents.

Regular files are generally classified as ASCII files or binary files. ASCII files consist of text. In some systems, each line ends with a carriage return (ASCII is 13, control character CR, escape character \r). Others use newlines (ASCII is 10, control character LF, escape character \n). Some systems (like Windows) use both.

ASCII files have the advantage of being displayed and printed, and can be edited with any text editor. Further, if many applications use ASCII as input and output, it is easy to connect multiple programs, and the output of one program may be the input of another, like a pipe.

The other difference from ASCII is binary files. The printed binary is incomprehensible. Here is a binary file format taken from the early days of UNIX. Although technically this file is just a sequence of bytes, the operating system will only execute it if the file is formatted correctly.

This file has five sections: header, essay, data, relocation bits, and symbol table. The file header begins with a magic number, indicating that the file is an executable (to prevent accidental execution of files in a format other than this). Then there are the sizes of the various parts of the file, flags to start execution, and some flag bits. The body and data of the program itself are behind the file header, and they are loaded into memory or repositioned according to the repositioned bit. Symbol tables are used for debugging.

Another form of binaries is archive files, which consist of compiled but unlinked library procedures (modules). Each file starts with a module header, which records the name, creation date, owner, protection code, and file size. Like executables, module headers are binary numbers, and copying them to a printer would produce garbled characters.

All operating systems must be able to recognize at least one file type: its own executable. Previous topS-20 systems (for DECsystem 20) even checked the creation time of any files to be executed, to locate resource files to check if the automatic files had been modified since they were created. If it has been modified, the file will be compiled automatically. In UNIX, you embed the make program in the shell. At this point, the operating system requires the user to adopt a fixed file extension to determine which source generates which binary.

What is a make program? In software development, the make program is an auto-compiled tool that automatically builds executable programs and libraries from source code by reading files called Makefiles that specify how to export the target program. Although integrated development environments and language-specific compiler capabilities can also be used to manage the build process, Make is still widely used, especially in Unix and UniX-like operating systems.

When a program reads or writes data from a file, the request is forwarded to the kernel driver. If the file is a regular file, the data is processed by the file system driver and is usually stored in an area on disk or other storage media, where the data read from the file is the data previously written.

When data is read or written to a device file, the request is processed by the device driver. Each device file has an associated number that identifies the device driver to be used. The device’s job of processing data is its own business.

Piece of equipmentAlso called block-special files, they usually behave like regular files: they are arrays of bytes, and the value read at a given location is the last value written to that location. Data from block devices can be cached in memory and read from the cache; Writes can be buffered. Block devices are usually searchable. The concept of a block device is that the corresponding hardware can read or write to an entire block at once, such as a sector on a disk
Character deviceAlso known as character special files, they behave like pipes, serial ports. Writing a byte to a character device might cause it to be displayed on a screen, output on a serial port, and converted to sound.

Directories are system files that manage the file system structure. It is the location used to store files on the computer. Directories are located in tiered file systems, such as Linux, MS-DOS, and UNIX.

It displays all local and subdirectories (for example, the BIG directory in the CDN directory). The current directory is the root of drive C. It is called the root directory because there is nothing in it, and all the other directories branch under it.

File access

Early operating systems had only one type of access: sequential access. In these systems, a process can read all the bytes or records in a file in order, but cannot skip and execute them out of order. Sequential access to a file can be returned to the starting point, and the file can be read multiple times as needed. Sequential access to files is convenient when the storage medium is tape rather than disk.

When a file is stored on disk, bytes or records in the file can be read out of order, or records can be accessed by keyword instead of location. Such a file that can be read in any order is called a random access file. Many applications require this approach.

Random file access is essential for many applications, such as database systems. If a passenger calls to book a flight, the booking program must be able to access the flight’s records directly, rather than reading thousands of records for other flights.

There are two ways to indicate where to start reading the file. The first method is to read directly from scratch using read. The other is to set the current position with a special SEEK operation, from which files are read sequentially after the SEEK operation. UNIX and Windows use the latter approach.

File attributes

A file contains a file name and data. In addition, all operating systems store other file-related information, such as the date and time the file was created, and the size of the file. We can call these the attributes of a file. Some people also like to call them metadata. The attributes of files vary greatly between systems. File properties have only two states: Set and Clear. Here are some common attributes

attribute	meaning
To protect the	Who can access files and how
Password (password)	The password needed to access the file (password)
The creator	ID of the person who created the file
The owner of the	Current owner
The read-only flag	0 indicates read/write, and 1 indicates read-only
Hide the logo	0 indicates normal, and 1 indicates no longer displayed in the list
Sign system	0 indicates a common file and 1 indicates a system file
Archive logo	0 indicates that data has been backed up, and 1 indicates that data needs to be backed up
ASCII/binary flag	0 represents an ASCII file and 1 represents a binary file
Random access flag	0 indicates that only sequential access is allowed, and 1 indicates random access
Temporary marks	0 indicates that the file is normal. 1 indicates that the file is deleted when the process exits
Lock symbol	0 indicates that the lock is not in place and 1 indicates that the lock is in place
Record length	The number of bytes in a record
The location of the keys	The offset of the key in each record
The length of the key	The number of bytes in the key field
Creation time	The date and time the file was created
Last access time	The date and time the file was last accessed
The time was last changed	Date and time when the file was last modified
The current size	The number of bytes of the file
The maximum length	The number of bytes the file can grow to

No system can have all of the above attributes at the same time, but each attribute is adopted in some system.

The first four attributes (protect, password, creator, owner) are related to file protection and indicate who can and cannot access the file.

File Protection: Methods used to protect valuable data on a computer. File protection is achieved by password protection of files or by providing permissions only to specific users or groups.

On some systems, the user must give a password to access a file. Flags are bits or short attributes that control or allow specific attributes.

Hidden File bitsIndicates that the file does not appear in the file list.
Archive FlagIt is used to record whether the file has been backed up, and the backup program will clear the flag bit. If the file is modified, the operating system sets this flag bit. In this way, the backup program knows which files need to be backed up.
Temporary FlagAllow files to be marked as whether to allow automatic deletion when the process terminates.

Fields such as record-length, key-position, and key-length can only appear in files that use keywords to look up records. They provide the information needed to find the keywords.

Different time fields record when a file was created, when it was last accessed, and when it was last modified for different purposes. For example, the source file that is modified after the object file is generated needs to be recompiled to generate the object file. These fields provide the necessary information.

The current size field indicates the current file size, and some older mainframe operating systems required that the maximum file size be specified when the file was created to allow the operating system to reserve the maximum storage value ahead of time. But some servers and PCS do not have this feature.

File operations

The purpose of using files is to store information and facilitate later retrieval. Different systems provide different operations for storage and retrieval. Here are some of the most common system calls related to files:

CreateCreate a file that does not contain any data. The purpose of the call is to indicate that the file is about to be created and to set some properties on the file.
DeleteWhen the file is no longer needed, it must be deleted to free up memory space. There is always a system call to delete files for this purpose.
OpenThe file must be opened before it can be used. The purpose of this call is to allow the system to save a list of properties and disk addresses to main memory for quick access later.
Close, properties and disk addresses are no longer needed, so the file should be closed to free the table space. Many systems limit the number of files a process can open to encourage users to close files they no longer use. The disk writes data in blocks, forcing the last one to be written when the file is closedblockEven if the block of space is not satisfied inside.
ReadData is read from a file. Typically, the data read is from the current location of the file. The caller must specify how much data to read and provide a buffer to hold that data.
Write, writes data to a file, usually from the current location of the file. If the current position is the end of the file, the write is appended directly. If the current location is in a file, the existing data is overwritten and gone forever.
appendAppend can only add data to the end of the file.
seekFor randomly accessed files, specify where to start retrieving data. The usual approach is to use the SEEK system call to point the current location to a specific location in the file. After the seek call ends, you can start reading and writing data from the specified location.
get attributesFile properties are usually read when the process is running.
set attributesThe user can set some file attributes himself, even after the file has been created. This is achieved by the Set Attributes system call.
rename, users can change the names of existing files themselves, and the rename system call is used for this purpose.

File system implementation

Now that you have a basic understanding of the file, it’s time to move on to the implementation of the file system. Previously, users were concerned about how files were named, what operations could be performed, what the directory tree was, and how to find the correct file path. Designers are concerned about how files and directories are stored, how disk space is managed, and how file systems run smoothly. Let’s discuss these issues.

File system Layout

File systems are stored on disks. Most disks can be divided into one or more partitions, called disk partitioning or disk slicing. Each partition has its own file system, and each partition can have a different file system. Partition 0 of the disk is called the Master Boot Record (MBR) and is used to Boot the computer. At the end of the MBR is a partition table. Each partition table gives the address from start to end of each partition. System administrators use a program called the Partition editor to create, resize, delete, and manipulate partitions. One disadvantage of this approach is that it is difficult to properly size partitions, resulting in one partition having a lot of free space while the other partition is almost completely allocated.

MBR can be used in DOS, Microsoft Windows, and Linux operating systems. Since the mid-2010s, most new computers have switched to GUID partitioned table (GPT) partitioning schemes.

Here is a disk partitioned with GParted. The partitions in the table are considered active.

When the computer starts to boot, the BIOS reads in and executes the MBR.

Boot block

The first thing the MBR does is identify the active partition, read its first block, called a boot block, and execute it. The program in the boot block loads the operating system in the partition. For consistency, each partition starts with a boot block, even if the boot block does not contain the operating system. The boot block occupies the first 4096 bytes of the file system, starting with byte offset 0 on disk. The boot block can be used to start the operating system.

In computers, boot is the process of starting the computer, either by hardware (such as pressing the power button) or by software command. After the computer is turned on, the CPU cannot execute commands because there is no software in main memory, so some software must be loaded into memory before the CPU can execute. That is, after the computer is turned on, the loading process of software will be carried out first.

The process of rebooting the computer is called rebooting, and returning to the computer from sleep or sleep does not involve booting.

In addition to starting with the boot block, the layout of the disk partitions varies from file system to file system. Usually a file system contains a few attributes, such as the following

superblock

Immediately following the boot block is the Superblock, which is 4096 bytes in size, starting with byte offset 4096 on disk. The super block contains all the key parameters of the file system

Size of the file system
Number of data blocks in the file system
Flag indicating the status of a file system
Allocate group size

Superblocks are read into memory at computer startup or when the file system is first used.

Free space block

This is followed by information about free blocks in the file system, for example, in bitmaps or pointer lists.

A BitMap or a Bit vector

A bitmap or bitvector is a series of bits or sets of bits, each of which corresponds to a disk block. The bit can take two values: 0, which indicates that the block is allocated, and 1, which indicates a free block. The given disk block instance on the disk in the following figure (with green blocks allocated) can be represented in a 16-bit bitmap as 0000111000000110.

Use linked lists for management

In this approach, free disk blocks are linked together, meaning that one free block contains a pointer to the next free block. The block number of the first disk block is stored in a separate location on the disk and also cached in memory.

debris

I have to mention the concept of fragments, also known as fragments. A generally fragmented single piece of data is often called a fragment. Disk blocks can be further divided into fixed-size allocation units, and fragments are simply file fragments that are not adjacent to each other on the drive. I’ll give you an example if you don’t understand the concept. For example, if you create a file on a Windows computer, you will find that the file can be stored anywhere, such as on your desktop, in a folder on disk, or elsewhere. You can open files, edit files, delete files and so on. You might think that all of these in one place, but in fact is not, your hard drive may be part of the file is stored in a region, the other part is stored in another area, when you open the file, all parts of the hard drive will quickly put the file to collect together, so that other computer system can use it.

inode

Then there is an inode(index node), also known as an index node. It’s an array structure, and each file has an inode, and inodes are very important because they tell you everything about a file. Each index node stores properties of object data and disk block locations

There is an easy way to find them with the ls-lai command. Let’s take a look at the root filesystem:

The inode contains the following information

Mode/Permission (Protection)
Owner ID
Group ID
The file size
Number of hard links to the file
Last visit time
Last Modified time
Inode last modified time

The file is divided into two parts, index nodes and blocks. Once created, the number of blocks of each type is fixed. You cannot increase the number of inodes on a partition, nor can you increase the number of disk blocks.

Immediately following the inode is the root directory, which holds the root of the file system directory tree. Finally, the rest of the disk houses all the other directories and files.

File implementation

The most important problem is keeping track of which disk blocks are used for each file. Different systems take different approaches. We’ll explore these ways below. The main idea behind allocation is efficient use of file space and fast access to files. There are three main allocation schemes

Continuous distribution
Distribution of the list
The index distribution

Continuous distribution

The simplest allocation scheme is to store each file on disk as a series of contiguous data blocks. Therefore, on a disk with 1KB blocks, 50 contiguous blocks will be allocated for 50 KB files.

The above shows 40 contiguous blocks of memory. Let’s start with block 0 on the far left. In the initial state, no files have been loaded, so the disk is empty. Next, four blocks of memory A are written from the beginning of the disk (block 0). And then there’s A memory B that takes up six blocks, and it starts right at the end of A.

Note that each file is written at A new block, so if file A takes up only 3 and 1/2 blocks, some of the memory in the last block will be wasted. In the figure above, there are seven files in total, and each file writes a new file block starting at the end of the previous file.

Continuous disk space allocation has two advantages.

First, continuous file storage is relatively easy to implement. You only need to remember two numbers: the file address of the first block and the number of blocks of the file. Given the number of the first block, you can find the number of any other block by simple addition.
The second point is that the reading performance is relatively strong, you can read the entire file from the file through a single operation. You just have to find the first block once. Seek time and rotation delay are no longer required, so the data comes to disk at full bandwidth.

Therefore, continuous space allocation has the characteristics of simple implementation and high performance.

Unfortunately, continuous space allocation also has obvious disadvantages. Over time, disks can become fragmentary. The diagram below illustrates this phenomenon

There are two files D and F that were deleted. When a file is deleted, the block occupied by the file is also released, leaving some free blocks in disk space. The disk will not squeeze the free block at this location, because it will copy all files after the free block, which could be millions of blocks, which is too large.

Initially, this fragmentation is not a problem because each new file is written at the end of the previous file. However, the disk will eventually fill up, so either compress the disk or reuse the free block space. Compressed disks are too expensive to be feasible; The latter maintains a free list, which works. However, there is a problem with this case. To match a file of the right size to a free block, you need to know the final size of the file.

Imagine the result of this design. The user starts the Word process to create a document. The application starts by asking how big the final document will be. This question must be answered, or the application will not proceed. If the size of the free block is smaller than the size of the file, the program terminates. Because the disk space used is full. So in real life, are there any media that use continuous allocation of memory?

Continuous distribution is widely used on CD-ROMs.

Compact Disc Read-Only Memory (CD-ROM) is also called read-only Memory. Is a disc used on a computer. Data can only be written to the DISC once, and the information is kept permanently on the disc and read from the DISC drive during use.

DVDS, however, are a little more complicated. In principle, a 90-minute movie can be encoded into a stand-alone, approximately 4.5GB file. However, the Universal Disk Format (UDF) used by file systems limits the file size to 1 GB by using a 30-bit number to represent the file length. Therefore, DVD movies are usually stored in three or four consecutive 1 GB Spaces. These file blocks that make up a single movie are called extends.

As we’ve mentioned over and over again, the history is surprisingly similar. Many years ago, continuous allocation was actually used in disk file systems due to its simplicity and high performance. This idea was later abandoned because users did not want to specify the size of the file when creating it. However, with the advent of optical media such as CD-ROM, DVD and Blu-ray disc, continuous distribution became popular again. The conclusion is that technology is never obsolete, and what seems old now may become popular again at some stage in the future.

Distribution of the list

The second way to store files is to construct linked lists of disk blocks for each file, with each file being a linked list of disk blocks, as shown below

The first word of each block acts as a pointer to the next block, and the rest of the block stores data. If you can’t see the above picture clearly, you can look at the entire list allocation scheme

Unlike the continuous allocation scheme, this approach makes full use of each disk block. With the exception of the last disk block, no storage space is wasted due to disk fragmentation. Similarly, in a directory entry, as long as the first file block is stored, other file blocks can also be found.

On the other hand, in a linked list allocation scheme, while sequential reads are convenient, random access is difficult (a big difference between arrays and linked list data structures).

Another problem is that since Pointers take up some bytes, the actual number of bytes of data stored per disk block is no longer an integer power of two. This is not a serious problem, but it reduces the efficiency of the program. Since the first few bytes of each block are used by Pointers, to read a complete block size requires the information from the current block to be pieced together with the information from the next block, thus causing the overhead of lookups and concatenations.

Use memory tables for linked list allocation

Because continuous allocation and linked list allocation have their own shortcomings can not be ignored. Therefore, a table in memory is proposed to solve the allocation problem. Taking the pointer words for each disk block and placing them in a table in memory solves the two shortcomings of the linked list. Here’s an example

The figure above shows the contents of a linked list of disk blocks. There are two files in both figures. File A uses disk block addresses 4, 7, 2, 10, 12 in sequence, and file B uses 6, 3, 11, and 14. That is, file A starts at address 4 and follows the linked list to find all of file A’s disk blocks. Similarly, starting at block 6 and following the chain to the end, you can also find all the disk blocks for file B. You will notice that both lists end with a special mark (-1) that is not a valid disk number. Such tables in memory are called File Application Tables (FAT).

With this organization, the entire block can hold data. In turn, random access is much easier. While a given offset is still searched in memory along the chain, the entire chain is stored in memory, so no disk references are required. As before, no matter how big the file is, you only need to record an integer (the starting block number) in the directory entry to find all blocks of the file.

The downside of this approach is that you must keep the entire list in memory. For a 1TB disk and 1KB block size, the table would need to have 1 billion entries… Each entry corresponds to one of the billion disk blocks. Each item is at least 3 bytes long, sometimes 4 bytes for faster lookups. Depending on how the system optimizes space or time, this table takes up either 3GB or 2.4GB of memory. The FAT management approach does not scale well to large disks. This is where mS-DOS files were originally useful and are still securely supported across Windows versions.

inode

A final way to keep track of which disk blocks each file contains is to give each file a data structure called an inode(index node). Each file is associated with an inode identified by an integer.

Here is a description of a simple example.

Given the length of the inode, you can find all blocks in the file.

This mechanism has significant advantages over the way tables are used in memory. That is, the inode is in memory only when the file is open. If each inode requires n bytes, and at most K files are open at the same time, then the inode occupies a total of KN bytes of open files. There’s only so much room left.

This array takes up much less space than the FAT we described above. The reason is that the size of the table used to hold the linked list of all disk blocks is proportional to the disk itself. If the disk has N blocks, then the table also needs N entries. As disk space increases, the table grows linearly. Instead, an inode requires an array of nodes whose size is proportional to the maximum number of files that may need to be opened. It doesn’t matter whether the disk is 100GB, 4000GB, or 10000GB.

One problem with inodes is that if each node has a fixed-size disk address, what happens when files grow beyond the maximum size allowed? One solution is for the last disk address not to point to a data block, but to point to an address that contains an additional disk block address, as shown in the figure above. A more advanced solution is to have two or more blocks that contain the address of the disk, or that point to other disk blocks that hold the address. The Windows NTFS file system takes a similar approach, except that large inodes can also represent small files.

The full name of NTFS is New Technology File System, which is a special System File developed by Microsoft. NTFS replaces FAT(File allocation table) and HPFS(high performance File System), and further improves on this basis. For example, metadata is supported and advanced data structures are used to improve performance, reliability, and disk space utilization.

Directory implementation

Files can only be read if they are opened. After the file is opened, the operating system uses the path name provided by the user to locate the directory on the disk. The directory entry provides the information needed to locate a file disk block. Depending on the system, the information provided may be the disk address of the entire file, or the number of first blocks (two linked list schemes), or the number of inodes. But in that case, the main function of a directory system is to map the ASCII names of files to the information needed to locate the data.

A closely related issue is where attributes should be stored. Each file system contains different file properties, such as who owns the file and when it was created, and where it needs to be stored. One obvious way to do this is to simply store file attributes in a directory. There are some systems that do exactly that, as follows.

In this simple design, a directory has a list of fixed-size directory entries, one for each file, containing a fixed-length filename, a structure for file attributes, and one or more disk addresses that specify the location of disk blocks.

For inode-using systems, inodes are stored in properties rather than directory entries. In this case, the directory entry is shorter: just the file name and inode count. This approach is shown below

So far, we have assumed that files have short, fixed-length names. In MS-DOS, there is a base name of 1-8 characters and an extensible name of 1-3 characters. In UNIX version 7, files are 1-14 characters long, including any extensions. However, almost all modern operating systems support variable-length extensions. How does this work?

The easiest way to do this is to put a limit on the length of the file name, say 255 characters, and then use the design shown above and reserve 255 character Spaces for each file name. This is simple, but wastes a lot of directory space because few files have file names that long. So, you need a different kind of structure to handle it.

One alternative is to abandon the idea that all directory entries are the same size. In this approach, each directory entry contains a fixed section, which usually starts with the length of the directory entry, followed by data in a fixed format, usually including the owner, creation time, protection information, and other attributes. This fixed-length header is followed by an actual filename of any length, as shown below

Above is a SPARC machine placed in positive order.

A string of characters in a processor can be stored in big-endian or little-endian order. Positive-order stores the high byte before the low byte, while reverse order stores the low byte before the high byte.

In this example, there are three files, project-budget, Personnel, and Foo. Each filename ends with a special character (usually 0), denoted by a cross in a rectangle. To make each directory entry start at the boundary of the word, each filename is filled with integer words, as shown in the figure below

The disadvantage of this method is that when files are removed, a fixed amount of space is left, and the size of the new files may not match the size of the free space.

This is the same problem we discussed with contiguous disk files above, since the entire directory is in memory, only compact concatenation of the directory can save space. Another problem is that a directory entry may be spread over multiple pages, and a page-missing interrupt may occur when the file name is read.

Another way to handle variable-length file names is to make the directory entries themselves fixed-length and place the file names on the stack at the end of the directory. This is shown in the picture above. The advantage of this approach is that when the directory entry is removed, the next file will normally match the space of the removed file. Of course, the heap must be managed, because page misses can also occur when processing file names.

In all of the designs so far, when it comes to finding a filename, all of the schemes have been linear, searching the directory from beginning to end. For very long directories, linear search is inefficient. One way to improve file retrieval efficiency is to use a hash table, also known as a hash table, on each directory. We assume that the size of the table is N, and when the filename is entered, the filename is hashed between 0 and n-1, for example, it is divided by n and the remainder is taken. Or sum the words that make up the name of the file or something like that.

Either way, the hash table corresponding to the hash value is checked when adding a file. If not, a pointer to the directory entry is pointed to here. File directory entries follow the hash table. If so, a linked list is constructed (is this the same data structure used in HashMap?). The table header pointer of the linked list is stored in the table entries, and all the entries are joined by hash values.

The process of finding a file is similar to adding a file. The file name is first hashed, the hash table is checked to see if the hash value exists, and if so, all hash items on the chain are checked to see if the file name exists. If the hash is not on the chain, the file is not in the directory.

The advantage of using hash tables is that lookup is very fast, but the disadvantage is that they are very complex to administer. Hash tables are only considered as a solution if there are thousands of directory entries in the system.

Another way to speed up the lookup of instructions in a large number of directories is to use caching, which caches the results of the lookup. Before starting the search, the file name is first checked to see if it is in the cache. If it is in the cache, the file can be located immediately. Of course, caching works best when you do multiple lookups under fewer files.

The Shared file

When multiple users are working on the same project, they often need to share files. If the shared file appears in multiple user directories at the same time, they can work together easily. The following image was mentioned above, but one change is that a file from C also appears in B’s directory.

If organized as shown in the figure above, the connection between B’s directory and the shared file is called a link. Instead of a tree, the file system is now an Directed Acyclic Graph (DAG).

In graph theory, a directed graph is a directed acyclic graph if it starts at any vertex and cannot go back to that point by several edges. We won’t focus on graph theory here, but you can Google it.

Organizing a file system as a directed acyclic graph complicates maintenance, but it’s a price you have to pay.

Sharing files is convenient, but it can also cause problems. If the directory contains the disk address, the disk address in directory C must be copied to directory B when linking files. If B or C subsequently adds to the file, the newly written block is displayed only in the directory of the user who performed the appending. Such changes would be invisible to other users, defeating the purpose of sharing.

There are two solutions to this problem.

In the first solution, disk blocks are not placed in directories, but in small data structures associated with the files themselves. The directory will point to this small data structure. This is the way it is used in UNIX (small data structures are inodes).
In the second solution, B and C are linked by having the system create a new file of type LINK and place this file in B’s directory. The new file contains only the pathname of the file to which it is linked. When B wants to read a file, the operating system checks that a file of type LINK exists in B’s directory, finds the file and pathname of the LINK, and then reads the file, which is called symbolic linking.

Each of the above methods has its own disadvantages. In the first method, when B links to the shared file, the inode record file is owned by C. Creating a link does not change all relationships, as shown in the figure below.

The first situation is shown in Figure A, where the owner of directory C is C. When directory B is linked to a shared file, the owner of directory C does not change, but just increases the count by 1, so the system knows how many directories point to this file currently. Then C tries to delete the file. There is a problem. If C removes the file and clears the inode, B will have a directory entry pointing to the invalid node. If the inode is later assigned to another file, B’s link points to the wrong file. The system knows from the inode that the file is still referenced, but there is no way to find all of the file’s directory entries to delete them. Pointers to directories cannot be stored in inodes because there may be an infinite number of such directories.

So all we can do is delete the C entry, but leave the inode in place and set the count to 1, as shown in figure C above. C means that only B has a directory entry pointing to the file, and the former of the file is C. If the system does an accounting operation, C will continue to pay for the file until B decides to delete it, in which case the file will not be deleted until the count becomes zero.

This problem does not occur with symbolic links; only the real file owner has a pointer to the inode. The user linked to the file has only the pathname and no pointer to the inode. When the file owner deletes the file, the file is destroyed. Subsequent attempts to access the file through symbolic links will fail because the system cannot find the file. Deleting symbolic links does not affect the file.

The problem with symbolic links is that they require additional overhead. You must read the file containing the path, and then scan the path section by section until you find the inode. These operations may require many additional disk accesses. In addition, each symlink requires an additional inode, as well as an additional disk block to store the path, although if the path name is short, the system can store it in the inode as an optimization. Symbolic links have the advantage of connecting files on machines anywhere in the world simply by providing a machine’s network address and the path where the files reside on that machine.

There is another problem with links, both in symbolic links and other ways. If linking is allowed, the file has two or more paths. A program that finds all files in a specified directory and its subdirectories will locate the linked file multiple times. For example, a program that dumps files in a directory and its subdirectories onto tape might copy a linked file multiple times. Furthermore, if the tape is then read into another machine, unless the roll-out program is intelligent, the linked file will be copied to disk twice, rather than just linked together.

Log structured file system

Changes in technology can put pressure on current file systems. In this case, cpus get faster and faster, and disks get bigger and cheaper (but not faster). Memory capacity has also grown exponentially. But the seek time of disks (except solid-state drives, which have no seek time) has not improved.

The combination of these factors means performance bottlenecks in many system files. Berkeley designed a new type of File System to alleviate this problem: the Log-structured File System (LFS).

Journal-structured file systems were introduced by Rosenblum and Ousterhout in the early 1990s to solve the following problems.

Growing system memory
Sequential I/O performance beats random I/O performance
An existing inefficient file system
File Systems do not support RAID (Virtualization)

On the other hand, file systems of the time, both UNIX and FFS, had a large number of random reads and writes (at least five random writes to create a new file in FFS), and thus became a performance bottleneck for the entire system. At the same time, because of the Page cache, the author thinks that random reads are not the main problem: as more and more memory is stored, most reads can be cached, so LFS’s main solution is to reduce random writes to hard disk.

In this design, inodes even have the same structure as in UNIX, but now they are scattered throughout the log rather than in a fixed location on disk. So, inodes are well positioned. To find inodes, an inode map(inode map) indexed by inodes is maintained. Entry I points to the ith inode on the disk. This mapping is kept on disk, but also in the cache, so the most frequently used parts are in memory most of the time.

Log structure File systems use four data structures: Inode, Inode Map, Segment, and Segment Usage Table.

So far, all writes are initially cached in memory and appended to the end of the log, and all cached writes are periodically written to disk in a single segment. So, opening the file now means locating the file’s index node with a map. Once the inode is located, the address of the disk block can be found. All of these blocks themselves will be in segments somewhere in the log.

In the real world, disk capacity is limited, so eventually the log will fill the entire disk space, in which case no new disk blocks will be written to the log. Fortunately, many existing segments may have blocks that are no longer needed. For example, if a file is overwritten, its inode will be pointed to the new block, but the old disk block will still occupy space in the previously written segment.

To deal with this problem, LFS has a clean thread that cycles through the logs and compresses them. First, look at the information in the first part of the log to see which index nodes and files exist. It checks the mapping of the current inode to see if the inode is still in use in the current block. If not, the information is discarded. If it is still in use, inodes and blocks go into memory to wait to be written back to the next segment. The original segment is then marked as free so that the log can be used to store new data. In this way, the cleanup thread traverses the log, removes the old segment from behind, and then puts the valid data into memory to wait for writing to the next segment. As a result, the entire disk forms a large circular buffer, with the writer thread writing the new segment first and the cleaning thread cleaning the following segment.

Log file system

While log-structured systems are elegantly designed, they are not yet widely used because they do not match existing file systems. However, a new type of journaling system, called journaling file systems, has evolved from journaling file structure systems to keep a log of what the system is going to do next. Microsoft’s NTFS file system and Linux’s Ext3 use this log. OS X offers a logging system as an option. To see how this works, let’s discuss an example, such as removing a file, which in UNIX requires three steps:

Delete files in the directory
Free inodes to the free inode pool
Return all disk blocks to the free disk pool.

In Windows, similar steps exist. In the absence of a system crash, the order in which these steps are executed does not pose a problem. But when the system crashes, it can cause problems. Suppose the system crashes after the first step. Inodes and file blocks will not be acquired by any file and will not be redistributed; They exist only in one place in the waste pool and thus reduce the resources available. If the crash occurs after the second step, only the disk blocks are lost. Journaling file systems retain logs or logs of changes made to the file system during disk writes, which can be used to quickly rebuild damage that might occur due to events such as system crashes or power outages.

Generally, you must run the FSCK (File System Consistency Check) utility after a file system crash.

In order for the log to work correctly, the log operations written must be idempotent, which means they can be repeated as many times as necessary without disruption. Operations like updating the bittable and marking inode k or block N as free can be repeated any number of times. Similarly, finding a directory and deleting all items called foobar is idempotent. Conversely, adding newly freed blocks from inode K to the end of the free table is not idempotent, since they may have already been freed and stored there.

To increase reliability, a file system can introduce the concept of atomic transactions in a database. Using this concept, a set of actions can be defined between the start and end of a transaction operation. In this way, the file system knows that it must do all or none of the actions.

Virtual file system

Even on the same computer or operating system, many different file systems are used. The primary file system in Windows is the NTFS file system, but not just the NTFS operating system in Windows. It also has some other such as old fat-32 or FAT-16 drives or partitions that contain data that is still needed, flash drives, Old CD-ROMs or DVDS (each with its own unique file system). Windows handles these different file systems by specifying different drive letters, such as C:, D:, etc. Drive letters can be either explicit or implicit. If you want to find a file in a specified location, drive letters are explicit. If a process opens a file, the drive letter is implicit, so Windows knows which file system to pass the request to.

UNIX, by contrast, takes a different approach, integrating multiple file systems into a unified structure. A Linux system can use ext2 as the root filesystem, with the ext3 partition mounted under /usr, another hard disk using the Reiser FS filesystem mounted under /home, and an ISO 9660 CD-ROM temporarily mounted under/MNT. From the user’s point of view, there is only one file system hierarchy, but in reality they are composed of multiple file systems and are invisible to the user and process.

The UNIX operating System uses a Virtual File System (VFS) to try to organize multiple File systems into an ordered structure. The key idea is to abstract out what is common to all file systems and put that code in a layer that then calls the specific file system to manage the data. Here is the architecture of a VFS system

Again, in the computer world, any problem that can’t be solved can be solved by an agent. All file-related system calls are initially processed to point to the virtual file system. These calls from the user process are standard POSIX system calls such as Open, Read, Write, seek, and so on. The VFS has an upper-layer interface to user processes, known as POSIX.

VFS also has a low-level interface to the actual file, the one labeled VFS in the figure above. This interface contains many function calls so that VFS can make each file system complete. Therefore, to create a new file system that can be used with VFS, the designer of the new file system must ensure that it provides the functionality required by VFS. An obvious example is a function that reads a specific block from disk, then puts it into the buffer cache of the file system, and then returns a pointer to that block. As a result, the VFS has two distinct interfaces: one to the user process and the next to the specific file system.

When the system starts, the root file system is registered with the VFS. In addition, when other files are loaded, either at startup or during operations, they must also be registered with the VFS. When a file system is registered, the root file system is registered with the VFS. In addition, when other file systems are mounted at boot time or during operations, they must also be registered with VFS. When a file system is registered, its basic function is to provide an address list, call direction table, or VFS object for the functionality required by the VFS. So once the file system is registered with the VFS, it knows where to start reading blocks of data.

Once the file system is mounted, it is ready to use. For example, if a filesystem is loaded into /usr and a process calls it:

open("/usr/include/unistd.h",O_RDONLY)
Copy the code

When the path is resolved, the VFS sees that the new file system is mounted to /usr and determines its superblock by searching for the superblock of the mounted file system. It then finds the root directory of the file it reprints, where it looks for the path include/unistd.h. The VFS then creates a VNode and calls the actual file system to return all the information in the file inode. This information is copied to the VNode (in memory) along with other information. The most important of these other pieces of information are Pointers to tables of functions that contain vNode operations, such as read, write, and close.

When a VNode is created, VFS creates an entry in the file descriptor table for the process to invoke and points it to the new VNode. Finally, VFS returns the file descriptor to the caller, so the caller can use it to read, write, or close the file.

When a process does a read with a file descriptor, the VFS uses the process table and file descriptor to locate the VNode and follows a pointer to the function table, which calls the process read function that runs on the actual system and retrieves the requested block. The VFS does not know whether the request came from a local hard disk, or from a remote file system, CD-ROM, USB, or other media on the network, and all the relevant data structures are shown in the figure below

Start with the caller process number and file descriptor, then vNode, read function pointer, and then access function location to the actual file system.

File system management and optimization

It’s one thing to be able to make a file system work, it’s another thing to be able to make a file system work efficiently and stably, so let’s look at file system management and optimization.

Disk Space Management

Files often exist on disk, so managing disk space is an operating system designer’s concern. There are two policies on the file: allocate n bytes of contiguous disk space; Or to split a file into chunks that are not necessarily contiguous. In the storage management system, there are two main management modes: segmental management and paging management.

As we can see, there is an obvious problem with storing files in sequential byte sequences, which may require moving files on disk as they grow. Memory segmentation has the same problem. The difference is that moving the middle of memory is much faster than moving files from one location on the disk to another. As a result, almost all file systems divide files into fixed-size chunks for storage.

The block size

Once files are stored in fixed-size chunks, the question arises: What is the size of the chunk? Depending on the disk organization, sectors, tracks, and cylinders can obviously be used as allocation units. Page size is also a major factor in paging systems.

Having large block sizes means that each file, even a 1-byte file, takes up a cylindrical space, which means that small files waste a lot of disk space. Small chunks, on the other hand, mean that most files will span multiple blocks, requiring multiple searches and rotation delays to read them, which reduces performance. So if you allocate too many blocks, you waste space; Allocating blocks that are too small wastes time.

Record free block

Once the block size is specified, the next problem is how to keep track of free blocks. Two approaches are widely used, as shown in the figure below

The first approach is to use a linked list of disk blocks, each of which contains as many free disk block numbers as possible. For 1 KB blocks and 32-bit disk blocks, each block in the free table contains 255 free block numbers. Consider a terabyte hard drive, which has about a billion disk blocks. To store all address block numbers, if each block could hold 255 blocks, you would need nearly 4 million blocks. Typically, free blocks are used to hold free lists, so the storage is essentially free.

Another technique for free space management is bitmap. N bits are required for n blocks of disk. In the bitmap, free blocks are represented by 1 and allocated blocks are represented by 0. For the example of a 1-TERabyte hard disk, a billion-bit representation is required, which is about 130,000 1-KB block storage. Obviously, bitmaps require less space than the 32-bit linked list model because each block uses one bit. Lists require fewer blocks than bitmaps only when the disk is near full.

If free blocks are long-term contiguous, the free list can be changed to record contiguous blocks instead of individual blocks. Each block is associated with an 8 -, 16 -, and 32 – bit count to record the number of consecutive free blocks. In the best case, a free block can be represented by two numbers: the address of the first free block and the count of the free blocks. On the other hand, if the disk is heavily fragmented, tracking successive partitions is less efficient than tracking individual partitions because you store not only addresses but also quantities.

This situation illustrates a problem often encountered by operating system designers. There are many data structures and algorithms that can be used to solve problems, but choosing the best solution requires data that the designer cannot have up front. It is obtained only after the system is deployed and used.

Now, back to the free linked list method, only one pointer block is kept in memory. When the file is created, the required block is taken from the pointer block. When it runs out, a new pointer block is read from disk. Similarly, when a file is deleted, the block of the file is freed and added to the pointer block in main memory. When the block is filled, write back to disk.

In certain cases, this approach results in unnecessary disk IO, as shown in the figure below

The pointer block in memory above has only two free blocks. If a file with three disk blocks is freed, the pointer block overflows and must be written to disk, resulting in the situation shown in the following figure.

If you now write to a file with three blocks, the full pointer will have to be read in again, which will return to the situation in figure A above. If a file with three blocks is only written as a temporary file, another disk write is required to write the full pointer block back to disk when it is released. In short, when the pointer block is nearly empty, a short series of temporary files can cause a lot of disk I/O.

Another way to avoid most disk I/O is to split the entire pointer block. Thus, when three blocks are released, the change is no longer from A-B, but from A-C, as shown below

Now the system can process a series of temporary files without any disk I/O. If the pointer block in memory is full, it is written to disk, and the half-full pointer block is read from disk. The idea here is to keep most of the pointer blocks on the disk full (reducing disk usage), but keep a half-full pointer block in memory. In this way, you can handle both file creation and file deletion without doing disk I/O for free tables.

For bitmaps, only one block is kept in memory and another block is fetched from disk only when the block is full or empty. By doing all allocation operations on a single block of the bitmap, the disk blocks are tightly packed together, reducing disk arm movement. Because a bitmap is a fixed-size data structure, if the kernel is paginated, you can put the bitmap in virtual memory and call in pages of the bitmap as needed.

Disk quota

To prevent some users from occupying too much disk space, multi-user operations usually provide enforcing disk quotas. The system administrator assigns maximum file and block allocations to each user, and the operating system ensures that users do not exceed their quotas. We’ll talk about this mechanism next.

When the user opens a file, the operating system finds the file properties and disk address and feeds them into the open file table in memory. One of the attributes tells you who owns the file. Any additions to related files are credited to the owner’s quota.

The second table contains a quota record for each user’s currently open file, even if it is opened by someone else. As shown in the figure above, the contents of this table are extracted from the disk quota file of the owner of the opened file. When all files are closed, the record is written back to the quota file.

When a new entry is created in the open file table, a pointer to the owner quota record is generated. Each time a block is added to a file, the total number of blocks used by the file owner increases, and checks for both hard and soft limits are added. Soft limits can be exceeded, but hard limits cannot be exceeded. When the hard limit has been reached, adding more content to the file raises an error. Similarly, a similar check exists for the number of files.

What are hard and soft limits? The hard limit is the upper limit of the soft limit. Soft limits are the restrictions that are actually enforced for the session or process. This allows administrators (or users) to set hard limits to allow as much usage as they wish. Other users and processes can then self-limit their resource usage to a lower limit using soft limits as needed.

When a user tries to log in, quota files are checked to see if the user has exceeded the soft limit for the number of files or disk blocks. If any of the restrictions are violated, a warning is displayed and the saved warning count is reduced by 1. If the warning count is 0, the user has ignored the warning multiple times and will not be allowed to log in. To get permission to log in again, you must negotiate with your system administrator.

If the user removes the excess when they exit the system, they can once again exceed their soft limit during the terminal session, but never exceed the hard limit in any case.

File system Backup

File system corruption is much more serious than computer corruption. It is extremely difficult, if not impossible, to recover a computer file system from a hardware or software failure. Since the file system is not immune to corruption, we need to do a backup before the file system is destroyed, but backup is not easy, let’s explore the backup process.

Many people think that backing up a file system is not worth it and a waste of time, until one day their disk fails and they realize how serious it is. Relatively speaking, the company is doing a good job in this area. Tape backup deals with one of two potential problems

Recovering from an unexpected disaster

This problem is mainly caused by external conditions, such as disk breakage, floods and fires.

Recovery from a bad operation

The second problem is usually caused by users accidentally deleting files that needed to be restored. This happens so often that Windows designers have a special directory for delete commands called the Recycle bin, which means that when you delete a file, the file itself doesn’t actually disappear from the disk, but is placed in this special directory. You can restore it later when you need it. File backups are more of a case of allowing files that are days or weeks old to be restored from the disk that was originally backed up.

Making a backup of files is time-consuming and a waste of space, which can cause the following problems. First, do you want to back up the entire file or just part of it? In general, only a specific directory and all files under it are backed up, not the entire file system.

Second, the idea of incremental dumps is that it is wasteful to back up files that were not modified the last time. The simplest form of incremental dump is to periodically make a full backup, and only make a single backup of the files that have changed since the incremental dump.

Periodicity: such as a week or a month

A slightly better approach is to back up only the files that have changed since the last dump. Of course, this greatly reduces dump time, but recovery is more complex, since the most recent full dump is first restored in full, followed by incremental dumps in reverse order. To facilitate recovery, people tend to use more complex dump patterns.

Third, since there is often a huge amount of data to dump, it is necessary to compress the file before writing it to tape. However, if file corruption occurs during backup, the compression algorithm will be broken, making the entire tape unreadable. Therefore, be careful whether to compress files before backup.

Fourth, it is difficult to make a backup of the file system you are using. If files and directories are added, deleted, or modified during the dump, the dump results may be inconsistent. Therefore, because the dump process can take several hours, it is necessary to take the system offline for backup at night, but this approach is not well accepted. So people modify dump algorithms to take instantaneous snapshots of the file system, that is, copy critical data structures, and then need to copy future changes to files and directories into blocks instead of updating them everywhere.

There are two methods for dumping disks to backup disks: physical dump and logical dump. Physical dump starts with 0 disk blocks and writes all disk blocks to the output disk in sequence. The physical dump stops when the last disk is copied. This procedure is infallible in a way that no other procedure can.

The second consideration is bad block dumps. It is impossible to make large disks without defects, so there will be some bad blocks. Sometimes bad blocks are detected and flagged after low-level formatting, and the solution is to replace them with some free blocks at the end of the disk.

However, some blocks deteriorate after formatting, in which case the operating system can detect them. Typically, it solves the problem by creating a file made up of all the bad blocks, ensuring that they never appear in the free pool and are never allocated. Then the file is completely unreadable. If the disk controller remaps all the bad blocks, the physical dump still works.

Windows systems have paging files and Hibernation Files. They do not play a role in restoring files and should not be backed up in the first place.

Physical dump and logical dump

The main advantage of a physical dump is that it is simple and extremely fast (running at disk speed basically), but the disadvantage is that a full backup cannot skip a specified directory, nor can an incremental dump, nor can requests for personal files be recovered. Therefore, most sentences do not use physical dumps, but logical dumps.

Logical dumps start at one or more specified directories and recursively dump files and directories that have changed since the specified date. Thus, in a logical dump, there is a carefully identified set of directories and files on the dump disk, which makes it easy to restore specific files or directories on request.

Since logical dumps are the most common approach, let’s look at a general algorithm for logical dumps. This algorithm is widely used on UNIX systems, as shown in the figure below

The file system to be dumped, where boxes represent directories and circles represent files. The yellow item table has been modified since the last dump. Each directory and file is marked with its inode number.

This algorithm dumps all directories (including unmodified directories) on the path to the modified file or directory for two reasons. The first is the ability to recover the dumped files in the file systems of different computers. In this way, dumped and re-stored programs can be used to transfer the entire file system between two computers. The second reason is the ability to do incremental recovery on individual files.

Logical dump algorithms need to maintain a bitmap indexed by an inode, with each inode containing several bits. As the algorithm progresses, these bits in the bitmap are set or cleared. The implementation of the algorithm is divided into four stages. The first phase starts from the start directory (the root directory in this example) and checks all directory entries. For each modified file, the algorithm marks its inode in the bitmap. The algorithm also marks and recursively inspects each directory (whether modified or not).

At the end of phase 1, all modified files and all directories are marked in the bitmap, as shown below

In theory, the second phase iterates recursively through the tree again, removing any markup in the tree that does not contain the modified file or directory. The results of this phase are as follows

Note that the directories with inode numbers 10, 11, 14, 27, 29, and 30 have been unmarked because their contents have not been modified. They don’t dump either. In contrast, the inodes 5 and 6 are themselves dumped even though they have not been modified, because this information is needed to recover today’s changes on a new machine. In order to improve the efficiency of the algorithm, the directory tree traversal of the two stages can be combined into one.

Now that you know which directories and files must be dumped, which is what is marked in figure B above, the third stage algorithm scans these inodes in order of node numbers and dumps all directories marked for dump, as shown below

For recovery, each dumped directory is prefixed with the attributes of the directory (owner, time).

Finally, in phase four, the files marked in the figure above are dumped, again prefixed by their file attributes. At this point, the dump ends.

Restoring a file system from a dump disk is simple. To start, you need to create an empty file system on disk. Then restore the last full dump. Since the directory appears first on tape, the directory is restored, the skeleton of the file system is given, and the file system itself is restored. Full storage is followed by the first incremental storage, then the process is repeated a second time, and so on.

Despite the simplicity of logical storage, there are some tricky problems. First, since the free block list is not a file, it needs to be rebuilt from scratch after all the dumped files have been recovered.

Another question is about links. If a file is linked to two or more directories and the file can only be restored once, then all directories pointing to the file must be restored.

Another problem is that UNIX files actually contain a lot of holes. It is legal to open a file, write a few bytes, then find an address in the file that is offset by some distance and write more bytes. But the blocks in between do not belong to the file itself, and therefore should not be used for file dumps and recovery.

Finally, special files, named pipes, and similar files should not be dumped regardless of which directory they belong to.

File system consistency

One factor that affects reliability is file system consistency. Many file systems read disk blocks, modify disk blocks, and write them back to disk. If the system crashes before all blocks are written, the file system is put in an inconsistent state. This is a serious problem if some of the blocks that have not been written back are inode blocks, directory blocks, or blocks that contain free lists.

To deal with file system consistency, most computers have applications that check file system consistency. For example, UNIX has FSCK; Windows has SFC, which you can run whenever you boot the system (especially after a crash).

Two types of consistency checks can be performed: block consistency checks and file consistency checks. To check for block consistency, the application creates two tables, each containing a block with a counter, initially set to 0. The counter in the first table keeps track of how many times the block appears in the file, and the counter in the second table keeps track of how often each block appears in the free list, free bitmap.

The verification program then reads all inodes using raw devices, ignoring the file structure and returning only all disk blocks starting from zero. Starting with the inode, it’s easy to find the number of blocks in the file. Whenever a block is read and its counter in the first table + 1, the application checks the free block or bitmap to find unused blocks. Each occurrence of a block in the free list causes its counter in the second table to increase.

If the file systems are consistent, each block is either 1 in the first table counter or 1 in the second table counter, as shown in the figure below

But when the system crashes, the two tables might look like this

Disk block 2 does not appear in any of the tables, which is called missing block. Although block loss does not cause actual damage, it does waste disk space and reduce disk capacity. The problem of missing blocks is easily solved by adding them to the free table by the file system validator.

Another possible scenario is shown below

Of these, block 4 appears twice in the free table. This solution is as simple as re-creating the free table.

The worst case scenario is the same data block in two or more files, as shown below

For example, if one of the files on disk block 5 is deleted, block 5 is added to the free table, causing a block to be in both used and idle states. If you delete these two files, the disk block will appear twice in the free table.

The file system validator takes the approach of allocating a disk block, copying the contents of block 5 into the free block, and then inserting it into one of the files. This leaves the contents of the file unchanged, which can certainly be wrong, but at least ensures consistency. This error should be reported to the user for inspection.

In addition to checking the correct count of each disk block, the file system also checks the directory system. A counter table is used, but there is a file (not a block) that corresponds to a counter. The program starts at the root and searches down the directory tree, examining each directory in the file system. For each file in the directory, make it count + 1.

Note that a file may appear in two or more directories because of a hard connection. However, symbolic links are not counted and do not add + 1 to the target file’s counter.

After the validation program is complete, you get a table indexed by inode that describes the inclusion relationship for each file and directory. The validator compares these numbers to the number of links stored in the file inode. If the link count of the inode node is large, then even if all files are deleted from the directory, the count is not zero and the inode is not deleted. This is not a serious error, but it wastes disk space because there are files that do not belong to any directory.

The other kind of mistake is potential risk. If two directory entries are linked to the same file and the inode link count is only 1, if either directory entry is deleted, the inode link count becomes 0. When the inode count reaches 0, the file system marks the inode as unused and releases all blocks. This will result in one of the directories pointing to an unused inode, which is likely to be immediately allocated to another file.

File System Performance

Disk access is much more efficient than full memory, so it’s time to revisit this chart

Reading a 32-bit word from memory is about 10ns, while reading from hard disk is about 100MB/S. For each 32-bit word, the efficiency is four times slower, plus 5-10 ms of seek time and other losses. If only one word is accessed, memory is millions of orders of magnitude faster than disk. So disk tuning is necessary, and we’ll discuss some of them below

The cache

The most common technique to reduce the number of disk accesses is to use block cache or buffer cache. A cache is a series of blocks that logically belong to disk but are actually held in memory for performance reasons.

There are different algorithms for managing the cache. A common algorithm is to check all read requests to see if there are any blocks in the cache that are needed. If yes, read operations can be performed without accessing the disk. If the check block is no longer in the cache, it is first read into the cache and then copied to the desired location. After that, requests to the same block are made through the cache.

The operation of caching is shown below

Because there are many blocks in the cache, you need some way to quickly determine if the required block exists. The common approach is to hash the device and disk addresses and then look up the results in the hash table. Blocks with the same hash value are joined together in a linked list (is this data structure much like a HashMap?). , so you can look for other blocks along the collision chain.

If the cache is full and a new block needs to be called in, one of the original blocks needs to be called out of the cache, or if the block to be called has been modified since the last call, it needs to be written back to disk. This situation is very similar to paging, all the common page replacement algorithm we have introduced before, if you are not familiar with the friends can refer to mp.weixin.qq.com/s/5-k2BJDgE… FIFO algorithm, second chance algorithm, LRU algorithm, clock algorithm, aging algorithm, etc. They are both good for caching.

Block read ahead of time

The second significant improvement in filesystem performance is to try to write blocks to the cache before they are needed, thereby increasing the hit ratio. Many files are read sequentially. If the requested file system generates block K in a file, the file system performs the operation and, upon completion, checks the cache to determine whether block K + 1 is already cached. If not, the file system arranges a prefetch for k + 1, because the file wants to read the block directly from the cache when it is used.

Of course, the block-ahead read policy only applies to files that are actually read sequentially. For randomly accessed files, reading ahead doesn’t work at all. It can even get in the way.

Reduces the disk arm movement

Caching and block read-ahead are not the only ways to improve filesystem performance. Another important technique is to place blocks that can be accessed sequentially together, preferably on the same cylinder, thereby reducing the number of disk arm movements. When writing an output file, the file system must allocate disk blocks again and again as required. If a bitmap is used to record free blocks, and the entire bitmap is in memory, it is easy to select the nearest free block to the previous one. If you have a free table, and part of the linked list exists on disk, it is much more difficult to allocate adjacent free blocks.

However, even with free tables, you can use the cluster technique. That is, continuous clusters of blocks instead of blocks are used to track disk storage. If a sector has 512 bytes, it is possible that the system uses 1 KB blocks (2 sectors) but allocates disk storage in units of 2 blocks (4 sectors). This is not the same as a 2 KB disk block, because it still uses 1 KB blocks in the cache, and data transfers between disk and memory are also 1 KB, but reading these files sequentially on an idle system can reduce the number of seeks by half, resulting in a significant improvement in filesystem performance. Variations of this method can be obtained if rotational positioning is considered. When allocating blocks, the system tries to store contiguous blocks in a file on the same cylinder.

Another performance bottleneck in systems that use inodes, or any inode-like system, is that reading a very short file also requires two disk accesses: one to the inode and one to the block. Typically, inodes are placed as shown below

All inodes are placed near the beginning of the disk, so the average distance between the inode and the block it points to is half that of the cylinder group, which requires a longer seek time.

A simple improvement is to store the inode in the middle of the disk instead of at the beginning, so that the seek time between the inode and the first block is halved. Another approach is to divide the disk into cylinder groups, each with its own inodes, data blocks, and free tables, as shown in Figure B.

Of course, it only makes sense to talk about seek times and spin times if you have disk arms on the disk. More and more computers are using solid-state drives (SSDS), for which many of the problems of traditional hard drives disappear because the same manufacturing techniques used in flash memory allow random and sequential access to transfer at similar speeds. But it also raises new questions.

Disk defragmentation

After the initial installation of the operating system, files are constantly created and erased, resulting in fragmentation of the disk. When a file is created, its blocks are scattered all over the disk, reducing performance. Reclaiming disk blocks after deleting files may cause holes.

Disk performance can be restored by moving files next to each other and placing all or at least most of the free space in one or more large contiguity areas. There’s a Windows program called Defrag that does just that. Windows users will use it a lot, with the exception of SSD.

Disk defragmenter will run fine on the file system. Linux file systems (ext2 and ext3 in particular) are generally less difficult to defragment than Windows due to the way they select disk blocks, so manual disk defragmentation is rarely required. Moreover, SSDS are not affected by disk fragmentation, and in fact, defragmentation on SSDS is a waste of time, wearing out the SSD rather than improving performance. So defragmentation will only shorten the life of the SSD.

Related references:

zhuanlan.zhihu.com/p/41358013

www.linuxtoday.com/blog/what-i…

www.lifewire.com/what-is-fra…

www.geeksforgeeks.org/free-space-…

Sites. The ualberta. Ca/dept/chemen…

En.wikipedia.org/wiki/Disk_p…

En.wikipedia.org/wiki/Master…

En.wikipedia.org/wiki/Bootin…

www.computerhope.com/jargon/f/fi…

En.wikipedia.org/wiki/File_a…

En.wikipedia.org/wiki/Make_ (…

Unix.stackexchange.com/questions/6…

www.computerhope.com/jargon/d/di…

www.computerhope.com/jargon/r/re…

baike.baidu.com/item/ SSD /4…

Modern operating systems, fourth edition

Modern Operation System Fourth

Not too hard! This article will take you through the file system

file

The file name

File structure

The file type

File access

File attributes

File operations

directory

Primary directory system

Hierarchical directory system

The path name

Directory operations

File system implementation

File system Layout

Boot block

superblock

Free space block

debris

inode

File implementation

Continuous distribution

Distribution of the list

Use memory tables for linked list allocation

inode

Directory implementation

The Shared file

Log structured file system

Log file system

Virtual file system

File system management and optimization

Disk Space Management

The block size

Record free block

Disk quota

File system Backup

Physical dump and logical dump

File system consistency

File System Performance

The cache

Block read ahead of time

Reduces the disk arm movement

Disk defragmentation

Related Posts

How to use Jxls to export multi-sheet Flexible header Excel file

Spring integrates Common Swagger errors and solutions

mongodb