System Architecture Design Notes (5) - Operating system file management

The management of computer by operating system includes two aspects: hardware resources and software resources. Hardware resource management includes CPU management, memory management, and device management. It mainly solves the problem of effective and rational use of hardware resources.

Software resources include all kinds of system programs, all kinds of applications, all kinds of user programs, and also include a lot of documentation materials, library functions and so on. Each software resource itself is a collection of relevant information with certain logical meaning, which is stored in the form of files in the operating system.

One of the important functions of computer system is to process a large amount of information quickly, so the organization, access and protection of data has become a very important content. The file system is an important part of the operating system for organizing, accessing, and protecting data.

The functions of file management include: creating, modifying and deleting files; Access files by filename; To decide the storage location, storage form and access permission of file information; Manage the communication between files and provide file sharing, protection and confidentiality. Allow multiple users to work together without causing confusion.

File sharing means that a file can be shared by multiple users to reduce repetitive work, save file storage space, and reduce the number of input/output files.
File protection is mainly to prevent damage to files caused by improper operations.
The files are kept secret to prevent unauthorized users from accessing them.

The protection and confidentiality of files is actually a problem of users’ access control to files. Two levels of control are generally set up for file access: the first level is visitor identification, that is, to specify who can access; Level 2 is the recognition of access rights, that is, the right to participate in what the visitor can do to the file.

1 Logical structure of the file

The file structure refers to the file organization form, which is called the logical structure of the file from the perspective of the user.

Files are logically organized for user convenience. The logical structure of general files can be divided into two types: unstructured character stream files and structured record files. A record file consists of records, that is, the information in a file is divided into multiple records, and the information is organized and used by records.

Record files include sequential files, indexed sequential files, indexed files, and direct files. (1) Sequential files. Most files are sequential files. The record length of the sequential file is fixed, and the type length and order of the data items in the record are fixed. Generally, there is a data item that can uniquely identify the record, called key, and the record is organized by the convention order of key values. Sequential files are often used in batch applications and do not perform well for querying or updating a record. (2) Index sequential files. The index order file is organized based on the convention order of the keys and maintains the index and overflow regions of the keys. Key indexes can also be multilevel indexes. Indexed sequential files are suitable for both interactive and batch applications. (3) Index file. Index files are organized based on a key data item of the record. Many applications need to access files according to other data items. For this purpose, index file method is often adopted, that is, to build indexes for the records in the main file according to the required data items (one or several). The index file itself is a sequential file organization. (4) Direct documents. Direct files are also called Hash files. Records are accessed directly (randomly) with their physical address on the direct access storage device. Direct files are often used in applications that require high-speed access to files and access only one record at a time.

2 Physical structure of the file

The physical structure of files refers to how files are stored on storage devices. The physical structure of files focuses on improving memory efficiency and reducing access time. File storage devices are usually divided into physical blocks of equal size, which are the basic units for allocating and transferring information. The physical structure of a file depends on the blocking policy and file allocation policy of the file storage device, and determines the storage location of file information on the storage device. Common file allocation policies are as follows:

(1) Sequential allocation (continuous allocation). That’s the easiest way to do it. A set of contiguous physical blocks is pre-allocated when a file is created, and the information (or records) is then stored in the physical blocks in order of the information (or records) in the logical file. In this way, only need to know the starting position of the file on the file storage device and the file length, can access, this allocation method is suitable for sequential access, in the continuous access to adjacent information, the access speed is fast. Its disadvantage is that the information length of the file must be specified when the file is created, and cannot be dynamically increased later. It is generally not suitable for files that need to be modified frequently.

(2) link allocation (series allocation). This is done individually by individual physical blocks. Each physical block (usually the last unit) has a pointer to the address of the next physical block to which it is connected, so that all physical blocks are linked together to form a link queue. When creating a linked file, you do not need to specify the length of the file. In the description of the file, you only need to specify the number of the first physical block of the file, and the length of the linked file can be dynamically increased. A block of information can be inserted or deleted by simply adjusting the pointer between the physical blocks. The advantage of link allocation is that it can solve the problem of memory fragmentation and improve the utilization of storage space. Because link files can only be searched according to the order of link Pointers in the queue, the search efficiency is low, and generally only applies to sequential access, not random access. (3) Index allocation. This is another method of discontinuous allocation of file storage. A system that uses the index allocation method creates an index table for each file. Each entry in the index table indicates the logical block number and the corresponding physical block number of the file information. Index allocation can not only meet the requirement of file dynamic growth, but also realize random access conveniently and rapidly. For some large files, index table allocation problems can occur when the index table size exceeds one physical block. Multi-level (indirect indexing) techniques are commonly used, in which the physical blocks indicated by the index table store not the file location but the physical block address where the file information is stored. If a physical block can store N addresses, then a level of indirect indexing reduces the addressable file length to n ^ 2 blocks. Secondary or even tertiary indirect indexes can be used for larger files (for example, UNIX operating systems use tertiary index structures).

Index files have the advantage of being suitable for both sequential and random access. The disadvantage is that indexed tables increase the storage space overhead. In addition, to access a file, you need to access the disk twice, once to access the index table, and once to access the file information based on the physical block number provided by the index table. An improved way to improve efficiency is to pre-load the index table into memory before operating on a file. In this way, file access can be directly determined from the index table in memory corresponding to the physical block number, thus only need to access the disk once.

3 File Storage Device Management (Storage Space Management)

File storage device management means that the operating system must manage storage space effectively. Since file storage devices are divided into many physical blocks of the same size and exchange information in blocks, the management of file storage devices is essentially a problem of organizing and managing free blocks. It includes the organization of free blocks, the allocation of free blocks and the recovery of free blocks.

There are three different free block management methods: indexing, linking, and bitmap.

(1) Index method. Indexing takes free blocks as files and uses indexing techniques. To be effective, an index corresponds to a free area or a number of free blocks. In this way, each free block on disk corresponds to an entry in the index table, which effectively supports every file allocation method.

(2) Link method. Linking method is divided into two types: free disk block chain and free disk area chain:

[1] Free disk block chain: Use a linked list to organize free blocks together. When the applicant needs free blocks, the allocator picks the required free blocks from the head of the chain. Instead, the hypervisor hooks the reclaimed free blocks to the end of the queue one by one, and this applies to each file allocation method. Free blocks can be linked by the order in which they are released, or by the size of the free block. The latter favors requests for consecutive free blocks, but there is a bit more overhead when allocating requests and reclaiming free blocks.

[2] Free zone chain: All free zones on the disk are drawn into a chain, and each zone contains several Pointers to indicate the next free zone and information about the size of the zone. When allocating disk blocks, the first adaptation algorithm (explicit chaining method) is usually used. When you recycle, merge the recycle area with the free disk area.

(3) bitmap method. This method is to establish a Bitmap on the external memory to record the use of file memory. Each bit corresponds to only one physical block on the file storage. 0 and 1 indicate free and occupied respectively. The physical blocks on file storage are numbered: 0, 1, 2… . If a word in the system is 32 bits long, that is, a word can represent 32 physical blocks. Given 4096 physical blocks, it takes 128 words (4096/32=128) to represent these physical blocks.

A bitmap represents the usage of a block on a disk using one bit of binary, as shown in Figure 2. If the value is 0, the corresponding disk block is free. If the value is 1, it indicates that the vm has been allocated. A set composed of the corresponding bits of all disk blocks is called a bitmap.

The following is an example: A disk group has 10 disks, and each disk has 100 tracks. Each track has 32 sectors. Assume that the size of a physical block is two sectors, and the physical block is allocated.

If bitmaps are used to manage disk space, how many bytes of space do bitmaps take up?
If blank files are used to manage disk space and each entry of the blank file directory occupies 5 bytes, when the number of blank files is greater than the number of bytes occupied by the blank file directory is greater than the number of bytes occupied by the bitmap?

Analysis process:

Total sectors = 10 * 100 * 32=32000; If each physical block has two sectors, the total number of physical blocks = 3200/2 =16000.
Assume bitmaps are used to manage disk space. A byte is usually 8 bits, so all of these physical blocks require 2000 (16000/8) bytes of space.
Assume that a blank file is used to manage disk space. If an entry occupies five bytes, 2000/5=400. That is, when the number of blank text items is greater than 400, the number of bytes occupied by the file directory is greater than the number of bytes occupied by the bitmap.

(4) Group linking method Group linking method is a free disk block management method combining the free table and free linked list method, which is suitable for large file systems.

4 Tree directory structure

In the computer file system, the tree directory structure is generally adopted. In the tree directory structure, the root of the tree is the root directory, the data file is the leaf, and all other directories are the nodes of the tree.

The root directory is hidden in a partition of a hard disk. The root directory is at the top level. The subdirectories it contains are level 1 subdirectories. Each level 1 subdirectory can contain several level 2 subdirectories… Such an organizational structure is called a directory tree.

The current disk and current directory are the default operation objects. If the user does not specify the operation object, the system points the user command to the current disk and the current directory.

A path is a sequence of all directories in the directory tree from the root or current directory to the access object (directory or file). For example, c:\temp\ is a path in Windows. In the tree directory structure, there is only one unique path from the root directory to any data file. Starting from the root, all directory names and data files are connected by “/” (UNIX/Linux system) or “\” (Windows system) to form the path name of the data file. And the path name of each data file is unique. In this way, the problem of file duplication can be solved.

The path starting from the root of the tree is an absolute path, which is not very convenient if the file system has many levels. Therefore, the relative path is introduced, that is, the path starts from the current directory, then passes through the intermediate directory files, and finally reaches the data file to be accessed.

An absolute path gives a complete description of the location of a file or directory, usually starting at the top of the hierarchy (root) and usually starting with a “/” (UNIX/Linux systems) or a drive letter (Windows systems). Relative paths typically start at the current location in the directory structure and are generally shorter than absolute paths.

The parent directory is the directory at the upper level of the current path. Each directory has a “representing the current directory. File and “.. “representing the parent directory of the current directory. File, relative path name is generally from “..” In the first place.

In Windows, there are two formats of files, FAT32 (FAT16) files and NTFS files. NTFS generates less disk fragmentation than FAT32, is more secure, and supports a larger single file capacity than 4GB, which is especially suitable for today’s mass storage. NTFS can support partitions (called volumes if dynamic disks are used) up to 2TB in size, while FAT32 in Windows 2000 supports partitions up to 32GB in size.

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

System Architecture Design Notes (5) – Operating system file management

1 Logical structure of the file

2 Physical structure of the file

3 File Storage Device Management (Storage Space Management)

4 Tree directory structure

System Architecture Design Notes (5) – Operating system file management

1 Logical structure of the file

2 Physical structure of the file

3 File Storage Device Management (Storage Space Management)

4 Tree directory structure

Related Posts

Fast search data crawler technology example installation teaching

Diy web server Development (2)

Old languages are still strong. Why can’t new languages like GO and Kotlin break the game?