Have you ever run into a problem where there are so many files in a folder that you have to wait a long time for them to appear? If so, have you thought about why and how we can fix it? To understand the cause of this problem, we need to start with the disk space occupied by folders.

Inode consumption verification

In how much Disk Space does Creating an Empty file Take? I mentioned that each file consumes a bit of space in its folder. Folders also consume inodes. Let’s take a look at the current inode usage

# df -i  
Filesystem            Inodes   IUsed   IFree IUse% Mounted on
......
/dev/sdb1            2147361984 12785020 2134576964    1% /search
Copy the code

Create an empty folder

# mkdir temp
# df -i
Filesystem            Inodes   IUsed   IFree IUse% Mounted on
......
/dev/sdb1            2147361984 12785021 2134576963    1% /search
Copy the code

As you can see from IUsed, an empty folder consumes an inode, just like an empty file. However, this is very small, my machine is only 256 bytes, should not cause the ls command card master.

Block consumption verification

Where is the folder name stored? Well, and how much disk space does a New Empty file Take? An ext4_dir_entry_2 (ext4, defined in fs/ext4/ex4.h) will be consumed and placed in its parent block. As you can quickly imagine from this, if it creates a bunch of files under its own node, it will occupy its own block. Let’s try it out:

# CD test # du -h 4.0kCopy the code

Four kilobytes means one block has been consumed. Empty files do not consume blocks. Empty directories consume blocks in the first place because they must have two directory entries “.” and “..” by default. . And the 4K doesn’t have to be this big on your machine, it’s actually a block size that you decide when you format it.

Let’s create two new empty files and check again:

# touch aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab
# touch aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
# du -h
4.0K    .
Copy the code

Not much has changed, it seems. This is because

  • First, the new empty file does not occupy the block, so it still shows the block occupied by the directory.
  • Second, the folder created before the allocation of 4KB inside the free space, enough to put the next two file items

Let’s try creating a few more. Use the script to create 100 empty files with 32Byte names.

#! /bin/bash for((i=1; i<=100; i++)); do file="tempDir/"$(echo $i|awk '{printf("%032d",$0)}') echo $file touch $file doneCopy the code
# du -h
12K    .
Copy the code

Haha, now we see that the directory takes up 3 blocks of disk space. When we got to 10,000 files,

# du -h
548K     .
Copy the code

Each ext4_dir_entry_2 file contains information such as the inode number in addition to the file name:

struct ext4_dir_entry_2 {
        __le32  inode;                  /* Inode number */
        __le16  rec_len;                /* Directory entry length */
        __u8    name_len;               /* Name length */
        __u8    file_type;
        char    name[EXT4_NAME_LEN];    /* File name */
};
Copy the code

Let’s calculate that the average space taken up by each file =548K/10000=54 bytes. In other words, it’s a little bit bigger than our file name 32 bytes, which is basically correct. Here we also take note of the fact that the longer the file name, the more space it consumes in its parent directory.

This article conclusion

A folder of course also consumes disk space.

  • The first thing to consume is an inode, which on my machine is 256 bytes
  • Consumes a directory entry in its parent directoryext4_dir_entry_2Save your inode number and directory name.
  • If you create folders or files underneath it, it needs to be in its own blockext4_dir_entry_2An array of

The more files/subdirectories in a directory, the more blocks a directory needs to apply for. In addition, the size of ext4_dir_entry_2 is not fixed. The longer the file name/subdirectory name, the larger the space consumed by a single directory entry.

As for the problem in the opening paragraph, I think you now understand why. The problem is the block in the folder. That is, if you have a folder with a lot of files under it, especially if the file name is long, it will consume a lot of blocks. If the Page Cache does not hit the block you want to access while you are traversing the folder, it will pass through to disk for actual IO. From your point of view, you are stuck after you execute ls.

So you’re asking, I do have to save a lot of files, what do I do? In fact, it is very simple to create more folders, a directory do not store too much, there will not be this problem. In engineering practice, it is common practice to hash files into multiple directories using level 1 or even level 2 hash to keep the number of files in a directory below 100,000 or 10,000.

Ext bug

It looks like we’re done for today, so let’s delete all the files we just created and take a look.

# rm -f *
# du -h
72K     .
Copy the code

Wait, what’s going on? All files in this folder have been deleted. Why does this folder still occupy 72 KB of disk space? This question stayed with me for a long time before it was finally solved. The key is rec_len in ext4_DIR_entry_2. This variable stores the current length of the entire ext4_dir_entry_2 object, so that when the operating system traverses the folder, it can add this length to the current pointer to find the dir_entry of the next file in the folder. The advantage is that it’s very convenient to traverse, kind of like a linked list, one by one. However, if you want to delete a file, there is a little trouble, the current file structure variable cannot be deleted directly, otherwise the linked list will be broken. Linux does not reclaim the entire ext4_dir_entry_2 object by simply setting inode to 0 in its directory when deleting a file. In fact, it’s the same thing that you often use when you’re doing engineering. The XFS file system doesn’t have this problem anymore, but I haven’t studied how to solve it. If you have an answer, please leave a comment!



The development of internal training of hard disk album:

  • 1. Disk opening: remove the hard coat of mechanical hard disk!
  • 2. Disk partitioning is also a technical trick
  • 3. How can we solve the problem that mechanical hard disks are slow and easy to break?
  • 4. Disassemble the SSD structure
  • 5. How much disk space does a new empty file occupy?
  • 6. How much disk space does a 1-byte file occupy
  • 7. Why is the ls command stuck when there are too many files?
  • 8. Understand formatting principles
  • 9. Read file How much disk IO actually occurs per byte?
  • 10. When to initiate disk I/O write after one byte of the write file?
  • 11. Mechanical hard drive random IO is slower than you might think
  • 12. How much faster is an SSD server than a mechanical one?

My public account is “developing internal Skills and Practicing”. Here I am not simply introducing technical theories, nor only introducing practical experience. But to combine theory with practice, with practice to deepen the understanding of theory, with theory to improve your technical practice ability. Welcome you to follow my public number, also please share with your friends ~~~