Writing in the front

Oddly enough, on the first day of work, the distributed file system in the production environment crashed. I just came to my station and sat down. The phone rang. It was from the operation, “Hey, Ice, look quickly. . You’re talking about an accident where I’m not in operations, but I call to show me the production environment? It turned out that the operation and maintenance buddy didn’t go to work, well, I accepted, so I quickly tidy up the workstation, put out the computer, log in to the server, a meal of fierce operation like a tiger, 10 minutes, the rest is asynchronous replication of pictures and videos.

Today, I would like to share with my friends the problem of distributed file system in production environment, and how I troubleshoot and solve the problem in 10 minutes. Also, this article is not based on the production environment accident, but on the environment I simulated on my native VIRTUAL machine afterwards. The idea and method of solving the problem are the same.

Well, it is estimated that operation and maintenance will be 3.25!!

The article has been included:

Github.com/sunshinelyz…

Gitee.com/binghe001/t…

Problem orientation

After you log in to the server and view the system access log, the following exception information is displayed in the log file.

org.csource.common.MyException: getStoreStorage fail, errno code: 28
	at org.csource.fastdfs.StorageClient.newWritableStorageConnection(StorageClient.java:1629)
	at org.csource.fastdfs.StorageClient.do_upload_file(StorageClient.java:639)
	at org.csource.fastdfs.StorageClient.upload_file(StorageClient.java:162)
	at org.csource.fastdfs.StorageClient.upload_file(StorageClient.java:180)
Copy the code

It is obvious that the system cannot upload files. This log information is very important and plays a crucial role in troubleshooting the problem.

The analysis reason

Since there is a problem with uploading files, can I first try to access the files uploaded before? After verification, the previously uploaded file is accessible, again verified to be the problem of uploading files.

Since the production environment uses a distributed file system, there are generally no problems with file uploads. The probability is high that the disk space on the server is insufficient. Well, I’ll follow up on that.

As a result, I used df -h to check the storage usage of the server and it reached 91%.

Well, disk space could be the cause of the problem. Next, check to see if disk space is causing the problem.

So, I open the configuration of tracker.conf in /etc/fdfs/and see that the storage space reserved is 10% (note: the distributed file system uses FastDFS).

From here, you can confirm that the problem is caused by insufficient disk space.

The cause is as follows: 91% of the disk space on the server is used, but 10% of the disk space is reserved in the distributed file system configuration. When uploading files, the system detects that the remaining disk space on the server is less than 10%, throws an exception and refuses to upload files.

Now that the cause of the problem has been identified, it’s time to fix it.

To solve the problem

First, there are two ways to solve this problem. One is to delete unnecessary files. Another is to expand disk space.

Delete unnecessary files

This way to use carefully, here, I also briefly introduced this way. I gave you guys a couple of ways to recursively delete.

Recursively delete. Pyc files.

find . -name '*.pyc' -exec rm -rf {} \;
Copy the code

Prints files of the specified size under the current folder

find . -name "*" -size 145800c -print
Copy the code

Delete files of specified size recursively (145800)

find . -name "*" -size 145800c -exec rm -rf {} \;
Copy the code

Deletes files of specified size recursively and prints them

find . -name "*" -size 145800c -print -exec rm -rf {} \;
Copy the code

Here are some brief explanations of the above commands.

  • "."Indicates a recursive search starting from the current directory
  • "- the name '*. Exe'"By name, look for all folders or files that end in.exe
  • " -type f "The search type is file
  • "-print"Output the name of the file directory searched
  • -size 145800cSpecifies the size of the file
  • -exec rm -rf {} \;Recursive delete (result of previous query)

Expanding disk Space

Glacier recommends this approach here, and IT’s the same approach I use to fix failures in my production environment.

Check the disk space of the server and find that the space in the /data directory is 5TB. Why don’t the o&m guys point the data storage directory of the file system to /data? As a result, I started to migrate the data storage directory of the file system to the /data directory, as shown below.

Note: here, I simply simulate migrating /opt/fastdfs_storage_data to /data.

(1) Copy files and migrate data

cp -r /opt/fastdfs_storage_data  /data
cp -r  /opt/fastdfs_storage  /data
cp -r /opt/fastdfs_tracker /data 
Copy the code

(2) Modify the path

Here need to modify the file system/etc/FDFS/storage. Conf, mod_fastdfs. Conf. Client. Conf, tracker. The conf file.

  • /etc/fdfs/storage.conf
store_path0=/data/fastdfs_storage_data
base_path=/data/fastdfs_storage
Copy the code
  • /etc/fdfs/mod_fastdfs.conf
Store_path0 =/data/ fastDFs_storage_data (two places) base_path=/data/ fastDFs_storageCopy the code
  • /etc/fdfs/client.conf
base_path=/data/fastdfs_tracker
Copy the code
  • /etc/fdfs/tracker.conf
base_path=/data/fastdfs_tracker
Copy the code

Re-establish the symbolic connection between M00 and the storage directory: ln -s /data/fastdfs_storage_data/data/ data/fastdfs_storage_data/data/M00

(3) Kill the process and restart the storage service (tracker and memory)

Run the following commands in sequence

  pkill -9 fdfs
  service fdfs_trackerd start
  service fdfs_storaged start
Copy the code

(4) Modify the file read path nginx configuration

location ~/group1/M00{
	root /data/fastdfs_storage_data/data;
}
Copy the code

(5) Restart nginx

cd /opt/nginx/sbin
./nginx -s reload
Copy the code

Ok, problem solved, the operation can upload pictures and videos normally.

If you have any questions, you can leave a comment below or add me to wechat: SUN_shine_LYz. I will pull you into the group. We can exchange technology together, advance together, and make great force together