directory

  • background
  • Architecture design
  • File transfer protocol
  • Block transfer design
  • NameNode Federated architecture
  • Online playback of performance parameters
  • Problem inventory in production environment

background

The thing is, in my confucianist architecture class, there is a project that takes students to develop a production-level middleware [distributed massive small file system supporting 100 million pictures] from scratch.

After I talked about the project, the technical team deployed to the production environment and finished a series of pressure tests, the project code and pressure reports were open-source to the code cloud: Ruape middleware – self-developed distributed small file storage system

All of a sudden, one day after some time, a VC investor contacted and asked if they were considering commercialization of distributed Massive small file system.

However, the author declined in the end. For those who are interested in this middleware course and want to join the open source community, please scan the QR code (Giotto1245) and we will iterate on this project together.

Many students who are not familiar with this self-developed distributed mass data small file system are not familiar with it. So I’m going to tell you what is this little file system? How to do architectural design? What are the techniques used? What are the problems encountered during production deployment and how are they solved? Next, I’ll give you a brief introduction.

Architecture design

The architecture of distributed mass data small file system mainly includes:

  • High concurrency architecture

  • Highly available architecture

  • High-performance Architecture

  • Scalable architecture

    The overall structure of the project is as follows:

File transfer protocol

File transfer is required in several scenarios in a cluster. For example, file transfer is performed between clients and Datanodes for uploading and downloading files, and FsImage file transfer is performed between BackupNodes and NameNode.

So we designed a protocol for file transfer. The network packet for file transfer includes the packet type, file metadata, and file content binary data.

Block transfer design

In order to solve the problem of too large message transmission and low transmission efficiency, we design a block transmission protocol. If the response to the request written back by the server is large (exceeding the maximum message length), the server can split the packet into n packets based on whether the request supports block transmission. After receiving the packet, the NetClient checks whether the packet is complete according to a certain mechanism. When all the response packets are received, they are combined into one package and returned to the user.

NameNode Federated architecture

In order to solve the memory growth pressure caused by large and massive small files, the federated architecture of NameNode is developed. Simply speaking, multiple NameNode nodes form a cluster, and each NameNode node stores part of the data of the entire memory directory tree.

Online playback of performance parameters

After the completion of project development, we did full-link monitoring of the instantaneous QPS, JVM, network, IO and CPU of file upload and download.

Problem inventory in production environment

After the project was deployed in production, we encountered some of the following online issues. These problems have been corrected and debugged and have done a detailed fault record, analysis, sorted into a set of documents.

  • OOMKiller kills the Spring Boot transmitter

  • The request response times out because the bandwidth is full

  • Uneven DataNode traffic

  • CPU 100% problem caused by too many threads

  • NameNode upload file request between throughput and consistency

  • How to optimize the throughput decrease caused by disk flushing

END

pleasantly surprised