Ru ape from the research of the middleware project, by the venture capital liking!

background

The thing is, in my confucianist architecture class, there is a project that takes students to develop a production-level middleware [distributed massive small file system supporting 100 million pictures] from scratch.

After I talked about the project, the technical team deployed to the production environment and finished a series of pressure tests, the project code and pressure reports were open-source to the code cloud: Ruape middleware – self-developed distributed small file storage system

All of a sudden, one day after some time, a VC investor contacted and asked if they were considering commercialization of distributed Massive small file system.

However, the author declined in the end. For those who are interested in this middleware course and want to join the open source community, please scan the QR code (Giotto1245) and we will iterate on this project together.

Many students who are not familiar with this self-developed distributed mass data small file system are not familiar with it. So I’m going to tell you what is this little file system? How to do architectural design? What are the techniques used? What are the problems encountered during production deployment and how are they solved? Next, I’ll give you a brief introduction.

Architecture design

The architecture of distributed mass data small file system mainly includes:

High concurrency architecture
Highly available architecture
High-performance Architecture
Scalable architecture

The overall structure of the project is as follows:

File transfer protocol

File transfer is required in several scenarios in a cluster. For example, file transfer is performed between clients and Datanodes for uploading and downloading files, and FsImage file transfer is performed between BackupNodes and NameNode.

So we designed a protocol for file transfer. The network packet for file transfer includes the packet type, file metadata, and file content binary data.

Block transfer design

In order to solve the problem of too large message transmission and low transmission efficiency, we design a block transmission protocol. If the response to the request written back by the server is large (exceeding the maximum message length), the server can split the packet into n packets based on whether the request supports block transmission. After receiving the packet, the NetClient checks whether the packet is complete according to a certain mechanism. When all the response packets are received, they are combined into one package and returned to the user.

NameNode Federated architecture

In order to solve the memory growth pressure caused by large and massive small files, the federated architecture of NameNode is developed. Simply speaking, multiple NameNode nodes form a cluster, and each NameNode node stores part of the data of the entire memory directory tree.

Online playback of performance parameters

After the completion of project development, we did full-link monitoring of the instantaneous QPS, JVM, network, IO and CPU of file upload and download.

Problem inventory in production environment

After the project was deployed in production, we encountered some of the following online issues. These problems have been corrected and debugged and have done a detailed fault record, analysis, sorted into a set of documents.

OOMKiller kills the Spring Boot transmitter
The request response times out because the bandwidth is full
Uneven DataNode traffic
CPU 100% problem caused by too many threads
NameNode upload file request between throughput and consistency
How to optimize the throughput decrease caused by disk flushing

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Ru ape from the research of the middleware project, by the venture capital liking!

directory

background

Architecture design

File transfer protocol

Block transfer design

NameNode Federated architecture

Online playback of performance parameters

Problem inventory in production environment

END

pleasantly surprised

Ru ape from the research of the middleware project, by the venture capital liking!

directory

background

Architecture design

File transfer protocol

Block transfer design

NameNode Federated architecture

Online playback of performance parameters

Problem inventory in production environment

END

pleasantly surprised

Related Posts

SpringBoot online collaborative office small program development full stack project strength V: ITITIT111222333

Database password configuration items are not encrypted? The heart is too big!

New time apis in JDK8 :Duration Period and ChronoUnit