preface

Nowadays, our daily life has been surrounded by different types of networks, such as telephone network, enterprise network, home network and various types of LOCAL area networks, which together constitute the network we call the Internet. Therefore, we can assert that the Internet is made up of networks of different types, regions and domains. We can see that the Internet does not have a centralized control center, but consists of a large number of separate and connected nodes. This is a decentralized model. We can compare this concept to the concept of distribution that we are going to talk about.

The concept of distribution was born under the premise of network. Traditional computing is centralized, using powerful servers to handle a large number of computing tasks, but such supercomputers are expensive to build and maintain, and there are significant bottlenecks. In contrast, if a system can be a massive computing power is needed to deal with split into many small blocks, and then assign these small pieces of the same set of different computing nodes in the system for processing, finally consolidated results calculated separately if necessary to get the final result, then it will be this kind of system is called a distributed system. For such a system, we will use a variety of ways to carry out data communication and coordination between different nodes, and network message is one of the commonly used means.

From the above description, we can basically think of a distributed system that uses hardware resources and software components on the network for computing, and each computing node communicates with each other in a certain way. This is a brief overview of the concept of distributed systems from a computer science perspective.

If we consider the network as the key factor, we can allocate computing to different computing nodes in the network to make full use of computing resources in the network, and these nodes may exist in different regions with a certain distance in space. Although this explanation is less formal, it vividly illustrates the basic property of distribution from another Angle, namely node distribution.

The body of the

This book takes the most basic concepts of cloud computing and big data as an introduction, and gradually introduces the knowledge needed for the programming of high-performance distributed real-time processing system. The architecture and internal implementation of real-time processing system are described in detail.

Chapter 1 introduces some basic concepts of distributed system and some important knowledge points needed to develop real-time processing system.

Chapter 2 introduces the basic concepts of distributed system communication, including TCP/IP and Socket, and provides knowledge for future development of network library Meshy.

Chapter 3 introduces the high-level abstractions required for distributed system communication, including RPC remote procedure calls, RESTful. Common communication models such as message queues. Basic serialization concepts and solutions were introduced, and a simple bulletin board service was developed using Thrit to establish communication abstractions and basic concepts on a framework for Hurricane development.

Chapter 4 introduces the basic and advanced knowledge needed for high performance programming in C++, including memory resource management, coding solutions, concurrent and asynchronous processing, and memory management techniques in C++, as well as C++11 related to memory management, coding processing, and threading model.

Chapter 5 introduces the basic concept of distributed processing system, including the difference between batch processing and real-time processing, the basic introduction of Hadoop and Storm and the basic model. Finally, the basic idea of reliable message processing is introduced.

Chapter 6 introduces the overall architecture and interface design of real-time processing system, including message source, message processor, data collector, tuple and serialization interface.

Chapter 7 introduces the design and implementation of service components, including Executors and their message queues, dynamic loading, and Task design and implementation.

Chapter 8 introduces the design and implementation of management services, including the architecture design and programming implementation of cluster Manager President and node Manager.

Chapter 9 introduces the realization of the interface of each part in the real-time processing system, including the realization of message source, message processing unit and data collector.