Introduction to the

Sometimes our system mainly processes and transforms the input data independently of each other. In this case, the input data is transformed and then put into the specified output.

In daily work, we often encounter such data processing tasks, so for such tasks we can adopt data flow architecture.

Data flow architecture

In practice, there are many kinds of flows, the most common are I/O flows, I/O buffers, pipes, etc. Different components or modules are connected through these flows. The flow of data can be a topology with loops, a linear structure without loops, or a tree structure, etc.

The main purpose of the data flow architecture is to enable reuse and easy modification. It is suitable for a series of well-defined independent data transformations or calculations over sequentially defined inputs and outputs, such as compilers and business data processing applications. In general, there are three basic data flow structures.

Sequential batch

Sequential batch processing is the most common and fundamental data flow architecture. As a whole, the data will go through processing units one by one. After the processing of the previous processing unit, the data will enter the next processing unit.

Let’s look at the flow chart of sequential batch processing:

Data is transferred from one processor to another as a whole. The interaction is mainly through temporary files. The output of each processor is taken as the input of the next processor, through data processing over and over again, and finally get the desired result.

The advantage of sequential batch processing is that each process is independent, and they are combined to form an overall sequential processing architecture.

Of course, the disadvantage is not parallel, only serial execution, throughput is not enough. Each processor only interacts with each other through intermediate files, and the degree of interaction is not high.

Piping and filter

Sequential batch processing can vary greatly from processor to processor, and generally they are different systems. If you are working on data flow tasks in the same system, you need pipes and filters.

Java 8 introduces the concepts of streams and pipes. A collection can be converted to a stream, and by operating on the stream, the entire data stream can be transformed to get the desired result.

This approach emphasizes incremental transformations of data by successive components. In this approach, data flows are driven by data, and the entire system can be decomposed into components such as data sources, filters, pipes, and data sinks.

The connection between modules is a data stream, which is a first in/first out buffer and can be a byte stream, a character stream, or any other type of such stream. The main advantages of this architecture are its concurrency and incremental execution.

In this mode, the most important component is the filter, which is an independent data stream converter. It transforms the data from the input data stream, processes it, and writes the transformed data stream to the pipe for processing by the next filter. It works in incremental mode and starts working as soon as the data arrives through the connected pipe.

The data in the figure above starts from the pipe and passes through the filters to get the final result.

There are two types of filters, which are active filter and passive filter. The active filter can actively pull data from the pipeline and push out the processed data. This mode is mainly used for UNIX pipes. The passive filter is responsible for receiving the data pushed into the pipeline.

The advantage of this pattern is that it provides high concurrency and high throughput. The disadvantage is that it is not suitable for dynamic interaction.

Process control

There is also a mode, neither batch nor pipeline mode, which controls the different execution flow based on the input. Similar to the judgment statement we use in our program.

conclusion

Above we have introduced several data flow architecture ways, hope you enjoy.

Author: Flydean program stuff

Link to this article: www.flydean.com/07-data-flo…

Source: Flydean’s blog

Welcome to pay attention to my public number: “procedures those things” the most popular interpretation, the most profound dry goods, the most concise tutorial, many you do not know the small skills you find!