Introduction to the

Sometimes our system mainly processes and transforms the input data independently of each other, in which case the input data is converted and then put into the specified output.

In daily work, we often encounter such data processing tasks, so we can use data flow architecture for such tasks.

Data flow architecture

There are many kinds of streams in practice, the most common being I/O streams, I/O buffers, pipes, etc. Different components or modules are connected through these flows. The flow of data can be a topology with cycles, a linear structure without cycles, or a tree structure.

The main purpose of a data flow architecture is to enable reuse and easy modification. It is suitable for performing a series of well-defined, independent data transformations or computations on sequentially defined inputs and outputs, such as compilers and business data processing applications. Generally speaking, there are three basic data flow structures.

Sequential batch processing

Sequential batch processing is the most common and fundamental data flow architecture. Data, as a whole, will go through one processing unit after another, and will only enter the next processing unit after the processing of the previous processing unit is finished.

Let’s look at the flow chart of sequential batch processing:

Data is passed as a whole from one processor to another. The interaction is mainly through temporary files. The output of each processor is used as the input of the next processor, and the data is processed again and again to get the desired result.

The advantage of sequential batch processing is that each process is independent, and they are combined to produce an overall sequential processing architecture.

Of course, the disadvantage is not parallel, only serial execution, throughput is not enough. Each processor interacts only through the intermediate file, and the degree of interaction is not high.

Piping and strainers

In sequential batch processing, the functions of each processor are quite different, and generally they are different systems. If you are working on data flow tasks in the same system, then you need to use pipes and filters.

Java 8 introduces the concept of streams and pipes. A collection can be converted to a stream, and by operating on a stream, the entire data stream can be transformed to get the desired result.

This approach emphasizes the incremental transformation of data by successive components. In this approach, the data flow is driven by data, and the entire system can be decomposed into components such as data sources, filters, pipes, and data receivers.

The connection between modules is a data stream, which is a first in/first out buffer, which can be a byte stream, a character stream, or any other type of such stream. The main advantage of this architecture is its concurrent and incremental execution.

In this mode, the most important component is the filter, which is a standalone data stream converter. It converts the data into the data stream, processes it, and writes the converted data stream to the pipe for processing by the next filter. It works in incremental mode, starting as soon as data arrives through the connected pipe.

The data in the figure above starts from the pipe and passes through a series of filters to get the processed result.

There are two types of filters, active filters and passive filters. Active filters can actively pull data from the pipe and push the processed data out. This pattern is mostly used for UNIX pipes. The passive filter is responsible for receiving the data pushed by the pipe.

The advantage of this pattern is that it provides high concurrency and high throughput. The disadvantage is that it is not suitable for dynamic interaction.

Process control

Another mode, which is neither batch processing nor pipeline mode, controls different execution processes depending on the input. Similar to the judgment statement we use in our program.

conclusion

Above, we have introduced several data stream architecture approaches that we hope you will enjoy.

The Flydean program stuff

This paper links: http://www.flydean.com/07-data-flow-architecture/

This article is from Flydean’s blog

Welcome to pay attention to my public number: “program those things” the most popular interpretation, the most profound dry goods, the most concise tutorial, many you do not know the tips you to find!