Flink Real time engine project combat

I am 3Y, a markdown programmer with one year CRUD experience and ten years’ experience 👨🏻💻 known as a professional octagon player for many years

Those of you who have pulled the Austin project code recently may have noticed an additional Austin-Stream module. It’s not surprising, because it all went according to plan.

This module is mainly to access the streaming processing platform (FLink), which is used for real-time calculation of cleaning data to business and system maintainers to make it easier to use the message push platform Austin.

This article will mainly talk about the background of access and my shallow experience

01. Why stream processing platform

I had some experience with data at my old employer and saw the development of “performance data” for ads on the site.

The so-called effect data, to put it bluntly, means that the merchants have put advertisements on the platform. We need to show the effects brought by the advertisements to the merchants. The core is “exposure”, “click” and “order”, and then aggregate some ROI-like indicators based on these data.

Let’s take a look at this “evolution” and see why we need a streaming platform

1. PHP stage: In the initial stage, the business and system structure were relatively simple. “Click” and “order” were stored in the database table, and the shuttle obtained the final effect data through the full aggregation of scheduled tasks, while “exposure” data was written into the effect data table the next day.

In this stage, due to the small amount of data, it is not impossible to aggregate the data through the full amount of scheduled tasks, and merchants can accept the delay of the service at that time

2. Java stage: With the development of business, PHP is gradually abandoned and the three-tier advertising structure is formed, the amount of data is increasing day by day, and the middleware service platform on the site is also developed. Through the consumption binlog framework provided by the middleware team, the aggregation mode can be changed from the architecture, and the effect data can be displayed to the merchants more quickly at this stage, which is about 1min

3. Streaming processing platform stage: Streaming processing platform is the abstraction of “computing” or processing data, based on which it can make full use of system resources (a large task is divided into several smaller tasks and then distributed to different machines for execution)

4. Storm was first used as the streaming processing platform for advertising effect data. The data has been stable for several years, and the performance and throughput also meet business needs. Later, Flink emerged, supporting SQL, Exactly-Once, streaming batch integration, etc. With the promotion within the company, I changed the advertising effect data from Strom to Flink system, and the effect data was produced at the level of about seconds. (In fact, it can be compressed, but the performance cost of DB needs to be taken into account, as long as the business is acceptable. Traff – off!)

In the third point I mentioned “abstraction in data processing”, which IS how I understand it. In Storm, spout is defined as input, Bolt is defined as intermediate processing or output, and intermediate data flow is defined as tuple. Shuffle mechanism is used to control data flow

In Flink, you have more explicit semantics for inputs and outputs (the API is also more semantic)

These streaming processing platforms abstract data processing, making us more convenient and efficient to process data, such as the following functions:

02. Where does AUSTIN use the streaming platform

Austin system has already designed part of the buried point information, which has been printed down in the log.

However, I have not processed this part of data for a long time (however, one of my friends who studied Austin with me cut me a log, and I knew what was wrong at a glance).

The access streaming processing platform can clean this part of the data (according to the dimension of the sender, according to the dimension of the template message, etc.), and then send the cleaned data to the interface for display or troubleshooting, which can greatly improve the efficiency of troubleshooting or the service side

03, FLINK introduction

Flink has been popular since 2018, and many companies are now using Flink as a streaming platform for real-time big data processing. As for why I choose Flink, the reasons are as follows:

1, I know some Flink (mainly because I am too lazy to learn other, still enough for now)

2. After several years of development, Flink is mature and used by many large companies, with an active community

3, Flink’s official documents are quite good, suitable for learning and troubleshooting problems

Install Flink, docker-comemage. yml

Version: "2.2" services: JobManager: image: flink:latest ports: - "8081:8081" Command: JobManager environment: - | FLINK_PROPERTIES= jobmanager.rpc.address: jobmanager - SET_CONTAINER_TIMEZONE=true - CONTAINER_TIMEZONE=Asia/Shanghai - TZ=Asia/Shanghai taskmanager: image: flink:latest depends_on: - jobmanager command: taskmanager scale: 1 environment: - | FLINK_PROPERTIES= jobmanager.rpc.address: jobmanager taskmanager.numberOfTaskSlots: 2 - SET_CONTAINER_TIMEZONE=true - CONTAINER_TIMEZONE=Asia/Shanghai - TZ=Asia/ShanghaiCopy the code

After direct docker – compose the up – d will be ready to start the flink, we access the browser input IP: port 8081 will see the flink background

A brief look at the background shows that we can Submit the JAR package to Flink in Submit New Job after we have developed it locally and packaged it into jar

When writing the code, you can refer to the official document given by the MVN command to build the basic environment of Flink

Of course, now that I’ve built it, you can just pull down the code and look at the Austin-Stream module. If you are building from scratch, you may also notice that the plugin in the POM needs to be changed (or packaging will fail), please refer to my POM file

04. AUSTIN code

From the current code structure and logic, it is very simple, students who have not learned Flink should be able to understand:

At present, it mainly realizes real-time aggregation of data into Redis, which is divided into two dimensions: user and message template (the corresponding Redis structure has been written in the code comments).

If you are working on an Austin project, just create a topic in Kafka (topicName is austinLog) and fill in the AustinFlinkConstant with kafka Broker information and Redis information.

After submitting to the Flink platform, you can run:

05, subsequent

After Flink processing, the data has been written into Redis. Recently, I have written the Controller layer development interface and displayed the cleaned data on the page.

Those of you who have seen the previous page implementation probably know that I use amis, a low-code platform, and AMIS I looked at the documentation of the diagram and I rendered it using Echarts.

It should not be a problem. It is estimated that the development will be finished in two days. The main problem is the adaptation parameters.

Recently, some friends have made pull Request and written access to wechat service number. I have merged the code, but I haven’t debugged it yet. The main trouble is that I do not have a business license, it is not easy to open the service number for debugging, I will think about ways later.

That’s all for today. If you are interested in Flink, you can read my previous articles and the introduction to the official website. I suggest you pull down Austin’s code first, deploy your own experience, and then look at the theoretical knowledge.

1. Introduction to Flink

2. Flink back pressure mechanism

3. Flink CheckPoint

Now that you’ve seen it, a “like” is not too much, is it? I’m 3y. See you in the next video.

Follow my wechat public number [Java3y] in addition to technology I will also talk about some daily, some words can only say quietly ~ [line interview + write Java project from zero] continuous high intensity update! O star!!!!! Original is not easy!! Three times!!

Austin project source code Gitee link: gitee.com/austin

Austin project source code on GitHub: github.com/austin

Flink Real time engine project combat

01. Why stream processing platform

02. Where does AUSTIN use the streaming platform

03, FLINK introduction

04. AUSTIN code

05, subsequent

Related Posts

Wechat public account development series-7. Message management – Receiving event push

1. Zuul gateway and basic usage scenarios

Linear table search technology: binary search