Mom no longer need to worry, I can not learn big data flink

This is an ape small talk 31 original share

The business said, “… Bulabula… This requirement is very simple, how to achieve I don’t care?”

In the face of ambitious business needs, I have no knowledge of big data, and I dare not ask, and I dare not say, so I can only quietly reserve and silently look for solutions.

For those of you who follow the “One Ape Small Talk” public account, today is a blessing, because today we will jump out of the system together and step into the door of flink of big data together.

Flink is what? What’s flink for? …

I’m sure there are a million of these questions in your mind, but if you spend two minutes reading to the end, I think 99.99% of them will be extinguished.

Okay, please get your little stool ready, and we’ll start our story.

Come up theory first don’t talk, a word is not practical. Looking around, 90% of my colleagues use MacBooks, and this demo is based on the Mac system.

Sharpener does not mistakenly cut wood workers, to prepare the environment. Make sure you have JDK installed on your native machine, since flink compilation and running requires Java version at least JDK 1.8, to check by typing the command

java -versionCopy the code

If you don’t have JDK 1.8 installed, follow the instructions in your heart. I believe that after this step, the back will be smooth, dapeng one day with the wind, soaring up to ninety thousand miles (wu mouth smile).

There are thousands of versions, and there’s always one you like. Here we choose the latest version 1.8.1 for entry study, don’t ask why, just because the bastard see mung beans, see the right eye.

http://mirrors.tuna.tsinghua.edu.cn/apache/flink/flink-1.8.1/flink-1.8.1-bin-scala_2.12.tgzCopy the code

Choose the version download, take a look at the global. Bin is the start and stop script, conf is the configuration file directory, examples is the sample directory, lib depends on the class library, and log is the log directory.

This time we will focus on bin, examples, and log.

We’re all set for a test run. Run flink in standalone mode, in the flink home directory, enter the command alarm to ring, call Flink to work.

./bin/start-cluster.shCopy the code

Even if others praise you a million times, let me see if it looks good. Enter http://127.0.0.1:8081/ for a view.

It is difficult to draw a dragon or a tiger. Look at the surface and guess behind it. What harm does it do to know more? Enter the JPS command to find out.

There are two processes working behind the scenes: JobManager and TaskManager. In fact, I like the people who pay silently behind the scenes most. I would like to Call and praise the progress of the two people who pay silently.

Flink is awake and ready for our mission. HelloWorld run and see.

Bounded data processing (installed literary scholar has no). I have defined some WORDS here. Flink, could you help me count the number of occurrences of each word?

Step 1: Prepare the data. Data from Flink’s own example source code, find a time to let’s talk about the source code again. The data is posted for everyone to see, to know what we want Flink to do, the data source does not need to pay special attention to this time.

Step 2: Submit wordcount.jar to Flink. I have to say that Flink millisecond processing, did not wait a moment, gave us feedback.

Enter the command:

./bin/flink run examples/streaming/WordCount.jarCopy the code

The results are as follows:

Step 3: Open the page and take a look at flink’s trail.

Step 4: Where are the results? Where the focus is, the results are there.

Unbounded data processing (again installed literary scholar has no). I have defined a port 9000, please flink you connect me, so that we can secretly communicate, I will give you a wink from time to time, but you must count every 5 seconds, the number of each word I said to you, because the word is pearls (cover mouth smile).

Step 1: Start the local service. We started the local service with a Netcat command, and we could keep saying how much we loved Flink.

If there is an error, according to the error of the installation of NC, I believe that a small ape fans must be able to fix seconds seconds.

Step 2: submit SocketWindowWordCount. Jar program. In fact, Flink already can’t resist, just let it go.

Open a new window and type the following command:

./bin/flink run examples/streaming/SocketWindowWordCount.jar --port 9000Copy the code

Look at Flink’s shy reaction.

Step 3: See the effect, see the beauty.

Step 4: Don’t be reserved. Tell flink how you like it.

Write some text in the window opened by NC and press Enter to send one line of input to Flink.

That’s good. That’s good statistics.

Maybe be a little more romantic, a little more honest.

Look at Flink’s coy face at the console. Open a new window and execute

tail -f log/flink*.outCopy the code

The effect is really great

You can also see Flink’s shy reaction on the page.

Step 5: Finish confiding, quit NC, Flink is still a little reluctant to give up.

Vision is bad, so we took apart the picture above and enlarged it. The effects of exiting nc are as follows.

When we disconnected NC, Flink reacted with a bit of reluctance, which went something like this.

All right, call it a day! To these two Flink HelloWorld are finished, we also started together. Flink, you release your resources, and you take a break.

Enter the command:

./bin/stop-cluster.shCopy the code

The effect is as follows:

Practice first and then theory, HelloWorld practice, might as well throw two concepts to play.

Concept 1: Flow?

Mind you, we’re not talking about rogue flow here. We’re talking about a stream of credit card transactions, sensor measurements, machine logs, user interactions on websites or mobile apps, and so on. But any kind of data can form a stream of events.

Concept 2: Unbounded flow vs bounded flow?

An unbounded flow defines the beginning of a flow, but does not define the end of a flow. They produce data endlessly. Unbounded stream data must be processed continuously, that is, immediately after it is ingested. We can’t wait for all the data to arrive before processing it, because the input is infinite and will never be complete at any point in time. Processing unbounded data often requires ingesting events in a particular order, such as the order in which they occurred, so that the integrity of the results can be inferred.

A bounded flow has the beginning and end of a defined flow. Bounded flows can be calculated after all the data is ingested. Bounded streams all data can be sorted, so no sequential ingestion is required. Bounded stream processing is often referred to as batch processing.

Concept 3: So what is Flink anyway?

Apache Flink specializes in working with unbounded and bounded data sets. Precise time control and stateization enable Flink’s runtime to run any application that handles unbounded flow. Bounded streams are internally processed by algorithms and data structures specially designed for fixed-size data sets, resulting in excellent performance.

Concept 4: Streaming technology which is better?

The blue bar is the throughput of single-thread Storm job, and the orange bar is the throughput of single-thread Flink job. It can be seen that Flink throughput is about 3-5 times that of Storm job. As for Flink vs Spark, you can ask Baidu or Gu Ge, there will be a lot of search.

Well, today’s sharing will take you successfully into the big data flink door, I hope you have a harvest. In the end, it’s the same thing: get out of your comfort zone and keep learning. Outside the system, there’s a different kind of taste.

Mom no longer need to worry, I can not learn big data flink

Related Posts

ThreadLocal source code parsing and memory leak issues

AOP+SPel+Redis to realize the aspect of distributed lock

08Django base middleware –django request life cycle, custom middleware, middleware login authentication whitelist