Flink is an open source big data streaming processing framework, it can batch and stream processing at the same time, has the advantages of fault tolerance, high throughput, low latency and so on. This paper describes the installation steps of Flink in Windows and Linux, and the running of sample programs, including local debugging environment, cluster environment. In addition, the construction of Flink development project is introduced.

First of all to run Flink, we need to download and unzip the Flink binary package, download address as follows: https://flink.apache.org/downloads.html

We can choose the Flink and Scala version. Here we choose the latest 1.9 version of Apache Flink 1.9.0 for Scala 2.12 to download.

After a successful download, Flink can be run on a Windows system through the Windows BAT file or Cygwin.

In Linux system, it can be divided into single machine, cluster and Hadoop.

Run through the Windows BAT file

Start the CMD command line window, go to the flink folder, and run the start-cluster.bat file in the bin directory

Note: A Java environment is required to run Flink. Make sure the Java environment variables are configured on your system.

$ cd flink
$ cd bin
$ start-cluster.bat
Starting a local cluster with one JobManager process and one TaskManager process.
You can terminate the processes via CTRL-C in the spawned shell windows.
Web interface by default on http://localhost:8081/.Copy the code

After the successful startup, we can visit http://localhost:8081/ in the browser and see the flink management page.

Run through Cygwin

Cygwin is a run on the Windows platform of unix-like simulation environment, download website: http://cygwin.com/install.html

After the installation, start the Cygwin terminal and run the start-cluster.sh script.

$ cd flink
$ bin/start-cluster.sh
Starting cluster.Copy the code

After the successful startup, we can visit http://localhost:8081/ in the browser and see the flink management page.

Install Flink on your Linux system

Single-node Installation

On Linux, the single-node installation is the same as for Cygwin. Download Apache Flink 1.9.0 for Scala 2.12 and unpack it to start start-cluster.sh.

Cluster installation

Cluster installation consists of the following steps:

1. Copy the extracted Flink directory on each machine.

Conf /flink-conf.yaml on all machines

Jobmanager.rpc. address = master host nameCopy the code

3. Modify conf/ Slaves to write all work nodes

work01
work02Copy the code

4. Start the cluster on master

bin/start-cluster.shCopy the code

Installed in the Hadoop

We can choose to have Flink run on a Yarn cluster.

Download the Flink for Hadoop package

Make sure HADOOP_HOME is set correctly

Start the bin/yarn – session. Sh

Run the Flink sample program

Batch example:

Submit flink’s batch examples program:

bin/flink run examples/batch/WordCount.jarCopy the code

Flink examples batch processing examples program, count the number of words.

$ bin/flink run examples/batch/WordCount.jar Starting execution of program Executing WordCount example with default input data set. Use --input to specify file input. Printing result to stdout. Use --output to specify output path. (a,5)  (action,1) (after,1) (against,1) (all,2) (and,12) (arms,1) (arrows,1) (awry,1) (ay,1)Copy the code

The result is the default data set, which can be specified with –input –output.

We can see what is going on in the page:

Examples of stream processing:

Start the NC server:

nc -l 9000Copy the code

Submit flink’s batch examples program:

bin/flink run examples/streaming/SocketWindowWordCount.jar --port 9000Copy the code

This is the flink provided under the examples of flow processing examples, receiving socket data incoming, statistics of the number of words.

Write the word on the NC side

$ nc -l 9000
lorem ipsum
ipsum ipsum ipsum
byeCopy the code

The output is in logs

$ tail -f log/flink-*-taskexecutor-*.out
lorem : 1
bye : 1
ipsum : 4Copy the code

Stop flink

$ ./bin/stop-cluster.shCopy the code

Once Flink is installed, it’s easy to get started by quickly building the Flink project and completing the relevant code development.

Build tools

Flink projects can be built using different build tools. To get started quickly, Flink provides project templates for the following build tools:

  • Maven
  • Gradle

These templates help you structure your project and create the initial build file.

Maven

Environmental requirements

The only requirements are to use Maven 3.0.4 (or later) and install Java 8.x.

Create a project

Create the project using one of the following commands:

Using Maven archetypes

$ mvn archetype:generate \ -DarchetypeGroupId=org.apache.flink \ -DarchetypeArtifactId=flink-quickstart-java \ - DarchetypeVersion = 1.9.0Copy the code

Run the Quickstart script

The curl https://flink.apache.org/q/quickstart.sh | 1.9.0 bash - sCopy the code

Once the download is complete, view the project directory structure:

├─ Java │ ├─ Java │ ├─ Java │ ├─ Java │ ├─ Java │ ├─ Java │ ├─ Java │ ├─ ├ ─ 07.02.java ├ ─ 07.02.java ├ ─ 07.02.javaCopy the code

The example project is a Maven project that contains two classes: StreamingJob and BatchJob, which are the basic skeleton of DataStream and DataSet programs, respectively. The main method is the entry point to the program and can be used either for IDE testing/execution or for deployment.

We recommend that you import this project into the IDE to develop and test it. IntelliJ IDEA supports Maven projects right out of the box. If you use Eclipse, you can import Maven projects using the M2E plug-in. Some Eclipse bundles include the plug-in by default, others require you to install it manually.

Note: The default JVM heap memory may be too small for Flink, and you should manually increase the heap memory. In Eclipse, select Run Configurations -> Arguments and say in the input box corresponding to VM Arguments: -XMx800m. In IntelliJ IDEA, recommended from the Help menu | Edit Custom VM Options to modify the JVM Options.

Build the project

If you want to build/package your project, run the ‘MVN clean Package’ command in the project directory. After executing the command, you will find a JAR file that contains your application and the connectors and libraries that have been added to the application as dependencies: target/-.jar.

Note: if you use a class other than StreamingJob as the mainClass/entry for your application, we recommend that you modify the mainClass configuration in the pom.xml file accordingly. This way, Flink can run the application from a JAR file without specifying the main class separately.

Gradle

Environmental requirements

The only requirements are to use Gradle 3.x (or later) and install Java 8.x.

Create a project

Create the project using one of the following commands:

Gradle example:

build.gradle

buildscript { repositories { jcenter() // this applies only to the Gradle 'Shadow' plugin } dependencies { classpath 'com. Making. Jengelman. Gradle. Plugins: shadow: 2.0.4'}} plugins {id 'Java' id 'application' / / shadow plugin to produce Fat JARs id 'com. Making. Johnrengelman. Shadow' version '2.0.4} / / an artifact properties group =' org. Myorg. Quickstart ' 0.1 the SNAPSHOT version = ' 'mainClassName =' org. Myorg. Quickstart. StreamingJob 'description = "" "Flink quickstart Job "" " Ext {JavInaryVersion = '1.11' slf4jVersion = '1.7.7' log4jVersion = '1.2.17'} sourceCompatibility = JavLibrary targetCompatibility = JavLibrary Tasks. WithType (JavaCompile) { options.encoding = 'UTF-8' } applicationDefaultJvmArgs = ["-Dlog4j.configuration=log4j.properties"] task wrapper(type: GradleVersion = '3.1'} // Declare where to find the dependencies of your project repositories { MavenCentral () maven {url "https://repository.apache.org/content/repositories/snapshots/"}} / / note: We can't use "compileOnly" or "shadow" configuration, this will make it impossible to run code in the IDE or by using "gradle run". / / we can't exclude from shadowJar pass dependence (see https://github.com/johnrengelman/shadow/issues/159). // -> Explicitly define the class library we want to include in the "flinkShadowJar" configuration! Configurations {flinkShadowJar // dependencies which go into the shadowJar // Always exclude these dependencies (also from passing dependencies), because Flink provides these dependencies. flinkShadowJar.exclude group: 'org.apache.flink', module: 'force-shading' flinkShadowJar.exclude group: 'com.google.code.findbugs', module: 'jsr305' flinkShadowJar.exclude group: 'org.slf4j' flinkShadowJar.exclude group: 'log4j' } // declare the dependencies for your production and test code dependencies { // -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- / / compile time dependence should not be included in the shadow jar, / / these will depend on the Flink lib directory.  // -------------------------------------------------------------- compile "org.apache.flink:flink-java:${flinkVersion}" compile "org.apache.flink:flink-streaming-java_${scalaBinaryVersion}:${flinkVersion}" // -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- / / should be included in the shadow jar of dependence, for example: connector. // They must be in the flinkShadowJar configuration! // -------------------------------------------------------------- //flinkShadowJar "Org. Apache. Flink: flink - connector - kafka - 0.11 _ ${scalaBinaryVersion} : ${flinkVersion}" the compile "log4j:log4j:${log4jVersion}" compile "org.slf4j:slf4j-log4j12:${slf4jVersion}" // Add test dependencies here. // TestCompile "junit:junit:4.12"} // make compileOnly dependencies available for tests: sourceSets { main.compileClasspath += configurations.flinkShadowJar main.runtimeClasspath += configurations.flinkShadowJar test.compileClasspath += configurations.flinkShadowJar test.runtimeClasspath += configurations.flinkShadowJar javadoc.classpath += configurations.flinkShadowJar } run.classpath = sourceSets.main.runtimeClasspath jar { manifest { attributes 'Built-By': System.getProperty('user.name'), 'Build-Jdk': System.getProperty('java.version') } } shadowJar { configurations = [project.configurations.flinkShadowJar] }Copy the code

setting.gradle

rootProject.name = 'quickstart'Copy the code

Or run the Quickstart script

Bash - c "$(curl) https://flink.apache.org/q/gradle-quickstart.sh" -- 1.9.0 2.11Copy the code

View the directory structure:

├─ Java │ ├─ Java │ ├─ Java │ ├─ Java │ ├─ Java │ ├─ Java │ ├─ Myorg │ └ ─ ─ quickstart │ ├ ─ ─ BatchJob. Java │ └ ─ ─ StreamingJob. Java └ ─ ─ resources └ ─ ─ log4j. PropertiesCopy the code

The example project is a Gradle project that contains two classes: StreamingJob and BatchJob are the basic skeleton programs for DataStream and DataSet programs. The main method is the entry point to the program and can be used for IDE testing/execution as well as for deployment.

We recommend that you import this project into your IDE to develop and test it. IntelliJ IDEA supports Gradle projects after the Gradle plug-in is installed. Eclipse supports Gradle projects through the Eclipse Buildship plugin (since the Shadow plugin has Gradle version requirements, be sure to specify Gradle version >= 3.0 in the last step of the import wizard). You can also create project files from Gradle using Gradle’s IDE Integration.

Build the project

If you want to build/package the project, run the ‘gradle Clean shadowJar’ command in the project directory. After executing this command, you will find a JAR file that contains your application and the connectors and libraries that have been added to the application as dependencies: build/libs/–all.jar.

Note: If you are using a class other than StreamingJob as the main class/entry for your application, we recommend that you modify the mainClassName configuration in the build.gradle file accordingly. This way, Flink can run the application from a JAR file without specifying the main class separately.

Flink series:

Flink introduction to Apache Flink Introduction to Apache Flink Architecture

For more blog posts on real-time computing,Flink,Kafka and other related technologies, welcome to real-time streaming computing