Hi, I’m Tuge.

Today, a Flink beginner asked me if I had a Flink installation tutorial.

The following tutorials are all deployed using Ink version 1.13.2 for general users:

1. Standalone deployment

Version requirements:

version node Deployment way
Flink – 1.13.2 – bin – scala_2. 11. TGZ standalone

1.1 Adding software Installation Packages to a Cluster

1.2. Decompress the software package

tar -zxvf flink-1.132.-bin-scala_211..tgz
Copy the code

1.3. Configure system environment variables

# 1. Go to the directory
cd flink-1.132./

# 2. View the complete CLASspSTH and copy it

# 3. Edit system variables
sudo  vim  /etc/profile

#4. Configure the variable environment
export FLINK_HOME=/home/liyaozhou/lyz/flink-1.132.

Refresh the system variable environment
source /etc/profile

#6 Check whether the configuration is successful
Copy the code

1.4. Configure Flink conf file

Go to the flink-1.13.2/conf directory

1.4.1 configuration flink – the conf. Yaml

#1. Configure the JOBManager RPC address
jobmanager.rpc.address: 192.168244.129.

#2. Change the size of the taskManager memory
taskmanager.memory.process.size: 2048m

#3. Change the number of taskslots in a TaskManager
taskmanager.numberOfTaskSlots: 4

# change the degree of parallelism
parallelism.default: 4
Copy the code

1.4.2 configuration master

# Change the IP address of the primary node192.168244.129.:8081
Copy the code

1.4.3 configuration work

The standalone node is the same as the standalone node192.168244.129.
Copy the code

1.4.4 configuration zoo –

Create snapshot directory in flink1.132.Mkdir TMP CD TMP mkdir zookeeper# directory where snapshot is stored
dataDir=/home/liyaozhou/lyz/flink1.132./ TMP /zookeeper1.=192.168244.129.:2888:3888
Copy the code

1.5. Start the Flink cluster

Go to the flink-1.13.2/bin directory

Copy the code

2. Standalone-HA cluster deployment

Two nodes are deployed in a cluster

version The master node From the node Deployment way
Flink – 1.13.2 – bin – scala_2. 11. TGZ standalone-HA
Hadoop 2.6.4 Distributed
zookeeper3.4.14 Distributed

The zooKeeper and Hadoop clusters are configured

2.1. Add the software installation package to the cluster

2.2. Decompress software packages

Tar - ZXVF flink 1.13.2 - bin - scala_2. 11. TGZCopy the code

2.3. Configure system environment variables

# 1Go to CD Flink1.132./

# 2, view the complete CLASspSTH and copy PWD #3Sudo vim /etc/profile #4, configure the variable environmentexport FLINK_HOME=/home/liyaozhou/lyz/flink1.132.

#5Add the hadoop_conf classpathexport HADOOP_CONF_DIR=/home/liyaozhou/lyz/hadoop2.64./etc/hadoop

#6Source /etc/profile #7Check whether $FLINK_HOME is configured successfullyCopy the code

2.4 configure the Flink conf file

Go to the flink-1.13.2/conf directory

Against 2.4.1 configuration flink – the conf. Yaml

#1.Configure the JOBManager RPC address. Jobmanager.rpc. Address:192.168244.129.

#2.Modify the taskmanager memory size, can not change taskmanager. Memory. Process. Size:2048m

#3.Modify a taskmanager for taskslot number, can not change taskmanager. NumberOfTaskSlots:4

#4.Parallelism can be used to change the degree of parallelism.default: 4

#5.Backend storage mode state.backend:filesystem #6.Configuration to enable the checkpoint, can save the snapshot to HDFS state. The backend. The fs. Checkpointdir: HDFS:/ / / flink - checkpoints
#7.Configure savepoints to save snapshots to HDFS state.savepoints.dir: HDFS:/ / / flink - savepoints

#8.High availability: ZooKeeper #9.Address configuration ZK cluster high - the availability. The zookeeper. Quorum:192.168244.129.:2181

#10.Store JobManager metadata to HDFS high-availability. StorageDir: HDFS:/ / / flink/ha /

#11.Configuration zookeeper client default is open, if the zookeeper security is enabled to creator high - the availability. Zookeeper. Client. Acl: openCopy the code

Configure the master 2.4.2

# Change the IP address of the primary node192.168244.129.:8081
Copy the code

2.4.3 configuration work

The standalone node IP address is standalone192.168244.130.
Copy the code

2.4.4 configuration zoo –

Create snapshot directory in flink1.132.Mkdir TMP CD TMP mkdir zookeeper# directory where snapshot is stored
dataDir=/home/liyaozhou/lyz/flink1.132./ TMP /zookeeper1.=192.168244.129.:2888:3888
Copy the code

2.5. Download the Hadoop dependency package

Download address: flink.apache.org/downloads.h…

Copy the package to the flink-1.13.2/lib directory

2.6 file transfer

Copy the primary node flink package to the slave node scp-r flink1.132. 192.168244.130.:/home/liyaozhou/lyz/
Copy the code

Example Change the IP address of the flink-conf.yaml RPC on the secondary node

2.7. Start the Flink cluster

Go to the flink-1.13.2/bin directory

Copy the code

On the login page, the address of TaskManager is

3. Flink On Yarn cluster deployment

Two nodes are deployed in a cluster

version The master node From the node Deployment way
Flink – 1.13.2 – bin – scala_2. 11. TGZ yarn
Hadoop 2.6.4 Distributed
zookeeper3.4.14 Distributed

The zooKeeper and Hadoop clusters are configured

3.1 Modifying the yarn-site. XML file in the Hadoop Cluster

Note that HA in YARN mode must be configured as follows: YARN configuration, modify yarn-site.xml

<! -- Maximum number of failed restart attempts for master (JobManager) -->
    The maximum number of application master execution attempts.

<! -- Disable YARN memory check -->
<! Whether to start a thread to check the amount of virtual memory each task is using, and if the task exceeds the allocated value, it will be killed. Default is true -->
<! -- If flink is in YARN mode, it is easy to run out of memory, and yarn will kill job automatically.


Copy the code

3.2 Modifying flink Conf Configuration

Add the following two items to flink-conf.yaml:

Yarn.application-attempts:4Institute-spread-out slots specifies a cluster evenly allocated to all nodes.true
Copy the code

3.3 Starting a Test (Session Mode)

3.3.1 Starting a Flink Session (Tested on

# Run bin/yarn-session.sh -d -jm on the active node1024 -tm 1024 -s 1-tm indicates the size of memory per TaskManager. -s indicates the number of slots per TaskManagerCopy the code

3.3.2 Logging In to the YARN Cluster page

The login url: / cluster

3.3.3 Submitting tasks on YARN In Session mode

Note: All submitted tasks are executed through this Session, and yarn resources are not applied for

(1) Create a wordcount. TXT file, and place it under Flink-1.13.2, and then upload it to HDFS

hadoop  fs  -copyFromLocal  wordcount.txt  /
Copy the code

(2) Submit tasks

# 192.168244.129.Can be executed in bin/flink run examples/batch/WordCount. The jar - input HDFS:/ / / wordcount. TXT
Copy the code

3.3.3 Viewing the Web-UI page of Hadoop ApplicationManager

3.3.4 Disabling the Session mode

yarn application -kill application_1631862788541_0001

3.4 Starting a Test (Per-Job Mode)

3.4.1 Directly Submitting a Job

bin/flink run \
-t yarn-per-job   \
--detached  examples/batch/WordCount.jar  \
--input hdfs:/ / / wordcount. TXT
Copy the code