Hi, I’m Tuge.

Today, a Flink beginner asked me if I had a Flink installation tutorial.

The following tutorials are all deployed using Ink version 1.13.2 for general users:

1. Standalone deployment

Version requirements:

version node Deployment way
Flink – 1.13.2 – bin – scala_2. 11. TGZ 192.168.244.129 standalone

1.1 Adding software Installation Packages to a Cluster

1.2. Decompress the software package

tar -zxvf flink-1.132.-bin-scala_211..tgz
Copy the code

1.3. Configure system environment variables

# 1. Go to the directory
cd flink-1.132./

# 2. View the complete CLASspSTH and copy it
pwd 

# 3. Edit system variables
sudo  vim  /etc/profile

#4. Configure the variable environment
export FLINK_HOME=/home/liyaozhou/lyz/flink-1.132.
export PATH=$PATH:$FLINK_HOME/bin

Refresh the system variable environment
source /etc/profile

#6 Check whether the configuration is successful
$FLINK_HOME
Copy the code

1.4. Configure Flink conf file

Go to the flink-1.13.2/conf directory

1.4.1 configuration flink – the conf. Yaml

#1. Configure the JOBManager RPC address
jobmanager.rpc.address: 192.168244.129.

#2. Change the size of the taskManager memory
taskmanager.memory.process.size: 2048m

#3. Change the number of taskslots in a TaskManager
taskmanager.numberOfTaskSlots: 4

# change the degree of parallelism
parallelism.default: 4
Copy the code

1.4.2 configuration master

# Change the IP address of the primary node192.168244.129.:8081
Copy the code

1.4.3 configuration work

The standalone node is the same as the standalone node192.168244.129.
Copy the code

1.4.4 configuration zoo –

Create snapshot directory in flink1.132.Mkdir TMP CD TMP mkdir zookeeper# directory where snapshot is stored
dataDir=/home/liyaozhou/lyz/flink1.132./ TMP /zookeeper1.=192.168244.129.:2888:3888
Copy the code

1.5. Start the Flink cluster

Go to the flink-1.13.2/bin directory

./start-cluster.sh
Copy the code

2. Standalone-HA cluster deployment

Two nodes are deployed in a cluster

version The master node From the node Deployment way
Flink – 1.13.2 – bin – scala_2. 11. TGZ 192.168.244.129 192.168.244.130 standalone-HA
Hadoop 2.6.4 192.168.244.129 192.168.244.130 Distributed
zookeeper3.4.14 192.168.244.129 192.168.244.130 Distributed

The zooKeeper and Hadoop clusters are configured

2.1. Add the software installation package to the cluster

2.2. Decompress software packages

Tar - ZXVF flink 1.13.2 - bin - scala_2. 11. TGZCopy the code

2.3. Configure system environment variables

# 1Go to CD Flink1.132./

# 2, view the complete CLASspSTH and copy PWD #3Sudo vim /etc/profile #4, configure the variable environmentexport FLINK_HOME=/home/liyaozhou/lyz/flink1.132.
export PATH=$PATH:$FLINK_HOME/bin

#5Add the hadoop_conf classpathexport HADOOP_CONF_DIR=/home/liyaozhou/lyz/hadoop2.64./etc/hadoop

#6Source /etc/profile #7Check whether $FLINK_HOME is configured successfullyCopy the code

2.4 configure the Flink conf file

Go to the flink-1.13.2/conf directory

Against 2.4.1 configuration flink – the conf. Yaml

#1.Configure the JOBManager RPC address. Jobmanager.rpc. Address:192.168244.129.

#2.Modify the taskmanager memory size, can not change taskmanager. Memory. Process. Size:2048m

#3.Modify a taskmanager for taskslot number, can not change taskmanager. NumberOfTaskSlots:4

#4.Parallelism can be used to change the degree of parallelism.default: 4

#5.Backend storage mode state.backend:filesystem #6.Configuration to enable the checkpoint, can save the snapshot to HDFS state. The backend. The fs. Checkpointdir: HDFS:/ / 192.168.244.129:9000 / flink - checkpoints
#7.Configure savepoints to save snapshots to HDFS state.savepoints.dir: HDFS:/ / 192.168.244.129:9000 / flink - savepoints

#8.High availability: ZooKeeper #9.Address configuration ZK cluster high - the availability. The zookeeper. Quorum:192.168244.129.:2181

#10.Store JobManager metadata to HDFS high-availability. StorageDir: HDFS:/ / 192.168.244.129:9000 / flink/ha /

#11.Configuration zookeeper client default is open, if the zookeeper security is enabled to creator high - the availability. Zookeeper. Client. Acl: openCopy the code

Configure the master 2.4.2

# Change the IP address of the primary node192.168244.129.:8081
Copy the code

2.4.3 configuration work

The standalone node IP address is standalone192.168244.130.
Copy the code

2.4.4 configuration zoo –

Create snapshot directory in flink1.132.Mkdir TMP CD TMP mkdir zookeeper# directory where snapshot is stored
dataDir=/home/liyaozhou/lyz/flink1.132./ TMP /zookeeper1.=192.168244.129.:2888:3888
Copy the code

2.5. Download the Hadoop dependency package

Download address: flink.apache.org/downloads.h…

Copy the package to the flink-1.13.2/lib directory

2.6 file transfer

Copy the primary node flink package to the slave node scp-r flink1.132. 192.168244.130.:/home/liyaozhou/lyz/
Copy the code

Example Change the IP address of the flink-conf.yaml RPC on the secondary node

2.7. Start the Flink cluster

Go to the flink-1.13.2/bin directory

./start-cluster.sh
Copy the code

On the login page, the address of TaskManager is 192.168.244.130

3. Flink On Yarn cluster deployment

Two nodes are deployed in a cluster

version The master node From the node Deployment way
Flink – 1.13.2 – bin – scala_2. 11. TGZ 192.168.244.129 192.168.244.130 yarn
Hadoop 2.6.4 192.168.244.129 192.168.244.130 Distributed
zookeeper3.4.14 192.168.244.129 192.168.244.130 Distributed

The zooKeeper and Hadoop clusters are configured

3.1 Modifying the yarn-site. XML file in the Hadoop Cluster

Note that HA in YARN mode must be configured as follows: YARN configuration, modify yarn-site.xml

<! -- Maximum number of failed restart attempts for master (JobManager) -->
<property>
  <name>yarn.resourcemanager.am.max-attempts</name>
  <value>4</value>
  <description>
    The maximum number of application master execution attempts.
  </description>
</property>

<! -- Disable YARN memory check -->
<! Whether to start a thread to check the amount of virtual memory each task is using, and if the task exceeds the allocated value, it will be killed. Default is true -->
<! -- If flink is in YARN mode, it is easy to run out of memory, and yarn will kill job automatically.

<property>
   <name>yarn.nodemanager.pmem-check-enabled</name>
   <value>false</value>
</property>

<property>
   <name>yarn.nodemanager.vmem-check-enabled</name>
   <value>false</value>
</property>
Copy the code

3.2 Modifying flink Conf Configuration

Add the following two items to flink-conf.yaml:

Yarn.application-attempts:4Institute-spread-out slots specifies a cluster evenly allocated to all nodes.true
Copy the code

3.3 Starting a Test (Session Mode)

3.3.1 Starting a Flink Session (Tested on 192.168.244.129)

# Run bin/yarn-session.sh -d -jm on the active node1024 -tm 1024 -s 1-tm indicates the size of memory per TaskManager. -s indicates the number of slots per TaskManagerCopy the code

3.3.2 Logging In to the YARN Cluster page

The login url: 192.168.244.129:8088 / cluster

3.3.3 Submitting tasks on YARN In Session mode

Note: All submitted tasks are executed through this Session, and yarn resources are not applied for

(1) Create a wordcount. TXT file, and place it under Flink-1.13.2, and then upload it to HDFS

hadoop  fs  -copyFromLocal  wordcount.txt  /
Copy the code

(2) Submit tasks

# 192.168244.129.Can be executed in bin/flink run examples/batch/WordCount. The jar - input HDFS:/ / 192.168.244.129:9000 / wordcount. TXT
Copy the code

3.3.3 Viewing the Web-UI page of Hadoop ApplicationManager

3.3.4 Disabling the Session mode

yarn application -kill application_1631862788541_0001

3.4 Starting a Test (Per-Job Mode)

3.4.1 Directly Submitting a Job


bin/flink run \
-t yarn-per-job   \
--detached  examples/batch/WordCount.jar  \
--input hdfs:/ / 192.168.244.129:9000 / wordcount. TXT
Copy the code