Welcome to my GitHub

Github.com/zq2599/blog…

Content: all original article classification summary and supporting source code, involving Java, Docker, Kubernetes, DevOPS, etc.;

Why is it super easy

Use Ansible to simplify the deployment of CDH6, and reduce the probability of manual error. Today’s content is to run ansible scripts on a computer with Ansible installed (Apple or Linux operating system), remotely operate a CentOS server, deploy CDH6 on it. Verify that the deployment is successful.

Ansible learning

If you want to learn more about Ansible, see Ansible2.4 Installation and Experience.

Why deploy standalone CDH6

It is mainly used as a learning and development environment for big data technology, not suitable for production;

Actual combat briefly

The actual combat content: deployment, start-up and verification, the whole process is shown in the figure below:

The full text outline

This paper consists of the following chapters:

  1. Environmental information;
  2. Download files;
  3. File placement;
  4. CDH machine setup;
  5. Ansible parameter Settings;
  6. The deployment;
  7. Restart the CDH server
  8. Start;
  9. Settings;
  10. Fix problems;
  11. Experience;

Environmental information

The actual operation process is shown as follows. Install MabBook Pro PC of Ansible2.9 as ansible server, run playbook script, and remotely operate a CentOS server to complete the deployment and startup of CDH6.The computer on the blue background above can run either apple or Linux, and the computer on the yellow background must run CentOS7.7 to run CDH6. (sorry, I have not tried other systems.)

The version of the environment involved in the whole combat is as follows:

  1. Ansible server: macOS Catalina 10.15
  2. CDH server: CentOS Linux release 7.7.1908
  3. Cm version: 6.1.0
  4. Parcel version: 6.1.1
  5. JDK version: 8U191

Downloading files (on ansible Server)

All documents used in this actual practice are shown in the following table:

| | number file name introduction of | | | — – | — – | — – | — – | | | 1 JDK – 8 u191 – Linux – x64. Tar. Gz | Linux version of the JDK installation package | | 2 | mysql connector – Java – 5.1.34. Jar | mysql JDBC driver | | 3 | cloudera manager – server – 6.1.0-769885. El7. X86_64. RPM cm server installation package | | | | 4 Cloudera manager – daemons – 6.1.0-769885. El7. X86_64. RPM cm daemon installation package | | | | 5 Cloudera manager – agent – 6.1.0-769885. El7. X86_64. RPM cm agent installation package | | | | 6 CDH 6.1.1-1. Cdh6.1.1. P0.875250 – el7. Parcel | CDH applications offline installation package | | | 7 CDH 6.1.1-1. Cdh6.1.1. P0.875250 – el7. Parcel. Sha | CDH applications offline installer sha captcha | | | 8 hosts | ansible used remote host configuration, It recorded the information of CDH6 server | | | 9 ansible. CFG | ansible used configuration information | | | 9 ansible. CFG | ansible used configuration information | | | 10 CDH – single – the yml | deploy CDH used when ansible scripts | | | CDH 11 – single – start. Yml | use ansible scripts | start CDH for the first time

Download address of the above 11 files:

  1. The JDK 8 u191 – Linux – x64. Tar. Gz: Gz and mysql-connector-java-5.1.34.jar together and uploaded to CSDN. You can download it at the address: Download.csdn.net/download/bo…
  2. Mysql – connector – Java – 5.1.34. Jar: Maven central repository can be downloaded. In addition, I will package JDK-8U191-linux-x64.tar. gz and mysql-connector-java-5.1.34.jar together and upload them to CSDN. Download.csdn.net/download/bo…
  3. Cloudera-manager-server-6.1.0-769885.el7.x86_64.rpm:archive.cloudera.com/cm6/6.1.0/r…
  4. Cloudera-manager-daemons-6.1.0-769885.el7.x86_64.rpm:archive.cloudera.com/cm6/6.1.0/r…
  5. Cloudera-manager-agent-6.1.0-769885.el7.x86_64.rpm:archive.cloudera.com/cm6/6.1.0/r…
  6. CDH-6.1.1-1.cdh6.1.1.p0.875250-el7.parcel:archive.cloudera.com/cdh6/6.1.1/…
  7. CDH-6.1.1-1.cdh6.1.1.p0.875250-el7.parcel.sha:archive.cloudera.com/cdh6/6.1.1/… (After downloading, change the extension from.sha256 to.sha.)
  8. Hosts, ansible. CFG, cdh-single-install.yml, cdh-single-start.yml: these four files are saved in my GitHub repository at github.com/zq2599/blog… The above files are in a folder named ansible-cDH6-single, as shown in the red box below:

Placing Files on an Ansible server

If you have downloaded the 11 files, place them in the following locations to complete the deployment:

  1. Create a folder named playbooks in your home directory: mkdir ~/playbooks
  2. Put these four files into the playBooks folder: hosts, ansible. CFG, cdh-single-install.yml, cdh-single-start.yml
  3. Create a new subfolder named cDH6 in playBooks.
  4. Put these seven files into the CDH6 folder (the remaining seven) : Jk-8u191-linux-x64.tar. gz, mysql-connector-java-5.1.34.jar, cloudera-manager-server-6.1.0-769885.el7.x86_64. RPM, cloudera- Manager-daemons-6.1.0-769885.el7.x86_64. RPM, cloudera-manager-agent-6.1.0-769885.el7.x86_64. RPM, cdH-6.1.1-1.cDH6.1.1.p0.8 75250 – el7 parcel, CDH 6.1.1-1. Cdh6.1.1. P0.875250 – el7. Parcel. Sha
  5. Playbooks must be placed in your home directory (i.e. ~/) :

CDH server Settings

In this actual combat, THE CDH server hostname is deskmini and the IP address is 192.168.50.134. The following operations need to be done:

  1. Please ensure that the CDH server can be SSH login (user name + password);
  2. SSH login to the machine where the CDH is deployed;
  3. Check that the /etc/hostname file is correct, as shown below:

4. Modify the /etc/hosts file and configure your IP address and hostname, as shown in the red box below.

Ansible Parameter Settings (Ansible Server)

The ~/playbooks/hosts file contains the following information, including the IP address, login account, and password of the host. You need to modify deskmini, ANSIBLE_HOST, ANSIBLE_port, ANSIBLE_USER, and anSIBLE_password as required:

[cDH_group] deskmini anSIBLE_HOST =192.168.50.134 anSIBLE_port =22 anSIBLE_user =root anSIBLE_password =888888Copy the code

Deployment (Ansible Server)

  1. Go to ~/playbooks directory;
  2. Run the ansible deskmini -a “free -m” command to check whether the ANsible REMOTE operation CDH server is normal. Normally, the memory information of the CDH server is displayed, as shown in the following figure:

3. Run the following command to start deployment: ansible-playbook cdh-single-install.yml 4. The entire deployment process involves time-consuming operations such as online installation and file transfer, so please wait patiently (about half an hour). During the deployment, I failed to exit due to a network problem. You can perform the above operations again after the network becomes normal, as Ansible ensures idempotent operations. 5. The deployment is successful, as shown in the following figure:

Restart the CDH server

The Settings of Selinux and swap take effect only after the operating system is restarted. Therefore, restart the CDH server.

Start (Ansible Server)

  1. Wait until the CDH server restarts successfully.
  2. Log in to the Ansible server and go to the ~/playbooks directory.
  3. Run this command to initialize the database and start the CDH: ansible-playbook cdh-single-start.yml
  4. The following information is displayed:

Settings (Web page)

CDH has been started, and the CDH server provides web services externally, which can be operated through a browser:

  1. Browser visit: http://192.168.50.134:7180, the diagram below, the password is admin:

2. Next, select the 60-day experience version from the version selection page:3. Select the host page to see deskmini:4. Select the CDH version in the red box below, because the corresponding offline package has been copied to CM’s local repository, so there is no need to download:5. Download is completed instantly, waiting for allocation, decompression and activation:6. On the page for selecting services, I select Data Engineering because Spark is needed:7. Select the page of the machine, deskmini:8. On the database configuration page, ensure that the database hosts are all localhost, and the database names, user names, and passwords are the same: Hive, AMon, RMAN, Oozie, and Hue9. On the parameter setting page, adjust the storage path according to the disk conditions. For example, if my /home directory has sufficient space, change the storage path to the /home directory:10. Wait for startup to complete:11. Wait until the startup is complete, as shown below:At this point, all services are started, but there are two minor issues that need to be fixed;

Repair HDFS Faults

  1. The overall service situation is shown as follows. If HDFS service is faulty, click the icon in the red box:

2. Click the position in the red box below:3. The fault details are shown in the following figure, which is a common problem of insufficient copies:4. Change HDFS configuration dfs.replication from 3 to 1, and save the changes, as shown in the following figure:5. Restart the service.6. After the above Settings, the number of copies has been adjusted to 1. However, the number of copies of existing files has not been synchronized. 7. Run the vi /etc/passwd command to find the CONFIGURATION of the HDFS account, as shown in the red box below. A shell such as /sbin/nologin fails to switch to the HDFS account:8. Change the content in the red box above to /bin/bash, as shown in the red box below:9. Run the su – HDFS command to switch to the HDFS account. Then run the following command to set the number of copies:

hadoop fs -setrep -R 1 /
Copy the code
  1. The service is all normal:

Adjust YARN parameters to avoid spark-shell startup failure

  1. YARN allocates too little memory to containers by default. As a result, spark shell fails to be started. You need to adjust yARM-related memory parameters:

2. In YARN configuration page, adjust YARN. The scheduler. The maximum allocation – MB and YARN nodemanager. Resource. The memory – MB value of these two parameters, I change the values of these two parameters to 8G (please adjust according to your computer’s actual hardware configuration), as shown below:3. Restart YARN.4. Before running the spark-shell command, run the su – HDFS command to switch to the HDFS account. 5. Finally enter Spark-shell interactive mode this time:At this point, the deployment, startup and setup of CDH6 have been completed. Next, let’s experience the big data service.

Experience HDFS and Spark

Next, run a Spark task, classic WordCount:

  1. Prepare a text file, inside is English content, you can download this file: raw.githubusercontent.com/zq2599/blog…
  2. Log in to SSH and switch to the HDFS account.
  3. Create HDFS folder:
hdfs dfs -mkdir /input
Copy the code
  1. Upload a text file to the /input directory:
hdfs dfs -put ./GoneWiththeWind.txt /input
Copy the code
  1. Run the spark-shell command to start a worker.
  2. To complete a WorkCount task, enter the following command. 192.168.50.134 is the IP address of deskmini:
Sc. TextFile (" HDFS: / / 192.168.50.134:8020 / input/GoneWiththeWind. TXT "). The flatMap (_. Split (" ")). The map ((_, 1)). ReduceByKey + (_ _) saveAsTextFile (" HDFS: / / 192.168.50.134:8020 / output ")Copy the code
  1. After execution, download the result file:
hdfs dfs -get /output/*
Copy the code
  1. Run the preceding command to download the spark task result files part-00000 and part-00001 to the local computer, and run the vi command to view the files as shown in the following figure.

9. View the historical task in the browser. The address is:http://192.168.50.134:18088To see the details of the mission:At this point, the deployment, setup and experience of CDH6 have been completed. If you are setting up your own learning or development environment, I hope this article will give you some references.

The depth of the custom

While the whole operation eliminates a lot of manual work in traditional deployments, the drawbacks are obvious: All paths, file names, and service versions are fixed and cannot be set. Although variables are supported by Ansible, too many variables can cause problems, so if you need to change versions or paths, It is recommended that you modify the contents of cdh-single-install.yml and cdh-single-start.yml by yourself, all files and version information are contained in them.

Welcome to pay attention to the public number: programmer Xin Chen

Wechat search “programmer Xin Chen”, I am Xin Chen, looking forward to enjoying the Java world with you…

Github.com/zq2599/blog…