In the last few years, Two big Data pioneers, Cloudera and Hortonworks, provide us with two enterprise-class Hadoop solutions, CDP (Cloudera Data Platform) and HDP (Hortonworks Data Platform) respectively. All of them provide the ability to deploy, manage, monitor, and operate big data service components and nodes, greatly improving the efficiency of big data operations engineers. But with the merger of two companies, Cloudera and Hortonworks, and some strategic changes. As of January 31, 2021, all Cloudera software will require a valid subscription to access it! This undoubtedly has some impact on our big data engineers.

In this context, UCloud, based on years of experience in the development of big data platform, recently officially released the free version of the one-stop intelligent big data platform USDP under the privatization deployment scenario. After CDH charges, the company took the lead in launching the free version of the big data suite service! . The USDP series version supports the whole ecology of HDFS, Kudu and ES, and will continue to expand the support of other services and components in the future, so as to help enterprises improve the efficiency of big data development, operation and maintenance, and quickly build the analytical and processing capacity of big data business.

This article will introduce you to the installation and deployment process of USDP Free Edition, hoping to give you some help.

Environment to prepare

As we can see from the data provided by USDP, the USDP platform includes Manager Node and Worker Node. The most important service in the Manager Node is the Manager Server, which is the USDP management side service and needs to be equipped with a MySQL instance to store the cluster-related metadata information. An important component in Worker Node is Agent, which serves as the slave Node control terminal of USDP and is used for managing and operating the Node and the big data service on the Node. Among them, BigData Service is all kinds of BigData services (such as HDFS, YARN, etc.). The deployment architecture for a typical production environment is as follows:



As you can see from the figure above, the USDP platform requires us to provide a cluster of at least three nodes. And the system must be CentOS, between 7.2 and 7.6, because the USDP needs to get some information from the operating system to run the USDP platform properly. Here I have used three nodes, each of which is 8C32G and 500G data disk. The services deployed by each node are as follows:



Download and set up USDP

Once the size of the cluster has been determined, we are ready to download the installation package for the free version of USDP. It can be downloaded from the link below:https://s3-cn-bj.ufileos.com/…This file is quite large, around 43GB, so it may take several hours to download.

After downloading it, we unzip it. The file after unzipping is as follows:

[root@node1 usdp-1.0.0]# ll Total amount 44686388-rW-r --r-- 1 root root 20491532904 Feb 1 18:57 epel. TGZ-rW-r --r-- 1 root root 3077630 Feb 1 18:56 httpd-rpms.tar. gz-rw-r --r-- 1 root root 16897158731 Feb 1 18:56 mirror. TGZ-rw-r --r-- 1 root root 8367086414 May 15 13:19 USDP-01-master-privatization -free-1.0.0.tar.gz

• USDP-01-master-privatization -free-1.0.0.0.tar.gz: The USDP main application and big data service resource package • httpd-rPMS.tar. gz, mirror.tgz: • Epel. TGZ: Is the USDP Offline YUM Base Source Resource Pack

For the convenience of later deployment, we created /opt/usdp-srv/ and /data folders respectively and moved epel. TGZ, httpd-rpms. Tar. gz and mirror. TGZ to /data folder. Usdp-01-master-privatization -free-1.0.0.0.tar.gz moved to /opt/usdp-srv/ folder. And distribute the USDP-01-master-privatization free-1.0.0.0.tar.gz file to all USDP nodes.

If we go to /opt/usdp-srv/ and unzip the file usdp-01-master-privatization free-1.0.0.0.tar.gz (unzip the other nodes as well), we can get the following directory structure:

[root@node1 usdp-srv]# tar-zxf usdp-01-master-privatization-free-1.0.0.0.tar.gz [get usdp-srv]# CD usdp/ [root@node1 usdp]# ls-l Total 4 drwxr-xr-x 2 root root 33 May 14 12:06 Agent drwxr-xr-x 2 root root 136 May 14 12:07 bin Exporter drwxr-xr-x 2 root root 65 May 14 12:06 Config drwxr-xr-x 2 root root 137 December 16 2020 JMX_Exporter drwxr-xr-x 2 root root 65 May 14 12:06 Exporter drwxr-xr-x 2 root root 35 May 14 12:06 recommend drwxr-xr-x 6 root root 59 May 14 12:06 repair drwxr-xr-x 3 root root 21 April 20 16:21 repository Drwxr-xr-x 2 root root 4096 December 16 2020 scripts drwxr-xr-x 2 root root 34 May 14 12:07 Server drwxr-xr-x 2 root root 29 On May 14 15:03 SQL DRWXR xr - x 3 root root 21 May 14 12:06 templated DRWXR xr - x 2 root root 6 December 16, 2020 verify DRWXR xr - 2 x Root root 29 May 14 12:06 versions

The above directories are explained as follows:

• Agent: USDP distributed client program • Bin: USDP program start and stop script • CONFIG: USDP program configuration file • JMX_Exporter: process monitoring indicator collection program • Recommend: big data service deployment preset template • Repair: • Repository: Big Data Services • Scripts: USDP • Server: USDP Distributed Manager • SQL: USDP Metadata Information Initialization • Templated: • Verify: certificate store path • Versions: USDP big data resource bundle version information

USDP Platform Deployment When the USDP Free Edition is ready, we are ready to deploy. Take the first full deployment as an example.

The first full deployment requires the /opt/usdp-srv/ usdp-repair module, which has the following directory structure:

[root@node1 usdp]# CD repair/ [right repair]# ll total amount 8 drwxr-x-- 2 root root 4096 May 14 12:06 bin drwxr-x-- 2 Root root 105 Packets drwxr-x-- 4 Packets drwxr-x-- 2 Packets 23 Packets sbin

The above directory does the following:

•bin: the directory where a single repair module script is located, without manual management; • Config: one key to repair the configuration file directory required by the script, which requires the user to manually modify; • Packages: Repair the directory required to install USDP dependencies; • Sbin: One-key repair of the directory where the main script is located, without manual management;

There are three configuration files under the config directory:

• Repair.Properties: The main configuration is privatized YUM source installation node information, NAMP installation node information, MySQL database installation node information, the total number of repair machines, and the storage location of repair module logs. Users modify relevant configuration items according to their needs; •repair-host-info.properties: Node full repair, need to configure this file, specifically configure all node Intranet IP, password, port number and host name; • Repair-Host-Info-Add.properties: When new nodes are added to the cluster, it is necessary to configure this file, specifically to configure the IP, password, port number and host name of the new internal network of the node;

Since we are doing a full repair, we need to use two configuration files, repair.properties and repair-host-info.properties. According to the information of our above node, modify it as follows:

repair.properties

# Set the YUM source host IP yum.repo. Host. IP =10.23.110.136 #The host information is used for installing the Nmap service Napp.server.ip =10.23.110.136 Napp.server.port =22 Napp.server.password = ABCD123456 # The Host information for installing # Install MySQL machine node information mysql. IP =10.23.110.136 # Install NTP service(Master) ntp.master mysql.host.ssh.port=22 mysql.host.ssh.password=abcd123456 # Set the MYSQL database login password mysql.password=abc123456 # The total number of machines needed to be repaired. repair.host.num=3 # The total number of added machines needed to be repaired. repair.add.host.num=0 # Common Settings. repair.log.dir=./logs

The detailed meaning of each parameter above is as follows:

repair-host-info.properties

# 1.Please provide the information of hosts needed to be repaired in the format specified below # 2. The usdp. IP. I (eg: I = 1, 2, 3...) : # 3.usdp.password.i: # 4.usdp.ssh.port.i: # 5.usdp.ssh.port.hostname.i: Usdp. IP. 1 = 10.23.110.136 usdp. Password. 1 = abcd123456 usdp. SSH. Port. 1 = 22 usdp. SSH. Port. The hostname. 1 = node1 Usdp.ip.2=10.23.30.148 usdp.pa ssword.2=abcd123456 usdp.ssh.port.2=22 usdp.ssh.port. Hostname.2=node2 Usdp. IP. 3 = 10.23.148.109 usdp. Password. 3 = abcd123456 usdp. SSH. Port. 3 = 22 usdp. SSH. Port. The hostname. 3 = node3

The detailed meaning of each parameter above is as follows:

Note:

• For convenience, the login password of our three nodes is set to ABCD123456, which is the login password of SSH. In addition, the login password should not have special characters, or the installation process will be problematic. • In the repair-host-info.properties file, we have configured three USDP nodes. If we have more than three nodes, we need to add corresponding information in it. For several nodes, add several.

Execute the initialization script


After completing the above steps, execute the following command to begin the one-key initialization task.

cd /opt/usdp-srv/usdp/repair/sbin
bash repair.sh initAll
source /etc/profile

After executing bash repair.sh initAll, you see the following output indicating that the deployment of the entire USDP platform is complete.

[root@node1 sbin]# bash repair.sh initAll Bash Path: /opt/usdp-srv/usdp/repair/sbin REPAIR_PATH: /opt/usdp-srv/usdp/repair/sbin UDP_PATH: /opt/usdp-srv/usdp REPAIR_BIN_PATH: /opt/usdp-srv/usdp/repair/bin REPAIR_BIN_PATH: /opt/usdp-srv/usdp/repair/bin REPAIR_SBIN_PATH: /opt/usdp-srv/usdp/repair/sbin PACKAGES_PATH: /opt/usdp-srv/usdp/repair/packages REPAIR_PATH: /opt/usdp-srv/usdp/repair UDP_PATH: /opt/usdp-srv/usdp REPAIR_BIN_PATH: /opt/usdp-srv/usdp/repair/bin REPAIR_SBIN_PATH: /opt/usdp-srv/usdp/repair/sbin PACKAGES_PATH: /opt/usdp-srv/usdp/repair/packages ......... This saves a lot of output......... SUCCESS: All encryption-free login links have been repaired successfully SUCCESS: All closing firewall links have been repaired successfully SUCCESS: All closing swap links have been repaired successfully SUCCESS: set hostname links have been repaired successfully SUCCESS: Set ntp have been repaired successfully SUCCESS: Set libxslt devel have been repaired successfully SUCCESS: Set psmisc have been repaired successfully SUCCESS: Set mysql-client links have been repaired successfully SUCCESS: Set mysql-python have been repaired successfully SUCCESS: All transparent_hugepage links have been repaired successfully SUCCESS: Set JDK links have been repaired successfully SUCCESS: Set xdg-utils links have been repaired successfully SUCCESS: Set redhat-lsb links have been repaired successfully SUCCESS: Set python-devel links have been repaired successfully SUCCESS: Set cyrus-sasl links have been repaired successfully SUCCESS: Set python36-devel links have been repaired successfully SUCCESS: Set gcc-c++ links have been repaired successfully SUCCESS: Set Cython links have been repaired successfully SUCCESS: Set Six links have been repaired successfully SUCCESS: Set websocket-client links have been repaired successfully SUCCESS: Set ecdsa links have been repaired successfully SUCCESS: Set pytest-runner links have been repaired successfully SUCCESS: Set krb5-devel links have been repaired successfully The USDP deployment environment of all nodes has been repaired successfully . Please proceed to the next step

Description:

• All we need to do is download usdp-free-1.0.0.tar.gz on the node where Repair is being performed and distribute the uncompressed usdp-01-master-privatization to all nodes. /opt/usdp-srv/ No configuration is required. • The above configuration only needs to be set on the node that performs REPAIR, and no additional setting is required for other nodes; Because executing bash repair.sh initAll will distribute the above two configuration files to all nodes. • USDP relies on JDK, Python, MySQL, etc., all of which are automatically installed when bash repair.sh initAll is executed, and all of the nodes are automatically installed. We don’t need any extra processing.

Configure the MySQL database for the USDP


The initialization is completed, we need to configure the following USDP MySQL service node information, only need to open the/opt/USDP – the SRV/USDP/config/application – server. Yml file, find the datasource configuration, Modify to the following content:

datasource: type: com.zaxxer.hikari.HikariDataSource # driver-class-name: org.gjt.mm.mysql.Driver driver-class-name: com.p6spy.engine.spy.P6SpyDriver url: jdbc:p6spy:mysql://node1:3306/db_udp? useUnicode=true&characterEncoding=utf-8&useSSL=false username: root password: abc123456

Note: The address and password of this MySQL service node need to be filled according to the actual situation. Because I installed MySQL on the 10.23.110.136 (node1) node in the repaired.properties file and set the login password to abc123456.

Start the USDP server program


After the node repair is completed, enter the node where the USDP management terminal is located and enter the USDP installation root directory. Execute the following command through root to start the USDP management terminal service:

[root@node1 sbin]# cd /opt/usdp-srv/usdp/ [root@node1 usdp]# bin/start-udp-server.sh BASE_PATH: /opt/usdp-srv/usdp/bin JMX_PATH: /opt/usdp-srv/usdp/jmx_exporter ln -s /opt/usdp-srv /data/usdp-srv ln -s /opt/usdp-srv/srv/udp /srv/ ln -s /data/var/log/udp /var/log/ REPAIR_PATH: /opt/usdp-srv/usdp/repair UDP_PATH: /opt/usdp-srv/usdp REPAIR_BIN_PATH: /opt/usdp-srv/usdp/repair/bin REPAIR_SBIN_PATH: /opt/usdp-srv/usdp/repair/sbin PACKAGES_PATH: /opt/ USDP-SRV/USDP/Repair/Packages NMAP -6.40-19.el7.x86_64 NMAP exists UDP Server is running with: 10691 Done. [root@node1 usdp]#

The UDP Server is running with: 10691, which indicates that the USDP management node has been successfully started.

USDP cluster configuration

After the USDP management node starts successfully, you need to wait a while. We can then open the USDP Web page by visiting the following address via our browser:http://10.23.110.136That is as follows:

The first time you visit the USDP Web page, you need to set the administrator password. After setting the password, you can proceed to the next step. Activate the USDP:

Click the import license button above, and the following page will pop up:

We copy down the above hardware identification number (D2060300FFFB8B07), then go to http://117.50.84.208:8002/lic… The page generates a free version of the certificate:

Click the download button above and upload the certificate directly to the USDP page without decompressing it. You should see the page for the new cluster:

Click the New Cluster Wizard to enter the cluster configuration process.

The free version of the USDP supports computing, storage, monitoring, visualization, scheduling, and security, as shown by the service components. USDP covers HDFS, Hive, HBase, Spark, Flink, Presto, Atlas, Ranger and many other open source big data components. At present, the services supported by UCloud one-stop intelligent big data platform USDP are shown in the table. At the same time, we are continuing to expand more open source ecological component services.



We click the next button to go to the page of the specified cluster node. Since we have configured the hostname of each node in the repair-host-info.properties file (such as usdp.ssh.port.hostname.1=node1), So the system has already set up the configuration for us in /etc/hosts when we perform REPAIR. So we only need to configure this page as follows:

Click on the

Click Next, and the USDP system will automatically identify each node for us. Since the USDP requires a minimum of three nodes to be deployed, we select the above three nodes and proceed to the next step. In this part, USDP will check the environment of the node for us:



If you’re interested, click on the check above to see what’s being checked:

If you use the USDP REPAIR script, the above inspection environment will generally pass smoothly, as follows:

Click Next, you can enter the page of service selection, as follows:

The USDP provides us with three recommended scenarios that we can choose for different component deployments depending on our needs. Of course, if the solution recommended by USDP does not meet our needs, then we can choose to customize to choose different service combinations. Here, since this is a test, we select the recommended option B and click Next.

At this time, I entered the option of installing components on each node. Here I selected Intelligent Recommendation, that is, I went to the above page and then clicked Next.

The HDFS DataNode directory and Hive Metastore configuration are required in this section. I used the default configuration in this section and then click Next.

This step is to preview the deployment information of each component we selected before. If there is no problem, you can click “Start Deploying”, which will lead you to the following page:

When the page is 100% advanced, the deployment is complete.



Click on the cluster details to see the deployment of the cluster.

At this point the USDP cluster installation is complete. In addition, we can see that USDP provides us with a wealth of monitoring indicators and alarm alarm Settings.

In terms of monitoring index collection, it mainly includes the following three aspects:

•JMX full index collection • HTTP common index collection • custom index collection

The monitoring data in the above three sections will eventually be aggregated in the USDP PromeValues and the most commonly used monitoring metrics will be shown in the overview page of each service. Meanwhile, in Grafana, users can view the most detailed monitoring metrics through the USDP’s official pre-installed Monitoring Dashboard. If the preset USDP monitoring ICONS do not meet the business requirements, users can customize the required monitoring charts.

In terms of alerts, USDP provides a rich set of preset alerts templates, users only need to guide a simple configuration, you can achieve the needs of sending cluster indicator alerts to different targets (WeChat, pin, mail, interface call, etc.). Similar to the design of monitoring metrics, users can customize the warning template to be modified or add new warning rules if they feel that the preset warning template does not meet the business needs.

USDP cluster use

Now that you have the USDP cluster installed, let’s try out how it works. Here’s an example of using Hive to simply query data.

To enter Hive command-line mode, enter a Hadoop user on any Hive client installed on the USDP.

[root @ node1 templated] # su hadoop [hadoop @ node1 templated] $/ SRV/udp / 1.0.0.0 / hive/bin/hive hive (default) >

Create a Hive table named iTEBLOG_TEST_USDP_HIVE using the following steps:

hive (default)> create table iteblog_test_usdp_hive (id int, name string, age int); OK Time taken: 3.014 seconds hive (default)> show create table iTeblog_test_usdp_hive; OK CREATE TABLE `iteblog_test_usdp_hive`( `id` int, `name` string, `age` int) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 'hdfs://iteblog/user/hive/warehouse/iteblog_test_usdp_hive' TBLPROPERTIES ( 'TRANSIENT_LASTDDLTIME '='1625383436') Time taken: 0.185 seconds, ____ : 14 row(s) hive (default)>

Then we insert a piece of data and query:

hive (default)> insert into iteblog_test_usdp_hive values (1, 'iteblog', 100); Ok_col0 _col1 _col2 Time taken: 10.321 seconds Hive (Default)> select * from iTeblog_test_usdp_xive select * from iTeblog_test_usdp_xive; OK iteblog_test_usdp_hive.id iteblog_test_usdp_hive.name iteblog_test_usdp_hive.age 1 iteblog 100 Time taken: 0.186 seconds, touchdown: 1 row(s) hive (default)>

In the USDP, the default execution engine for Hive is TEZ. You can switch to another execution engine by modifying the HIV.Execution. Through the USDP platform, we can enter the display page of TEZ in many aspects, as follows:





conclusion

Students who are engaged in the operation and maintenance deployment of big data platforms should be well aware that the deployment of a big data cluster is a time-consuming and laborious physical work, and it is easy to make mistakes if you are not careful. The emergence of CDH greatly reduces the work of deploying operation and maintenance of big data clusters and improves the efficiency of operation and maintenance. But with the full charge of CDH, I believe to give you a certain amount of trouble. The free version of USDP provided by UCloud big data team can completely replace the CDH platform to a certain extent, and the installation experience is relatively smooth. The installation and deployment process is basically automatic, which reduces the workload of manual workers and reduces mistakes. It is very worthy of recommendation to you.

Scan the code for a detailed USDP installation manual and join the UCloud Big Data Technical Exchange Group