Abstract:By establishing the connection between GausSDB (DWS) and MRS, it supports the data warehouse service SQL on Hadoop, and realizes the quick import of Hhive data in the appearance way to meet the application scenarios of big data fusion analysis.

This article was shared from Huawei Cloud Community”[Cloud Lessons] EI Lesson 17 Big Data Fusion Analysis: GausSDB (DWS) Easy Import of MRS-Hive Data Sources”, originally by: Hi, Ei.

In the era of big data convergence and analysis,GaussDB(DWS)If you want to visitMRS data source, how to achieve? In this cloud lesson, you will open the door of MRS data source and import data into DWS by remote reading of ORC data table on MRS cluster Hive.

Prepare the environment

The DWS cluster has been created, so it is necessary to ensure that the MRS and DWS cluster are in the same region, available zone and the same VPC subnet to ensure the cluster network interconnection.

The basic flow

Expected duration of this practice: 1 hour. The basic process is as follows:

Create MRS analysis cluster (select Hive, Spark, Tez components).

2. Uuploading local TXT data file to OBS bucket, then importing Hive through OBS bucket, and importing ORC storage table from TXT storage table.

3. Create MRS data source connection.

Create an external server.

5. Create a facade.

6. Import DWS local surface through appearance.

First, create MRS analysis cluster

1. Log in to the Huawei cloud console, select “EI Enterprise Intelligent > MapReduce Service”, click “Purchase Cluster”, select “Custom Purchase”, fill in the software configuration parameters, and click “Next”.

2. Fill in the hardware configuration parameters and click “Next”.

3. Fill in the advanced configuration parameters in the table below, click “Buy Now”, wait for about 15 minutes, and the cluster is created successfully.

2. Prepare MRS ORC table data source

Create a new product_info.txt and copy the following data and save it to local PC.

100, xhdk-a-1293 -#fJ3,2017-09-01,A,2017 Autumn New Shirt Women,red,M,328,2017-09-04,715,good 205,KDKE-B-9947-# KL5,2017-09-01,A,2017 Autumn New Knitwear Women, PINK,L,584,2017-09-05,406, Very Good! 300, jodl-x-1937 -#pV7,2017-09-01,A,2017 Autumn new T-shirt men,red,XL,1245,2017-09-03,502,Bad. 310,QQPX-R-3956-#aD8,2017-09-02,B,2017 Autumn New Jacket Women, Red,L,411,2017-09-05,436,It's really super nice 150, ABE-FC-1820 -# MC6,2017-09-03,B,2017 Autumn New Jeans Women, Blue,M,1223, 2017-09-03, 1200,The seller's packaging packaging is Exquisite 200,BCQP-E-2365-# QE4,2017-09-04,B,2017 Autumn New Casual Pants Men, Black,L,997,2017-09-10,301,The clothes are Of Good Quality. 250,EABE-D-1476-# OB1,2017-09-10,A,2017 Autumn New Dress Women, Black,S,841,2017-09-15,299,Follow the Store for a long time. 108,CDXK-F-1527-# PL2,2017-09-11,A,2017 Autumn New Dress Women, Red,M,85,2017-09-14,22,It's really Amazing to Buy 450,MMCE-H-4728-# NP9,2017-09-11,A,2017 Autumn New Jacket Women, White,M,114,2017-09-14,22,Open the package And the clothes have no odor 260,OCDA-G-2817-# BD3,2017-09-12,B,2017 Autumn New Woolen Coat Women, Red,L,2004 2017-09-15,826,Very Favorite Clothes 980,ZKDS-J- # CW4,2017-09-13,B,2017 Autumn New Women's Cotton Clothing,red,M, 12,2017-09-16,219,The clothes are small 98, 2017-09-15,B,2017 Autumn new shoes Men, Green,M,4345 2017-09-18,5473,The clothes are thick and it's better this winter. 150,DMQY-K-6579-# ES6,2017-09-21,A,2017 Autumn New Underwear Men,yellow, 2017-09-25,5831,This price is very cost Effective 200, GKLW-L-2897 -# WQ7,2017-09-22,A,2017 Autumn New Jeans Men, Blue,39,5879,2017-09-25,7200,The clothes are very Comfortable to Wear 300, HEC-L-2531 -# XP8,2017-09-23,A,2017 Autumn New Shoe Women, Brown,M,403,2017-09-26,607, Good 100,IQPD-M-3214-# YQ1,2017-09-24,B,2017 Autumn New Wide Leg Pants Women, Black,M,3045,2017-09-27,5021,very good. 350, LPIC-N-4572 -# ZX2,2017-09-25,B,2017 Autumn New Underwear Women, Red,M, 255,2017-09-28, The Seller's Service is very Good 110,NQAB-O-3768-# SM3,2017-09-26,B,2017 Autumn New Underwear Women,red,S,6089,2017-09-29, The color is very good 210,HWNB-P-7879-# TN4,2017-09-27,B,2017 Autumn New Underwear Women, Red,L, 310,2017-09-30,4059,I like it very much and The quality is good. 230,JKHU-Q-8865-# UO5,2017-09-29,C,2017 Autumn New Clothes with Chiffon Shirt, black, M, 2056201 7-10-02384-2, very good

2. Log into the OBS console, click “Create Bucket”, fill in the following parameters, and click “Create Now”.

3. After the bucket is created, click the bucket name and select “Object > Upload Object” to upload Product_Info.txt to OBS bucket.

4. Switch back to the MRS console, click the MRS cluster name created, enter “Overview”, click “Click Synchronization” in the line of “IAM User Synchronization”, and wait for about 5 minutes for the synchronization to complete.

5. Go back to the MRS cluster page, click “Node Management”, click any master node, enter the node page, switch to “Elastic Public Network IP”, click “Bind Elastic Public Network IP”, check the existing elastic IP and click “OK”, if not, please create. Record the public IP.

6. Confirm the master node.

  • Log in to the above node as root using SSH tool. The root password is Huawei_12345. Switch to OMM user.
  • su – omm
  • Execute the following command to query the master node and echo the node with the parameter value of “HAACTIVE” as “ACTIVE” in the information as the master node.
  • Sh ${BIGDATA_HOME} / om – 0.0.1 / sbin/status – oms. Sh

7. Log on to the master node as root, switch to OMM, and enter the directory where the Hive client is located.

  • su – omm
  • cd /opt/client

CREATE TABLE PRODUCT_INFO ON HIVE; CREATE TABLE PRODUCT_INFO ON HIVE;

  • Under the /opt/client path, import the environment variables.
  • source bigdata_env
  • Log in to the Hive client.
  • beeline
  • Executes the following SQL statements in sequence to create the DEMO database and the table PRODUCT_INFO.
CREATE DATABASE demo;
USE demo;
DROP TABLE product_info;
CREATE TABLE product_info 
(    
    product_price                int            not null,
    product_id                   char(30)       not null,
    product_time                 date           ,
    product_level                char(10)       ,
    product_name                 varchar(200)   ,
    product_type1                varchar(20)    ,
    product_type2                char(10)       ,
    product_monthly_sales_cnt    int            ,
    product_comment_time         date           ,
    product_comment_num          int        ,
    product_comment_content      varchar(200)                   
) 
row format delimited fields terminated by ',' 
stored as TEXTFILE

9. Import the product_info.txt data file into Hive.

  1. Cut back to the MRS cluster, click File Management, and click Import Data.
  2. OBS path: Select the OBS bucket name created above, find the product_info.txt file and click Yes.
  3. HDFS path: choose/user/hive/warehouse/demo db/product_info /, click “yes”.
  4. Click OK and wait for the import to succeed, at which point the table data for PRODUCT_INFO has been successfully imported.

Create ORC table and import data into ORC table.

  • Execute the following SQL statement to create the ORC table.
DROP TABLE product_info_orc;
CREATE TABLE product_info_orc
(    
    product_price                int            not null,
    product_id                   char(30)       not null,
    product_time                 date           ,
    product_level                char(10)       ,
    product_name                 varchar(200)   ,
    product_type1                varchar(20)    ,
    product_type2                char(10)       ,
    product_monthly_sales_cnt    int            ,
    product_comment_time         date           ,
    product_comment_num          int            ,
    product_comment_content      varchar(200)                   
) 
row format delimited fields terminated by ',' 
stored as orc;
  • INSERT TABLE PRODUCT_INFO INTO HIVE ORC TABLE PRODUCT_INFO_ORC;
insert into product_info_orc select * from product_info;
  • SQL > select * from ORC table;

select * from product_info_orc;

Create MRS data source connection

  1. Log in to the DWS administrative console, click on the created DWS cluster, and make sure that the DWS cluster is in the same region, available partitions, and under the same VPC subnet as the MRS.
  2. Switch to MRS Data Source and click Create MRS Data Source Connection.
  3. Select the preorder step to create the data source named “MRS01”, username: admin, password: Huawei@12345, click “OK”, and the creation is successful.

Create an external server

(1) Connect the created DWS cluster using Data Studio. Create user dbuser create user dbuser create user dbuser

CREATE USER dbuser WITH CREATEDB PASSWORD "Bigdata@123";

(3) Change to new user: SET ROLE dbuser PASSWORD “Bigdata@123”; CREATE DATABASE mydatabase; CREATE DATABASE mydatabase; (5) Execute the following steps to switch to connect to the newly created MyDatabase database.

  1. In the Object Browser window of the Data Studio client, right-click on the database connection name, and in the pop-up menu, click Refresh to see the newly created database.
  2. Right-click ‘MyDatabase’ database name, and in the pop-up menu click ‘Open Connection’.
  3. Right-click “MyDatabase” database name and click “Open New Terminal” in the pop-up menu to open the SQL command window connecting to the specified database. Please execute all the following steps in this command window.

(6) grant dbuser user permission to create external server:

GRANT ALL ON FOREIGN DATA WRAPPER hdfs_fdw TO dbuser;

The Foreign Data Wrapper name can only be hdfs_fdw, and dbuser is the user name for creating the SERVER.

(7) Execute the following command to give the user the permission to use the appearance. ALTER USER dbuser USEFT; (8) Switch back to Postgres system database and query the external server automatically created by the system after the creation of MRS data source. SELECT * FROM pg_foreign_server; Return results such as:

srvname | srvowner | srvfdw | srvtype | srvversion | srvacl | srvoptions --------------------------------------------------+----------+--------+---------+------------+--------+----------------- ---------------------------------------------------------------------------------------------------- gsmpp_server | 10 |  13673 | | | | gsmpp_errorinfo_server | 10 | 13678 | | | | hdfs_server_8f79ada0_d998_4026_9020_80d6de2692ca | 16476 | 13685 | | | | {" address = 192.168.1.245:9820192168 1.218:9820 ", hdfscfgpath = / MRS / 8 f79ada0 d6de2692ca - d998-4026-9020-80 type = HDFS} (3 rows)

(9) Switch to mydatabase and switch to dbuser user. SET ROLE dbuser PASSWORD “Bigdata@123”; Create an external server. The SERVER name, address and configuration path should be consistent with 8.

CREATE SERVER hdfs_server_8f79ada0_d998_4026_9020_80d6de2692ca FOREIGN DATA WRAPPER HDFS_FDW OPTIONS ( address '192.168.1.245:9820192168 1.218:9820', / / MRS Master of the management of the main network IP nodes, and DWS communications. hdfscfgpath '/MRS/8f79ada0-d998-4026-9020-80d6de2692ca', type 'hdfs' );

(11) View external servers.

SELECT * FROM pg_foreign_server WHERE srvname='hdfs_server_8f79ada0_d998_4026_9020_80d6de2692ca';

The result is as follows, indicating that the creation was successful:

srvname | srvowner | srvfdw | srvtype | srvversion | srvacl | srvoptions --------------------------------------------------+----------+--------+---------+------------+--------+----------------- ---------------------------------------------------------------------------------------------------- hdfs_server_8f79ada0_d998_4026_9020_80d6de2692ca | 16476 | 13685 | | | | {" address = 192.168.1.245:9820192168 1.218:29820 ", hdfscfgpath = / MRS / 8 f79ada0 d6de2692ca - d998-4026-9020-80 type = HDFS} (1) row)

5. Create a facade

1. Get the file path of Hive product_info_orc.

  1. Log on to the MRS admin console.
  2. Select “Cluster List > Existing Clusters”, click the name of the cluster you want to view, and go to the basic information page of the cluster.
  3. Click File Management and select HDFS File List.
  4. Go to the storage directory where you want to import the data into the GausSDB (DWS) cluster, and record its path.

Figure 1 View the data store path on the MRS

2. Create a facade. The SERVER name is filled with the name of the external SERVER created by 10, and the folderName is filled with the path found by 1.

DROP FOREIGN TABLE IF EXISTS foreign_product_info;
CREATE FOREIGN TABLE foreign_product_info
(
    product_price                integer        not null,
    product_id                   char(30)       not null,
    product_time                 date           ,
    product_level                char(10)       ,
    product_name                 varchar(200)   ,
    product_type1                varchar(20)    ,
    product_type2                char(10)       ,
    product_monthly_sales_cnt    integer        ,
    product_comment_time         date           ,
    product_comment_num          integer        ,
    product_comment_content      varchar(200)                      
) SERVER hdfs_server_8f79ada0_d998_4026_9020_80d6de2692ca 
OPTIONS (
format 'orc', 
encoding 'utf8',
foldername '/user/hive/warehouse/demo.db/product_info_orc/'
) 
DISTRIBUTE BY ROUNDROBIN;

6. Perform data import

Mysql > create local target table;

DROP TABLE IF EXISTS product_info;
CREATE TABLE product_info
(
    product_price                integer        not null,
    product_id                   char(30)       not null,
    product_time                 date           ,
    product_level                char(10)       ,
    product_name                 varchar(200)   ,
    product_type1                varchar(20)    ,
    product_type2                char(10)       ,
    product_monthly_sales_cnt    integer        ,
    product_comment_time         date           ,
    product_comment_num          integer        ,
    product_comment_content      varchar(200)                   
) 
with (
orientation = column,
compression=middle
) 
DISTRIBUTE BY HASH (product_id);

2. Import the target table from its appearance.

INSERT INTO product_info SELECT * FROM foreign_product_info;

3, query import results.

SELECT * FROM product_info; So, let’s practice and teach you how to get started with data warehouse services quickly

Please click here for more details.

Click on the attention, the first time to understand Huawei cloud fresh technology ~