Abstract: By establishing the connection between GaussDB(DWS) and MRS, the database supports data warehouse service SQL on Hadoop, and implements the quick import of Hive data in external mode, meeting the application scenarios of big data fusion analysis.

Lesson 17 Big Data Fusion analysis: GaussDB(DWS) Easily import MRS-Hive Data Source

In the era of big data fusion analysis, how can GaussDB(DWS) access MRS data sources? This cloud tutorial introduces you to open the door of MRS data source and complete data import DWS by remotely reading ORC data table on Hive MRS cluster.

Prepare the environment

A DWS cluster has been created. Ensure that the MRS cluster and DWS cluster reside in the same area, availability zone, and VPC subnet to ensure that the cluster network is normal.

The basic flow

The expected duration of this practice: 1 hour. The basic process is as follows:

1. Create an MRS analysis cluster (select Hive, Spark, and Tez components).

2. Upload the local TXT data file to the OBS bucket, import it to Hive through the OBS bucket, and import the TXT storage table to the ORC storage table.

3. Create the MRS data source connection.

Create an external server.

5. Create an appearance.

6. Import DWS local surface through external appearance.

Create an MRS analysis cluster

1. Log in to the Huawei Cloud Console. Choose EI Enterprise Intelligence > MapReduce Service, click Purchase Cluster, select Custom Purchase, set software configuration parameters, and click Next.

2. Set hardware parameters and click Next.

3. Set advanced parameters in the following table and click Buy Now. Wait about 15 minutes for the cluster to be created.

Prepare the ORC table data source for MRS

1. Create product_info. TXT on the local PC, copy the following data, and save it to the local PC.

100, xHDK-A-1293 -#fJ3,2017-09-01,A,2017 Autumn New Shirt Women, Red,M,328,2017-09-04,715, Good 205,KDKE-B-9947-#kL5,2017-09-01,A,2017 Autumn New Knitwear Women, PINK,L,584,2017-09-05,406, Very good! 300, JOdl-X-1937 -#pV7,2017-09-01,A,2017 Autumn New T-shirt Men, Red,XL,1245,2017-09-03,502,Bad. 310,QQPX-R-3956-#aD8,2017-09-02,B,2017 Autumn New Jacket Women, Red,L,411,2017-09-05,436,It's Really Super Nice 150, ABEF-C-1802 -#mC6,2017-09-03,B,2017 Autumn New Jeans Women, Blue,M,1223,2017-09-06,1200,The Buyer's Packaging is Esquisito 200, BCqP-E-2365 -#qE4,2017-09-04,B,2017 Autumn New Casual Pants Men, Black,L, 667, 2017-09-09-10,The clothes are Quality. 250, Eabe-D-1476 -#oB1,2017-09-10,A,2017 Autumn New Dress Women, Black,S,841,2017-09-15,299,Follow the Store for a Long time. 108,CDXK-F-1527-#pL2,2017-09-11,A,2017 Autumn New Dress Women, Red,M,85,2017-09-14,22,It's really Amazing to Buy 450,MMCE-H-4728-#nP9,2017-09-11,A,2017 Autumn New Jacket Women, White,M,114,2017-09-14,22,Open the package And the clothes have no odor 260,OCDA-G-2817-#bD3,2017-09-12,B,2017 Autumn New Woolen Coat Women, Red,L,2004,2017-09-15,826,Very Favorite Clothes 980,ZKDS-J-5490-#cW4,2017-09-13,B,2017 Autumn New Women's Cotton Clothing, RED,M,112,2017-09-16,219,The clothes are small 98,FKQB-I-2564-#dA5,2017-09-15,B,2017 Autumn new shoes Men, Green,M, 4145,2017-09-18,5473,The clothes are thick and it's better this winter. 150,DMQY-K-6579-#eS6,2017-09-21,A,2017 Autumn New Underwear Men, YELLOW,37,2840,2017-09-25,5831,This price is very cost 7,2017-09-22,A,2017 Autumn New Jeans Men, Blue, 28, 2879,2017-09-25,7200,The clothes are very Comfortable to Wear 300,HWEC-L-2531-#xP8,2017 年 09-23,A,2017 Autumn New Shoes Women, Brown,M,403,2017-09-26,607, Good 100, iqPD-M-3214 -#yQ1,2017-09-24,B,2017 Autumn New Wide Leg Pants Women, Black,M,3045,2017-09-27,5021,very good. 350,LPEC-N-4572-#zX2,2017-09-25,B,2017 Autumn New Underwear Women, Red,M,239,2017-09-28,407,The Seller's Service is Very Good 110,NQAB-O-3768-#sM3,2017-09-26,B,2017 Autumn New Underwear Women, Red,S,6089,2017-09-29,7021,The color is Very good 210,HWNB-P-7879-#tN4,2017-09-27,B,2017 Autumn New Underwear Women, Red,L, 201,2017-09-30,4059,I like it very much and Quality is Good. 230,JKHU-Q-8865-#uO5,2017-09-29,C,2017 Autumn New Clothes with Chiffon Shirt, black, M, 2056201 7-10-02384-2, very goodCopy the code

2. Log in to the OBS console, click Create Bucket, set the following parameters, and click Create Now.

3. Wait until the bucket is created, click the bucket name, choose Object > Upload Object, and upload product_info. TXT to OBS bucket.

4. Switch to the MRS console, click the created MRS cluster name, go to Overview, and click Synchronize in the row where IAM User Synchronization is located. Wait about 5 minutes until the synchronization is complete.

5. Return to the MRS cluster page, click Node Management, click any master node, switch to elastic public IP Address, click Bind Elastic public IP Address, select the existing elastic IP address, and click OK. If no elastic IP address exists, create one. Record the public IP address.

6. Confirm the active master node.

  • Use SSH to log in to the preceding nodes as user root. The password of user root is Huawei_12345, and switch to user omm.

su – omm

  • Run the following command to query the active master node: The node whose HAActive value is Active in the command output is the active master node.

Sh ${BIGDATA_HOME} / om – 0.0.1 / sbin/status – oms. Sh

7. Log in to the active master node as user root, switch to user omm, and go to the directory where the Hive client resides.

su – omm

cd /opt/client

8. Create the product_info table whose storage type is TEXTFILE on Hive.

  • In /opt/client, import environment variables.

sourcebigdata_env

  • Log in to the Hive client.

beeline

  • Execute the following SQL statements in sequence to create the Demo database and table product_info.

    CREATE DATABASE demo; USE demo; DROP TABLE product_info;

    CREATE TABLE product_info (

    product_price int not null, product_id char(30) not null, product_time date , product_level char(10) , product_name varchar(200) , product_type1 varchar(20) , product_type2 char(10) , product_monthly_sales_cnt int , product_comment_time date , product_comment_num int , product_comment_content varchar(200)

    ) row format delimited fields terminated by ‘,’ stored as TEXTFILE

9. Import the product_info. TXT file to Hive.

  • Switch back to the MRS cluster, click File Management, and click Import Data.

  • OBS path: Select the created OBS bucket name, locate the product_info. TXT file, and click Yes.

  • HDFS path: choose/user/hive/warehouse/demo db/product_info /, click “yes”.

  • Click OK and wait until the import succeeds. The product_INFO table data has been imported successfully.

10. Create an ORC table and import data to the ORC table.

  • Execute the following SQL statement to create the ORC table.

    DROP TABLE product_info_orc;

    CREATE TABLE product_info_orc (

    product_price int not null, product_id char(30) not null, product_time date , product_level char(10) , product_name varchar(200) , product_type1 varchar(20) , product_type2 char(10) , product_monthly_sales_cnt int , product_comment_time date , product_comment_num int , product_comment_content varchar(200)

    ) row format delimited fields terminated by ‘,’ stored as orc;

  • Insert the product_info table into the product_info_ORc table.

    insert into product_info_orc select * from product_info;

  • Querying ORC table data is successfully imported.

    select * from product_info_orc;

Create the MRS data source connection

1. Log in to the DWS management console and click the created DWS cluster. Ensure that the DWS cluster and MRS reside in the same region, availability zone, and VPC subnet.

2. Switch to MRS Data Source and click Create MRS Data Source Connection.

3. Select data source MRS01, user name admin, password Huawei@12345, and click OK. The data source is created successfully.

Creating an external server

1. Connect the created DWS cluster using Data Studio.

2. Create a user dbuser that has permission to create a database.

CREATE USER dbuser WITH CREATEDB PASSWORD "Bigdata@123";
Copy the code

3. Switch to the newly created dbuser user.

SET ROLE dbuser PASSWORD "Bigdata@123";
Copy the code

Create a new mydatabase database:

CREATE DATABASE mydatabase;
Copy the code

5. Perform the following steps to switch to the newly created MyDatabase.

  • In the ** object Browser ** window of the Data Studio client, right-click the database connection name and click Refresh in the pop-up menu to see the new database.

  • Right-click the database name myDatabase and click Open Connection in the pop-up menu.

  • Right-click myDatabase and click Open New Terminal in the displayed menu to open the SQL command window for connecting to the specified database. Perform the following steps in the command window.

6. Grant dbuser the permission to create an external server:

GRANT ALL ON FOREIGN DATA WRAPPER hdfs_fdw TO dbuser;
Copy the code

The FOREIGN DATA WRAPPER name can only be hdfs_fdw, and dbuser is the user name used to create the SERVER.

7. Run the following command to grant the appearance permission to the user.

ALTER USER dbuser USEFT;
Copy the code

8. Switch to the Postgres database and query the external server automatically created after the MRS data source is created.

SELECT * FROM pg_foreign_server;
Copy the code

The result is as follows:

srvname | srvowner | srvfdw | srvtype | srvversion | srvacl | srvoptions --------------------------------------------------+----------+--------+---------+------------+--------+----------------- ---------------------------------------------------------------------------------------------------- gsmpp_server | 10 |  13673 | | | | gsmpp_errorinfo_server | 10 | 13678 | | | | hdfs_server_8f79ada0_d998_4026_9020_80d6de2692ca | 16476 | 13685 | | | | {" address = 192.168.1.245:9820192168 1.218:9820 ", hdfscfgpath = / MRS / 8 f79ada0 d6de2692ca - d998-4026-9020-80 type = HDFS} (3 rows)Copy the code

9. Switch to the myDatabase database and switch to the dbuser user.

SET ROLE dbuser PASSWORD "Bigdata@123";
Copy the code

10. Create an external server.

The SERVER name, address, and configuration path must be the same as those in 8.

CREATE SERVER hdfs_server_8f79ada0_d998_4026_9020_80d6de2692ca FOREIGN DATA WRAPPER HDFS_FDW OPTIONS ( address '192.168.1.245:9820192168 1.218:9820', / / MRS Master of the management of the main network IP nodes, and DWS communications. hdfscfgpath '/MRS/8f79ada0-d998-4026-9020-80d6de2692ca', type 'hdfs' );Copy the code

11. View the external server.

SELECT * FROM pg_foreign_server WHERE srvname='hdfs_server_8f79ada0_d998_4026_9020_80d6de2692ca';
Copy the code

If the following information is displayed, the vm is successfully created:

srvname | srvowner | srvfdw | srvtype | srvversion | srvacl | srvoptions --------------------------------------------------+----------+--------+---------+------------+--------+----------------- ---------------------------------------------------------------------------------------------------- hdfs_server_8f79ada0_d998_4026_9020_80d6de2692ca | 16476 | 13685 | | | | {" address = 192.168.1.245:9820192168 1.218:29820 ", hdfscfgpath = / MRS / 8 f79ada0 d6de2692ca - d998-4026-9020-80 type = HDFS} (1) row)Copy the code

Create the appearance

1. Obtain the product_info_orc file path of Hive.

  • Log in to the MRS management console.

  • Choose Cluster List > Existing Cluster and click the name of the cluster to view the basic cluster information page.

  • Click File Management and select HDFS File List.

  • Go to the directory where the data you want to import into the GaussDB(DWS) cluster is stored and record its path.

Figure 1 Viewing the data store path on MRS

2. Create an appearance. Set SERVER name to the name of the external SERVER created in 10. Set FolderName to the path obtained in 1.

DROP FOREIGN TABLE IF EXISTS foreign_product_info;

CREATE FOREIGN TABLE foreign_product_info
(
    product_price                integer        not null,
    product_id                   char(30)       not null,
    product_time                 date           ,
    product_level                char(10)       ,
    product_name                 varchar(200)   ,
    product_type1                varchar(20)    ,
    product_type2                char(10)       ,
    product_monthly_sales_cnt    integer        ,
    product_comment_time         date           ,
    product_comment_num          integer        ,
    product_comment_content      varchar(200)                      
) SERVER hdfs_server_8f79ada0_d998_4026_9020_80d6de2692ca 
OPTIONS (
format 'orc', 
encoding 'utf8',
foldername '/user/hive/warehouse/demo.db/product_info_orc/'
) 
DISTRIBUTE BY ROUNDROBIN;
Copy the code

Performing data import

1. Create a local target table.

DROP TABLE IF EXISTS product_info;
CREATE TABLE product_info
(
    product_price                integer        not null,
    product_id                   char(30)       not null,
    product_time                 date           ,
    product_level                char(10)       ,
    product_name                 varchar(200)   ,
    product_type1                varchar(20)    ,
    product_type2                char(10)       ,
    product_monthly_sales_cnt    integer        ,
    product_comment_time         date           ,
    product_comment_num          integer        ,
    product_comment_content      varchar(200)                   
) 
with (
orientation = column,
compression=middle
) 
DISTRIBUTE BY HASH (product_id);
Copy the code

2. Import the target table from the external.

INSERT INTO product_info SELECT * FROM foreign_product_info;
Copy the code

3. Query the import result.

SELECT * FROM product_info;
Copy the code

So, practice, teach you to quickly start the data warehouse service ~

Click here for details.

Click to follow, the first time to learn about Huawei cloud fresh technology ~