Net the most detailed big data Hive article series, strongly suggest collection plus attention!

Each new article has a list of historical articles to help you review previous knowledge points.

Series of historical articles

Big Data Hive 2021 (12) : Hive Integrated Case!!

Big data Hive 2021 (XI) : Hive tuning

2021 Big Data Hive (X) : Hive data storage format

2021 Big Data Hive (9) : Hive data compression

2021 Big data Hive (8) : Hive custom functions

Big data Hive 2021 (7) : Hive window function

Big data Hive 2021 (VI) : Table generating functions for Hive

2021 Big Data Hive (5) : Hive built-in functions (math, String, date, condition, Transform, row to column)

2021 Big data Hive (4) : Hive query syntax

Hive (2021) : How to understand Hive database and table operations

2021 Big data Hive (2) : Three Hive installation modes are used together with MySQL

Big data Hive (1) : Basic concepts of Hive

preface

2021 Quality creation blog in the field of big data, take you from the entry to the master, the blog is updated every day, gradually improve the knowledge system of big data articles, to help you learn more efficiently.

The following is the basic ten HiveSQL questions, must be able to tear out, otherwise the interview is cool, from the luring interviewer you still need 100 HiveSQL questions, don’t think so much, give yourself a small goal, fix the ten questions!

HiveSQL ten questions

The first question

1, requirements,

We have the following user access data

userId visitDate visitCount
u01 2021/1/21 5
u02 2021/1/23 6
u03 2021/1/22 8
u04 2021/1/20 3
u01 2021/1/23 6
u01 2021/2/21 8
u02 2021/1/23 6
u01 2021/2/22 4

SQL is required to calculate the cumulative number of access times for each user, as shown in the following table:

The user id in subtotal The cumulative
u01 The 2021-01 11 11
u01 The 2021-02 12 23
u02 The 2021-01 12 12
u03 The 2021-01 8 8
u04 The 2021-01 3 3

2. Data preparation

CREATE TABLE test_sql.test1 ( userId string, visitDate string, visitCount INT ) ROW format delimited FIELDS TERMINATED BY "\t";
INSERT INTO TABLE test_sql.test1
VALUES
	( 'u01'.'2021/1/21'.5 ),
	( 'u02'.'2021/1/23'.6 ),
	( 'u03'.'2021/1/22'.8 ),
	( 'u04'.'2021/1/20'.3 ),
	( 'u01'.'2021/1/23'.6 ),
	( 'u01'.'2021/2/21'.8 ),
	( 'u02'.'2021/1/23'.6 ),
	( 'u01'.'2021/2/22'.4 );
Copy the code

3, query SQL

SELECT
	t2.userid,
	t2.visitmonth,
	subtotal_visit_cnt,
	sum( subtotal_visit_cnt ) over ( PARTITION BY userid ORDER BY visitmonth ) AS total_visit_cnt 
FROM
	(
	SELECT
		userid,
		visitmonth,
		sum( visitcount ) AS subtotal_visit_cnt 
	FROM
		( SELECT userid, date_format( regexp_replace ( visitdate, '/'.The '-' ), 'yyyy-MM' ) AS visitmonth, visitcount FROM test_sql.test1 ) t1 
	GROUP BY
		userid,
		visitmonth 
	) t2 
ORDER BY
	t2.userid,
	t2.visitmonth;
Copy the code

4. Execution results

The second question

1, requirements,

There are 50W STORES in JINGdong. When each customer visitor visits any product in any store, an access log will be generated. The table of the access log is called Visit, the user ID of the visitor is user_id, and the shop name is shop.

user_id shop
u1 a
u2 b
u1 b
u1 a
u3 c
u4 b
u1 a
u2 c
u5 b
u4 b
u6 c
u2 c
u1 b
u2 a
u2 a
u3 a
u5 a
u5 a
u5 a

Please statistics:

(1) UV of each store (Number of visitors)

(2) information of top3 visitors of each store. Output store name, visitor ID and number of visits

2. Data preparation

CREATE TABLE test_sql.test2 ( user_id string, shop string ) ROW format delimited FIELDS TERMINATED BY '\t';
INSERT INTO TABLE test_sql.test2
VALUES
	( 'u1'.'a' ),
	( 'u2'.'b' ),
	( 'u1'.'b' ),
	( 'u1'.'a' ),
	( 'u3'.'c' ),
	( 'u4'.'b' ),
	( 'u1'.'a' ),
	( 'u2'.'c' ),
	( 'u5'.'b' ),
	( 'u4'.'b' ),
	( 'u6'.'c' ),
	( 'u2'.'c' ),
	( 'u1'.'b' ),
	( 'u2'.'a' ),
	( 'u2'.'a' ),
	( 'u3'.'a' ),
	( 'u5'.'a' ),
	( 'u5'.'a' ),
	( 'u5'.'a' );
Copy the code

3, query SQL implementation

(1)

Method 1:

# UV per store (number of visitors)SELECT shop,count(DISTINCT user_id) FROM test_sql.test2 GROUP BY shop
Copy the code

Method 2:

# UV per store (number of visitors)SELECT
	t.shop,
	count(*) 
FROM
	( SELECT user_id, shop FROM test_sql.test2 GROUP BY user_id, shop ) t 
GROUP BY
	t.shop;
Copy the code

(2)

# top3 visitor information of each store. Output store name, visitor ID and number of visitsSELECT
	t2.shop,
	t2.user_id,
	t2.cnt 
FROM
	(
	SELECT
		t1.*.row_number(a)over ( PARTITION BY t1.shop ORDER BY t1.cnt DESC ) rank 
	FROM
		( SELECT user_id, shop, count(*) AS cnt FROM test_sql.test2 GROUP BY user_id, shop ) t1 
	) t2 
WHERE
	rank < = 3;
Copy the code

4. Execution results

(1)

(2)

The third question

1, requirements,

Table stg. ORDER (Date, Order_id, User_id, amount);

Data sample: 2021-01-01, 10029028100003, 251,33.57.

Please provide SQL for statistics:

(1) Give the order number, users and total transaction amount of each month in 2021.

(2) Give the number of new customers in November 2021 (the first order will be placed in November)

2. Data preparation

CREATE TABLE test_sql.test3 ( dt string, order_id string, user_id string, amount DECIMAL ( 10.2))ROW format delimited FIELDS TERMINATED BY '\t';
INSERT INTO TABLE test_sql.test3
VALUES
	( '2021-01-01'.'10029028'.'1000003251'.33.57 );
INSERT INTO TABLE test_sql.test3
VALUES
	( '2021-01-01'.'10029029'.'1000003251'.33.57 );
INSERT INTO TABLE test_sql.test3
VALUES
	( '2021-01-01'.'100290288'.'1000003252'.33.57 );
INSERT INTO TABLE test_sql.test3
VALUES
	( '2021-02-02'.'10029088'.'1000003251'.33.57 );
INSERT INTO TABLE test_sql.test3
VALUES
	( '2021-02-02'.'100290281'.'1000003251'.33.57 );
INSERT INTO TABLE test_sql.test3
VALUES
	( '2021-02-02'.'100290282'.'1000003253'.33.57 );
INSERT INTO TABLE test_sql.test3
VALUES
	( '2021-11-02'.'10290282'.'100003253'.234 );
INSERT INTO TABLE test_sql.test3
VALUES
	( '2018-11-02'.'10290284'.'100003243'.234 );
Copy the code

3, query SQL

(1) Give the order number, users and total transaction amount of each month in 2021.

SELECT
	t1.mon,
	count( t1.order_id ) AS order_cnt,
	count( DISTINCT t1.user_id ) AS user_cnt,
	sum( amount ) AS total_amount 
FROM
	(
	SELECT
		order_id,
		user_id,
		amount,
		date_format( dt, 'yyyy-MM' ) mon 
	FROM
		test_sql.test3 
	WHERE
		date_format( dt, 'yyyy' ) = '2021' 
	) t1 
GROUP BY
	t1.mon;
Copy the code

(2) Give the number of new customers in November 2021 (the first order will be placed in November)

SELECT
	count( user_id ) 
FROM
	test_sql.test3 
GROUP BY
	user_id 
HAVING
	date_format( min( dt ), 'yyyy-MM' )= 'the 2021-11';
Copy the code

4. Execution results

(1)

(2)

The fourth question

1, requirements,

There is a 50 million user file (user_id, name, age), a 200 million recorded user movie-watching log file (user_id, URL), sorted by the number of movies watched by age group?

2. Data preparation

CREATE TABLE test_sql.test4user ( user_id string, NAME string, age INT );
CREATE TABLE test_sql.test4log ( user_id string, url string );
INSERT INTO TABLE test_sql.test4user
VALUES
	( '001'.'u1'.10 );
INSERT INTO TABLE test_sql.test4user
VALUES
	( '002'.'u2'.15 );
INSERT INTO TABLE test_sql.test4user
VALUES
	( '003'.'u3'.15 );
INSERT INTO TABLE test_sql.test4user
VALUES
	( '004'.'u4'.20 );
INSERT INTO TABLE test_sql.test4user
VALUES
	( '005'.'u5'.25 );
INSERT INTO TABLE test_sql.test4user
VALUES
	( '006'.'u6'.35 );
INSERT INTO TABLE test_sql.test4user
VALUES
	( '007'.'u7'.40 );
INSERT INTO TABLE test_sql.test4user
VALUES
	( '008'.'u8'.45 );
INSERT INTO TABLE test_sql.test4user
VALUES
	( '009'.'u9'.50 );
INSERT INTO TABLE test_sql.test4user
VALUES
	( '0010'.'u10'.65 );
INSERT INTO TABLE test_sql.test4log
VALUES
	( '001'.'url1' );
INSERT INTO TABLE test_sql.test4log
VALUES
	( '002'.'url1' );
INSERT INTO TABLE test_sql.test4log
VALUES
	( '003'.'url2' );
INSERT INTO TABLE test_sql.test4log
VALUES
	( '004'.'url3' );
INSERT INTO TABLE test_sql.test4log
VALUES
	( '005'.'url3' );
INSERT INTO TABLE test_sql.test4log
VALUES
	( '006'.'url1' );
INSERT INTO TABLE test_sql.test4log
VALUES
	( '007'.'url5' );
INSERT INTO TABLE test_sql.test4log
VALUES
	( '008'.'url7' );
INSERT INTO TABLE test_sql.test4log
VALUES
	( '009'.'url5' );
INSERT INTO TABLE test_sql.test4log
VALUES
	( '0010'.'url1' );
Copy the code

3, query SQL

Method 1

SELECT
	t2.age_phase,
	sum( t1.cnt ) AS view_cnt 
FROM
	( SELECT user_id, count(*) cnt FROM test_sql.test4log GROUP BY user_id ) t1
	JOIN (
	SELECT
		user_id,
	CASE
			
			WHEN age < = 10 AND age > 0 THEN
			'0 to 10' 
			WHEN age < = 20 AND age > 10 THEN
			'10-20' 
			WHEN age > 20 
			AND age < = 30 THEN '20-30' WHEN age > 30 
				AND age < = 40 THEN '30-40' WHEN age > 40 
					AND age < = 50 THEN '40-50' WHEN age > 50 
						AND age < = 60 THEN '50 to 60' WHEN age > 60 
							AND age < = 70 THEN
								'60-70' ELSE '70' 
								END AS age_phase 
						FROM
							test_sql.test4user 
						) t2 ON t1.user_id = t2.user_id 
					GROUP BY
					t2.age_phase;
Copy the code

Way 2

SELECT
	concat( phase - 10.The '-', phase ),
	sum( cnt ) sum_movies 
FROM
	(
	SELECT
		*.ceil( age / 10 ) * 10 phase 
	FROM
		test4user a
		JOIN ( SELECT user_id, count( url ) cnt FROM test4log GROUP BY user_id ) b ON a.user_id = b.user_id 
	) c 
GROUP BY
	c.phase;
Copy the code

4. Execution results

Method 1

Way 2

The fifth problem

1, requirements,

Please write the code to find the total number and average age of all users and active users. (Active users refer to users who have access records for two consecutive days.)

The date of the user Age 2019-02-11, test_1, 11, 232019-02 – test_2, 11, 192019-02 – test_3, 11, 392019-02 – test_1, 11, 232019-02 – test_3, 11, 392019-02 – test_1, 232019-02-12, test_2, 13, 192019-02 – test_1, 15, 232019-02 – test_2, 16, 192019-02 – test_2, 19

2. Data preparation

CREATE TABLE test5 ( dt string, user_id string, age INT ) ROW format delimited FIELDS TERMINATED BY ', ';
INSERT INTO TABLE test_sql.test5
VALUES
	( '2019-02-11'.'test_1'.23 );
INSERT INTO TABLE test_sql.test5
VALUES
	( '2019-02-11'.'test_2'.19 );
INSERT INTO TABLE test_sql.test5
VALUES
	( '2019-02-11'.'test_3'.39 );
INSERT INTO TABLE test_sql.test5
VALUES
	( '2019-02-11'.'test_1'.23 );
INSERT INTO TABLE test_sql.test5
VALUES
	( '2019-02-11'.'test_3'.39 );
INSERT INTO TABLE test_sql.test5
VALUES
	( '2019-02-11'.'test_1'.23 );
INSERT INTO TABLE test_sql.test5
VALUES
	( '2019-02-12'.'test_2'.19 );
INSERT INTO TABLE test_sql.test5
VALUES
	( '2019-02-13'.'test_1'.23 );
INSERT INTO TABLE test_sql.test5
VALUES
	( '2019-02-15'.'test_2'.19 );
INSERT INTO TABLE test_sql.test5
VALUES
	( '2019-02-16'.'test_2'.19 );
Copy the code

3, query SQL

Method 1

SELECT
	sum( total_user_cnt ) total_user_cnt,
	sum( total_user_avg_age ) total_user_avg_age,
	sum( two_days_cnt ) two_days_cnt,
	sum( avg_age ) avg_age 
FROM
	(
	SELECT
		0 total_user_cnt,
		0 total_user_avg_age,
		count(*) AS two_days_cnt,
		cast(
		sum( age ) / count(*) AS DECIMAL ( 5.2 )) AS avg_age 
	FROM
		(
		SELECT
			user_id,
			max( age ) age 
		FROM
			(
			SELECT
				user_id,
				max( age ) age 
			FROM
				(
				SELECT
					user_id,
					age,
					date_sub( dt, rank ) flag 
				FROM
					(
					SELECT
						dt,
						user_id,
						max( age ) age,
						row_number(a)over ( PARTITION BY user_id ORDER BY dt ) rank 
					FROM
						test_sql.test5 
					GROUP BY
						dt,
						user_id 
					) t1 
				) t2 
			GROUP BY
				user_id,
				flag 
			HAVING
				count(*) > = 2 
			) t3 
		GROUP BY
			user_id 
		) t4 UNION ALL
	SELECT
		count(*) total_user_cnt,
		cast(
		sum( age ) / count(*) AS DECIMAL ( 5.2 )) total_user_avg_age,
		0 two_days_cnt,
		0 avg_age 
	FROM
		( SELECT user_id, max( age ) age FROM test_sql.test5 GROUP BY user_id ) t5 
	) t6;
Copy the code

Way 2

SELECT
	* 
FROM
	(
	SELECT
		count( user_id ) total_cnt_users,
		avg( age ) total_avg_age 
	FROM
		( SELECT user_id, max( age ) age FROM test5 GROUP BY user_id ) g 
	) h
	CROSS JOIN (
	SELECT
		count( user_id ) hot_users_count,
		avg( age ) hot_age_avg 
	FROM
		(
		SELECT
			d.user_id,
			max( d.age ) age 
		FROM
			(
			SELECT
				user_id,
				max( age ) age,
				count( 1 ) cnt 
			FROM
				(
				SELECT
					*,
					date_sub( dt, rank ) dt2 
				FROM
					(
					SELECT
						user_id,
						dt,
						max( age ) age,
						ROW_NUMBER(a)over ( PARTITION BY a.user_id ORDER BY a.dt ) rank 
					FROM
						( SELECT DISTINCT dt, age, user_id FROM test5 ) a 
					GROUP BY
						a.user_id,
						a.dt 
					) b 
				) c 
			GROUP BY
				c.user_id,
				c.dt2 
			HAVING
				cnt > 1 
			) d 
		GROUP BY
			d.user_id 
		) e 
	) f ON 1 = 1;
Copy the code

4. Execution results

Method 1

Way 2

The sixth question

1, requirements,

Table ordertable :(purchase user: userid, amount: money, purchase time: paymenttime(format: 2021-10-01), orderid: orderid

2. Data preparation

CREATE TABLE test_sql.test6 ( userid string, money DECIMAL ( 10.2 ), paymenttime string, orderid string );
INSERT INTO TABLE test_sql.test6
VALUES
	( '001'.100.'2021-10-01'.'123' );
INSERT INTO TABLE test_sql.test6
VALUES
	( '001'.200.'2021-10-02'.'124' );
INSERT INTO TABLE test_sql.test6
VALUES
	( '002'.500.'2021-10-01'.'125' );
INSERT INTO TABLE test_sql.test6
VALUES
	( '001'.100.'2021-11-01'.'126' );
Copy the code

3, query SQL

SELECT
	userid,
	paymenttime,
	money,
	orderid 
FROM
	(
	SELECT
		userid,
		money,
		paymenttime,
		orderid,
		row_number(a)over ( PARTITION BY userid ORDER BY paymenttime ) rank 
	FROM
		test_sql.test6 
	WHERE
		date_format( paymenttime, 'yyyy-MM' ) = 'the 2021-10' 
	) t 
WHERE
	rank = 1;
Copy the code

4. Execution results

Number 7

1, requirements,

The three data models of the existing library management database are as follows:

BOOK (data table name: BOOK)

The serial number The field names The field The field type
1 BOOK_ID The total number The text
2 SORT Classification, The text
3 BOOK_NAME Title: The text
4 WRITER The author The text
5 OUTPUT The publisher The text
6 PRICE unit Value (keep 2 decimal places)

READER (data table name: READER)

The serial number The field names The field The field type
1 READER_ID Library card number The text
2 COMPANY unit The text
3 NAME The name The text
4 SEX gender The text
5 GRADE The title The text
6 ADDR address The text

Borrowing Records (Data sheet name: BORROW LOG)

The serial number The field names The field The field type
1 READER_ID Library card number The text
2 BOOK_ID The total number The text
3 BORROW_DATE Borrow date The date of

(1) to create the library library of books, readers and borrowing three basic table table structure. Please write a construction sentence.

(2) Find out the NAME and COMPANY of the reader surnamed Li.

(3) Find all book names (BOOK_NAME) and PRICE (PRICE) of “Higher Education Press”, the result is in descending order by unit PRICE.

(4) Search the SORT of books whose PRICE is between 10 yuan and 20 yuan, and the result is sorted in ascending order by OUTPUT and PRICE.

(5) Find the names and companies of all readers who have borrowed books.

(6) Seek the highest unit price, lowest unit price and average unit price of “Science Press” books.

(7) Find out the names and units of readers who have borrowed at least 2 books (no less than 2 books).

(8) In view of data security, the data in “Borrowing Records” needs to be backed up regularly. Please use an SQL statement to create a data table BORROW_LOG_BAK with the same table structure as “Borrowing Records” under the backup user BAK. Well and copy all the existing data in “Borrowing records” to BORROW_L0G_ BAK.

(9) now requires the original Oracle database data migration to Hive in the warehouse, please write a “book” built in the Hive table statements (Hive, tip: column delimiter |; Table data needs external import: partition name “month_part” and “day_part” respectively.)

Update user_dinner (user_id 20000) from monthly partition 202106 of table A to bonc8920. Update user_dinner (user_id 20000) from monthly partition 202106 to bonc8920. (Hive implementation: Hlive does not have update syntax, please use other methods to update data)

2. Data preparation

(1) create table book

CREATE TABLE test_sql.book (
	book_id string,
	`SORT` string,
	book_name string,
	writer string,
	OUTPUT string,
price DECIMAL ( 10.2 ));
INSERT INTO TABLE test_sql.book
VALUES
	( '001'.'TP391'.'Information processing'.'author1'.China Machine Press.'20' );
INSERT INTO TABLE test_sql.book
VALUES
	( '002'.'TP392'.'Database'.'author12'.'Science Press'.'15' );
INSERT INTO TABLE test_sql.book
VALUES
	( '003'.'TP393'.'Computer Network'.'author3'.China Machine Press.'and' );
INSERT INTO TABLE test_sql.book
VALUES
	( '004'.'TP399'.'Microcomputer Principles'.'author4'.'Science Press'.'and' );
INSERT INTO TABLE test_sql.book
VALUES
	( '005'.'C931'.'Management Information System'.'author5'.China Machine Press.'40' );
INSERT INTO TABLE test_sql.book
VALUES
	( '006'.'C932'.Operations Research.'author6'.'Science Press'.'55' );
INSERT INTO TABLE test_sql.book
VALUES
	( '007'.'C939'.'Big Data Platform Architecture and Prototype Implementation'.'author7'.'Higher Education Press'.'66' );
Copy the code

(2) create reader table

CREATE TABLE test_sql.reader ( reader_id string, company string, NAME string, sex string, grade string, addr string );
INSERT INTO TABLE test_sql.reader
VALUES
	( '0001'.Alibaba.'jack'.'male'.'vp'.'addr1' );
INSERT INTO TABLE test_sql.reader
VALUES
	( '0002'."Baidu".'robin'.'male'.'vp'.'addr2' );
INSERT INTO TABLE test_sql.reader
VALUES
	( '0003'.'company'.'tony'.'male'.'vp'.'addr3' );
INSERT INTO TABLE test_sql.reader
VALUES
	( '0004'.'jingdong'.'jasper'.'male'.'cfo'.'addr4' );
INSERT INTO TABLE test_sql.reader
VALUES
	( '0005'.'netease'.'zhangsan'.'woman'.'ceo'.'addr5' );
INSERT INTO TABLE test_sql.reader
VALUES
	( '0006'.'sohu'.'lisi'.'woman'.'ceo'.'addr6' );
INSERT INTO TABLE test_sql.reader
VALUES
	( '0007'.'Meituan'.'li3 ge'.'male'.'Big Data Development'.'addr7' );
Copy the code

(3) Create borrowing record form Borrow_log

CREATE TABLE test_sql.borrow_log ( reader_id string, book_id string, borrow_date string );
INSERT INTO TABLE test_sql.borrow_log
VALUES
	( '0001'.'002'.'2021-10-14' );
INSERT INTO TABLE test_sql.borrow_log
VALUES
	( '0002'.'001'.'2021-10-13' );
INSERT INTO TABLE test_sql.borrow_log
VALUES
	( '0003'.'005'.'2021-09-14' );
INSERT INTO TABLE test_sql.borrow_log
VALUES
	( '0004'.'006'.'2021-08-15' );
INSERT INTO TABLE test_sql.borrow_log
VALUES
	( '0005'.'003'.'2021-10-10' );
INSERT INTO TABLE test_sql.borrow_log
VALUES
	( '0006'.'004'.'2021-12-13' );
INSERT INTO TABLE test_sql.borrow_log
VALUES
	( '0007'.'003'.'2021-10-16' );
INSERT INTO TABLE test_sql.borrow_log
VALUES
	( '0007'.'008'.'2021-10-16' );
Copy the code

3, query SQL

(1) to create the library library of books, readers and borrowing three basic table table structure. Please write a construction sentence.

CREATE TABLE test_sql.book (
	book_id string,
	`SORT` string,
	book_name string,
	writer string,
	OUTPUT string,
price DECIMAL ( 10.2 ));
CREATE TABLE test_sql.reader ( reader_id string, company string, NAME string, sex string, grade string, addr string );
CREATE TABLE test_sql.borrow_log ( reader_id string, book_id string, borrow_date string );
Copy the code

(2) Find out the NAME and COMPANY of the reader surnamed Li.

SELECT name, company FROM test_sql.reader WHERE name LIKE 'l %';
Copy the code

(3) Find all book names (BOOK_NAME) and PRICE (PRICE) of “Higher Education Press”, the result is in descending order by unit PRICE.

SELECT
	book_name,
	price 
FROM
	test_sql.book 
WHERE
	OUTPUT ="Higher Education Press"ORDER BY
	price DESC;
Copy the code

(4) Search the SORT of books whose PRICE is between 10 yuan and 20 yuan, and the result is sorted by OUTPUT and PRICE ascending order.

Method 1

SELECT
	sort,
	output,
	price 
FROM
	test_sql.book 
WHERE
	price > = 10 
	AND price < = 20 
ORDER BY
	output,
	price;
Copy the code

Way 2

SELECT
	sort,
	output,
	price 
FROM
	book 
WHERE
	price BETWEEN 10 
	AND 20 
ORDER BY
	output ASC,
	price ASC;
Copy the code

(5) Find the names and companies of all readers who have borrowed books.

SELECT
	b.NAME,
	b.company 
FROM
	test_sql.borrow_log a
	JOIN test_sql.reader b ON a.reader_id = b.reader_id;
Copy the code

(6) Seek the highest unit price, lowest unit price and average unit price of “Science Press” books.

SELECT
	max( price ),
	min( price ),
	avg( price ) 
FROM
	test_sql.book 
WHERE
	OUTPUT = 'Science Press';
Copy the code

(7) Find out the names and units of readers who have borrowed at least 2 books (no less than 2 books).

SELECT
	b.NAME,
	b.company 
FROM
	( SELECT reader_id FROM test_sql.borrow_log GROUP BY reader_id HAVING count(*) > = 2 ) a
	JOIN test_sql.reader b ON a.reader_id = b.reader_id;
Copy the code

(8) In view of the need of data security, the data in “Borrowing Records” needs to be backed up regularly. Please use an SQL statement to create a data table BORROW_LOG_BAK under the backup user BAK whose table structure is completely consistent with that of “Borrowing Records”. And all the existing data in “Borrowing records” are copied to BORROW_L0G_ BAK.

CREATE TABLE test_sql.borrow_log_bak AS SELECT
* 
FROM
	test_sql.borrow_log;
Copy the code

(9) now requires the original Oracle database data migration to Hive in the warehouse, please write a “book” built in the Hive table statements (Hive, tip: column delimiter |; Table data needs external import: partition name “month_part” and “day_part” respectively.)

CREATE TABLE book_hive 
	( book_id string, SORT string, book_name string, writer string, OUTPUT string, price DECIMAL ( 10.2 ) ) 
	partitioned BY ( month_part string, day_part string ) 
	ROW format delimited FIELDS TERMINATED BY '\ |' stored AS textfile;
Copy the code

Update user_dinner (user_id 20000) from monthly partition 202106 of table A to bonc8920. Update user_dinner (user_id 20000) from monthly partition 202106 to bonc8920. (Hive implementation: Hlive does not have update syntax, please use other methods to update data)

Mode 1: Configure Hive to support transaction operations, bucket table, and ORC storage format

Method 2: The first step is to find the data to be updated, replace the field to be changed with the new value, the second step is to find the data that does not need to be updated, and the third step is to insert the data from the previous two steps into a new table.

4. Execution results

(2)

(3)

(4)

(5)

(6)

(7)

(8)

(9)

The eighth problem

1, requirements,

There is an online server access log in the following format (answer in SQL)

Select top10 IP addresses for accessing the/API /user/login interface at 14:00 on November 9

2. Data preparation

CREATE TABLE test_sql.test8 ( `date` string, interface string, ip string );
INSERT INTO TABLE test_sql.test8
VALUES
	( 'the 2016-11-09 11:22:05'.'/api/user/login'.'110.23.5.23' );
INSERT INTO TABLE test_sql.test8
VALUES
	( 'the 2016-11-09 11:23:10'.'/api/user/detail'.'57.3.2.16' );
INSERT INTO TABLE test_sql.test8
VALUES
	( 'the 2016-11-09 23:59:40'.'/api/user/login'.'200.6.5.166' );
INSERT INTO TABLE test_sql.test8
VALUES
	( 'the 2016-11-09 11:14:23'.'/api/user/login'.'136.79.47.70' );
INSERT INTO TABLE test_sql.test8
VALUES
	( 'the 2016-11-09 11:15:23'.'/api/user/detail'.'94.144.143.141' );
INSERT INTO TABLE test_sql.test8
VALUES
	( 'the 2016-11-09 11:16:23'.'/api/user/login'.'197.161.8.206' );
INSERT INTO TABLE test_sql.test8
VALUES
	( 'the 2016-11-09 12:14:23'.'/api/user/detail'.'240.227.107.145' );
INSERT INTO TABLE test_sql.test8
VALUES
	( 'the 2016-11-09 13:14:23'.'/api/user/login'.'79.130.122.205' );
INSERT INTO TABLE test_sql.test8
VALUES
	( 'the 2016-11-09 14:14:23'.'/api/user/detail'.'65.228.251.189' );
INSERT INTO TABLE test_sql.test8
VALUES
	( 'the 2016-11-09 14:15:23'.'/api/user/detail'.'245.23.122.44' );
INSERT INTO TABLE test_sql.test8
VALUES
	( 'the 2016-11-09 14:17:23'.'/api/user/detail'.'22.74.142.137' );
INSERT INTO TABLE test_sql.test8
VALUES
	( 'the 2016-11-09 14:19:23'.'/api/user/detail'.'54.93.212.87' );
INSERT INTO TABLE test_sql.test8
VALUES
	( 'the 2016-11-09 14:20:23'.'/api/user/detail'.'218.15.167.248' );
INSERT INTO TABLE test_sql.test8
VALUES
	( 'the 2016-11-09 14:24:23'.'/api/user/detail'.'20.117.19.75' );
INSERT INTO TABLE test_sql.test8
VALUES
	( 'the 2016-11-09 15:14:23'.'/api/user/login'.'183.162.66.97' );
INSERT INTO TABLE test_sql.test8
VALUES
	( 'the 2016-11-09 16:14:23'.'/api/user/login'.'108.181.245.147' );
INSERT INTO TABLE test_sql.test8
VALUES
	( 'the 2016-11-09 14:17:23'.'/api/user/login'.'22.74.142.137' );
INSERT INTO TABLE test_sql.test8
VALUES
	( 'the 2016-11-09 14:19:23'.'/api/user/login'.'22.74.142.137' );
Copy the code

3, query SQL

SELECT
	ip,
	count(*) AS cnt 
FROM
	test_sql.test8 
WHERE
	date_format( `date`, 'yyyy-MM-dd HH' ) > = 'the 2016-11-09 14' 
	AND date_format( `date`, 'yyyy-MM-dd HH' ) < 'the 2016-11-09 15' 
	AND interface = '/api/user/login' 
GROUP BY
	ip 
ORDER BY
	cnt DESC 
	LIMIT 10;
Copy the code

4. Execution results

Question 9

1, requirements,

There is a recharge log table credit_log with the following fields:

‘dist_id’ int ‘zone group ID ‘,

‘account’ string ‘account ‘,

‘money’ int ‘recharge amount ‘,’

Create_time ‘string’ order time ‘

Write an SQL statement to query the account with the largest amount of recharge in each area group on January 02, 2021

Results:

Area group ID, account number, amount, recharge time

2. Data preparation

CREATE TABLE test_sql.test9 ( dist_id string COMMENT 'area group id', account string COMMENT 'account', `money` DECIMAL ( 10.2 ) COMMENT 'Top up amount', create_time string COMMENT 'Order time' );
INSERT INTO TABLE test_sql.test9
VALUES
	( '1'.'11'.100006.'the 2021-01-02 13:00:01' );
INSERT INTO TABLE test_sql.test9
VALUES
	( '1'.'22'.110000.'the 2021-01-02 13:00:02' );
INSERT INTO TABLE test_sql.test9
VALUES
	( '1'.'33'.102000.'the 2021-01-02 13:00:03' );
INSERT INTO TABLE test_sql.test9
VALUES
	( '1'.'44'.100300.'the 2021-01-02 13:00:04' );
INSERT INTO TABLE test_sql.test9
VALUES
	( '1'.'55'.100040.'the 2021-01-02 13:00:05' );
INSERT INTO TABLE test_sql.test9
VALUES
	( '1'.'66'.100005.'the 2021-01-02 13:00:06' );
INSERT INTO TABLE test_sql.test9
VALUES
	( '1'.'77'.180000.'the 2021-01-03 13:00:07' );
INSERT INTO TABLE test_sql.test9
VALUES
	( '1'.'88'.106000.'the 2021-01-02 13:00:08' );
INSERT INTO TABLE test_sql.test9
VALUES
	( '1'.'99'.100400.'the 2021-01-02 13:00:09' );
INSERT INTO TABLE test_sql.test9
VALUES
	( '1'.'12'.100030.'the 2021-01-02 13:00:10' );
INSERT INTO TABLE test_sql.test9
VALUES
	( '1'.'13'.100003.'the 2021-01-02 13:00:20' );
INSERT INTO TABLE test_sql.test9
VALUES
	( '1'.'14'.100020.'the 2021-01-02 13:00:30' );
INSERT INTO TABLE test_sql.test9
VALUES
	( '1'.'15'.100500.'the 2021-01-02 13:00:40' );
INSERT INTO TABLE test_sql.test9
VALUES
	( '1'.'16'.106000.'the 2021-01-02 13:00:50' );
INSERT INTO TABLE test_sql.test9
VALUES
	( '1'.'17'.100800.'the 2021-01-02 13:00:59' );
INSERT INTO TABLE test_sql.test9
VALUES
	( '2'.'18'.100800.'the 2021-01-02 13:00:11' );
INSERT INTO TABLE test_sql.test9
VALUES
	( '2'.'the'.100030.'the 2021-01-02 13:00:12' );
INSERT INTO TABLE test_sql.test9
VALUES
	( '2'.'10'.100000.'the 2021-01-02 13:00:13' );
INSERT INTO TABLE test_sql.test9
VALUES
	( '2'.'45'.100010.'the 2021-01-02 13:00:14' );
INSERT INTO TABLE test_sql.test9
VALUES
	( '2'.'78'.100070.'the 2021-01-02 13:00:15' );
Copy the code

3, query SQL

WITH TEMP AS (
	SELECT
		dist_id,
		account,
		sum( `money` ) sum_money 
	FROM
		test_sql.test9 
	WHERE
		date_format( create_time, 'yyyy-MM-dd' ) = '2021-01-02' 
	GROUP BY
		dist_id,
		account 
	) SELECT
	t1.dist_id,
	t1.account,
	t1.sum_money 
FROM
	(
	SELECT
		temp.dist_id,
		temp.account,
		temp.sum_money,
		rank(a)over ( PARTITION BY temp.dist_id ORDER BY temp.sum_money DESC ) ranks 
	FROM
		TEMP 
	) t1 
WHERE
	ranks = 1;
Copy the code

4. Execution results

The first ten questions

1, requirements,

There is an account table as follows, please write SQL statement to query the top 10 gold accounts of each region group (select top 10 accounts for grouping).

Dist_id string ‘zone group ID ‘,

Account string ‘account ‘,

Gold int

2. Data preparation

CREATE TABLE test_sql.test10 ( `dist_id` string COMMENT 'area group id', `account` string COMMENT 'account', `gold` INT COMMENT 'gold' );
INSERT INTO TABLE test_sql.test10
VALUES
	( '1'.'77'.18 );
INSERT INTO TABLE test_sql.test10
VALUES
	( '1'.'88'.106 );
INSERT INTO TABLE test_sql.test10
VALUES
	( '1'.'99'.10 );
INSERT INTO TABLE test_sql.test10
VALUES
	( '1'.'12'.13 );
INSERT INTO TABLE test_sql.test10
VALUES
	( '1'.'13'.14 );
INSERT INTO TABLE test_sql.test10
VALUES
	( '1'.'14'.25 );
INSERT INTO TABLE test_sql.test10
VALUES
	( '1'.'15'.36 );
INSERT INTO TABLE test_sql.test10
VALUES
	( '1'.'16'.12 );
INSERT INTO TABLE test_sql.test10
VALUES
	( '1'.'17'.158 );
INSERT INTO TABLE test_sql.test10
VALUES
	( '2'.'18'.12 );
INSERT INTO TABLE test_sql.test10
VALUES
	( '2'.'the'.44 );
INSERT INTO TABLE test_sql.test10
VALUES
	( '2'.'10'.66 );
INSERT INTO TABLE test_sql.test10
VALUES
	( '2'.'45'.80 );
INSERT INTO TABLE test_sql.test10
VALUES
	( '2'.'78'.98 );
Copy the code

3, query SQL

SELECT
	dist_id,
	account,
	gold 
FROM
	( SELECT dist_id, account, gold, row_number(a)over ( PARTITION BY dist_id ORDER BY gold DESC ) rank FROM test_sql.test10 ) t 
WHERE
	rank < = 10;
Copy the code

4. Execution results

The last

The above ten HiveSQL proficiency can effectively prevent the interview by hanging, but from hanging the strength of the interviewer, also need to classic 100 Hive questions plus training, will continue to create the next classic 100 Hive questions hanging interviewer series, strongly suggest collection plus attention!


  • 📢 : lansonli.blog.csdn.net
  • 📢 welcome to like 👍 collect ⭐ message 📝 if there is an error please correct!
  • 📢 this article was originally written by Lansonli and originally appeared on CSDN blog 🙉
  • 📢 big data series of articles will be updated every day, stop to rest do not forget that others are still running, I hope that we seize the time to learn, strive for a better life ✨