Hive3.X will be released in 2020, but for many enterprises it is still 1.x, even in version 0.x. Even so, it does not stop students from learning. Scheduled Queries hive3.x has been Scheduled for release 4.0. All of the hive3.x features have been Scheduled. This paper explains the supported scenarios, principles, syntax usage and demo cases one by one.

introduce

The Scheduled Queries feature is actually the scheduling feature we’re using today. Of course, hive 4.0 applies the following scheduled execution plan scenarios: 1. Periodically pull data from external systems and store it to Hive. 2. Periodically update column statistics. Refactoring the materialized view

The principle of

1. Users define execution plans and store them in MetaStore. 2.

grammar

-- Create a scheduled schedule
CREATE SCHEDULED QUERY <scheduled_query_name>
     <scheduleSpecification> -- Specific scheduling policies
     [<executedAsSpec> ] Executor, executed by default as the current user
     [<enableSpecification>] -- Enabled or not
     <definedAsSpec> -- Specific query plan definition
     
-- Modify scheduled scheduling
ALTER SCHEDULED QUERY <scheduled_query_name>
     (<scheduleSpec>|<executedAsSpec>|<enableSpecification>|<definedAsSpec>|<executeSpec>);
     
-- Delete scheduled scheduling
DROP SCHEDULED QUERY <scheduled_query_name>;

Copy the code

ScheduleSpecification syntax: 1. Based on CRON. For example, CRON ‘0 */10 * * *? * ‘once EVERY 10 minutes to perform 2, based on EVERY EVERY (integer) (SECOND | MINUTE | HOUR) [(OFFSET BY | AT) timeOrDate] such as: EVERY 2 MINUTES EVERY HOUR AT ‘0:07:30′ EVERY DAY AT ’11:35:30’ EXECUTED AS < user_name > for enableSpecification grammar use: (ENABLE [D] | DISABLE [D]) for the Defined AS the syntax of use: [DEFINED] AS The syntax for executeSpec is AS follows: EXECUTE: Sets the next execution time to now. Usually used for debugging

Case 1: Scheduled query

create table t (a integer);

-- create a scheduled query; every 10 minute insert a new row
create scheduled query sc1 cron '0 */10 * * * ? *' as insert into t values (1);
-- depending on hive.scheduled.queries.create.as.enabled the query might get create in disabled mode
-- it can be enabled using:
alter scheduled query sc1 enabled;

-- inspect scheduled queries using the information_schema
select * from information_schema.scheduled_queries s where schedule_name='sc1';
+-----------------------+------------------+------------+----------------------+-------------------+---------+----------- +----------------------+
| s.scheduled_query_id  | s.schedule_name  | s.enabled  | s.cluster_namespace  |    s.schedule     | s.user  |  s.query  |   s.next_execution   |
+-----------------------+------------------+------------+----------------------+-------------------+---------+----------- +----------------------+
| 1                     | sc1              | true       | hive                 | 0 */10 * * * ? *  | dev     | select 1  | 2020.- 03 15:10:00  |
+-----------------------+------------------+------------+----------------------+-------------------+---------+----------- +----------------------+

-- wait 10 minutes or execute by issuing:
alter scheduled query sc1 execute;

select * from information_schema.scheduled_executions s where schedule_name='sc1' order by scheduled_execution_id desc limit 1;
+---------------------------+------------------+----------------------------------------------------+-----------+-------- --------------+----------------------+------------+------------------+---------------------+
| s.scheduled_execution_id  | s.schedule_name  |                s.executor_query_id                 |  s.state  |     s.start_time     |      s.end_time      | s.elapsed  | s.error_message  | s.last_update_time  |
+---------------------------+------------------+----------------------------------------------------+-----------+-------- --------------+----------------------+------------+------------------+---------------------+
| 496                       | sc1              | dev_20200203152025_bdf3deac0ca6- 407.f-b122-c637e50f99c8 | FINISHED  | 2020.- 03 15:20:23  | 2020.- 03 15:20:31  | 8          | NULL             | NULL                |
+---------------------------+------------------+----------------------------------------------------+-----------+-------- --------------+----------------------+------------+------------------+---------------------+

Copy the code

Case 2: Refactoring the materialized view

-- Parameter Settings
set hive.support.concurrency=true;
set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
set hive.strict.checks.cartesian.product=false;
set hive.stats.fetch.column.stats=true;
set hive.materializedview.rewriting=true;

- create a table
CREATE TABLE emps (
empid INT,
deptno INT,
name VARCHAR(256),
salary FLOAT,
hire_date TIMESTAMP)
STORED AS ORC
TBLPROPERTIES ('transactional'='true');

CREATE TABLE depts (
deptno INT,
deptname VARCHAR(256),
locationid INT)
STORED AS ORC
TBLPROPERTIES ('transactional'='true');

Load data
insert into emps values (100.10.'Bill'.10000.1000), (200.20.'Eric'.8000.500),
(150.10.'Sebastian'.7000.null), (110.10.'Theodore'.10000.250), (120.10.'Bill'.10000.250),
(1330.10.'Bill'.10000.'2020-01-02');
insert into depts values (10.'Sales'.10), (30.'Marketing'.null), (20.'HR'.20);

insert into emps values (1330.10.'Bill'.10000.'2020-01-02');

Create materialized attempts
CREATE MATERIALIZED VIEW mv1 AS
SELECT empid, deptname, hire_date FROM emps
  JOIN depts ON (emps.deptno = depts.deptno)
  WHERE hire_date > = '2016-01-01 00:00:00';

Create a scheduled execution plan
create scheduled query mv_rebuild cron '0 */10 * * * ? *' defined as
alter materialized view mv1 rebuild;

Insert a new record
insert into emps values (1330.10.'Bill'.10000.'2020-01-02');

-- Check the result
SELECT empid, deptname FROM emps
JOIN depts ON (emps.deptno = depts.deptno)
WHERE hire_date > = '2018-01-01';

-- Wait 10 minutes or execute now
alter scheduled query mv_rebuild execute;

-- Check the results again
SELECT empid, deptname FROM emps
JOIN depts ON (emps.deptno = depts.deptno)
WHERE hire_date > = '2018-01-01';
Copy the code

Schedule Enable Disable

- to enable
set hive.scheduled.queries.create.as.enabled=true;

-- Enable the execution plan
alter scheduled query sc1 enabled;

Disabled -
set hive.scheduled.queries.create.as.enabled=false;
Copy the code

Metadata storage

The retention policy to implement information through metastore. Scheduled. Queries. Execution. Max. The age parameter configuration. Information_schema. scheduled_queries: specifies the scheduling plan definition information

2,information_schema.scheduled_executions: Indicates the schedule execution information