Introduction: This article focuses on how to update data through insert Overwrite

background

For most storage formats in big data, supporting random updates is complex. Transactional tables can support UPDATE and DELETE statements. However, update and DELETE functions are not suitable for frequent updates, data deletion, or real-time writes to target tables. Update and DELETE cannot be performed for non-transactional tables. This article focuses on how to update data through insert Overwrite.

1. Create a table and insert data

create table update_table(ID int,
 tranValue string,
 last_update_user string) PARTITIONED by(dt STRING ) LIFECYCLE 1;
INSERT INTO update_table PARTITION (dt="20210510") VALUES
(1, 'value_01', 'creation'),
(2, 'value_02', 'creation'),
(3, 'value_03', 'creation'),
(4, 'value_04', 'creation'),
(5, 'value_05', 'creation'),
(6, 'value_06', 'creation'),
(7, 'value_07', 'creation'),
(8, 'value_08', 'creation'),
(9, 'value_09', 'creation'),
(10, 'value_10','creation');
Copy the code

2. Update a data

When the id is 1, it is updated to value_011

INSERT OVERWRITE TABLE update_table PARTITION(dt) SELECT ID,CASE WHEN ID =1 THEN "value_011" ELSE TranValue END TranValue ,last_update_user ,dt FROM update_table WHERE dt = "20210510" ;Copy the code

3. Update multiple pieces of data

According to the delta table update, the delta table insert data is first created

create table update_table_inc(ID int,
 TranValue string,
 last_update_user string) LIFECYCLE 1;
INSERT INTO update_table_inc VALUES 
(5, 'value_11', 'creation'),
(6, NULL, '20170410'),
(7, 'value22', '20170413');
Copy the code

Update TranValue with id 5 and id 7, TranValue 6 is null

INSERT OVERWRITE TABLE update_table PARTITION( dt)
SELECT  a.id
        ,CASE    WHEN a.id=b.id  and b.TranValue is not null THEN b.TranValue 
                 ELSE a.TranValue 
         END TranValue
        ,CASE    WHEN a.id=b.id and b.TranValue is not null THEN b.last_update_user 
                 ELSE a.last_update_user 
         END last_update_user
         ,dt
FROM    update_table a
LEFT JOIN update_table_inc b
ON      a.id = b.id
WHERE   a.dt = "20210510"
;
Copy the code

4. Delete data

INSERT OVERWRITE TABLE update_table PARTITION(dt) SELECT * FROM update_table WHERE dt = "20210510" and id! = 4;Copy the code

The original link

This article is the original content of Aliyun and shall not be reproduced without permission.