SQL SERVER periodically transfers massive data solutions

【 Key words 】 1, partition switching 2, BCP 3, BULK INSERT 4, INSERT

【 Background 】 There is a huge table, about 5 million records are added every day. Only the latest 7 days of data are retained, and the past period records are periodically moved every day and merged into the history database. The INSERT and DELETE methods take longer and longer, reaching an average of 45 minutes, during which the table and history library cannot be accessed at all.

[Solution] 1. Partition switching is adopted to quickly clear old data and partition the table on a daily basis, with one zone per day. For example, today is August 8, 2010, CREATE PARTITION FUNCTION [CrPFN](smalldatetime) AS RANGE LEFT FOR VALUES (N’ 2010-08-01t00:00:00.000 ‘, N ‘2010-08-02 T00:00:00. 000’, N ‘2010-08-03 T00:00:00. 000’, N ‘2010-08-04 T00:00:00. 000’, N ‘2010-08-05 T00:00:00. 000’, N ‘2010-08-06 T00:00:00. 000’, N ‘2010-08-07 T00:00:00. 000’, N ‘2010-08-08 T00:00:00. 000′, N’2010-08-09T00:00:00.000’) GO — CREATE PARTITION SCHEME [crSCM] AS PARTITION [CrPFN] TO ([CR0], [CR1], [CR2], [cr3], [cr4], [cr5], [cr6], [cr7], [cr8], [cr9]) GO

Using this partition scheme, the data is divided by day and distributed in [CR0]… [cr9] The 10 file groups. Create a non-partitioned table from each of the 10 file groups using the same structure as the giant table: [Temp_0]… [Temp_9], one on each file group.

Partition switching steps: 1) Partition switch, switch the oldest partition data to a single table of the same file group (seconds kill, millions of records are completed instantly, Day2later SMALLDATETIME — 2 days after the current date ALTER TABLE [BigTable] SWITCH PARTITION $PARTITION.CrPFN(@day7ago) TO [Temp_0];

2) Merge the oldest partition with the second oldest partition, that is, merge the partition 8 days ago with the partition 7 days ago. Since the data in the partition was switched and empty 8 days ago, the merge was completed instantly. ALTER PARTITION FUNCTION CrPFN() MERGE RANGE (CONVERT(VARCHAR(10),@day7ago,120));

3) Modify the partition scheme to insert the file group corresponding to the original oldest partition to the end of the partition queue for reuse. ALTER PARTITION SCHEME [CrSCM] NEXT USED [cr0];

ALTER PARTITION FUNCTION CrPFN() SPLIT RANGE (CONVERT(VARCHAR(10),@day2later,120)); Since the partition with the latest date does not have data yet (because it is a future date), the split is also instantaneous. This is why I have 10 partitions instead of 7, always moving the boundary two days into the future.

5) How to merge the data moved to a single table into the history library is the next topic.

TRUNCATE TABLE [Temp_0]; Summary: 1) Always have 10 partitions and recycle. When the oldest partition is merged out, it is immediately added to the end of the partition queue. 2) The key is to figure out which file group the partition is in so that you know which single table to switch to for automatic execution in the job. See my book for this method:

Check which file group the partition is in

3) Partition table and single table must be aligned.

The first is that the structure must be consistent. Such as the field type, whether it can be null, and so on.

Then there is the index. A single table must have the same index as a partitioned table.

Note that indexes in a partitioned table must all be partitioned, whether clustered or non-clustered, if the method described in this article is to be applied. The partition function corresponding to the partition scheme of the table should be the same as that of the table. The partition scheme can be different.

Partitioned tables and single tables, including indexes, must also be compressed or not.

Export the database to a text file using BCP, and then use BULK INSERT. DECLARE @shell VARCHAR(500); SET @shell = ‘bcp [myDb].dbo.[‘ + @Table + ‘] out c:/data.txt -T -n’; EXEC xp_cmdshell @shell; BULK INSERT [History].dbo.[Table1] FROM ‘c:/data.txt’; EXEC xp_cmdshell ‘del c:/data.txt’;

Summary: 1) BCP export is fast. Four million records in 20 seconds. 2) BULK INSERT claims to import data quickly, but I haven’t seen any results here. If the target table is empty, 4 million pieces of data can be imported successfully in 1 and a half minutes. But my history table more than 500 million records, two indexes, more than 20 minutes no response, slower than direct INSERT (INSERT about 17 minutes), I lost patience, directly press cancel. 3) In the 20 minutes that BULK INSERT ran, over 1 million inserts were performed, which was unacceptably slower than with direct INSERTS. I guess it’s because of the index. If you delete all the indexes, it should be faster, right? But it will take longer to rebuild. What is the mechanism for BULK INSERT, which is slower than inserts? One user said that it would be processed in tempDB and then inserted into the source table, but I looked at the system tempDB during the run and found that its file did not change. 4) Failure. Ask the master to give advice. \

SQL SERVER periodically transfers massive data solutions

Check which file group the partition is in

Related Posts

【 algorithm Exercise 09】617. Merge binary tree — Recursive

Docker’s feud with K8S (part 3

Have like permission system