Does database batch insert pay so much attention to?

Recently, the new project has written a lot of various insertBatch code, it has been said that batch insertion is much more efficient than cyclic insertion, so this article to experiment, is it true?

Test environment:

SpringBoot 2.5
Mysql 8
JDK 8
Docker

First, multiple data inserts, optional schemes:

foreachCircular insert
Joining togethersql, one execution
Insert using the batch function

Build test environment ‘

SQL file:

drop database IF EXISTS test;
CREATE DATABASE test;
use test;
DROP TABLE IF EXISTS `user`;
CREATE TABLE `user` (
  `id` int(11) NOT NULL,
  `name` varchar(255) DEFAULT "",
  `age` int(11) DEFAULT 0.PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Copy the code

Application configuration file:

server:
  port: 8081
spring:
Database connection configuration
  datasource:
    driver-class-name: com.mysql.cj.jdbc.Driver
    url: JDBC: mysql: / / 127.0.0.1:3306 / test? characterEncoding=utf-8&useSSL=false&allowPublicKeyRetrieval=true&&serverTimezone=UTC&setUnicode=true&characterEncoding= utf8&&nullCatalogMeansCurrent=true&&autoReconnect=true&&allowMultiQueries=true
    username: root
    password: 123456
# Mybatis configuration
mybatis:
Mapper configuration file
  mapper-locations: classpath:mapper/*.xml
  type-aliases-package: com.aphysia.spingbootdemo.model
# Enable hump naming
  configuration:
    map-underscore-to-camel-case: true
logging:
  level:
    root: error

Copy the code

Start file, configure Mapper file scan path:

import org.mybatis.spring.annotation.MapperScan;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.EnableAutoConfiguration;
import org.springframework.boot.autoconfigure.SpringBootApplication;

@SpringBootApplication
@MapperScan("com.aphysia.springdemo.mapper")
public class SpringdemoApplication {

    public static void main(String[] args) { SpringApplication.run(SpringdemoApplication.class, args); }}Copy the code

The Mapper file contains several methods for inserting a single object, deleting all objects, and concatenating multiple objects:

import com.aphysia.springdemo.model.User;
import org.apache.ibatis.annotations.Param;

import java.util.List;

public interface UserMapper {

    int insertUser(User user);

    int deleteAllUsers(a);


    int insertBatch(@Param("users") List<User>users);
}
Copy the code

The mapper. XML file is as follows:


      
<! DOCTYPEmapper PUBLIC "- / / mybatis.org//DTD Mapper / 3.0 / EN" "http://mybatis.org/dtd/mybatis-3-mapper.dtd" >
<mapper namespace="com.aphysia.springdemo.mapper.UserMapper">
    <insert id="insertUser" parameterType="com.aphysia.springdemo.model.User">
        insert  into user(id,age) values(#{id},#{age})
    </insert>

    <delete id="deleteAllUsers">
        delete from user where id>0;
    </delete>

    <insert id="insertBatch" parameterType="java.util.List">
        insert into user(id,age) VALUES
        <foreach collection="users" item="model" index="index" separator=",">
            (#{model.id}, #{model.age})
        </foreach>
    </insert>
</mapper>
Copy the code

During the test, we deleted all the data for each operation to ensure that the test was objective and not affected by the previous data.

Different tests

1. Insert foreach

Get the list first, then perform a database operation for each item of data, insert data:

@SpringBootTest
@MapperScan("com.aphysia.springdemo.mapper")
class SpringdemoApplicationTests {

    @Autowired
    SqlSessionFactory sqlSessionFactory;

    @Resource
    UserMapper userMapper;

    static int num = 100000;

    static int id = 1;

    @Test
    void insertForEachTest(a) {
        List<User> users = getRandomUsers();
        long start = System.currentTimeMillis();
        for (int i = 0; i < users.size(); i++) {
            userMapper.insertUser(users.get(i));
        }
        long end = System.currentTimeMillis();
        System.out.println("time:"+ (end - start)); }}Copy the code

2. Concatenate SQL inserts

Insert data as follows:

INSERT INTO `user` (`id`, `age`) 
VALUES (1.11),
(2.12),
(3.13),
(4.14),
(5.15);
Copy the code

    @Test
    void insertSplicingTest(a) {
        List<User> users = getRandomUsers();
        long start = System.currentTimeMillis();
        userMapper.insertBatch(users);
        long end = System.currentTimeMillis();
        System.out.println("time:" + (end - start));
    }
Copy the code

3. Use Batch to insert data in batches

MyBatis Session executor Type = Batch, sqlSessionFactory = Batch, auto commit = false, commit = Batch

    @Test
    public void insertBatch(a){
        SqlSession sqlSession = sqlSessionFactory.openSession(ExecutorType.BATCH, false);
        UserMapper mapper = sqlSession.getMapper(UserMapper.class);
        List<User> users = getRandomUsers();
        long start = System.currentTimeMillis();
        for(int i=0; i<users.size(); i++){ mapper.insertUser(users.get(i)); } sqlSession.commit(); sqlSession.close();long end = System.currentTimeMillis();
        System.out.println("time:" + (end - start));
    }
Copy the code

4. Batch processing + Batch submission

On a batch basis, every 1000 pieces of data are submitted first, that is, batch submitted.

    @Test
    public void insertBatchForEachTest(a){
        SqlSession sqlSession = sqlSessionFactory.openSession(ExecutorType.BATCH, false);
        UserMapper mapper = sqlSession.getMapper(UserMapper.class);
        List<User> users = getRandomUsers();
        long start = System.currentTimeMillis();
        for(int i=0; i<users.size(); i++){ mapper.insertUser(users.get(i));if (i % 1000= =0 || i == num - 1) {
                sqlSession.commit();
                sqlSession.clearCache();
            }
        }
        sqlSession.close();
        long end = System.currentTimeMillis();
        System.out.println("time:" + (end - start));
    }


Copy the code

The first result was obviously wrong, right?

Running the above code, we can get the following result: for loop inserts are really inefficient, the efficiency of concatenated SQL is relatively high, see some data that concatenated SQL may be limited by mysql, but I did not see heap overflow until 1000W.

Incorrect result below!!

Insert the way	10	100	1000	1w	10w	100w	1000w
For loop insert	387	1150	7907	70026	635984	For too long…	For too long…
Spliced SQL insert	308	320	392	838	3156	24948	OutOfMemoryError: The heap memory overflows
The batch	392	917	5442	51647	470666	For too long…	For too long…
Batch + Batch commit	359	893	5275	50270	472462	For too long…	For too long…

Concatenated SQL does not run out of memory

Let’s look at mysql’s limitations:

mysql> show VARIABLES like '%max_allowed_packet%'; +---------------------------+------------+ | Variable_name | Value | +---------------------------+------------+ | max_allowed_packet | 67108864 | | mysqlx_max_allowed_packet | 67108864 | | slave_max_allowed_packet | 1073741824 | + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- + 3 rows in the set (0.12 SEC)Copy the code

This 67108864 is more than 600 meters, too big, no wonder there is no error, then we go to change it, after changing the test again:

First of all in the bootmysqlIn the case of entering the container, it can also be directly inDockerDesktop version directly clickCliIcon entry:

docker exec -it mysql bash
Copy the code

Enter the/etc/mysqlDirectory, go modifymy.cnfFile:

cd /etc/mysql
Copy the code

According to the firstvim, the file cannot be edited without:

apt-get update
apt-get install vim
Copy the code

Modify themy.cnf

vim my.cnf
Copy the code

Add on the last linemax_allowed_packet=20M(byiEdit, edit and pressesc, the input:wqExit)

 
[mysqld]
pid-file        = /var/run/mysqld/mysqld.pid
socket          = /var/run/mysqld/mysqld.sock
datadir         = /var/lib/mysql
secure-file-priv= NULL
# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0
 
# Custom config should go here! includedir /etc/mysql/conf.d/ max_allowed_packet=2MCopy the code

Out of the container

# exit
Copy the code

To viewmysqlThe containerid

docker ps -a
Copy the code

restartmysql

docker restart c178e8998e68
Copy the code

Max_allowed_pactet = max_allowed_pacTET

mysql> show VARIABLES like '%max_allowed_packet%';
+---------------------------+------------+
| Variable_name             | Value      |
+---------------------------+------------+
| max_allowed_packet        | 2097152    |
| mysqlx_max_allowed_packet | 67108864   |
| slave_max_allowed_packet  | 1073741824 |
+---------------------------+------------+
Copy the code

SQL = 3.6m when 100W is used, exceeding 2M.

org.springframework.dao.TransientDataAccessResourceException: 
### Cause: com.mysql.cj.jdbc.exceptions.PacketTooBigException: Packet for query is too large (36,788,583 > 2,097,152). You can change this value on the server by setting the 'max_allowed_packet' variable.
; Packet for query is too large (36,788,583 > 2,097,152). You can change this value on the server by setting the 'max_allowed_packet' variable.; nested exception is com.mysql.cj.jdbc.exceptions.PacketTooBigException: Packet for query is too large (36,788,583 > 2,097,152). You can change this value on the server by setting the 'max_allowed_packet' variable.

Copy the code

Why is batch processing so slow?

However, if you look closely, you will see that the above method does not show any advantage in batch processing. It is no different from the for loop. Is that right?

This is definitely not true. From the official documentation, we can see that it will be batch updated, instead of creating pre-processed statements every time, the theory is faster.

Then I found one of my most important problems: the database connection URL was missing rewriteBatchedStatements=true

If we do not write, the MySQL JDBC driver ignores executeBatch() statements by default. We expect a batch of SQL statements to be broken up, but when executed, they are sent to the MySQL database one by one, which is actually a single insert, resulting in poor performance. I said how performance is similar to loop to insert data.

If rewriteBatchedStatements is set to true, the database driver will execute the SQL in batches.

Correct database connection:

JDBC: mysql: / / 127.0.0.1:3306 / test? characterEncoding=utf-8&useSSL=false&allowPublicKeyRetrieval=true&&serverTimezone=UTC&setUnicode=true&characterEncoding= utf8&&nullCatalogMeansCurrent=true&&autoReconnect=true&&allowMultiQueries=true&&&rewriteBatchedStatements=trueCopy the code

After finding the problem, we re-tested the batch test and the final result was as follows:

Insert the way	10	100	1000	1w	10w	100w	1000w
For loop insert	387	1150	7907	70026	635984	For too long…	For too long…
Spliced SQL insert	308	320	392	838	3156	24948 (SQL length limit probably exceeded)	OutOfMemoryError: The heap memory overflows
Batch processing (emphasis)	333	323	362	636	1638	8978	OutOfMemoryError: The heap memory overflows
Batch + Batch commit	359	313	394	630	2907	18631	OutOfMemoryError: The heap memory overflows

From the above results, the batch is much faster, when the order of magnitude too large, it will be more than memory, batch and batch submission and no faster, and batch is similar, but slow, submitted too many times, joining together the SQL solution in less time actually and batch number were similar, The worst solution is to insert data through a for loop, which can be extremely time-consuming. At 100, you already need 1s, so you can’t do that.

At the beginning, WHEN I found that batch processing was slow, I really doubted myself. Later, I found that there was a parameter, and I felt that the more I knew, the more I didn’t know.

[Author profile] : Qin Huai, public number [Qin Huai Grocery store] author, the road of technology is not at that time, mountain high water long, even slow, chi and not stop.

Offer all questions in PDF

What did I write about 2020?

Open Source Programming Notes