Recently, the new project has written a lot of various insertBatch code, it has been said that batch insertion is much more efficient than cyclic insertion, so this article to experiment, is it true?

Test environment:

  • SpringBoot 2.5
  • Mysql 8
  • JDK 8
  • Docker

First, multiple data inserts, optional schemes:

  • foreachCircular insert
  • Joining togethersql, one execution
  • Insert using the batch function

Build test environment ‘

SQL file:

drop database IF EXISTS test;
CREATE DATABASE test;
use test;
DROP TABLE IF EXISTS `user`;
CREATE TABLE `user` (
  `id` int(11) NOT NULL,
  `name` varchar(255) DEFAULT "",
  `age` int(11) DEFAULT 0.PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Copy the code

Application configuration file:

server:
  port: 8081
spring:
Database connection configuration
  datasource:
    driver-class-name: com.mysql.cj.jdbc.Driver
    url: JDBC: mysql: / / 127.0.0.1:3306 / test? characterEncoding=utf-8&useSSL=false&allowPublicKeyRetrieval=true&&serverTimezone=UTC&setUnicode=true&characterEncoding= utf8&&nullCatalogMeansCurrent=true&&autoReconnect=true&&allowMultiQueries=true
    username: root
    password: 123456
# Mybatis configuration
mybatis:
Mapper configuration file
  mapper-locations: classpath:mapper/*.xml
  type-aliases-package: com.aphysia.spingbootdemo.model
# Enable hump naming
  configuration:
    map-underscore-to-camel-case: true
logging:
  level:
    root: error

Copy the code

Start file, configure Mapper file scan path:

import org.mybatis.spring.annotation.MapperScan;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.EnableAutoConfiguration;
import org.springframework.boot.autoconfigure.SpringBootApplication;

@SpringBootApplication
@MapperScan("com.aphysia.springdemo.mapper")
public class SpringdemoApplication {

    public static void main(String[] args) { SpringApplication.run(SpringdemoApplication.class, args); }}Copy the code

The Mapper file contains several methods for inserting a single object, deleting all objects, and concatenating multiple objects:

import com.aphysia.springdemo.model.User;
import org.apache.ibatis.annotations.Param;

import java.util.List;

public interface UserMapper {

    int insertUser(User user);

    int deleteAllUsers(a);


    int insertBatch(@Param("users") List<User>users);
}
Copy the code

The mapper. XML file is as follows:


      
<! DOCTYPEmapper PUBLIC "- / / mybatis.org//DTD Mapper / 3.0 / EN" "http://mybatis.org/dtd/mybatis-3-mapper.dtd" >
<mapper namespace="com.aphysia.springdemo.mapper.UserMapper">
    <insert id="insertUser" parameterType="com.aphysia.springdemo.model.User">
        insert  into user(id,age) values(#{id},#{age})
    </insert>

    <delete id="deleteAllUsers">
        delete from user where id>0;
    </delete>

    <insert id="insertBatch" parameterType="java.util.List">
        insert into user(id,age) VALUES
        <foreach collection="users" item="model" index="index" separator=",">
            (#{model.id}, #{model.age})
        </foreach>
    </insert>
</mapper>
Copy the code

During the test, we deleted all the data for each operation to ensure that the test was objective and not affected by the previous data.

Different tests

1. Insert foreach

Get the list first, then perform a database operation for each item of data, insert data:

@SpringBootTest
@MapperScan("com.aphysia.springdemo.mapper")
class SpringdemoApplicationTests {

    @Autowired
    SqlSessionFactory sqlSessionFactory;

    @Resource
    UserMapper userMapper;

    static int num = 100000;

    static int id = 1;

    @Test
    void insertForEachTest(a) {
        List<User> users = getRandomUsers();
        long start = System.currentTimeMillis();
        for (int i = 0; i < users.size(); i++) {
            userMapper.insertUser(users.get(i));
        }
        long end = System.currentTimeMillis();
        System.out.println("time:"+ (end - start)); }}Copy the code

2. Concatenate SQL inserts

Insert data as follows:

INSERT INTO `user` (`id`, `age`) 
VALUES (1.11),
(2.12),
(3.13),
(4.14),
(5.15);
Copy the code
    @Test
    void insertSplicingTest(a) {
        List<User> users = getRandomUsers();
        long start = System.currentTimeMillis();
        userMapper.insertBatch(users);
        long end = System.currentTimeMillis();
        System.out.println("time:" + (end - start));
    }
Copy the code

3. Use Batch to insert data in batches

MyBatis Session executor Type = Batch, sqlSessionFactory = Batch, auto commit = false, commit = Batch

    @Test
    public void insertBatch(a){
        SqlSession sqlSession = sqlSessionFactory.openSession(ExecutorType.BATCH, false);
        UserMapper mapper = sqlSession.getMapper(UserMapper.class);
        List<User> users = getRandomUsers();
        long start = System.currentTimeMillis();
        for(int i=0; i<users.size(); i++){ mapper.insertUser(users.get(i)); } sqlSession.commit(); sqlSession.close();long end = System.currentTimeMillis();
        System.out.println("time:" + (end - start));
    }
Copy the code

4. Batch processing + Batch submission

On a batch basis, every 1000 pieces of data are submitted first, that is, batch submitted.

    @Test
    public void insertBatchForEachTest(a){
        SqlSession sqlSession = sqlSessionFactory.openSession(ExecutorType.BATCH, false);
        UserMapper mapper = sqlSession.getMapper(UserMapper.class);
        List<User> users = getRandomUsers();
        long start = System.currentTimeMillis();
        for(int i=0; i<users.size(); i++){ mapper.insertUser(users.get(i));if (i % 1000= =0 || i == num - 1) {
                sqlSession.commit();
                sqlSession.clearCache();
            }
        }
        sqlSession.close();
        long end = System.currentTimeMillis();
        System.out.println("time:" + (end - start));
    }


Copy the code

The first result was obviously wrong, right?

Running the above code, we can get the following result: for loop inserts are really inefficient, the efficiency of concatenated SQL is relatively high, see some data that concatenated SQL may be limited by mysql, but I did not see heap overflow until 1000W.

Incorrect result below!!

Insert the way 10 100 1000 1w 10w 100w 1000w
For loop insert 387 1150 7907 70026 635984 For too long… For too long…
Spliced SQL insert 308 320 392 838 3156 24948 OutOfMemoryError: The heap memory overflows
The batch 392 917 5442 51647 470666 For too long… For too long…
Batch + Batch commit 359 893 5275 50270 472462 For too long… For too long…

Concatenated SQL does not run out of memory

Let’s look at mysql’s limitations:

mysql> show VARIABLES like '%max_allowed_packet%'; +---------------------------+------------+ | Variable_name | Value | +---------------------------+------------+ | max_allowed_packet | 67108864 | | mysqlx_max_allowed_packet | 67108864 | | slave_max_allowed_packet | 1073741824 | + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- + 3 rows in the set (0.12 SEC)Copy the code

This 67108864 is more than 600 meters, too big, no wonder there is no error, then we go to change it, after changing the test again:

  1. First of all in the bootmysqlIn the case of entering the container, it can also be directly inDockerDesktop version directly clickCliIcon entry:
docker exec -it mysql bash
Copy the code
  1. Enter the/etc/mysqlDirectory, go modifymy.cnfFile:
cd /etc/mysql
Copy the code
  1. According to the firstvim, the file cannot be edited without:
apt-get update
apt-get install vim
Copy the code
  1. Modify themy.cnf
vim my.cnf
Copy the code
  1. Add on the last linemax_allowed_packet=20M(byiEdit, edit and pressesc, the input:wqExit)
 
[mysqld]
pid-file        = /var/run/mysqld/mysqld.pid
socket          = /var/run/mysqld/mysqld.sock
datadir         = /var/lib/mysql
secure-file-priv= NULL
# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0
 
# Custom config should go here! includedir /etc/mysql/conf.d/ max_allowed_packet=2MCopy the code
  1. Out of the container
# exit
Copy the code
  1. To viewmysqlThe containerid
docker ps -a
Copy the code

  1. restartmysql
docker restart c178e8998e68
Copy the code

Max_allowed_pactet = max_allowed_pacTET

mysql> show VARIABLES like '%max_allowed_packet%';
+---------------------------+------------+
| Variable_name             | Value      |
+---------------------------+------------+
| max_allowed_packet        | 2097152    |
| mysqlx_max_allowed_packet | 67108864   |
| slave_max_allowed_packet  | 1073741824 |
+---------------------------+------------+
Copy the code

SQL = 3.6m when 100W is used, exceeding 2M.

org.springframework.dao.TransientDataAccessResourceException: 
### Cause: com.mysql.cj.jdbc.exceptions.PacketTooBigException: Packet for query is too large (36,788,583 > 2,097,152). You can change this value on the server by setting the 'max_allowed_packet' variable.
; Packet for query is too large (36,788,583 > 2,097,152). You can change this value on the server by setting the 'max_allowed_packet' variable.; nested exception is com.mysql.cj.jdbc.exceptions.PacketTooBigException: Packet for query is too large (36,788,583 > 2,097,152). You can change this value on the server by setting the 'max_allowed_packet' variable.

Copy the code

Why is batch processing so slow?

However, if you look closely, you will see that the above method does not show any advantage in batch processing. It is no different from the for loop. Is that right?

This is definitely not true. From the official documentation, we can see that it will be batch updated, instead of creating pre-processed statements every time, the theory is faster.

Then I found one of my most important problems: the database connection URL was missing rewriteBatchedStatements=true

If we do not write, the MySQL JDBC driver ignores executeBatch() statements by default. We expect a batch of SQL statements to be broken up, but when executed, they are sent to the MySQL database one by one, which is actually a single insert, resulting in poor performance. I said how performance is similar to loop to insert data.

If rewriteBatchedStatements is set to true, the database driver will execute the SQL in batches.

Correct database connection:

JDBC: mysql: / / 127.0.0.1:3306 / test? characterEncoding=utf-8&useSSL=false&allowPublicKeyRetrieval=true&&serverTimezone=UTC&setUnicode=true&characterEncoding= utf8&&nullCatalogMeansCurrent=true&&autoReconnect=true&&allowMultiQueries=true&&&rewriteBatchedStatements=trueCopy the code

After finding the problem, we re-tested the batch test and the final result was as follows:

Insert the way 10 100 1000 1w 10w 100w 1000w
For loop insert 387 1150 7907 70026 635984 For too long… For too long…
Spliced SQL insert 308 320 392 838 3156 24948 (SQL length limit probably exceeded) OutOfMemoryError: The heap memory overflows
Batch processing (emphasis) 333 323 362 636 1638 8978 OutOfMemoryError: The heap memory overflows
Batch + Batch commit 359 313 394 630 2907 18631 OutOfMemoryError: The heap memory overflows

From the above results, the batch is much faster, when the order of magnitude too large, it will be more than memory, batch and batch submission and no faster, and batch is similar, but slow, submitted too many times, joining together the SQL solution in less time actually and batch number were similar, The worst solution is to insert data through a for loop, which can be extremely time-consuming. At 100, you already need 1s, so you can’t do that.

At the beginning, WHEN I found that batch processing was slow, I really doubted myself. Later, I found that there was a parameter, and I felt that the more I knew, the more I didn’t know.

[Author profile] : Qin Huai, public number [Qin Huai Grocery store] author, the road of technology is not at that time, mountain high water long, even slow, chi and not stop.

Offer all questions in PDF

What did I write about 2020?

Open Source Programming Notes