This article will take you step by step from a novice’s point of view to slowly uncover the mystery of batch processing, for the first time to write Mybatis batch processing students may have a great help, suggest collection likes ~

There are many ways to handle BATCH processing, and the advantages and disadvantages of each method are not analyzed here. The usage of BATCH is summarized, and if there are any mistakes, you can correct them.

Question why

When the company is writing a project, there is an automatic reconciliation requirement, which needs to read tens of thousands of data from the file and insert it into the database, which may rise to hundreds of thousands with the growth of the business, so the insertion needs to carry out batch operation, let’s see how I step by step.

A quick look at the secret behind batch processing, BatchExecutor

Batch processing is another means of optimization in JDBC programming. When JDBC executes SQL statements, SQL statements and arguments are sent to the database in the form of network requests. Executing one SQL statement at a time reduces the payload of the request package and increases the time spent in network communication. Using batch processing, you can cache multiple SQL statements on the JDBC client and then package them to the database for execution when flush or full, thereby reducing both losses and improving system performance.

There is a limit to the number of SQL statements that can be sent to the database each time. If the number of SQL statements that can be sent to the database exceeds this limit, the database will throw an exception and refuse to execute the SQL statements. Therefore, we need to control the number and frequency of SQL statements that can be sent to the database in batches.

Quoted from “In-depth analysis of the core principle of MyBatis” – Yang Sizheng section 18

Version 1- Born with a cry

Without further ado, the batch code already exists in the early project code, and the pseudocode looks something like this:


@Resource
privateMapper instance object of a Mapper class;private int BATCH = 1000;


  private void doUpdateBatch(Date accountDate, List< entity class > data) {
    SqlSession batchSqlSession = null;
    try {
      if (data == null || data.size() == 0) {
        return;
      }
      batchSqlSession = sqlSessionFactory.openSession(ExecutorType.BATCH, false);
      for (int index = 0; index < data.size(); Index++) {mapper instance object. Update/insert Method(accountDate, data.get(index).getorderno ());if(index ! =0 && index % BATCH == 0) {
          batchSqlSession.commit();
          batchSqlSession.clearCache();
        }
      }
      batchSqlSession.commit();
    } catch (Exception e) {
      batchSqlSession.rollback();
      log.error(e.getMessage(), e);
    } finally {
      if(batchSqlSession ! =null) { batchSqlSession.close(); }}}Copy the code

Let’s first look at some of the problems with this notation

Do you really know commit, clearCache, and flushStatements?

Let’s take a look at the explanation on the official websiteAnd then we combine that with the way we wrote it, and it will do it manually when it thinks the number of batches has reached 1000commitAnd then manuallyclearCacheLet’s have a look firstcommitWhat exactly is done, the following is the call chain

  @Override
  public void commit(a) {
    commit(false);
  }  

  @Override
  public void commit(boolean force) {
    try {
      executor.commit(isCommitOrRollbackRequired(force));
      dirty = false;
    } catch (Exception e) {
      throw ExceptionFactory.wrapException("Error committing transaction. Cause: " + e, e);
    } finally{ ErrorContext.instance().reset(); }}private boolean isCommitOrRollbackRequired(boolean force) {
    // autoCommit defaults to false, and dirty is true after calls to insert, update, or delete
    return(! autoCommit && dirty) || force; }@Override
  public void commit(boolean required) throws SQLException {
    if (closed) {
      throw new ExecutorException("Cannot commit, transaction is already closed");
    }
    clearLocalCache();
    flushStatements();
    if(required) { transaction.commit(); }}Copy the code

And we’ll see that you actually call it directlycommitIn case it has already been doneclearLocalCacheThis matter, so may not be incommitAnd then he addedclearCacheAnd,clearCacheDo you even know what he did? Put this call!!In additionflushStatementsThe function of “is explained in detail on the websiteSee here, we are looking at some counter examples, you will think 0.0 is what it is!! Mybatis ExecutorType BATCH processing mybatis ExecutorType

Not universal

Since batch processing will definitely be used in more than one place in the project, CV will be required every time you use it. Wouldn’t 0.0 be too cheesy? Could it be once and for all? Java8 interface functions are used for this purpose

Version 2- In shape

With these two problems solved, we moved to version 2 of the code, and you thought that was right? And that’s it? Don’t worry, let’s keep reading!

import lombok.extern.slf4j.Slf4j;
import org.apache.ibatis.session.ExecutorType;
import org.apache.ibatis.session.SqlSession;
import org.apache.ibatis.session.SqlSessionFactory;
import org.springframework.stereotype.Component;

import javax.annotation.Resource;
import java.util.List;
import java.util.function.ToIntFunction;

@Slf4j
@Component
public class MybatisBatchUtils {

    /** * process 1000 entries at a time */
    private static final int BATCH = 1000;

    @Resource
    private SqlSessionFactory sqlSessionFactory;

    /** * Batch process changes or inserts **@paramData Indicates the data that needs to be processed@paramFunction Custom processing logic *@returnInt Total number of rows affected */
    public  <T> int batchUpdateOrInsert(List<T> data, ToIntFunction<T> function) {
        int count = 0;
        SqlSession batchSqlSession = sqlSessionFactory.openSession(ExecutorType.BATCH);
        try {
            for (int index = 0; index < data.size(); index++) {
                count += function.applyAsInt(data.get(index));
                if(index ! =0 && index % BATCH == 0) {
                    batchSqlSession.flushStatements();
                }
            }
            batchSqlSession.commit();
        } catch (Exception e) {
            batchSqlSession.rollback();
            log.error(e.getMessage(), e);
        } finally {
            batchSqlSession.close();
        }
        returncount; }}Copy the code

Pseudocode use cases

@Resource
privateMapper instance object of a Mapper class; BatchUtils. BatchUpdateOrInsert (data collection, item - > object mapper instance. Insert method (item));Copy the code

Knock off work at this time I am cheerful, until a day or two, the teacher asked me, considered the performance of the business, the subsequent large amount of data may have more than ten a day, ask me how long will it take to get every day now, I just found out that 0.0 LiangSanWanTiao insert to 7 minutes (not exactly the problem so slowly, and reasons of Oracle insert statement. The following will be described), ha ha, laugh not live, it is simply a Bug maker, I began to think why so slow, must be batch processing did not work, I thought why did not work?

Version 3- Standard writing

We know that we mentioned the BatchExecutor Executor. We know that every SqlSession has an Executor object that is behind the execution of SQL statements. Executortype. SIMPLE is the default. Mapper objects that you can automatically inject are not batch enabled

  public Executor newExecutor(Transaction transaction, ExecutorType executorType) {
    executorType = executorType == null ? defaultExecutorType : executorType;
    executorType = executorType == null ? ExecutorType.SIMPLE : executorType;
    Executor executor;
    if (ExecutorType.BATCH == executorType) {
      executor = new BatchExecutor(this, transaction);
    } else if (ExecutorType.REUSE == executorType) {
      executor = new ReuseExecutor(this, transaction);
    } else {
      executor = new SimpleExecutor(this, transaction);
    }
    if (cacheEnabled) {
      executor = new CachingExecutor(executor);
    }
    executor = (Executor) interceptorChain.pluginAll(executor);
    return executor;
  }
Copy the code

. That is, in fact, we need through sqlSessionFactory openSession (ExecutorType. BATCH) get the sqlSession objects (at this point the Executor is BatchExecutor inside) to get a new Mapper object To take effect!!

So let’s change this generic method and pass MapperClass in as well

public class MybatisBatchUtils {
    
    /** * process 1000 entries at a time */
    private static final int BATCH_SIZE = 1000;
    
    @Resource
    private SqlSessionFactory sqlSessionFactory;
    
    /** * Batch process changes or inserts **@paramData Indicates the data that needs to be processed@paramMapperClass Mybatis Mapper class *@paramFunction Custom processing logic *@returnInt Total number of rows affected */
    public  <T,U,R> int batchUpdateOrInsert(List<T> data, Class<U> mapperClass, BiFunction<T,U,R> function) {
        int i = 1;
        SqlSession batchSqlSession = sqlSessionFactory.openSession(ExecutorType.BATCH);
        try {
            U mapper = batchSqlSession.getMapper(mapperClass);
            int size = data.size();
            for (T element : data) {
                function.apply(element,mapper);
                if ((i % BATCH_SIZE == 0) || i == size) {
                    batchSqlSession.flushStatements();
                }
                i++;
            }
            // Enforce a COMMIT in a non-transactional environment. In a transactional environment, the commit is invalidbatchSqlSession.commit(! TransactionSynchronizationManager.isSynchronizationActive()); }catch (Exception e) {
            batchSqlSession.rollback();
            log.error(e.getMessage(), e);
        } finally {
            batchSqlSession.close();
        }
        return i - 1; }}Copy the code

If it is a transactional environment, the commit force value is invalid, as shown in the screenshot above.

Use cases:

BatchUtils. BatchUpdateOrInsert (data collection, XXXXX. Class, (item mapper instance objects) - > object mapper instance. Insert method (item));Copy the code

Bonus: Oracle batch insert optimization

Mybatis Generator generates the id of the insert from the template code generated by Mybatis Generator

<selectKey keyProperty="id" order="BEFORE" resultType="java.lang.Long">
  select XXX.nextval from dual
</selectKey>
Copy the code

This is equivalent to inserting 10,000 pieces of data, which is an estimated 20,000 insert and query sequences, which takes more than 10 seconds. We switched to native Batch inserts, which took just over 500 milliseconds, or 0.5 seconds

<insert id="insert" parameterType="user">
        insert into table_name(id, username, password)
        values(SEQ_USER.NEXTVAL,#{username},#{password})
</insert>
Copy the code

Finally such a meal operation, batch processing + statement optimization, this business directly from 7 minutes to more than 10 seconds, perfect solution, spread flowers to celebrate ~