By the firstDruidDataSourceattack

the

On an ordinary working day, I completed the business code of the requirements put forward by the product as usual, and suddenly received an alarm message from the monitoring platform. Some business thought is caused by a bug on the error, a newspaper found wrong log write Java. Lang. OutOfMemoryError: Java heap space.

Then I went to that server remotely, but the card didn’t work. So I used the top command to check the CPU information, and the usage was almost 99%. If you look at the GC log, you can see that the program is always in Full GC. No wonder the CPU usage is so high.

Memory leak causes GC to fail to reclaim memory, resulting in OOM. In order not to affect the service, I asked o&M to restart the service. As expected, the service was normal after the restart.

Analysis of the log

First take a look at the error log details write some error information, although the general OOM problem log can not accurately locate the problem, but has opened the log platform, take a look as a reference is not lost.

It can be seen that the OOM incident scene written in the log appears when calculating the total amount of multiple users, and the pseudo-code is as follows:

/** * OrderService.java */

// 1. Obtain a list of qualified user ids based on certain parameters
List<Long> customerIds = orderService.queryCustomerIdByParam(param); 

// 2. Calculate the sum of these user ids
long principal = orderMapper.countPrincipal(customerIds);

Copy the code
<! -- OrderMapper.xml -->

<! SQL > create OrderMapper SQL > create OrderMapper
<select id="countPrincipal" resultType="java.lang.Long">
    select
    IFNULL(sum(remain_principal),0)
    from
    t_loan where
    <if test="null ! = customerIds and customerIds.size > 0">
        customer_id in
        <foreach collection="customerIds" item="item" index="index" open="("
                 close=")" separator=",">
            #{item}
        </foreach>
    </if>
</select>
Copy the code

The reason this part feels wrong is because there are too many query parameters customerIds when calculating the total amount. Due to the service change a while ago, the customerIds list queried changed from tens of hundreds of ids to tens of thousands of ids without changing the parameters. According to the error message printed in the log here, the size of the list is more than 30,000 customerIds. However, even if the query conditions are more than thirty thousand and lead to SQL execution is slower, but this method only the internal financial system will be called, the volume of business is not so large, and should not lead to the emergence of OOM. So take a look at the Dump file printed by the JVM to locate the specific problem.

Dump file

Thanks to the JVM parameter china-canada – XX: + HeapDumpOnOutOfMemoryError parameters, in the event of a OOM when system will automatically generate the Dump file, so that we can complete the analysis of the “crime scene”. Here we use the Eclipse Memory Analyzer tool to help parse Dump files.

As you can see from the pie chart in the Overview, the largest area in blue is this class, which takes up 245.6MB of memory. And the note on the left says DruidDataSource, okay, he’s the culprit.

Through Domainator_Tree interface can see is com. Alibaba. Druid. Pool. DruidDataSource under the class of com. Alibaba. Druid. Stat. A Linke JdbcDataSourceStat $1 object inside DHashMap, which holds more than 600 entries, about 100 of which are more than 2,000,000 bytes (about 2MB) in size. Select IFNULL(sum remain_principal),0) from T_loan where customer_id in (? ,? ,? ,? . This is exactly what the error log suggests the code does.

Problem analysis

This SQL statement is extremely long because there are more than 30,000 query criteria to calculate the amount of these users, and then all the SQL is held by a HashMap object in JdbcDataSourceStat and cannot be GC, resulting in OOM. Well, it was a textbook OOM event.

To deal with

LinkedHashMap

sqlStatMap Map There is also a static variable and a static code block:
,>

private static JdbcDataSourceStat global;

static {
	String dbType = null;
	{
	String property = System.getProperty("druid.globalDbType");
	if(property ! =null && property.length() > 0) {
		dbType = property;
	}
	}
	global = new JdbcDataSourceStat("Global"."Global", dbType);
}
Copy the code

This means that unless you manually release the global object in your code or remove the sqlStatMap object, the sqlStatMap is held and cannot be released by GC.

SqlStatMap can be used to release sqlStatMap from Github Issues. SqlStatMap can be used to release sqlStatMap from Github Issues. Each SQL statement holds a reference for a long time, speeding up the FullGC frequency.

The sqlStatMap object is used for Druid’s monitoring statistics function, so hold the SQL to display on the page. Since I don’t use this function at ordinary times and I don’t know why THIS function is enabled after asking other colleagues, I decided to turn this function off first.

Write according to documentation this function is disabled by default, but we have enabled it in the configuration file, now we can remove this configuration

<bean id="dataSource" class="com.alibaba.druid.pool.DruidDataSource"
      init-method="init" destroy-method="close">.<! - monitoring - >
    <! -- <property name="filters" value="state"/> -->
</bean>
Copy the code

Jmap -dump:format=b,file= file name [pid] command is used to generate the dump file. The memory usage is restored. And will not see com. Alibaba. Druid. Pool. Under DruidDataSource class com. Alibaba. The druid. Stat. JdbcDataSourceStat $1. Prove that the OOM issue has been successfully resolved.

A repeatMybatisPlot against

the

It was another ordinary working day. The online alarm reported an error again. I looked at the log and it was OOM again. I thought the DruidDataSource problem was not solved last time, but this time the phenomenon is a little different. First of all, it’s only one alarm, not all the time. Then, check that the CPU and memory usage of the server are normal and services are not affected. Therefore, you do not need to restart the service this time.

Analysis of the

The error log of this alarm still points to the location where DruidDataSource caused OOM failure last time. I was so impressed with it that I directly looked at the Dump file (because the Dump file was too large, the online one was cleared, and I forgot to back it up, so THIS file was generated when I restored the scene).

This time there is no obvious one particularly large occupation object, it seems that the problem is really different from the last time. Take a look at the detailed analysis of the Domainator_Tree interface.

Although there is not a single object occupying memory, you can see that there are more than a dozen threads occupying nearly 20 MB of memory, adding up to more than 300 MB of memory. Let’s take a look at the memory footprint in these threads.

In view of the high memory footprint of this thread, there are several strings, which are concatenated SQL statements, which are essential.

There are two high memory footprint object is org. Apache. Ibatis. Scripting. Xmltags. DynamicContext $ContextMap and org. Apache. Ibatis. Builder. SqlSourceBuilder $Par AmeterMappingTokenHandler.

The contents of these two objects seem to be placeholder and parameter objects generated when Mybatis splices SQL. Take this query for example

List<Long> customerIds = orderService.queryCustomerIdByParam(param); 

long principal = orderMapper.countPrincipal(customerIds);
Copy the code

So although the parameter used for the query isLongEven thisList30,000 plus of them won’t take up a lot of memory on their own, butMybatisWhen you concatenate SQL, you put theLongObjects of type are wrapped into a generic object type (similar to the AbstractItem feel), and each generic object type is given an alias (e.g__frch_item_1.__frch_item_2And then stored in the Map for use when stitching SQL. In addition, since the splicing of SQL strings consumes resources, it still takes a certain amount of time when the PARAMETERS are multiple and the SQL is long, so the Map will be held for a long time. Once there are more threads doing this operation at the same time, the memory will be high, which is easy to happenOOM.

To viewMybatisSource code analysis

First see org. Apache. Ibatis. Scripting. Xmltags. DynamicContext $ContextMap, he is a variable DynamicContext, variable called bindings, is the inner class DynamicContext, Inherited from HashMap. And the DynamicContext class wraps the put() method of HashMap with the bind() method.

Use the Lead point function of IDEA to see which methods call the bind() method.

Can see there are three classes call the bind () method, here only focus on org. Apache. Ibatis. Scripting. Xmltags. ForEachSqlNode this class, Because we use foreach keyword in Mybatis XML to realize SQL in query function. So let’s take a look at what’s special about ForEachSqlNode that might result in oom.

ForEachSqlNode implements the SqlNode interface and implements the Apply () method, which is the core of a concatenated SQL statement. Here is the code for the Apply () method. I have added Chinese comments for some key steps.

  @Override
  public boolean apply(DynamicContext context) {
      Bindings (bindings) : bindings (bindings) : Bindings (bindings) : Bindings (bindings
    Map<String, Object> bindings = context.getBindings();
    finalIterable<? > iterable = evaluator.evaluateIterable(collectionExpression, bindings);if(! iterable.iterator().hasNext()) {return true;
    }
    boolean first = true;
    // Start string of SQL
    applyOpen(context);
    int i = 0;
    // Iterate over the parameters
    for (Object o : iterable) {
      DynamicContext oldContext = context;
      if (first || separator == null) {
        context = new PrefixedContext(context, "");
      } else {
        context = new PrefixedContext(context, separator);
      }
      int uniqueNumber = context.getUniqueNumber();
      // Issue #709
      if (o instanceof Map.Entry) {
        // If the Map object is used, the key value is used
        @SuppressWarnings("unchecked")
        Map.Entry<Object, Object> mapEntry = (Map.Entry<Object, Object>) o;
        applyIndex(context, mapEntry.getKey(), uniqueNumber);
        applyItem(context, mapEntry.getValue(), uniqueNumber);
      } else {
        // Use the quantity I as key
        applyIndex(context, i, uniqueNumber);
        applyItem(context, o, uniqueNumber);
      }
      // FilteredDynamicContext generates SQL dynamically
      contents.apply(new FilteredDynamicContext(configuration, context, index, item, uniqueNumber));
      if(first) { first = ! ((PrefixedContext) context).isPrefixApplied(); } context = oldContext; i++; }// End of SQL string
    applyClose(context);
    context.getBindings().remove(item);
    context.getBindings().remove(index);
    return true;
  }
Copy the code

The applyIndex() and applyItem() methods save their parameters, their placeholders, and their SQL prefixes to call the bind() method in bindings for each iteration.

  private void applyIndex(DynamicContext context, Object o, int i) {
    if(index ! =null) { context.bind(index, o); context.bind(itemizeItem(index, i), o); }}Copy the code
  private void applyItem(DynamicContext context, Object o, int i) {
    if(item ! =null) { context.bind(item, o); context.bind(itemizeItem(item, i), o); }}Copy the code

The placeholder is then handled with FilteredDynamicContext, an inner class of ForEachSqlNode that inherits DynamicContext and primarily overwrites the appendSql() method.

  private static class FilteredDynamicContext extends DynamicContext {...@Override
    public void appendSql(String sql) {
      GenericTokenParser parser = new GenericTokenParser("# {"."}", content -> {
        String newContent = content.replaceFirst("^\\s*" + item + "(? ! [^.,:\\s])", itemizeItem(item, index));
        if(itemIndex ! =null && newContent.equals(content)) {
          newContent = content.replaceFirst("^\\s*" + itemIndex + "(? ! [^.,:\\s])", itemizeItem(itemIndex, index));
        }
        return "# {" + newContent + "}";
      });

      delegate.appendSql(parser.parse(sql));
    }

Copy the code

AppendSql () uses the re to replace #{} placeholders, but it’s not really a binding parameter, it’s just replacing placeholders that just existed in bindings, such as __frch_item_1, __frch_item_2(seen in Dump).

Problem analysis

So the question is,MybatistheforeachThe performance of concatenated SQL is poor, especially when matching placeholders through operations such as re takes a lot of time. It also holds query parameters and placeholders inContextMapCan’t be freed by THE GC, so it’s easy to overconsume memory once the concurrency goes upOOM.

Scene: the repetition

This problem is easily replicated locally, so let’s create the database table first

CREATE TABLE user
(
    id int(11) PRIMARY KEY NOT NULL.name varchar(50));Copy the code

Create a SpringBoot+Mybatis project. The JVM configuration and online simulation, in setting the project IDEA VM Option parameters – Xmx512m – XX: + PrintGCDetails – XX: + HeapDumpOnOutOfMemoryError

Write out the foreach statement corresponding to the mock line

<select id="countUser" resultType="long">
    select
    IFNULL(sum(1),0)
    from user where
    <if test="ids ! = null and ids.size() > 0">
        id in
        <foreach collection="ids" item="item" index="index" open="("
                 close=")" separator=",">
            #{item}
        </foreach>
    </if>
</select>
Copy the code

Write unit tests

@Test
public void count(a) {
    AtomicInteger count = new AtomicInteger(0);
    for (int threadId = 0; threadId < 50; threadId++) {
        Call countUser() for 50 threads concurrently
        int finalThreadId = threadId;
        new Thread(() -> {
            long userCount = userMapper.countUser(createIds(10000 + finalThreadId));
            log.info("thread:{}, userCount:{}", finalThreadId, userCount);
            count.getAndAdd(1);
        }).start();
    }

    // Wait for 50 query threads to run
    while (count.get() < 50) {
    }
    log.info("end!!!!");
}

private List<Long> createIds(int size) {
    List<Long> ids = new ArrayList<>();
    for (int i = 0; i < size; i++) {
        ids.add((long) i);
    }
    return ids;
}
Copy the code

Then run the unit tests. Because of the -xx :+PrintGCDetails parameter added to the JVM configuration, the GC log is displayed on the console, and after a while you will see a lot of Full GC and an OOM error.

To deal with

The problem is caused by the poor performance of Mybatis through foreach concatenation of long SQL strings, so there are two solutions

  1. Reduced by splitting the query criteria for INforeachThe length of the CONCatenated SQL at a time
@Test
public void count2(a) {
    AtomicInteger count = new AtomicInteger(0);
    for (int threadId = 0; threadId < 50; threadId++) {
        Call countUser() for 50 threads concurrently
        int finalThreadId = threadId;
        new Thread(() -> {
            List<Long> totalIds = createIds(100000 + finalThreadId);
            long totalUserCount = 0;
            // Use Guava to split the list in groups of 1000
            List<List<Long>> parts = Lists.partition(totalIds, 1000);
            for (List<Long> ids : parts) {
                totalUserCount += userMapper.countUser(ids);
            }
            log.info("thread:{}, userCount:{}", finalThreadId, totalUserCount);
            count.getAndAdd(1);
        }).start();
    }

    // Wait for 50 query threads to run
    while (count.get() < 50) {
    }
    log.info("end!!!!");
}
Copy the code

In this way, you only need to loop 1000 times each time to concatenate the SQL query, which can quickly release the resources and will not cause OOM. However, this method still generates a lot of unnecessary data, occupies the memory, triggers GC frequently, and wastes resources.

  1. Do not useMybatistheforeachTo concatenate SQL for in conditions

Since the foreach performance of Mybatis is not good, we can splice in conditions by ourselves at the Java level, especially for such query conditions which are relatively single, so it is more suitable to splice by ourselves.

@Test
public void count3(a) {
    AtomicInteger count = new AtomicInteger(0);
    for (int threadId = 0; threadId < 50; threadId++) {
        Call countUser() for 50 threads concurrently
        int finalThreadId = threadId;
        new Thread(() -> {
            List<Long> ids = createIds(100000 + finalThreadId);
            StringBuilder sb = new StringBuilder();
            for (long i : ids) {
                sb.append(i).append(",");
            }
			// The query criteria are String characters
            long userCount = userMapper.countUserString(sb.toString());
            log.info("thread:{}, userCount:{}", finalThreadId, userCount);
            count.getAndAdd(1);
        }).start();
    }

    // Wait for 50 query threads to run
    while (count.get() < 50) {
    }
    log.info("end!!!!");
}
Copy the code
<select id="countUserString" resultType="long">
    select
    IFNULL(sum(1),0)
    from user where
    <if test="null ! = ids and ids ! = "">
        id in (#{ids})
    </if>
</select>
Copy the code

This will greatly reduce the number of objects generated using foreach, reduce the time spent stitching SQL, and avoid OOM while optimizing performance.

Afterword.

These two times encountered OOM problems are relatively easy to solve, in addition to proper follow-up analysis, but also inseparable from the prior environment configuration. Increase in the service the JVM parameters – XX: + HeapDumpOnOutOfMemoryError and – XX: + PrintGCDetails parameters, can output in OOM dump file, and can have a GC logs to see GC, These are very helpful for OOM problems.


Suffered DruidDataSource and Mybatis sabotage, resulting in OOM