background

The log resolution service parses the logs reported by the server and delivers them to the corresponding log library. Logs can be delivered in batches, but logs can only be parsed one by one. Therefore, the performance of parsing is critical.

Original log:

[INFO][2021-10-21 17:47:15. 074] [XXL - job] [a57260920d9140cd91ae4f3e5bab0922 293.16348096350130313] [] [Thread - 29] [XXX. Server. Service. Prometheu SService] [10.2.6.166] [172.17.0.239] [XXX - bb57fd server - 859 - LQGJH] Prometheus spent: 54 ms, request:{query=[sum(increase(logback_events_total{level=~"error|warn" }[1d] @1634809560 )) by (k8s_app,level) > 0], step=[60]}Copy the code

Need to convert to:

{" container_ip ":" 172.17.0.239 ", "date" : "the 2021-10-21 17:47:15. 074", "level" : "INFO", "logger" : "xxx.server.service.PrometheusService", "message": "prometheus spent: 54 ms, request:{query=[sum(increase(logback_events_total{level=~"error|warn" }[1d] @1634809560 )) by (k8s_app,level) > 0], Step = [60]} ", "node_ip 10.2.6.166" : ""," pod_name - server - 859 ":" XXX bb57fd - LQGJH rawLog ":"," "[INFO][2021-10-21 17:47:15. 074] [XXL - job] [a57260920d9140cd91ae4f3e5bab0922 293.16348096350130313] [] [Thread - 29] [cn. Jo-jo. Infra. Bs. Monitor. API . Server. Service. PrometheusService] [10.2.6.166] [172.17.0.239] [bs - monitor - API - server - 859 bb57fd - LQGJH] Prometheus spent: 54 ms, request:{query=[sum(increase(logback_events_total{level=~"error|warn" }[1d] @1634809560 )) by (k8s_app,level) > 0], step=[60]}", "threadName": "Thread-29", "traceId": "A57260920d9140cd91ae4f3e5bab0922. 293.16348096350130313" and "traceType" : "XXL - job"}Copy the code

Old function

The old function is implemented as follows:

    public void parseLog(a) {
        String rawLog = new String(contentByte);
        Map<String, Object> logMap = generateLogItem(rawLog);
        if (isReportRawLog) {
            logMap.put(LOG_FIELD_RAW_LOG, rawLog);
        }
        String date = String.valueOf(logMap.get(LOG_FIELD_DATE));

        if(date ! =null) {
            try {
                Date logDate = DateUtils.parseDate(date, DATE_PATTERN);
                time = logDate.getTime();
            } catch(Exception ignore) { time = System.currentTimeMillis(); }}else {
            time = System.currentTimeMillis();
        }

        if (logMap.containsKey(LOG_FIELD_TRACE_EXT)) {
            logMap.put(LOG_FIELD_TRACE_EXT, JSON.parseObject(logMap.get(LOG_FIELD_TRACE_EXT).toString()));
        }
        logContent = JSON.toJSONString(logMap);
        contentByte = null;
    }


    private Map<String, Object> generateLogItem(String message) {
        Map<String, Object> logItem = new HashMap<>(20);
        try {
            int start = 0;
            int end = 0;
            int matchCount = 0;
            int maxCount = LOG_FIELD_PATTERNS.length;
            for (int i = 0; i < message.length() && matchCount < maxCount; i++) {
                if ('[' == message.charAt(i)) {
                    start = i;
                }
                if ('] ' == message.charAt(i)) {
                    end = i;
                    // subString
                    start = start + 1;
                    if (end > start) {
                        logItem.put(LOG_FIELD_PATTERNS[matchCount], message.substring(start, end));
                    }
                    start = end + 1; matchCount++; }}// trim content
            int trimStart = end + 1;
            int trimEnd = message.length();
            for (int i = 0; i < message.length(); i++) {
                while ((trimStart < trimEnd) && (message.charAt(trimStart) <= ' ')) {
                    trimStart++;
                }
                while ((trimStart < trimEnd) && (message.charAt(trimEnd - 1) < =' ')) {
                    trimEnd--;
                }
            }
            logItem.put(LOG_FIELD_MESSAGE, message.substring(trimStart, trimEnd));
        } catch (Exception e) {
            logItem = new HashMap<>();
            logItem.put(LOG_FIELD_MESSAGE, message);
            log.warn("Huawei Cloud Log Parsing error:{}", e.getMessage(), e);
        }
        return logItem;
    }
Copy the code

Run the two methods separately to determine whether to report the original text, and the result is as follows (using JMH to do the swallow test) :

HuaWeiLogItemTest. ParseLog THRPT ops/s HuaWeiLogItemTest. 424734.273 parseLogReportRaw THRPT 350891.566 ops/sCopy the code

The working principle of

The function works by identifying brackets [] and extracting subscripts through the log text string, and putting strings in the subscript range into the HashMap. Finally, output with jsonObject.tojsonString ().

Possible performance overhead points include the following:

  1. A HuaweiLogItem object needs to be created each time
  2. You need to create a HashMap object each time
  3. The final message processing misuses the outer for loop, which is not really needed.
int trimStart = end + 1; int trimEnd = message.length(); // This code means that the left side of the string is checked for newlines, equivalent to the string.trim() method, which is actually copied from it. For (int I = 0; i < message.length(); i++) { while ((trimStart < trimEnd) && (message.charAt(trimStart) <= ' ')) { trimStart++; } while ((trimStart < trimEnd) && (message.charAt(trimEnd - 1) <= ' ')) { trimEnd--; } } logItem.put(LOG_FIELD_MESSAGE, message.substring(trimStart, trimEnd));Copy the code

New implementation

Serialize objects instead of hashMaps

There are several reasons:

  1. HashMap is expensive in computation and wasteful in storage.
  2. A HashMap cannot reuse objects, but a class’s fields can.

Benchmark

# parseLog refers to the old function. # ReportRaw newParseLog refers to a new function which contains the original log field HuaWeiLogItemTest. ParseLog THRPT 432612.603 ops/s HuaWeiLogItemTest. ParseLogReportRaw THRPT ops/s HuaWeiLogItemTest. 361318.931 newParseLog THRPT 1059580.230 ops/s HuaWeiLogItemTest. NewParseLogReportRaw THRPT 1059339.806 ops/sCopy the code

You can see an improvement of about 144% to 190%.

CPU cache optimization

When iterating through a string, we used string.charat (I) :

for (int i = 0; i < message.length() && matchCount < maxCount; I++) {/ / here if (' [' = = message. The charAt (I)) {...} / / / and here the if (= = '] 'message. The charAt (I)) {/ /... }}Copy the code

Using charAt will probably flush the CPU cache, so toCharArray() will create a new string, so it didn’t work before:

char[] chars = rawLog.toCharArray(); int rawLogLength = rawLog.length(); for (int i = 0; i < rawLogLength && matchCount < maxCount; i++) { if ('[' == chars[i]) { // ... } if (']' == chars[i]) { // ... }}Copy the code

Benchmark

# Cache row optimization. I'll use toCharArray to traverse, # before HuaWeiLogItemTest. NewParseLog THRPT ops/s HuaWeiLogItemTest. 1059580.230 newParseLogReportRaw THRPT 1059339.806 Ops/s # after HuaWeiLogItemTest. NewParseLog THRPT ops/s HuaWeiLogItemTest. 1156832.681 newParseLogReportRaw THRPT 1164270.861 ops/sCopy the code

That’s about a 15% improvement over the previous step.

Reuse objects

Previously, each log corresponds to a HuaweiLogItem object, which is modified to be used only as a medium to convert log text to JSON and store the object in ThreadLocal

ThreadLocal HuaWeiLogItemNew HuaWeiLogItemNew = huawei_log_ITEM_thread_local.get (); if (huaWeiLogItemNew == null) { synchronized (HuaweiLogProducer.class) { huaWeiLogItemNew = new HuaWeiLogItemNew(); HUAWEI_LOG_ITEM_THREAD_LOCAL.set(huaWeiLogItemNew); }} / / before each use logging data and transform the for (String rawLog: rawLogs) {huaWeiLogItemNew. SetData (rawLog, clientConfig isReportRawLog ()); huaWeiLogItemNew.parseLog(); contents.add(ImmutableMap.of("log_time_ns", huaWeiLogItemNew.getTime() * 1000L * 1000L, "log", JSON.toJSONString(huaWeiLogItemNew))); }Copy the code

Benchmark

. # # reuse objects optimization before HuaWeiLogItemTest newParseLog THRPT ops/s HuaWeiLogItemTest. 1156832.681 newParseLogReportRaw THRPT 1164270.861 ops/s # after HuaWeiLogItemTest. NewParseLogSharedObject THRPT 1365381.574 ops/s HuaWeiLogItemTest. NewParseLogReportRawSharedObject THRPT 1374768.509 ops/sCopy the code

This is a 17% to 19% improvement over the previous step.

Efficient date-to-timestamp

We used dateutils.parseDate () directly, which uses FastDateFormat and returns a Date object from which we get the timestamp. In this process, objects are created and the computation is quite complex. A Benchmark puff test of this method alone shows that it is around 130W /s, and it looks like the bottleneck is probably here.

What we need is only a tool to convert from 2021-10-22 17:47:15.074 to 1634896035074. Is there a simpler way? For example, from the string directly, the corresponding parsing time and seconds add up:

Public static long fastStrToTimestamp(String date) {// Support only yyyY-MM-DD HH: MM :ss.SSS format if (date.length()! = 23) { return 0; } // Find the timestamp of the current day, e.g. 2021-10-20 = 1634659200000 long todayTimestamp = -1; for (Tuple2<String, Long> tuple2 : fastConvertCache) { if (date.startsWith(tuple2.getFirst())) { todayTimestamp = tuple2.getSecond(); break; } } if (todayTimestamp == -1) { return DateUtils.parseDate(date, YYYY_MM_DD_HH_MM_SS_SSS).getTime(); } int hour = integer.parseInt (date.substring(11, 13)); int minute = Integer.parseInt(date.substring(14, 16)); int second = Integer.parseInt(date.substring(17, 19)); int mills = Integer.parseInt(date.substring(20, 23)); return todayTimestamp + hour * DateUtils.MILLIS_PER_HOUR + minute * DateUtils.MILLIS_PER_MINUTE + second * DateUtils.MILLIS_PER_SECOND + mills; }Copy the code

You also need a cache that holds timestamps for days, generated at startup and updated regularly each day. This function can achieve about 2000W QPS by Benchmark testing.

Benchmark

# # to use fast timestamp conversion before HuaWeiLogItemTest. NewParseLog THRPT ops/s HuaWeiLogItemTest. 1156832.681 newParseLogReportRaw THRPT 1365381.574. 1164270.861 ops/s HuaWeiLogItemTest newParseLogSharedObject THRPT ops/s HuaWeiLogItemTest. NewParseLogReportRawSharedObject THRPT ops/s # 1374768.509 after HuaWeiLogItemTest. NewParseLog THRPT 2314395.548. 2326123.191 ops/s HuaWeiLogItemTest newParseLogReportRaw THRPT ops/s HuaWeiLogItemTest. NewParseLogSharedObject THRPT ops/s HuaWeiLogItemTest. 3275743.575 newParseLogReportRawSharedObject THRPT 3326434.899 ops/sCopy the code

Compared with the previous step, there is about 100%~ 142% improvement, the effect can be said to be very outstanding. QPS has finally reached the 300W mark.

Benchmark summary

# Old and new integrated test. # parseLog refers to the old function. NewParseLog refers to new function # # SharedObject ReportRaw refers to contain the original log field refers to reuse objects # original analytic way HuaWeiLogItemTest. ParseLog THRPT 432612.603 ops/s HuaWeiLogItemTest. ParseLogReportRaw THRPT ops/s HuaWeiLogItemTest. 361318.931 parseLogReportRawSharedObject THRPT . 362798.892 ops/s HuaWeiLogItemTest parseLogSharedObject THRPT ops/s # 434279.601 to field after HuaWeiLogItemTest. NewParseLog 1059580.230 ops/s THRPT HuaWeiLogItemTest. NewParseLogReportRaw THRPT 1059339.806 ops/s HuaWeiLogItemTest. NewParseLogReportRawSharedObject THRPT ops/s HuaWeiLogItemTest. 1200556.455 newParseLogSharedObject THRPT 1191434.370 OPS/S # Cache row optimization. Use charsArrays to traverse. HuaWeiLogItemTest. NewParseLog THRPT ops/s HuaWeiLogItemTest. 1156832.681 newParseLogReportRaw THRPT 1164270.861 ops/s HuaWeiLogItemTest. NewParseLogReportRawSharedObject THRPT ops/s HuaWeiLogItemTest. 1374768.509 newParseLogSharedObject THRPT ops/s # 1365381.574 to fast timestamp conversion HuaWeiLogItemTest. NewParseLog THRPT 2326123.191 ops/s HuaWeiLogItemTest. NewParseLogReportRaw THRPT ops/s HuaWeiLogItemTest. 2314395.548 newParseLogReportRawSharedObject THRPT 3275743.575. 3326434.899 ops/s HuaWeiLogItemTest newParseLogSharedObject THRPT ops/sCopy the code

This set, compared to the old parsing method, up to 817% improvement:

HuaWeiLogItemTest. ParseLogReportRawSharedObject THRPT 362798.892 ops/s HuaWeiLogItemTest. NewParseLogReportRawSharedObject THRPT 3326434.899 ops/sCopy the code

By the way, the above test was run on a Mac Mini M1 chip and a Macbook I9 CPU, and the results are as follows

By loose comparison, the i9 is about 52 to 62 percent as efficient as the M1. Probably because the i9 on the MacBook is way too low.