This is my 11th day of the August Challenge

This series is my TM person silly series third [facepalm], past period wonderful review:

  • After upgrading to Spring 5.3.x, the number of GC increases dramatically, I am stupid
  • This big table walk index field query SQL how to complete scan, I TM person silly

Recently, the group used the SDK given by a third party to do some development. Recently, errors suddenly began to be reported online, and a particularly strange problem was found. The team members told me that the code didn’t go in the middle of running, and a section was skipped (was this code just participated in the long jump in the Eastern Olympic Games?? .

The code looks like this, and the logic is very simple:

try {"initiate client with conf: {}", conf); SDKClient client = new SDKClient(conf); client.init();"client initiated"); } catch (Exception e) { log.error("initiate client failed", e); }"start to manipulate..." );Copy the code

We found that the client was not actually initialized successfully, and the subsequent business processing kept reporting errors. Check the log and find:

initiate client with conf: xxxxx
start to manipulate...
Copy the code

This is what the team calls a code jump. Because neither Initiate Client failed nor Initiate Client Failed was printed… Just start to manipulate… .

As regular readers know, we are k8S + Docker online and have Arthas built into each image, and the Java version is Java 16 and JFR enabled. Logs contain link information and are pulled to the unified log server through the ELK Agent.

In this SDK, all the remote addresses to be accessed have IP whitelist. For local security, we cannot directly use the SDK to access the online environment of the other party. The local test is connected to the other party’s test environment, is no problem. So here again, we have to use Arthas to locate.

First of all, is the source code running online consistent with what we see locally? This can be done using the jad command:

The fully qualified name of the class that jad wants to seeCopy the code

After viewing, found that decompiled code, and our source consistent ah.

Then let’s look at the code in action:

Trace the fully qualified name method of the class you want to seeCopy the code

When we re-execute this method, we can see that we did throw an exception during initialization:

# omit we here don't care about + - [min = 0.010174 ms, Max = 0.01184 ms, total = 0.022014 ms, count = 2] org. Apache. Logging.. Log4j Logger: info () # 130 + - [min = 599.388978 ms, Max = 630.23967 ms, total = 1229.628648 ms, count = 2] com. Dasha13. SDK. SDKClient: < init > () # 131 + - [min = 203.617545 ms, Max = 221.785512 ms, total = 425.403057 ms, count = 2] com. Dasha13. SDK. SDKClient: init () # 132 [throws Exception, + 2] - [min = 0.034798 ms, Max = 0.084505 ms, total = 0.119303 ms, count = 2] org. Apache. Logging.. Log4j Logger: error # 136 () + - [min = 0.010174 ms, Max = 0.01184 ms, total = 0.022014 ms, count = 2] org. Apache. Logging.. Log4j Logger: info () # 138Copy the code

But why is this exception log not printed? Let’s go ahead and look at this exception, using the watch method and specifying a view depth of 2, so we expect to print out the stack and the Message:

watch com.dasha13.sdk.SDKClient init {throwExp} -x 2
Copy the code

However, only a Message that looks like “Message” is printed:

method=com.dasha13.sdk.SDKClient init location=AtExceptionExit ts=2021-08-10 02:58:15; [cost] = 131.20209 ms result = ERROR DATA!!!!!! object class: class java.util.ArrayList, exception class: class, exception message: java.lang.IllegalArgumentExceptionCopy the code

This is odd. Normally, if you specify a depth of 2, if an exception is thrown, then the output will contain the exception’s Message and the stack information. What’s going on? Let’s try getting the stack and the information separately:

First get the stack:

watch com.dasha13.sdk.SDKClient init {throwExp.getStackTrace()} -x 2
Copy the code

The stack problem is related to Google’s dependency on the Flip-bean Management framework (similar to Spring) Guice to load a Bean with an exception:

ts=2021-08-10 03:03:37; [cost] = 146.644563 ms result = @ ArrayList [@ StackTraceElement [] [ @StackTraceElement[$2.get(], @StackTraceElement[], @ StackTraceElement [com. Dasha13. SDK. SDKClient. Init (SDKClient. Java: 482)], after # omitCopy the code

Now look at the exception information:

watch com.dasha13.sdk.SDKClient init {throwExp.getMessage()} -x 2
Copy the code

Re-execute the faulty method, and find that the watch failed:

watch failed, condition is: null, express is: {throwExp.getMessage()}, java.lang.IllegalArgumentException, visit /app/arthas/arthas.log for more details.
Copy the code

We looked at the Arthas logs as prompted and found the stack of exceptions:

2021-08-10 03:07:11 [XNIO-2 task-3] ERROR -Error during evaluating the expression: java.lang.IllegalArgumentException                                
cause by: MethodNotFoundException: Method not found: class 
Copy the code

Instead, we find that getMessage() of ProvisionException, the getMessage() of the exception, has been excepted. We also locate the Cause of the exception that the Guava version is incompatible with the Guice version. The root Cause is that the three-party interface times out, which leads to initialization exceptions. If any exception is thrown, it is encapsulated as ProvisionException. The getMessage of ProvisionException exception depends on the Guava Cache for some exception information, but the Guava version in our project is not compatible with the Guice version, so some methods do not exist. So getMessage for the ProvisionException exception will also have an exception. The previous run is ok because the three parties have not yet initialized when the interface timeout throw exception…

The log4j2 asynchronous logging configuration we used, and passing the exception as the last parameter to the logging method, normally outputs a Message of the exception along with the exception stack. But we know from the above analysis that when we get the Message, we throw an exception. Log4j is designed to use the production and consumption of logging events architecture. Here is the Message where the consumer gets the exception and the exception stack, and while getting the Message, an exception is found. For Log4j2 asynchronous logging, when an exception is found, the original log event is discarded and the exception is output to the StatusLogger (the underlying standard exception output).


Private void tryCallAppender(final LogEvent Event) {try {// Call appender to output log appender.append(event); } catch (final RuntimeException error) {// Handle RuntimeException handleAppenderError(Event, error); } catch (final Exception error) {// Handle other exceptions handleAppenderError(Event, new AppenderLoggingException(error)); } } private void handleAppenderError(final LogEvent event, final RuntimeException ex) { appender.getHandler().error(createErrorMsg("An exception occurred processing Appender "), event, ex); if (! appender.ignoreExceptions()) { throw ex; }}Copy the code

ErrorHandler is usually implemented by default, which is DefaultErrorHandler; DefaultErrorHandler is output to a StatusLogger:


private static final Logger LOGGER = StatusLogger.getLogger(); public void error(final String msg, final LogEvent event, final Throwable t) { final long current = System.nanoTime(); if (current - lastException > EXCEPTION_INTERVAL || exceptionCount++ < MAX_EXCEPTIONS) { LOGGER.error(msg, t); } lastException = current; if (! appender.ignoreExceptions() && t ! = null && ! (t instanceof AppenderLoggingException)) { throw new AppenderLoggingException(msg, t); }}Copy the code

StatusLogger is a standard exception output System. Err:


this.logger = new SimpleLogger("StatusLogger", Level.ERROR, false, true, showDateTime, false, dateFormat, MessageFactory, PROPS, // standard exception output system.err);Copy the code

In our deployment architecture, we put the standard exception output in a remote location where almost no one sees it, so we don’t notice it… If you look at the standard exception output, you will see that there are indeed exceptions:

2021-08-10 03:30:29,810 log4J2-tF-10 -AsyncLoggerConfig-3 ERROR An exception occurred Processing Appender file java.lang.IllegalArgumentException at$Segment.get( at at at$LocalLoadingCache.get( at$LocalLoadingCache.getUnchecked( at at at at at at at at at org.apache.logging.log4j.core.impl.ThrowableProxy.<init>( at org.apache.logging.log4j.core.impl.ThrowableProxy.<init>( at org.apache.logging.log4j.core.impl.Log4jLogEvent.getThrownProxy( at org.apache.logging.log4j.core.pattern.ExtendedThrowablePatternConverter.format( ) at org.springframework.boot.logging.log4j2.ExtendedWhitespaceThrowablePatternConverter.format(ExtendedWhitespaceThrowablePa at org.apache.logging.log4j.core.pattern.PatternFormatter.format( at org.apache.logging.log4j.core.layout.PatternLayout$PatternSerializer.toSerializable( at org.apache.logging.log4j.core.layout.PatternLayout.toText( at org.apache.logging.log4j.core.layout.PatternLayout.encode( at org.apache.logging.log4j.core.layout.PatternLayout.encode( at org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.directEncodeEvent( 197) at org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.tryAppend( at org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.append( at org.apache.logging.log4j.core.appender.RollingFileAppender.append( at org.apache.logging.log4j.core.config.AppenderControl.tryCallAppender( at org.apache.logging.log4j.core.config.AppenderControl.callAppender0( at org.apache.logging.log4j.core.config.AppenderControl.callAppenderPreventRecursion( at org.apache.logging.log4j.core.config.AppenderControl.callAppender( at org.apache.logging.log4j.core.config.LoggerConfig.callAppenders( at org.apache.logging.log4j.core.async.AsyncLoggerConfig.callAppenders( at org.apache.logging.log4j.core.config.LoggerConfig.processLogEvent( at org.apache.logging.log4j.core.config.LoggerConfig.log( at org.apache.logging.log4j.core.async.AsyncLoggerConfig.log( at org.apache.logging.log4j.core.async.AsyncLoggerConfig.logToAsyncLoggerConfigsOnCurrentThread(  at org.apache.logging.log4j.core.async.AsyncLoggerConfigDisruptor$Log4jEventWrapperHandler.onEvent(AsyncLoggerConfigDisrupt at org.apache.logging.log4j.core.async.AsyncLoggerConfigDisruptor$Log4jEventWrapperHandler.onEvent(AsyncLoggerConfigDisrupt at com.lmax.disruptor.BatchEventProcessor.processEvents( at at java.base/ Caused by: java.lang.IllegalArgumentException at$ClassReader.<init>(Unknown Source) at$ClassReader.<init>(Unknown Source) at$ClassReader.<init>(Unknown Source) at<init>( at$1.load( at$1.load( at$LoadingValueReference.loadFuture( at$Segment.loadSync( at$Segment.lockedGetOrLoad( at$Segment.get( ... 41 moreCopy the code

After that, the Appender’s ignoreExceptions configuration (both true by default) determines whether an exception will be thrown where the logging method is called, but this is for synchronous logs, asynchronous logs throw an exception to the Disruptor exception handler. Log4j2 Disruptor also outputs exceptions to system. err, which is the standard exception output. By default, this is not thrown. After all, no one wants to throw an exception to a synchronous log. Asynchronous logs are already output to the standard exception output, so there is no need to do this.

Wechat search “my programming cat” to follow the public account, every day, easy to improve technology, win a variety of offers