Introduction to Xcrash

Xcrash is a stable log collection framework that iQiyi opened source on GitHub in April 2019. It can collect Java crash, Native crash and ANR logs for Android. Root and system permissions are not required. Supports Android 4.0-10 (API Level 14-29), armeabi, ArmeabI-V7A, ARM64-V8A, x86, and X86_64.

Project address: github.com/iqiyi/xCras…

Second, Xcrash architecture

Iii. Xcrash class diagram

Xcrash serves as the entrance to the facade mode, and client calls are initialized by configuring InitParameter. Xcrash associates three types of handlers to handle the corresponding crash listening and log collection, and to manage the tombstone files of the crash logs through FileManager and TombstoneManager. The client calls the TombstoneParser to parse the corresponding tombstone file generated locally to retrieve data.

Catch a Java crash

Crashes in the Java layer can be handled directly by the JVM’s crash capture mechanism. This is too simple to go over.

Thread.setDefaultUncaughtExceptionHandler(this);
Copy the code

If a Java crash occurs, uncaughtException is called and handleException is executed to collect log information

private void handleException(Thread thread, Throwable throwable) { ... //notify the java crash NativeHandler.getInstance().notifyJavaCrashed(); AnrHandler.getInstance().notifyJavaCrashed(); //create log file data/data/packageName/files/tombstones logFile = FileManager.getInstance().createLogFile(logPath); . //write info to log file if (logFile ! = null) {... // write java stacktrace raf.write(emergency.getBytes("UTF-8")); //write logcat log logcat -b main; logcat -b system; logcat -b event; raf.write(Util.getLogcat(logcatMainLines, logcatSystemLines, logcatEventsLines).getBytes("UTF-8")); //write fds raf.write(Util.getFds().getBytes("UTF-8")); //write network info raf.write(Util.getNetworkInfo().getBytes("UTF-8")); //write memory info raf.write(Util.getMemoryInfo().getBytes("UTF-8")); //write background / foreground raf.write(("foreground:\n" + (ActivityMonitor.getInstance().isApplicationForeground() ? "yes" : "no") + "\n\n").getBytes("UTF-8")); //write other threads info if (dumpAllThreads) { raf.write(getOtherThreadsInfo(thread).getBytes("UTF-8")); }} // ICrashCallback onCrash if (callback! = null) { try { callback.onCrash(logFile == null ? null : logFile.getAbsolutePath(), emergency); } catch (Exception ignored) { } } }Copy the code

5, Capture Native collapse

Crash.java

Public static synchronized int init(Context CTX, InitParameters params) {... NativeHandler.getInstance().initialize(...) . }Copy the code

NativeHandler.java

int initialize(...) { //load lib System.loadLibrary("xcrash"); . //init native lib try { int r = nativeInit(...) ; }... }Copy the code

NativeHandler will execute the initialize method to initialize Xcrash init. During the initialization process, the native function is registered through System.loadLibrary(” Xcrash “), followed by calling nativeInit.

Execute System.loadLibrary(“xcrash “), JNI_OnLoad will be called back, here is dynamic registration play.

xc_jni.c

JNIEXPORT jint JNICALL JNI_OnLoad(JavaVM *vm, void *reserved) { ... if((*env)->RegisterNatives(env, cls, xc_jni_methods, sizeof(xc_jni_methods) / sizeof(xc_jni_methods[0]))) return -1; . return XC_JNI_VERSION; }Copy the code

The element of array 0 corresponds to:

static JNINativeMethod xc_jni_methods[] = { { "nativeInit", "(" "I" "Ljava/lang/String;" "Ljava/lang/String;" "Ljava/lang/String;" "Ljava/lang/String;" "Ljava/lang/String;" "Ljava/lang/String;" "Ljava/lang/String;" "Ljava/lang/String;" "Ljava/lang/String;" "Ljava/lang/String;" "Z" "Z" "I" "I" "I" "Z" "Z" "Z" "Z" "Z" "I" "[Ljava/lang/String; "" Z" "Z" "" "" "I I I" "" "" "Z Z)" "I", (void *) xc_jni_init},...}.Copy the code

The Java layer calls nativeInit, and native Xc_jni_init is called. Then look at the nativeInit logic xc_jni.c

static jint xc_jni_init(...) {... / / common init xc_common_init (...). ; // Initialize general information, including system information, application information, and process information. . R_crash = xc_crash_init(...) ; . R_trace = xc_trace_init(...) ; }... return (0 == r_crash && 0 == r_trace) ? 0 : XCC_ERRNO_JNI; }Copy the code

Look at the xc_crash_init

Int xc_crash_init () {... //init for JNI callback xc_crash_init_callback(env); //1 set signal native signal callback jNI to Java... //register signal handler return xcc_signal_crash_register(xc_crash_signal_handler); //2 register signal handler, can call the corresponding signal}Copy the code

1) set the callback:

Xc_crash_init_callback ultimately calls back to NativeHandler’s crashCallback

private static void crashCallback(String logPath, String emergency, boolean dumpJavaStacktrace, boolean isMainThread, String threadName) { if (! TextUtils.isEmpty(logPath)) { //append java stacktrace TombstoneManager.appendSection(logPath, "java stacktrace", stacktrace); . //append memory info TombstoneManager.appendSection(logPath, "memory info", Util.getProcessMemoryInfo()); //append background / foreground TombstoneManager.appendSection(logPath, "foreground", ActivityMonitor.getInstance().isApplicationForeground() ? "yes" : "no"); } / / the last callback to client registered ICrashCallback. OnCrash ICrashCallback callback = NativeHandler. GetInstance (). CrashCallback; if (callback ! = null) { callback.onCrash(logPath, emergency); }... }Copy the code

2) Signal registration:

Static xcc_signal_crash_info_t xcc_signal_crash_info[] = {{. Signum = SIGABRT} Signum = SIGBUS},// invalid address, including memory address alignment error {. Signum = SIGFPE}, {. Signum = SIGILL},// forced end of program {. Signum = SIGSEGV},// illegal memory operation {. Signum = SIGTRAP},// generated when breakpoint, Signum = SIGSYS} is used by the debugger,// Invalid system call {. Signum = SIGSTKFLT}// coprocessor stack error}; int xcc_signal_crash_register(void (*handler)(int, siginfo_t *, void *)) { stack_t ss; if(NULL == (ss.ss_sp = calloc(1, XCC_SIGNAL_CRASH_STACK_SIZE))) return XCC_ERRNO_NOMEM; ss.ss_size = XCC_SIGNAL_CRASH_STACK_SIZE; ss.ss_flags = 0; if(0 ! = sigaltstack(&ss, NULL)) return XCC_ERRNO_SYS; struct sigaction act; memset(&act, 0, sizeof(act)); sigfillset(&act.sa_mask); act.sa_sigaction = handler; / / set signal callback handler act. Sa_flags = SA_RESTART | SA_SIGINFO | SA_ONSTACK; size_t i; For (I = 0; i < sizeof(xcc_signal_crash_info) / sizeof(xcc_signal_crash_info[0]); i++) if(0 ! = sigaction(xcc_signal_crash_info[i].signum, &act, &(xcc_signal_crash_info[i].oldact))) return XCC_ERRNO_SYS; return 0; }Copy the code

Xc_crash_signal_handler (xc_crash_signal_handler);

static void xc_crash_signal_handler(int sig, siginfo_t *si, void *uc { ... If ((xc_crash_log_fd = xc_common_open_crash_log(xc_crash_log_pathname, sizeof(xc_crash_log_pathname), &xc_crash_log_from_placeholder)) < 0) goto end; . Pid_t dumper_pid = xc_crash_fork(xc_crash_exec_dumper); spawn crash dumper process pid_t dumper_pid = xc_crash_fork(xc_crash_exec_dumper); . //JNI to Java callback xc_crash_callback(); . }Copy the code

Enter the xc_crash_exec_dumper pointer function to see the process dump operation:

Static int xc_crash_exec_dumper(void *arg) {... #define XCC_UTIL_XCRASH_DUMPER_FILENAME "libxcrash_dumper.so" execl(xc_crash_dumper_pathname, XCC_UTIL_XCRASH_DUMPER_FILENAME, NULL); return 100 + errno; }Copy the code

This part is to do various data dump. Find the main method simply:

xcd_core.c

int main(int argc, char** argv) { ... //read args from stdin if(0 ! = xcd_core_read_args()) exit(1); //open log file if(0 > (xcd_core_log_fd = XCC_UTIL_TEMP_FAILURE_RETRY(open(xcd_core_log_pathname, O_WRONLY | O_CLOEXEC)))) exit(2); //register signal handler for catching self-crashing xcc_unwind_init(xcd_core_spot.api_level); xcc_signal_crash_register(xcd_core_signal_handler); //create process object if(0 ! = xcd_process_create())) exit(3); //suspend all threads in the process xcd_process_suspend_threads(xcd_core_proc); //load process info if(0 ! = xcd_process_load_info(xcd_core_proc)) exit(4); //record system info if(0 ! = xcd_sys_record(...) ) exit(5); //record process info if(0 ! = xcd_process_record(...) ) exit(6); //resume all threads in the process xcd_process_resume_threads(xcd_core_proc); . }Copy the code

The process suspends all threads of the crash process, then collects relevant logs, and finally resumes all threads.

The xc_trace_init part is not analyzed, the same as the xc_jNI_init analysis method. Here is a brief analysis of the big context.

Summary of processing steps of Native crash:

  • Register signal handlers.
  • Create child processes to collect information when a crash occurs (to avoid system restrictions on calling functions in a crashed process).
  • Suspend suspends logCAT output from all threads in the process to collect logCAT.
  • Collect information such as backtrace.
  • Collect memory data.
  • Resume the thread when finished.

Capture ANR

This is also initialized in Xcrash init

Crash.java

public static synchronized int init(Context ctx, InitParameters params) {
//init ANR handler (API level < 21)
if (params.enableAnrHandler && Build.VERSION.SDK_INT < 21) {
    AnrHandler.getInstance().initialize(...);
  }
}
Copy the code

There is a restriction that only SDK <21 versions can be retrieved.

AnrHandler.java

void initialize(Context ctx, int pid, String processName, String appId, String appVersion, String logDir, boolean checkProcessState, int logcatSystemLines, int logcatEventsLines, int logcatMainLines, boolean dumpFds, boolean dumpNetworkInfo, ICrashCallback callback) { //check API level if (Build.VERSION.SDK_INT >= 21) { return; }... //FileObserver is used to monitor file systems, /data/anr/trace. TXT fileObserver = new fileObserver ("/data/anr/", CLOSE_WRITE) {public void onEvent(int event, String path) { try { if (path ! = null) { String filepath = "/data/anr/" + path; If (filepath. Contains ("trace")) {// listen for a callback, handleAnr handleAnr(filepath); } } } catch (Exception e) { XCrash.getLogger().e(Util.TAG, "AnrHandler fileObserver onEvent failed", e); }}}; Try {/ / start listening fileObserver startWatching (); } catch (Exception e) { fileObserver = null; XCrash.getLogger().e(Util.TAG, "AnrHandler fileObserver startWatching failed", e); }}Copy the code

The higher version system does not have the permission to read /data/anr/. Therefore, the FileObserver listening scheme can only support the <21 version. Xcrash cannot obtain ANR logs for the >21 version.

Then take a look at what handleAnr has collected:

private void handleAnr(String filepath) { Date anrTime = new Date(); //check ANR time interval if (anrTime.getTime() - lastTime < anrTimeoutMs) { return; } //check process error state if (this.checkProcessState) { if (! Util.checkProcessAnrState(this.ctx, anrTimeoutMs)) { return; } } //create log file logFile = FileManager.getInstance().createLogFile(logPath); //write info to log file //write emergency info raf.write(emergency.getBytes("UTF-8")); //write logcat raf.write(Util.getLogcat(logcatMainLines, logcatSystemLines, logcatEventsLines).getBytes("UTF-8")); //write fds raf.write(Util.getFds().getBytes("UTF-8")); //write network info raf.write(Util.getNetworkInfo().getBytes("UTF-8")); //write memory info raf.write(Util.getMemoryInfo().getBytes("UTF-8")); //callback if (callback ! = null) { try { callback.onCrash(logFile == null ? null : logFile.getAbsolutePath(), emergency); } catch (Exception ignored) { } } }Copy the code

The focus here is on checkProcessAnrState, an API exposed by AMS that filters crash and ANR processes from AMS mLruProcesses and returns the corresponding error information. Cause reason, ANR in.

static boolean checkProcessAnrState(Context ctx, long timeoutMs) { ActivityManager am = (ActivityManager) ctx.getSystemService(Context.ACTIVITY_SERVICE); if (am == null) return false; int pid = android.os.Process.myPid(); long poll = timeoutMs / 500; for (int i = 0; i < poll; i++) { List<ActivityManager.ProcessErrorStateInfo> processErrorList = am.getProcessesInErrorState(); if (processErrorList ! = null) { for (ActivityManager.ProcessErrorStateInfo errorStateInfo : processErrorList) { if (errorStateInfo.pid == pid && errorStateInfo.condition == ActivityManager.ProcessErrorStateInfo.NOT_RESPONDING) { return true; } } } try { Thread.sleep(500); } catch (Exception ignored) { } } return false; }Copy the code

So how does the >21 version anR crawl? //init native crash handler / ANR handler (API level >= 21) int r = Errno.OK; if (params.enableNativeCrashHandler || (params.enableAnrHandler && Build.VERSION.SDK_INT >= 21)) { r = NativeHandler.getInstance().initialize(…) ; } is captured through the nativeHandler. That’s what I mentioned earlier

R_trace = xc_trace_init(...) ;Copy the code

It is the native registration SIGNAL_QUIT signal that receives a callback to collect ANR information when ANR occurs.

int xc_trace_init(...) { int r; pthread_t thd; //capture SIGQUIT only for ART if(xc_common_api_level < 21) return 0; . //init for JNI callback xc_trace_init_callback(env); //create event FD if(0 > (xc_trace_notifier = eventfd(0, EFD_CLOEXEC))) return XCC_ERRNO_SYS; //register signal handler if(0 ! = (r = xcc_signal_trace_register(xc_trace_handler))) goto err2; //create thread for dump trace if(0 ! = (r = pthread_create(&thd, NULL, xc_trace_dumper, NULL))) goto err1; . return r; }Copy the code

Here xc_trace_notifier is an EventFD that is written when the handler receives the signal callback

static void xc_trace_handler(int sig, siginfo_t *si, void *uc) { uint64_t data; (void)sig; (void)si; (void)uc; if(xc_trace_notifier >= 0) { data = 1; XCC_UTIL_TEMP_FAILURE_RETRY(write(xc_trace_notifier, &data, sizeof(data))); }}Copy the code

The xc_trace_dumper thread will then unblock and begin the dump task.

static void *xc_trace_dumper(void *arg) { JNIEnv *env = NULL; uint64_t data; uint64_t trace_time; int fd; struct timeval tv; char pathname[1024]; jstring j_pathname; (void)arg; pthread_detach(pthread_self()); JavaVMAttachArgs attach_args = { .version = XC_JNI_VERSION, .name = "xcrash_trace_dp", .group = NULL }; if(JNI_OK ! = (*xc_common_vm)->AttachCurrentThread(xc_common_vm, &env, &attach_args)) goto exit; while(1) { //block here, waiting for sigquit XCC_UTIL_TEMP_FAILURE_RETRY(read(xc_trace_notifier, &data, sizeof(data))); //check if process already crashed if(xc_common_native_crashed || xc_common_java_crashed) break; //trace time if(0 ! = gettimeofday(&tv, NULL)) break; trace_time = (uint64_t)(tv.tv_sec) * 1000 * 1000 + (uint64_t)tv.tv_usec; //Keep only one current trace. if(0 ! = xc_trace_logs_clean()) continue; //create and open log file if((fd = xc_common_open_trace_log(pathname, sizeof(pathname), trace_time)) < 0) continue; //write header info if(0 ! = xc_trace_write_header(fd, trace_time)) goto end; //write trace info from ART runtime if(0 ! = xcc_util_write_format(fd, XCC_UTIL_THREAD_SEP"Cmd line: %s\n", xc_common_process_name)) goto end; if(0 ! = xcc_util_write_str(fd, "Mode: ART DumpForSigQuit\n")) goto end; if(0 ! = xc_trace_load_symbols()) { if(0 ! = xcc_util_write_str(fd, "Failed to load symbols.\n")) goto end; goto skip; } if(0 ! = xc_trace_check_address_valid()) { if(0 ! = xcc_util_write_str(fd, "Failed to check runtime address.\n")) goto end; goto skip; } if(dup2(fd, STDERR_FILENO) < 0) { if(0 ! = xcc_util_write_str(fd, "Failed to duplicate FD.\n")) goto end; goto skip; } xc_trace_dump_status = XC_TRACE_DUMP_ON_GOING; if(sigsetjmp(jmpenv, 1) == 0) { if(xc_trace_is_lollipop) xc_trace_libart_dbg_suspend(); xc_trace_libart_runtime_dump(*xc_trace_libart_runtime_instance, xc_trace_libcpp_cerr); if(xc_trace_is_lollipop) xc_trace_libart_dbg_resume(); } else { fflush(NULL); XCD_LOG_WARN("longjmp to skip dumping trace\n"); } dup2(xc_common_fd_null, STDERR_FILENO); skip: if(0 ! = xcc_util_write_str(fd, "\n"XCC_UTIL_THREAD_END"\n")) goto end; //write other info if(0 ! = xcc_util_record_logcat(fd, xc_common_process_id, xc_common_api_level, xc_trace_logcat_system_lines, xc_trace_logcat_events_lines, xc_trace_logcat_main_lines)) goto end; if(xc_trace_dump_fds) if(0 ! = xcc_util_record_fds(fd, xc_common_process_id)) goto end; if(xc_trace_dump_network_info) if(0 ! = xcc_util_record_network_info(fd, xc_common_process_id, xc_common_api_level)) goto end; if(0 ! = xcc_meminfo_record(fd, xc_common_process_id)) goto end; end: //close log file xc_common_close_trace_log(fd); //rethrow SIGQUIT to ART Signal Catcher if(xc_trace_rethrow && (XC_TRACE_DUMP_ART_CRASH ! = xc_trace_dump_status)) xc_trace_send_sigquit(); xc_trace_dump_status = XC_TRACE_DUMP_END; //JNI callback //Do we need to implement an emergency buffer for disk exhausted? if(NULL == xc_trace_cb_method) continue; if(NULL == (j_pathname = (*env)->NewStringUTF(env, pathname))) continue; (*env)->CallStaticVoidMethod(env, xc_common_cb_class, xc_trace_cb_method, j_pathname, NULL); XC_JNI_IGNORE_PENDING_EXCEPTION(); (*env)->DeleteLocalRef(env, j_pathname); } (*xc_common_vm)->DetachCurrentThread(xc_common_vm); exit: xc_trace_notifier = -1; close(xc_trace_notifier); return NULL; }Copy the code