preface

As an Android developer, whether App or System developer, you will often encounter some scenarios for analyzing logs. This article will introduce the analysis ideas when Watdog restarts triggered by Android System through a scenario. The Watchdog is commonly known as the Watchdog. If some Android system services are abnormal, the Watchdog is triggered and the system restarts.

The scene is introduced

QA reported a system restart Bug, is the TV through Hdmi connection box, when entering the Hdmi page, the system crashed, since the system crash is generally the system service is down, the system service is down again will trigger the Watchdog. Check the Android Log to see if WatDog triggers a search for Watchdog, or the GOODBYE keyword.

07-26 06:34:10.100 2112 2135 W Watchdog: *** Watchdog KILLING SYSTEM PROCESS: Blocked in handler on main thread (main) 07-26 06:34:10.105 2112 2135 W Watchdog: main annotated stack trace: 07-26 06:34:10.106 2112 2135 W Watchdog: At com. Android. Server. TV. TvInputHardwareManager. GetHardwareList (TvInputHardwareManager. Java: 250) 07-26 06:34:10. 106, 2112 2135 W Watchdog: -waiting to lock <0x0c25fdf3> (a java.lang.object) 07-26 06:34:10.107 2112 2135 W Watchdog: -waiting to lock < 0x0c25Fdf3 > (a java.lang.object) 07-26 06:34:10.107 2112 2135 W Watchdog: at com.android.server.tv.TvInputManagerService$InputServiceConnection.onServiceConnected(TvInputManagerService.java:2419) 07-26 06:34:10.107 2112 2135 W Watchdog: -locked < 0x0C7F5ab0 > (a java.lang.object) 07-26 06:34:10.108 2112 2135 W Watchdog: . An android app. LoadedApk $ServiceDispatcher. DoConnected (LoadedApk. Java: 1954) 07-26 06:34:10. 108, 2112, 2135 W Watchdog: At android.app.loadedapk $ServiceDispatcher$runconnection.run (loadedapk.java :1986) 07-26 06:34:10.109 2112 2135 W Watchdog: at the android OS. Handler. HandleCallback (Handler. Java: 938) 07-26 06:34:10. 109, 2112, 2135 W Watchdog: . An android OS. Handler. DispatchMessage (Handler. Java: 99) 07-26 06:34:10. 109, 2112, 2135 W Watchdog: At android.os.looper.loop (looper.java :223) 07-26 06:34:10.110 2112 2135 W Watchdog: At com. Android. Server. SystemServer. Run (669) SystemServer. Java: 07-26 06:34:10. 110, 2112, 2135 W Watchdog: At com. Android. Server. SystemServer. Main (413) SystemServer. Java: 07-26 06:34:10. 111, 2112, 2135 W Watchdog: The at Java. Lang. Reflect. Method. Invoke (Native Method) 07-26 06:34:10. 111, 2112, 2135 W Watchdog: At com. Android. Internal. OS. RuntimeInit $MethodAndArgsCaller. Run (592) RuntimeInit. Java: 07-26 06:34:10. 111, 2112, 2135 W Watchdog: ats com. Android. Internal. OS. ZygoteInit. Main (ZygoteInit. Java: 1060) 07-26 06:34:10. 112, 2112, 2135 W Watchdog: *** GOODBYE! 07-26 06:34:10.112 2112 2135 I Process: Sending signal. PID: 2112 SIG: 9Copy the code

Above TvInputHardwareManager. GetHardwareList method is invoked when block, from the next line log learned he was the cause of the block is waiting for a lock.

07-26 06:34:10.106 2112 2135 W Watchdog: At com. Android. Server. TV. TvInputHardwareManager. GetHardwareList (TvInputHardwareManager. Java: 250) 07-26 06:34:10. 106, 2112  2135 W Watchdog: - waiting to lock <0x0c25fdf3> (a java.lang.Object)Copy the code
   public List<TvInputHardwareInfo> getHardwareList(a) {
        synchronized (mLock) {
            returnCollections.unmodifiableList(mHardwareList); }}Copy the code

So who owns the lock? If you have an anR, you need to analyze the trace file. This trace file is in the /data/anr/ directory. We know the following scenarios to trigger ANR, dump trace files:

  • Service Timeout: Indicates that the foreground Service is not executed within 20 seconds
  • BroadcastQueue Timeout: Indicates that the foreground broadcast is not executed within 10 seconds
  • ContentProvider Timeout: specifies the ContentProvider that publishes 10 seconds
  • InputDispatching Timeout: input events are dispatching 5 seconds, including key and touch events

There is another case where trace files are generated, and that is when the Framework Watchdog is triggered. Take a look at the run method of watchdog.java, which triggers an operation to dump the trace when the condition is met.

//frameworks/base/services/core/java/com/android/server/Watchdog.java
@Override
    public void run(a) {...if(! fdLimitTriggered) {final int waitState = evaluateCheckerCompletionLocked();
                    if (waitState == COMPLETED) {
                        // The monitors have returned; reset
                        waitedHalf = false;
                        continue;
                    } else if (waitState == WAITING) {
                        // still waiting but within their configured intervals; back off and recheck
                        continue;
                    } else if (waitState == WAITED_HALF) {
                        if(! waitedHalf) { Slog.i(TAG,"WAITED_HALF");
                            // We've waited half the deadlock-detection interval. Pull a stack
                            // trace and wait another half.
                            ArrayList<Integer> pids = new ArrayList<>(mInterestingJavaPids);
                          / / dump trace files
                            ActivityManagerService.dumpStackTraces(pids, null.null,
                                    getInterestingNativePids(), null);
                            waitedHalf = true;
                        }
                        continue; }...}Copy the code

When a dump trace is triggered, the following logs are displayed:

07-26 06:33:59.485 2112 2135 I ActivityManager: reloading to /data/ ANR/ANR_2021-07-26-06-33-59-483Copy the code

Analyzing trace files

"main" prio=5 tid=1 Blocked | group="main" sCount=1 dsCount=0 flags=1 obj=0x71684c28 self=0xe11c5410 | sysTid=2112 nice=-2 cgrp=default sched=0/0 handle=0xe9762470 | state=S schedstat=( 0 0 0 ) utm=377 stm=96 core=1 HZ=100 | stack=0xff11b000-0xff11d000 stackSize=8192KB | held mutexes= at com.android.server.tv.TvInputHardwareManager.getHardwareList(TvInputHardwareManager.java:250) - waiting to lock <0x0c25fdf3> (a java.lang.Object) held by thread 100 at com.android.server.tv.TvInputManagerService$InputServiceConnection.onServiceConnected(TvInputManagerService.java:2419) -  locked <0x0c7f5ab0> (a java.lang.Object) at android.app.LoadedApk$ServiceDispatcher.doConnected(LoadedApk.java:1954) at  android.app.LoadedApk$ServiceDispatcher$RunConnection.run(LoadedApk.java:1986) at android.os.Handler.handleCallback(Handler.java:938) at android.os.Handler.dispatchMessage(Handler.java:99) at android.os.Looper.loop(Looper.java:223) at com.android.server.SystemServer.run(SystemServer.java:669) at com.android.server.SystemServer.main(SystemServer.java:413) at java.lang.reflect.Method.invoke(Native method) at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:592) at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:1060)Copy the code

This stack we have seen in the Android log, the reason for the block is to wait for the lock 0x0c25fdf3, so let’s search for the lock 0x0c25fdf3.

"Binder:2112_6" prio=5 tid=100 Blocked
  | group="main" sCount=1 dsCount=0 flags=1 obj=0x135c0808 self=0xb9698810
  | sysTid=2491 nice=0 cgrp=default sched=0/0 handle=0xb478b1c0
  | state=S schedstat=( 0 0 0 ) utm=32 stm=10 core=1 HZ=100
  | stack=0xb4690000-0xb4692000 stackSize=1008KB
  | held mutexes=
  at com.android.server.tv.TvInputHardwareManager$TvInputHardwareImpl.release(TvInputHardwareManager.java:847)
  - waiting to lock <0x0234b2e5> (a java.lang.Object) held by thread 11
  at com.android.server.tv.TvInputHardwareManager$Connection.resetLocked(TvInputHardwareManager.java:652)
  at com.android.server.tv.TvInputHardwareManager$Connection.binderDied(TvInputHardwareManager.java:722)
  - locked <0x0c25fdf3> (a java.lang.Object)
  at android.os.IBinder$DeathRecipient.binderDied(IBinder.java:305)
  at android.os.BinderProxy.sendDeathNotice(BinderProxy.java:654)
Copy the code

From 0 x0c25fdf3 found trace above the Connection. The binderDied held this method, and this method is called block reason is that it is the TvInputHardwareImpl. The release method, The TvInputHardwareImpl. Another lock release method was 0 x0234b2e5block lived, then 0 x0234b2e5 who lock is held?

"Binder:2112_2" prio=5 tid=11 Native | group="main" sCount=1 dsCount=0 flags=1 obj=0x131c03c8 self=0xe11c6210 | sysTid=2127 nice=0 cgrp=default sched=0/0 handle=0xbd2401c0 | state=S schedstat=( 0 0 0 ) utm=73 stm=20 core=3 HZ=100 | stack=0xbd145000-0xbd147000 stackSize=1008KB | held mutexes= native: #00 pc 000715e4 /apex/com.android.runtime/lib/bionic/libc.so (__ioctl+8) native: #01 pc 0003f2a3 /apex/com.android.runtime/lib/bionic/libc.so (ioctl+30) native: #02 pc 0006d78f /system/lib/libhidlbase.so (android::hardware::IPCThreadState::transact(int, unsigned int, android::hardware::Parcel const&, android::hardware::Parcel*, unsigned int)+438) native: #03 pc 0006a265 /system/lib/libhidlbase.so (android::hardware::BpHwBinder::transact(unsigned int, android::hardware::Parcel const&, android::hardware::Parcel*, unsigned int, std::__1::function<void (android::hardware::Parcel&)>)+36) native: # 4 PCS 0000 c3f9 / system/lib/[email protected] (android::hardware::tv::input::V1_0::BpHwTvInput::_hidl_closeStream(android::hardware::IInterface*, android::hardware::details::HidlInstrumentor*, int, int)+228) native: # 5 PCS 0000 c7c5 / system/lib/[email protected] (android::hardware::tv::input::V1_0::BpHwTvInput::closeStream(int, int)+36) native: #06 pc 00069801 /system/lib/libandroid_servers.so (android::JTvInputHal::removeStream(int, int)+440) at com.android.server.tv.TvInputHal.nativeRemoveStream(Native method) at com.android.server.tv.TvInputHal.removeStream(TvInputHal.java:111) - locked <0x08f771dc> (a java.lang.Object) at com.android.server.tv.TvInputHardwareManager$TvInputHardwareImpl.setSurface(TvInputHardwareManager.java:872) - locked <0x0234b2e5> (a java.lang.Object) at android.media.tv.ITvInputHardware$Stub.onTransact(ITvInputHardware.java:136) at android.os.Binder.execTransactInternal(Binder.java:1154) at android.os.Binder.execTransact(Binder.java:1123)Copy the code

The 0 x0234b2e5 lock is TvInputHardwareImpl. SetSurface method to hold the, So he is the reason of the block to be: a _hidl_closeStream method in [email protected] got caught on a block.

native: # 4 PCS 0000 c3f9 / system/lib/[email protected] (android::hardware::tv::input::V1_0::BpHwTvInput::_hidl_closeStream(android::hardware::IInterface*, android::hardware::details::HidlInstrumentor*, int, int)+228)Copy the code

I won’t go any further because I’m going to get into company-owned code, but the idea is clear, and combining the stack and source location issues in Trace shouldn’t be a problem.

conclusion

Whether it is a common ANR problem or a restart that triggers the Watchdog, the analysis method is similar, that is, first look at the Android log, and then in the analysis of the trace file, step by step to dig down. WatDog, mentioned in the above introduction, is actually a very important mechanism in the Android Framework, and its implementation will be introduced in the following article.