This paper introduces an effective solution to WebView multi-process crash.

There was a lonely town with mountains high above the Yellow River and white clouds. Qiang Flute why blame willows, spring breeze does not do Yumenguan. – Wang Zhihuan

The problem

On Android 9.0, if multiple processes use WebView, you need to use the official API provided by the child process to set the suffix for the WebView data folder:

WebView.setDataDirectorySuffix(suffix);
Copy the code

Otherwise the following error will be reported:

Using WebView from more than one process at once with the same data directory is not supported. https://crbug.com/558377  1 com.android.webview.chromium.WebViewChromiumAwInit.startChromiumLocked(WebViewChromiumAwInit.java:63) 2 com.android.webview.chromium.WebViewChromiumAwInitForP.startChromiumLocked(WebViewChromiumAwInitForP.java:3) 3 com.android.webview.chromium.WebViewChromiumAwInit$3.run(WebViewChromiumAwInit.java:3) 4 android.os.Handler.handleCallback(Handler.java:873) 5 android.os.Handler.dispatchMessage(Handler.java:99) 6 android.os.Looper.loop(Looper.java:220) 7 android.app.ActivityThread.main(ActivityThread.java:7437) 8 java.lang.reflect.Method.invoke(Native Method) 9 com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:500) 10 com.android.internal.os.ZygoteInit.main(ZygoteInit.java:865)Copy the code

After using the method provided by the official, the problem was only partially reduced. Bugly still received a large amount of crash information about this problem from the background, so that it even ranked Top3 of crash problems.

Problem analysis

The lock method in the AwDataDirLock class is finally called from the source analysis call chain.

public class WebViewChromiumAwInit { protected void startChromiumLocked() { ... AwBrowserProcess.start(); . } } public final class AwBrowserProcess { public static void start() { ... AwDataDirLock.lock(appContext); }Copy the code

AwDataDirLock.java

abstract class AwDataDirLock {
    private static final String TAG = "AwDataDirLock";
    private static final String EXCLUSIVE_LOCK_FILE = "webview_data.lock";
    // This results in a maximum wait time of 1.5s
    private static final int LOCK_RETRIES = 16;
    private static final int LOCK_SLEEP_MS = 100;
    private static RandomAccessFile sLockFile;
    private static FileLock sExclusiveFileLock;

    static void lock(final Context appContext) {
        try (ScopedSysTraceEvent e1 = ScopedSysTraceEvent.scoped("AwDataDirLock.lock");
             StrictModeContext ignored = StrictModeContext.allowDiskWrites()) {
            if (sExclusiveFileLock != null) {
                // We have already called lock() and successfully acquired the lock in this process.
                // This shouldn't happen, but is likely to be the result of an app catching an
                // exception thrown during initialization and discarding it, causing us to later
                // attempt to initialize WebView again. There's no real advantage to failing the
                // locking code when this happens; we may as well count this as the lock being
                // acquired and let init continue (though the app may experience other problems
                // later).
                return;
            }
            // If we already called lock() but didn't succeed in getting the lock, it's possible the
            // app caught the exception and tried again later. As above, there's no real advantage
            // to failing here, so only open the lock file if we didn't already open it before.
            if (sLockFile == null) {
                String dataPath = PathUtils.getDataDirectory();
                File lockFile = new File(dataPath, EXCLUSIVE_LOCK_FILE);
                try {
            // Note that the file is kept open intentionally.
                    sLockFile = new RandomAccessFile(lockFile, "rw");
                } catch (IOException e) {
                // Failing to create the lock file is always fatal; even if multiple processes
                // are using the same data directory we should always be able to access the file
                // itself.
                    throw new RuntimeException("Failed to create lock file " + lockFile, e);
                }
            }
            // Android versions before 11 have edge cases where a new instance of an app process can
            // be started while an existing one is still in the process of being killed. This can
            // still happen on Android 11+ because the platform has a timeout for waiting, but it's
            // much less likely. Retry the lock a few times to give the old process time to fully go
            // away.
            for (int attempts = 1; attempts <= LOCK_RETRIES; ++attempts) {
                try {
                    sExclusiveFileLock = sLockFile.getChannel().tryLock();
                } catch (IOException e) {
                // Older versions of Android incorrectly throw IOException when the flock()
                // call fails with EAGAIN, instead of returning null. Just ignore it.
                }
                if (sExclusiveFileLock != null) {
                    // We got the lock; write out info for debugging.
                    writeCurrentProcessInfo(sLockFile);
                    return;
                }
                // If we're not out of retries, sleep and try again.
                if (attempts == LOCK_RETRIES) break;
                try {
                    Thread.sleep(LOCK_SLEEP_MS);
                } catch (InterruptedException e) {
                }
            }
            // We failed to get the lock even after retrying.
            // Many existing apps rely on this even though it's known to be unsafe.
            // Make it fatal when on P for apps that target P or higher
            String error = getLockFailureReason(sLockFile);
            boolean dieOnFailure = Build.VERSION.SDK_INT >= Build.VERSION_CODES.P
                    && appContext.getApplicationInfo().targetSdkVersion >= Build.VERSION_CODES.P;
            if (dieOnFailure) {
                throw new RuntimeException(error);
            } else {
                Log.w(TAG, error);
            }
        }
    }

    private static void writeCurrentProcessInfo(final RandomAccessFile file) {
        try {
            // Truncate the file first to get rid of old data.
            file.setLength(0);
            file.writeInt(Process.myPid());
            file.writeUTF(ContextUtils.getProcessName());
        } catch (IOException e) {
            // Don't crash just because something failed here, as it's only for debugging.
            Log.w(TAG, "Failed to write info to lock file", e);
        }
    }

    private static String getLockFailureReason(final RandomAccessFile file) {
        final StringBuilder error = new StringBuilder("Using WebView from more than one process at "
                + "once with the same data directory is not supported. https://crbug.com/558377 "
                + ": Current process ");
        error.append(ContextUtils.getProcessName());
        error.append(" (pid ").append(Process.myPid()).append("), lock owner ");
        try {
            int pid = file.readInt();
            String processName = file.readUTF();
            error.append(processName).append(" (pid ").append(pid).append(")");
            // Check the status of the pid holding the lock by sending it a null signal.
            // This doesn't actually send a signal, just runs the kernel access checks.
            try {
                Os.kill(pid, 0);
                // No exception means the process exists and has the same uid as us, so is
                // probably an instance of the same app. Leave the message alone.
            } catch (ErrnoException e) {
                if (e.errno == OsConstants.ESRCH) {
                    // pid did not exist - the lock should have been released by the kernel,
                    // so this process info is probably wrong.
                    error.append(" doesn't exist!");
                } else if (e.errno == OsConstants.EPERM) {
                    // pid existed but didn't have the same uid as us.
                    // Most likely the pid has just been recycled for a new process
                    error.append(" pid has been reused!");
                } else {
                    // EINVAL is the only other documented return value for kill(2) and should never
                    // happen for signal 0, so just complain generally.
                    error.append(" status unknown!");
                }
            }
        } catch (IOException e) {
            // We'll get IOException if we failed to read the pid and process name; e.g. if the
            // lockfile is from an old version of WebView or an IO error occurred somewhere.
            error.append(" unknown");
        }
        return error.toString();
    }
}
Copy the code

The lock method attempts to lock the webview_data.lock file in the webView data directory 16 times in the for loop. The comment explains why: The extreme case is that a new process starts while an old one is being killed, which seems to be a headache for Google engineers; If the lock succeeds, the process ID and process name are written to the file. If the lock fails, an exception is thrown. Therefore, the principle of detecting whether multiple processes share the WebView data directory in android9.0 or above is that the process holds the lock of the webview_data.lock file in the WebView data directory. So if the child process tries to lock the same file, the application will crash.

The solution

Currently, most phones automatically restart apps when they crash, presumably when the phone’s operating system is slow and a new process starts while an old one is being killed, as noted in the comment. Failed to get the file lock since collapse occurs, and this file is only used for locking to judge if there are multiple processes sharing the WebView data directory, writes each every lock success to process information, so we can try to add to this file in application startup time lock, lock failure will delete the file and recreate the success of the lock lock is released immediately, In this way, when the system tries to lock, it can be locked successfully in theory, and this problem can be avoided.

private static void handleWebviewDir(Context context) { if (Build.VERSION.SDK_INT < Build.VERSION_CODES.P) { return; } try { String suffix = ""; String processName = getProcessName(context); if (! Suffix = textutils. equals(context.getPackagename (), processName)) {suffix = textutils.isEmpty (processName)? context.getPackageName() : processName; WebView.setDataDirectorySuffix(suffix); suffix = "_" + suffix; } tryLockOrRecreateFile(context,suffix); } catch (Exception e) { e.printStackTrace(); } } @TargetApi(Build.VERSION_CODES.P) private static void tryLockOrRecreateFile(Context context,String suffix) { String sb = context.getDataDir().getAbsolutePath() + "/app_webview"+suffix+"/webview_data.lock"; File file = new File(sb); if (file.exists()) { try { FileLock tryLock = new RandomAccessFile(file, "rw").getChannel().tryLock(); if (tryLock ! = null) { tryLock.close(); } else { createFile(file, file.delete()); } } catch (Exception e) { e.printStackTrace(); boolean deleted = false; if (file.exists()) { deleted = file.delete(); } createFile(file, deleted); } } } private static void createFile(File file, boolean deleted){ try { if (deleted && ! file.exists()) { file.createNewFile(); } } catch (Exception e) { e.printStackTrace(); }}Copy the code

The crash times of this problem were reduced by more than 90% after the application went online. Perhaps Google engineers should consider a different technical solution to detect whether an application has a WebView data directory shared by multiple processes.