The paper

In the previous series of articles, the design principle of ANR of Android system and some schemes obtained by our monitoring of ANR in practical work were introduced. Based on these conventional monitoring governance, THE ANR problem was effectively suppressed. However, some system components are designed to deviate from the actual use of developers, resulting in problems that need to be solved. The current article aims at the ANR problem caused by the abuse of SP in the actual development process, and how to avoid the ANR problem by skipping The Design defect of Google from the system level.

At the beginning of the design, Google implemented a set of lightweight data persistence scheme — SharedPreference (SP) for the convenience of developers. Because of its simple API and convenient way of use, developers have become more and more dependent on it. There is a reason why Google says lightweight data stores are found during iterations of application versions, and the more important the application, the worse the ANR problem. This paper analyzes the loading, parsing and writing process of SP files from the source level, and analyzes the causes of ANR problems and related optimization solutions.

SP causes ANR analysis

Two types of SharedPreference problems are often encountered. The reasons and optimization schemes for these two types of ANR problems are introduced as follows.

Problem 1: after sp is created, a separate thread will be used to load and parse the corresponding SP file. However, when the UI thread attempts to access the sp file, if the SP file has not been fully loaded and parsed into memory, the UI thread will be blocked until the SP file is fully loaded into memory. The ANR thread stack is as follows:

The main cause is that the SP file is not loaded or parsed into the memory, so the interface provided by SP cannot be used directly. When sp is created, a thread is started to load the corresponding SP file, and startLoadFromDisk() is executed.

In startLoadFromDisk(), sp is marked as unavailable, and the read-write thread will block directly until the SP file is fully loaded and parsed.

When a thread is reading or writing, it will go to awaitLoadedLocked() logic. MLoaded is false in the figure above, meaning sp file has not been parsed into memory, at which point it will be locked in mLock until loadFromDisk() completes.

The SP file is fully loaded and parsed into memory, directly evoking all the read-write threads waiting on the current SP.

Problem two: In order to ensure the cross-process integrity of data in Google system, the early application can use SP to do cross-process communication. In the process of component destruction or other life cycle, in order to ensure that the current writing task must be completed within the life cycle of the current component, At this point, the main thread will wait for SP to be completely written to the corresponding file within the life cycle of component destruction or component suspension. As shown in the figure below, the UI thread is blocked at Queuedwork.waittoFinish (). Then, based on the source code, the whole process from apply to the last file writing is sorted out to find the root cause of the problem.

The specific message types of AcitivtyThread H are as follows:

public static final int SERVICE_ARGS = 115;
public static final int STOP_SERVICE = 116;
public static final int PAUSE_ACTIVITY = 101;
public static final int STOP_ACTIVITY_SHOW = 103;
public static final int SLEEPING  = 137;
Copy the code

Since Google’s official design is a lightweight data storage method at the beginning, there is no problem with this waiting behavior. However, due to the excessive use of SP in the actual use process, the waiting time is uncontrollably prolonged until ANR finally appears. This problem is more obvious in the application with heavy business. The problem stack is as follows, all system stack, and then starts with waitToFinish to uncover the root cause of this ANR. The specific ANR stack is as follows:

In the early stage, SP interface only had the COMMIT interface, which was used to write files synchronously. This interface directly affected the use of developers, so Google officially provided the asynchronous Apply interface externally. Since developers believed that this asynchronous interface was asynchronous in the true sense, sp appy interface was used on a large scale. It is the implementation of Apply that causes the APPLICATION with large business volume to be deeply affected by the ANR problem caused by the apply design defect.

The overall detailed design idea of apply interface is shown below (based on Android8.0 and below version analysis) :

The overall idea is as follows:

  1. Sp.apply (), write to the memory and get the MemoryCommitResult of the data set that needs to be synchronized to the file:

  1. Encapsulate the MemoryCommitResult as Runnable and throw it into the child thread Queued-work-Looper;

  2. Write the key-value of mapToWriteToDisk in the MemoryCommitResult to the file in the child thread.

  3. File is written to complete, will perform MemoryCommitResult setDiskWriteResult method, key steps writtenToDiskLatch. CountDown () appeared.

  4. Queuedwork.waittofinish ();

  1. What do the main thread, this time from QueuedWork. Add (Runnable finisher), specific Runnable below, this place is what also not stem, direct, etc in the MCR. WrittenToDiskLatch. Await (), Here you should remember the lock that the neutron thread in Step 4 released directly after writing the file

Conclusion: although the overall process analysis of API abnormal complex, a runnable encapsulates the layer upon layer, from this thread to the thread, the child thread execution of the written documents will release the lock, the main thread to perform to a certain place to wait for the child thread, the behavior of the written to the file has been completed but the overall idea is quite simple. The root cause of this problem is that too many pending apply actions are not written to the file, and the main thread will have waiting behavior when executing the specified message. If the waiting time is too long, ANR will appear.

Although Google has officially optimized the sp write logic in Android8.0 and later versions, the expectation is that in step 6 above, the UI thread is not stupid, but helps the child threads to write together, but because the main thread is conservative assistance, it does not solve the problem well.

The solution

Problem one: for the problem of loading is very slow, the general use of more is to use the way of preloading to trigger the sp file loading and parsing, so in the real use of the large probability sp has been loaded and parsing finished; What we really need to deal with is that the SP of the core scene must not be too large. According to the official statement of Google, it is necessary to comply with the lightweight data persistent storage mode. Do not store too much data to avoid the file being too large, which leads to the time-consuming loading and parsing in the early stage.

Question 2: As for why Google wants to design this way, it puts forward its own several guesses:

  1. Google wants data to be written to files as quickly as possible, but waiting doesn’t make sense. Waiting on the main thread doesn’t make writing more efficient.

  2. Sp is expected to write files in real time to facilitate cross-process access to the files in SP in real time, this asynchronous writing mode itself can not ensure real-time;

  3. At this time, if there are no components in the process, the priority of the process may be reduced, and the existing process will be killed by the system when the system resources are tight. This probability is extremely low and can be ignored.

  4. The most likely reason is that Google officials wanted to seamlessly switch from commit to apply, still simulating the original commit behavior, but changed the original write file once to multiple commit behavior. Finally, apply waited for all writes to be written at once in the main thread.

Based on the above assumptions, it is found that there is no significance for the main thread to wait for the sub-thread to write. Therefore, we hope to skip this useless waiting behavior by some necessary means and find the following starting point after studying all the logic related to SharedPreference. The following are the optimization policies for versions 8.0 and later. The processing methods are similar:

If you need the main thread in waitToFinish directly jump in the past, let toTinish. The run () does not perform, obviously not possible, if can let sPendingWorkFinishers poll () returns null, then the waiting behavior directly jump past, SPendingWorkFinishers is a Collection of ConcurrentLinkedQueue that can be dynamically proxyeddirectly and overwrite the poll method so that it always returns null, where the UI never waits for child threads to finish writing to the file, which has proven to be simple and effective.

To solve the ANR problem of such write waiting, there is another way to replace write globally. All API implementations are replaced by piling, and other storage methods are adopted. This way has high repair costs and risks, but storage methods can be replaced randomly in the later stage, which is more flexible to use.

Scheme gains

Through the verification of multiple byte system products, the scheme is stable and effective, ANR problems caused by the corresponding stack are eliminated, ANR benefits are obvious, and the smoothness of corresponding interface jump and other scenes has been significantly improved.

Looking forward to

Google has added a new Jetpack member, DataStore, which may be used to replace SharedPreferences in the future. DataStore should be the library developers have been waiting for. DataStore is implemented based on Flow. A new data storage scheme. There are many references on the Internet.

Optimization practices for more reference

Toutiao ANR Optimization Practice Series (1)- Design principles and influencing factors

ANR Optimization Practice Series (2)- Monitoring tools and analysis ideas

Toutiao ANR Optimization Practice Series (3)- Case analysis collection

-barrier causes the main thread to fake death

Android Platform architecture team

As byteDance’s Android platform architecture team, we mainly serve Toutiao and serve GIP as well as other products of the company. We continue to optimize and explore user experience, r&d process and architecture in terms of product performance and stability, so as to meet the rapid product iteration and pursue the ultimate user experience.

If you are passionate about technology and want to meet greater challenges and stage, welcome to join us. There are positions available in Beijing and Shenzhen, and you are interested in sending email: [email protected]. Email title: name – GIP-Android platform architecture.


Welcome to Bytedance Technical Team