• Improving performance with background data prefetching
  • Posted by Instagram Engineering
  • The Nuggets translation Project
  • Permanent link to this article: github.com/xitu/gold-m…
  • Translator: A zhe who no longer dances bravely
  • Proofread by: realYukiko, Foxxnuaa

Performance is improved through background data preacquisition technology

The Instagram community is bigger and more diverse than ever. 800 million people visit Instagram every month, and 80 percent of those visits come from outside the United States. As the community continues to expand, the question of whether our apps can continue to perform well in the face of diverse and complex network conditions, a wide variety of devices, and non-traditional usage patterns is becoming more and more important. The Client performance optimization team from New York City is working to make Instagram smoother and more powerful for any user from any region.

Specifically, our team focused on instant content distribution without wasting any network or hard drive resources. We recently decided to focus on developing efficient back-end pre-access technology that allows Instagram to work even when the network is unavailable or users have limited data plans.

Problems encountered

Network availability

Internet quality in most parts of the world is not good. Michael Midling, our data scientist, created the map above to show the average bandwidth used by Instagram in different parts of the world. Dark green areas, like Canada, have about 4+Mbps bandwidth. By comparison, light green areas like India have an average bandwidth of just 1Mbps.

When a user opens Instagram and starts viewing shared content or swipes through a feed, we can’t assume that those media resources are available. If you want to develop a smooth media app in India, you need to use a different data loading strategy instead of loading data in real time due to their underdeveloped network and the latency can be 2 + seconds. If we want everyone to have unimpeded access to Instagram to view videos and pictures from lists of close friends and interests, then we have to be able to cope with different Internet bandwidth rates. Creating applications that can adapt to all of these network situations is a challenge.

Sensitivity of Mobile Network

One of our solutions is to write the user’s network type to our logging system. In this way, we can observe the usage of different network types to help us adapt. We strive to accommodate everyone’s data plans and maximize data transfer over free Internet connections.

The graph above shows the network through which users around the world access our application. In Indonesia, for example, when people are about to run out of data plans, they switch SIM cards and use cellular networks primarily for access. But in Brazil, most of the time people access our app through wifi.

Network connection failure

What happens if all Internet connections fail? Previously, we would display images and videos that we didn’t get as gray squares, hoping users would come back and try again when the network was better. But it’s not a good experience.

Scattered connections and cellular congestion are both concerns. When the user is in the light green area with low bandwidth in the map above, we need to figure out a way to reduce or eliminate the user’s load wait time.

Our goal is to make users unaware of network disconnection, but there is no universal solution for this. In order to meet the offline experience in different network conditions and scenarios, we propose the following solutions.

The solution

We propose a series of solutions. First, we will focus on building an offline mode of user experience. From there, we implemented the technology of getting data from disk for content distribution, making the data as if it were from the network. Secondly, using this cache architecture, we build a centralized background data prefetch framework, which fills the cache with prefetch unpresented data.

Principles of offline experience

Through data analysis and user research, we came up with some improvement principles that represent the major pain points and regions:

  1. Offline is a state, not an error
  2. Offline experiences should be non-perceptive
  3. Build trust through clear communication

You can see how this principle is illustrated in the video below:

Whether an application is available is network independent

Using a cache of stored request data and images and videos, we can still render content to the user’s screen when the network is unavailable, mimicking a successful network request.

We have three main components: device screens, the device network layer that makes up HttpRequests, and the device network engine that sends network requests to the server side.

With the ability to pull data off disk, we’ve seen the user experience on Instagram improve in a rapidly growing market. We decided to cache the network request data locally because we thought it would be a better experience for the user to see the previous content rather than the white screen and gray squares. However, the ideal scenario is still to present the latest content. This is where the background data prefetch technology comes in.

architecture

At Instagram, one of the engineering slogans is “Start with the simplest,” so the first step isn’t to build a perfect background data prefetch framework. Instead, it prefetches data while the app is running in the background and only when the phone is connected to wifi. The BackgroundPrefetcher program loops through the list of tasks and executes them in sequence.

The first such prototype can do:

  1. Cyclic prefetch of various content data
  2. Analyze the practical effects of presenting the latest media content cache to users from a user experience perspective
  3. Use this as a benchmark against which to measure the final framework
public void registerJob(Runnable job) {
  mBackgroundJobs.add(job);
}
@Override
public void onAppBackgrounded() {
  if (NetworkUtil.isConnectedWifi(AppContext.getContext())) {
    while(! mBackgroundJobs.isEmpty()){ mSerialExecutor.execute(mBackgroundJobs.poll()); }}}Copy the code

The reality is that apps are complex, and users are diverse! You have to analyze user habits very carefully to determine what type of media to pre-acquire. For example, some users will use a feature more often than others.

Our homepage contains a wide variety of content, from popular sharing to personal sharing. We can also pre-fetch images and videos from feeds, pending messages, searched content, and recent notifications. In our case, we decided to start simple and just pre-fetch what you’re searching for, popular shares and front page feeds.

Building a centralized architecture that can flexibly adapt to different usage scenarios helps maintain efficiency and facilitate expansion.

In addition to automatically prefetching data by scheduling tasks through our framework while the app is running in the background, we also added additional logic at the top of the app. Centralizing the data prefetch logic to a point allows us to set rules on it and verify that it meets certain conditions, such as:

  • Control network connection type -> non-billing
  • Task stop -> We need to be able to stop the ongoing background data prefetch task if conditions change or the app is running in the foreground
  • Merge requests to find the best time to do only one data prefetch between two sessions
  • Metrics Collection -> How long will all tasks take to complete? What is the success rate of scheduling and running background data preacquisition tasks?

workflow

Let’s take a look at how the background data prefetch strategy works on Android:

  • When the main activity starts (that is, the app running in the foreground), we will BackgroundWifiPrefetcherScheduler instantiation, activate the kind of tasks to be run at the same time.
  • This object will be its own registered as a BackgroundDetectorListener. In this context, we have implemented a code structure that sends notifications whenever the app is running in the background so that we can do something (such as send analysis data to the server) before the app process is killed.
  • When BackgroundWifiPrefetcherScheduler received notice, it will call we write their own AndroidJobScheduler to get task schedule of background data. The JobInfo parameter is passed in, which contains information about what services need to be started and what conditions need to be met to start the task.

Our main requirements are latency and a no-charge network connection. For Android, other conditions must also be considered, such as whether the power-saving mode is enabled. We have tested varying degrees of latency and are still working to provide a personalized service experience. At this stage, we only do one background pre-fetch between two sessions. To find the best time to run the task when the app is running in the background, we calculated the average time between sessions (how often do users visit Instagram?). And use standard deviation to remove outliers (for example, the amount of sleep a frequent Instagram user should not count). Our goal is to start data prefetching just before the average time.

  • After this point, the program checks to see if the network connection meets the criteria (no billing /wifi). If meet, BackgroundPrefetcherJobService will launch. If you don’t meet, a kick-off meeting for BackgroundPrefetcherJobService suspended until the condition is met. (And when the phone is not in power saving mode)
  • BackgroundService will create a serialExecutor to run each background task serially. Of course, we prefetch media data asynchronously after receiving the HTTP request response.
  • After all the tasks are complete, we send a notification to the operating system that our process can be killed, thus increasing memory/battery life. In Android, it is important to kill these running services to free up memory resources that are no longer being used.

All of this work is at the user level. The program should be able to handle user logout or identity switching. If the user logs out, we stop scheduled tasks to prevent them from doing unnecessary service startup work.

IgJobScheduler

For Android, we did the following:

  1. Looking for an efficient way to schedule tasks in the background that allows us to persist data in sessions and specify network connection requirements.
  2. Before Lollipop, Android’s API didn’t support the JobScheduler interface, so we analyzed how many Android users were running lower versions of the system. This is a problem we can’t get around…… For these users, we need to develop a compatible version.
  3. Find an existing open source solution that works with lower versions of Android to schedule tasks. Although we found many excellent third-party libraries, none of them were suitable for our scenario because they relied on the Google Play Service. Based on the current situation, we believe that the size of APK is a major factor in Keeping Instagram at the top of the list.
  4. Finally, we created a customizable high-performance compatible solution for the Android JobScheduler APIs.

review

At Instagram, we’re data-driven, and we rigorously evaluate the impact of every system we develop. This is why, while designing the backend data prefetch framework, we also thought about what metrics to collect to get the right feedback.

The fact is that a centralized architecture also facilitates the collection of higher-level metrics. It is important to be able to accurately assess the pros and cons, know how many prefetched bytes of data are wasted, or be able to analyze what is causing global CPU fluctuations.

It is useful to label each network request through the network request policy to identify its behavior and type. It’s already built into the app, but we use it to slice our metrics. We associate the request policy with the outgoing HTTP request and flag whether it is a pre-fetch request. In addition, the request policy labels each network request with a type. The types of requests can be images, videos, apis, analysis data, and so on. This can help us:

  • Set the request priority
  • The global CPU usage curve, data usage, and cache utilization can be used to better analyze and balance the system
/** * Public enum Behavior {Undefined(-1), Undefined(-1), OffScreen(0), OnScreen(1), ; int weight; Behavior(int weight) { this.weight = weight; }}Copy the code

A snippet of the requestPolicy class from the Android source code is shown above. We mark a request as “on screen”, which means that the user is waiting for the data to return from that request. About 1% or more of offScreen’s requests return data that does not interact with the user.

High cache log

We wanted to know how many of the prefetched bytes were actually used, so we investigated the use of data in the cache. We built the entire cache logging system, which satisfies the following points:

  • The system is extensible. Can support new cache instances through API.
  • The system is robust and fault tolerant. Can tolerate cache failures (no logging) or inconsistent data at certain times.
  • The system is reliable. Data can be persisted between sessions.
  • Use minimal disk space and minimal latency when logging. Cache reads/writes occur frequently, so we need to keep their overhead to a minimum. Caching read/write logging can lead to more crashes and higher latency.

We also want to know how much data is used when a new background data prefetch request is added. We have a layered base network engine on mobile, and as mentioned earlier, each network request is attached to a requestPolicy. This makes it easy to track data usage in the app and see how much traffic is consumed by downloading images, videos, JSON data, etc.

At the same time, we also want to analyze and compare the distribution of data usage over wifi and cellular networks. This makes it possible to try different data prefetch modes for different network connections.

Other benefits

What other benefits do background data prefetch technologies have besides eliminating the dependency on network availability and reducing cellular traffic usage? If we reduce the number of requests, then we reduce overall network congestion. By combining future requests, we can save overhead and extend battery life.

Prevent CPU curve fluctuation

Before we developed the backend data preprocessor, we considered the potential for increasing global CPU utilization.

How can CPU utilization go up? Take a look at the following example. Let’s say there’s an interface to get Instagram’s popular feed. Each time user A opens Instagram to get the latest feed from the first page, his device will request the interface once. This interface does CPU intensive operations such as sorting and sorting content based on user-selected criteria. Doing background prefetch every time a user opens Instagram would increase CPU load, right?

In order to minimize CPU usage fluctuations on the server side, Fei Huang, the engineer of the content recommendation team, created a new interface address for us in the first version of the background data prefetch system. This interface only retrieves the first 20 new shares that are not displayed.

conclusion

This is how we worked when we built the system. Our team will not open up the API to other engineers until we can ensure the quality of the framework and that users will benefit from it.

This work will only become more important as more people join Instagram. We look forward to continuously improving the efficiency and performance of Instagram so that people around the world can use Instagram without any problems.

Lola Priego is an engineer who works on Instagram’s Client performance optimization team in the New York area.


The Nuggets Translation Project is a community that translates quality Internet technical articles from English sharing articles on nuggets. The content covers Android, iOS, front-end, back-end, blockchain, products, design, artificial intelligence and other fields. If you want to see more high-quality translation, please continue to pay attention to the Translation plan of Digging Gold, the official Weibo, Zhihu column.