Reflection | Android list Paging component design and implementation of the Paging: architecture design and principle of parsing

This is the second article in the Android Jetpack Paging series; Readers of Paging are strongly encouraged to read this series of articles with the highest priority for learning Paging. If you do not have a systematic understanding of Paging, please refer to:

Reflection | Android list Paging component design and implementation of the Paging: system overview

preface

Paging is an excellent Paging component. Unlike other popular paging-related libraries, Paging is more focused on serving the business rather than the UI. — We all know that the quality of open source libraries for business types depends heavily on the overall architectural design of the code (such as Retofit and OkHttp); So how do you convince yourself or your colleagues to try Paging? Obviously, good ideas embedded in the source code are more persuasive.

On the other hand, you can benefit from learning from the source code that Google engineers design, develop, and maintain, even if you don’t actually use it on your project.

The chapter is as follows:

Architecture design and principle analysis

1. Dependency injection through builder mode

The creation process is undoubtedly the most important part of architectural design.

As the door to the component, the exposed API is as simple and user-friendly for developers to call as possible, and as API callers we want the framework to be as flexible and configurable as possible.

This may sound counterintuitive — how do you keep a simple, clean interface design easy for developers to use, while still having enough configurable items to keep the framework flexible?

The classic Builder pattern is used in the API design of Paging, and dependency injection is used to pass dependencies down one layer to build object instances of each layer successively.

For developers, you only need to configure the parameters that you care about. You don’t care (or even know) about the parameters configuration, and leave it to the Builder class to use the default parameters:

// You can configure it this way
val pagedListLiveData =
    LivePagedListBuilder(
            dataSourceFactory,
            PagedList.Config.Builder()
                    .setPageSize(PAGE_SIZE)                         // Number of page loads
                    .setInitialLoadSizeHint(20)                     // Initialize the number of loads
                    .setPrefetchDistance(10)                        // Preload distance
                    .setEnablePlaceholders(ENABLE_PLACEHOLDERS)     // Whether to enable placeholders
                    .build()
    ).build()

// It can be configured as simple as this
val pagedListLiveData =
    LivePagedListBuilder(dataSourceFactory, PAGE_SIZE).build()
Copy the code

It’s important to note whether building a paging-related configuration object and building an observer object are two different responsibilities. It is obviously necessary because:

LiveData = DataSource + pagedlist. Config

Therefore, the Paging configuration here uses two Builder classes. Even if the designer decides to use the Builder mode, the designer needs to have a clear understanding of the definition of the Builder class. This is an excellent demonstration of the principle of single responsibility in the design process.

Finally, all configurations in the Builder instantiate PagedList via dependency injection:

// PagedList.Builder.build()
public PagedList<Value> build(a) {
    return PagedList.create(
            mDataSource,
            mNotifyExecutor,
            mFetchExecutor,
            mBoundaryCallback,
            mConfig,
            mInitialKey);
}

// PagedList.create()
static <K, T> PagedList<T> create(@NonNull DataSource
       
         dataSource, @NonNull Executor notifyExecutor, @NonNull Executor fetchExecutor, @Nullable BoundaryCallback
        
          boundaryCallback, @NonNull Config config, @Nullable K key)
        ,> {
    // We only use the ContiguousPagedList as an example
    // As you can see, all pagedLists instantiate the constructor's dependency injection
    return new ContiguousPagedList<>(contigDataSource,
          notifyExecutor,
          fetchExecutor,
          boundaryCallback,
          config,
          key,
          lastLoad);
}
Copy the code

Dependency injection is a very simple and unsophisticated coding technique. In Paging, there are almost no singletons and few static members — all of the configuration items of an object that are injected through it are final except for its own state:

// PagedList.java
public abstract class PagedList<T> {
  final Executor mMainThreadExecutor;
  final Executor mBackgroundThreadExecutor;
  final BoundaryCallback<T> mBoundaryCallback;
  final Config mConfig;
  final PagedStorage<T> mStorage;
}

// ItemKeyedDataSource.LoadInitialParams.java
public static class LoadInitialParams<Key> {
  public final Key requestedInitialKey;
  public final int requestedLoadSize;
  public final boolean placeholdersEnabled;
}
Copy the code

In fact, there are a few exceptions to the thread switch design, but it is still possible to override the default thread fetching logic with dependency injection through the Builder.

Dependency injection ensures that the dependencies required by the instance of the object can be followed, the dependencies between classes are very clear, and the immutable internal members of the instantiated object also greatly ensures the thread safety of PagedList page data.

2. Build lazy loaded LiveData

For the observed, updates to their data are only meaningful if they are actually subscribed. In other words, when a developer builds a LiveData , it doesn’t make sense to immediately start requesting paging data asynchronously through a background thread.

On the other hand, if you request data without subscribing, when you subscribe, the data in the DataSource is already out of date, and you need to request the latest data again, so that the previous sequence of actions is meaningless.

The actual request should be executed when liveData.observe () is subscribed. I prefer to call it a “lazy load” here — if you’re familiar with RxJava, this is similar to the concept of observable.defer () :

So how do you build a “lazily loaded” LiveData ? Google’s designers use the ComputableLiveData class to wrap the data emission behavior of LiveData:

// @hide
public abstract class ComputableLiveData<T> {}
Copy the code

This is a hidden class, developers generally can not use it directly, but it is used in many places, Room component generated source code can often see it.

Describe the definition of ComputableLiveData in one sentence, I think the data source of LiveData is more suitable, interested readers can carefully study its source code, I have an opportunity to open a separate article for it, here do not continue to expand.

In summary, with the ComputableLiveData class, Paging realizes the function of performing asynchronous tasks only upon subscription, thus reducing the useless work to a greater extent.

3. Assign a life cycle to paging data

PagedList data should have its own life cycle.

During the normal life of the PagedList, the PagedList constantly tries to load the PagedList data from the DataSource and display it. But the data in the data source always expires, which means the PagedList life cycle has come to an end.

Paging requires the Paging Adapter to responsively create a new DataSource data snapshot and a new PagedList, and then give the PagedListAdapter to update on the UI.

To do this, add a mDetached field to the PagedList class:

public abstract class PagedList<T> extends AbstractList<T> {
  / /...
  private final AtomicBoolean mDetached = new AtomicBoolean(false);

  public boolean isDetached(a) {
      return mDetached.get();
  }

  public void detach(a) {
    mDetached.set(true); }}Copy the code

This AtomicBoolean field is meaningful: In this case, if mDetached.get() is true, the PagedList task will no longer be performed:

class ContiguousPagedList<K.V> extends PagedList<V> {

  / /...
  public void onPagePlaceholderInserted(final int pageIndex) {
         mBackgroundThreadExecutor.execute(new Runnable() {
             @Override
             public void run(a) {
                 // No longer load paging data asynchronously
                 if (isDetached()) {
                     return;
                 }

                 MDetached. Set (true)
                 if (mDataSource.isInvalid()) {
                    detach();
                 } else {
                 / /... Load the next page of data}}}); }}Copy the code

The PagedList life cycle is dependent on the isInvalid() function of the DataSource, which indicates whether the current DataSource isInvalid:

public abstract class DataSource<Key.Value> {
  private AtomicBoolean mInvalid = new AtomicBoolean(false);
  private CopyOnWriteArrayList<InvalidatedCallback> mOnInvalidatedCallbacks =
          new CopyOnWriteArrayList<>();

  // Notify data source invalid
  public void invalidate(a) {
      if (mInvalid.compareAndSet(false.true)) {
          for (InvalidatedCallback callback : mOnInvalidatedCallbacks) {
              // Data source invalid callback function, notifying the upper layer to create a new PagedListcallback.onInvalidated(); }}}// Whether the data source is invalid
  public boolean isInvalid(a) {
      returnmInvalid.get(); }}Copy the code

When the DataSource fails, a callback function is used to notify ComputableLiveData

to create a new PagedList and to notify LiveData observers to update the UI.

Therefore, the PagedList, the DataSource, and the ComputableLiveData

create and distribute the PagedList form a closed loop:

4, provide`Room`Responsive support

We know that Paging natively provides responsive support for the Room component. When database data is updated, Paging can respond to and automatically build a new PagedList and then update it to the UI.

This may seem like a magical operation, but the principle is simple. In the previous section, we saw that when the DataSource invalidate() function is called, it means that the DataSource is invalid, and the DataSource uses the callback function to rebuild the new PagedList.

The Room component also wraps a new DataSource based on this feature:

public abstract class LimitOffsetDataSource<T> extends PositionalDataSource<T> {

  protected LimitOffsetDataSource(...). {
      // 1. Define a "command data source invalid" callback function
      mObserver = new InvalidationTracker.Observer(tables) {
          @Override
          public void onInvalidated(@NonNull Set<String> tables) { invalidate(); }};// 2. Configure an observer for the database InvalidationTrackerdb.getInvalidationTracker().addWeakObserver(mObserver); }}Copy the code

After that, the datasource.invalidate () function is automatically executed whenever data in the database becomes invalid.

Back in the early days of Paging, the Dao class in Room was defined by the developer. What is the return DataSource.Factory object?

@Dao
interface RedditPostDao {
    @Query("SELECT * FROM posts WHERE subreddit = :subreddit ORDER BY indexInResponse ASC")
    fun postsBySubreddit(subreddit : String) : DataSource.Factory<Int, RedditPost>
}
Copy the code

The answer, of course, is the factory class of the LimitOffsetDataSource:

@Override
public DataSource.Factory<Integer, RedditPost> postsBySubreddit(final String subreddit) {
  return new DataSource.Factory<Integer, RedditPost>() {
   // Return LimitOffsetDataSource that can respond to database data invalidity
   @Override
   public LimitOffsetDataSource<RedditPost> create(a) {
     return new LimitOffsetDataSource<RedditPost>(__db, _statement, false , "posts") {
        / /...}}Copy the code

In principle, this code is unimpressive, but the designers have greatly simplified the amount of code developers need through a layer of encapsulation in annotations. For developers, you only need to configure an interface without having to understand the internal code implementation details.

Midfield: More confusion

The last article on the DataSource is a simple introduction, many friends react to the source of this part of the DataSource is too obscure, the choice of the DataSource is also ignorant.

The solution of complex problems depends on the segmentation of problems. This paper subdivides them into the following two small problems and discusses them one by one:

1. Why are there so manyDataSourceAnd their subclasses, what are their usage scenarios?
2. Why are there so manyPagedListAnd its subclasses?

5. Data source continuity and paging loading strategy

Why are there so many DataSource and subdatasource types, and what are their usage scenarios?

In the design of Paging components, the DataSource is a very important module. As the name implies, Key in DataSource

corresponds to the condition for loading data, and Value corresponds to the actual type of data set. For different scenarios, the designer of Paging provides several different types of DataSource implementation classes:
,>

Refer to this section of the previous article for an introduction to these DataSource, which will not be covered in this article.

When reading this part of the source code for the first time, the author was most confused, what is the difference between the ContiguousDataSource and PositionalDataSource?

Readers of the source code may have noticed that the DataSource has this abstract function:

public abstract class DataSource<Key.Value> {
  // Whether the data source is continuous
  abstract boolean isContiguous(a);
}

class ContiguousDataSource<Key.Value> extends DataSource<Key.Value> {
  // ContiguousDataSource is continuous
  boolean isContiguous(a) { return true; }}class PositionalDataSource<T> extends DataSource<Integer.T> {
  // PositionalDataSource is discontinuous
  boolean isContiguous(a) { return false; }}Copy the code

So, what exactly is the concept of continuity of data sources?

For a typical web page load request, where the next page of data always depends on the previous page loading, we often call the data source continuous — this seems to be the reason why ItemKeyedDataSource and PageKeyedDataSource are so widely used.

Interestingly, however, in a business model that uses local caches as paging data sources, the common sense perception that paging data sources should be continuous is broken.

Every phone has an address book, so this article takes the address book APP as an example. For contacts, all data is taken from the local persistence layer. Considering that there may be thousands of address book data in the phone, the list data of the APP itself should also be loaded in pages.

In this case, is the paging data source continuous?

As the reader will know on reflection, the paging data source must not be continuous at this point. Admittedly, continuous paging requests for data is fine for sliding operations, but the continuity of paging data requests is broken when a user clicks the Z letter from the side of the address book page and tries to quickly jump to a user starting with Z:

This is the PositionalDataSource scenario: the data is loaded ata specific location, where the Key is the location information of type Integer, and each page of data does not depend on the previous page of data, but on the data source itself.

Continuity of paging data is a very important concept, and once you understand it, you’ll understand what the subclasses of DataSource mean:

Whether it is PositionalDataSource, ItemKeyedDataSource, or PageKeyedDataSource, these classes are different paging loading strategies. Developers simply need to choose different paging loading strategies based on different business scenarios, such as data continuity.

Paged data model and paged data copy

Why are there so many PagedLists and subclasses?

Like DataSource, PagedList also has an isContiguous() interface:

public abstract class PagedList<T> extends AbstractList<T> {
  abstract boolean isContiguous(a);
}

class ContiguousPagedList<K.V> extends PagedList<V> {
  // hold a ContiguousDataSource inside the ContiguousPagedList
  final ContiguousDataSource<K, V> mDataSource;

  boolean isContiguous(a) { return true; }}class TiledPagedList<T> extends PagedList<T> {
  // TiledPagedList internally holds PositionalDataSource
  final PositionalDataSource<T> mDataSource;

  boolean isContiguous(a) { return false; }}Copy the code

Readers should understand that PagedList internally holds a DataSource, and that pagedloading essentially retrives data asynchronously from the DataSource — different datasorps will have different parameters during the PagedList request. So the behavior inside the PagedList is different; So PagedList exports the class of ContiguousPagedList and TiledPagedList down, which are used for paging request processing in different business situations.

So what class is SnapshotPagedList?

PagedList has an additional snapshot() interface to return a snapshot of the current paginated data:

public abstract class PagedList<T> extends AbstractList<T> {
  public List<T> snapshot(a) {
      return new SnapshotPagedList<>(this); }}Copy the code

The snapshot () function is very important, its used to hold state before a paging data, and data set is used for AsyncPagedListDiffer calculation, the difference of the new PagedList arrives (through PagedListAdapter. SubmitList ()), Instead of directly carrying out data coverage and difference calculation, it first copies the data set in the previous PagedList.

Space because no details, interested readers can read by oneself PagedListAdapter. SubmitList () related to the source code.

SnapshotPagedList:

class SnapshotPagedList<T> extends PagedList<T> {
  SnapshotPagedList(@NonNull PagedList<T> pagedList) {
    // 1. Here we see that none of the other objects change the reference to the address in the heap
    / / in addition to the pagedList. MStorage. The snapshot (), the final executive - > 2
      super(pagedList.mStorage.snapshot(),
              pagedList.mMainThreadExecutor,
              pagedList.mBackgroundThreadExecutor,
              null, pagedList.mConfig); mDataSource = pagedList.getDataSource(); mContiguous = pagedList.isContiguous(); mLastLoad = pagedList.mLastLoad; mLastKey = pagedList.getLastKey(); }}final class PagedStorage<T> extends AbstractList<T> {
  PagedStorage(PagedStorage<T> other) {
      // 2. The current paging data is copied
      mPages = newArrayList<>(other.mPages); }}Copy the code

In addition, mSnapshot is used for state saving. If the developer calls getCurrentList() when the difference calculation is not complete, it will attempt to return mSnapshot, which is a copy of the previous dataset.

Thread switching and “bugs” in Paging Design

Google’s engineers originally designed Paging to allow developers to switch threads without any awareness, so most of the code for switching threads is encapsulated internally:

public class ArchTaskExecutor extends TaskExecutor {
  // The main thread Executor
  private static final Executor sMainThreadExecutor = new Executor() {
      @Override
      public void execute(Runnable command) { getInstance().postToMainThread(command); }};// The IO thread Executor
  private static final Executor sIOThreadExecutor = new Executor() {
      @Override
      public void execute(Runnable command) { getInstance().executeOnDiskIO(command); }}; }Copy the code

The principle of sMainThreadExecutor is to use looper.getMainLooper () to create a corresponding Handler and send messages to the main thread, which is not discussed in this article.

The designers of the source code hope that developers using Paging will be able to internally switch to the IO thread when performing Paging load tasks of data, and then internally switch back to the main thread to update the UI when Paging data is successfully loaded.

As a design, this is a very good design, but when developers actually use it, it’s hard to notice that the DataSource callback to the data load itself is executed in the IO thread:

public abstract class PositionalDataSource<T> extends DataSource<Integer.T>{
  // Use annotations to remind developers of callbacks in child threads
  @WorkerThread
  public abstract void loadInitial(...).;

  @WorkerThread
  public abstract void loadRange(...).;
}
Copy the code

The callback itself is executed in the child thread, which means that it is best not to use asynchronous methods for loading paging data, otherwise problems may occur.

For users of OkHttp, developers should use the execute() synchronization method:

override fun loadInitial(... , callback:LoadInitialCallback<RedditPost>) {
  // Use the synchronous method
  val response = request.execute()
  callback.onResult(...)
}
Copy the code

For RxJava, blocking operations should be done using blocking related methods.

If the PositionalDataSource has @workerThread, then the other ItemKeyedDataSource and PageKeyedDataSource have no @workerThread:

public abstract class ItemKeyedDataSource<Key.Value> extends ContiguousDataSource<Key.Value> {
  public abstract void loadInitial(...).;

  public abstract void loadAfter(...).;
}

// PageKeyedDataSource has no 'WorkerThread' comment
Copy the code

So if you don’t pay attention to these details, you can go astray and lead to some unknown problems. For this, you can try this example code from Google.

Curiously, even in Google’s official code example, for loadInitial and loadAfter, only loadInitial uses synchronous methods for requests, while loadAfter still uses enqueue() for asynchronous requests. Although the notes clearly state this, I still don’t understand this behavior, as it does have the potential to lead some developers astray.

In short, in the design of Paging, the original intention to hide the implementation details of thread switching is good, but the result does not achieve a good effect, on the contrary, it may lead to incorrect understanding and use (I step on the pit).

Perhaps it would be better for thread switching not to be implemented with internal default parameters (especially not to be configured with Builder mode, which is too easily overlooked), but to be mandated by developers?

Welcome friends with ideas to leave comments below this article, the exchange of ideas will be easier to make people progress.

conclusion

In this paper, the realization of the principle of Paging is systematically explained. Then, in terms of architecture design of Paging, what advantages are worth learning?

First, dependency injection. Most of the dependencies of all objects in Paging, including configuration parameters, internal callback, and thread switching, are carried out through dependency injection, which is simple and simple. The dependencies between classes can be traced.

Second, the class abstraction and the sinking of different businesses, the division of labor of DataSource and PagedList is clear, and abstract up into an abstract class, and the different business situation of paging logic down to their subclasses.

Finally, define object boundaries: Design the life cycle of paging data to avoid performing invalid asynchronous paging tasks when the data source is invalid; Use lazy loaded LiveData to ensure that paging logic is not performed when not subscribed.

Reference & More

If you’re interested in Paging, you’re welcome to read more of my articles and discuss Paging with me:

Reflection | Android list Paging component design and implementation of the Paging: system overview
Paging: The design aesthetics of the Paging library, an official Android architecture component
Official Android architecture component Paging-Ex: Adds a Header and Footer to the Paging list
Official Architecture component of Android Paging-Ex: reactive management of list states

About me

Hello, I am the only one who is interested in this article. If you think this article is of value to you, please feel free to follow me at ❤️, or on my blog or Github.

If you think the writing is a little bit worse, please pay attention and push me to write a better writing — just in case I do someday.

My Android learning system
About article correction
About Paying for Knowledge
About the Reflections series