SparseArray for Android data containers

SparseArray is a data container from Google. It is recommended to replace a HashMap whose key is int on Android. The reasons given in the official documentation are memory savings compared to HashMap: 1. Avoiding automatic boxing; 2. Data structures do not depend on external object mappings. Because for mobile, memory resources are more precious.

The source code parsing

Take android.util.SparseArray as an example, SDK=28

Member variables

private static final Object DELETED = new Object();     // flag whether value is deleted
private boolean mGarbage = false;   // Marks whether garbage collection is currently taking place

private int[] mKeys;    // An array of keys, in ascending order
private Object[] mValues;   // Store an array of values
private int mSize;   // The actual number of elements
Copy the code

Initialize the

The default SparseArray no-argument constructor has an initial capacity of 10, but is internally processed to 11. After initialization, both mKeys and mValues are unassigned arrays of length 11


public SparseArray(int initialCapacity) {
    if (initialCapacity == 0) {
        mKeys = EmptyArray.INT;
        mValues = EmptyArray.OBJECT;
    } else {
        mValues = ArrayUtils.newUnpaddedObjectArray(initialCapacity);
        mKeys = new int[mValues.length];
    }
    mSize = 0;
}
Copy the code

put

The most important method, the key and value stored procedure

public void put(int key, E value) {
    // A binary search is performed for incoming keys, so mkeys must be in order
    int i = ContainerHelpers.binarySearch(mKeys, mSize, key);
    // If yes, the value of the original key already exists
    if (i >= 0) {
        mValues[i] = value;
    } else {
        // If I is not found, perform a reverse operation on I to get the index to be inserted
        i = ~i;
        // If the subscript to be inserted happens to be DELETED, update it directly
        if (i < mSize && mValues[i] == DELETED) {
            mKeys[i] = key;
            mValues[i] = value;
            return;
        }

        
        if (mGarbage && mSize >= mKeys.length) {
            // If the capacity is insufficient and garbage collection is needed, the collection operation is performed first
            gc();
    
            // After the space is compressed, the array element changes and the insertion position needs to be recalculated
            i = ~ContainerHelpers.binarySearch(mKeys, mSize, key);
        }
        // Insert elementsmKeys = GrowingArrayUtils.insert(mKeys, mSize, i, key); mValues = GrowingArrayUtils.insert(mValues, mSize, i, value); mSize++; }}Copy the code

Binary lookup and insert elements together ensure the order of mKeys. After determining the corresponding position of key in the array, the position of value in the array is also determined. Mkeys. length=11,mSize=0. When mSize increases to >= mkeys. length, gc will be performed first if there are DELETED elements in the array. Improves space utilization and avoids array expansion.

binarySearch


static int binarySearch(int[] array, int size, int value) {
    int lo = 0;
    int hi = size - 1;

    while (lo <= hi) {
        final int mid = (lo + hi) >>> 1;
        final int midVal = array[mid];

        if (midVal < value) {
            lo = mid + 1;
        } else if (midVal > value) {
            hi = mid - 1;
        } else {
            return mid;  // value found}}return ~lo;  // value not present
}
Copy the code

BinarySearch, unlike Arrays#binarySearch, returns the inverse of the bottom boundary if the key is not found. If I invert it, I return a negative number, put goes down, and when I invert it again, I get where I want to insert it, and it’s still pretty efficient. Such as:

MKeys ={3,4,6,7,8}, lo=0,hi=4, lo=2,hi=1, lo=2,hi=1Copy the code

delete

public void delete(int key) {
    int i = ContainerHelpers.binarySearch(mKeys, mSize, key);

    if (i >= 0) {
        if(mValues[i] ! = DELETED) { mValues[i] = DELETED; mGarbage =true; }}}Copy the code

The delete method is simple. If the key already exists and the corresponding value has not been marked for deletion, the value is marked for deletion and mGarbage is set to true. The method of marking is to point value to a static constant Object. Remove (int key) is also implemented by delete. If the value of the subscript is removed by the tag, it is updated directly, eliminating the need to delete and insert elements. It’s in the GC method that SparseArray actually handles mKeys and mValues

gc

private void gc(a) {

    int n = mSize;  // The number of elements before compression
    int o = 0;      // The number of compressed elements
    int[] keys = mKeys;
    Object[] values = mValues;

    for (int i = 0; i < n; i++) {
        Object val = values[i];

        if(val ! = DELETED) {// If the value of the position is not marked for deletion
            if(i ! = o) { keys[o] = keys[i];// Move the element at I forward to o so that all non-deleted elements are consecutively placed at the front of the array
                values[o] = val;
                values[i] = null;   // Free up space
            }

            o++;    // if I >o, the element is marked for deletion
        }
    }

    mGarbage = false;   // Compression (garbage collection) is complete
    mSize = o;      // Update the number of compressed elements
}
Copy the code

The DELETED nodes in the value array are reclaimed and the number of valid elements in the array is updated through a very clever algorithm. Looking back again put in operation, assuming that when mKeys =,3,4,5,6,7,8,9,10,11,12 {0}, mValues = {” a “, “b”, “c”, “d”, “e”, “f”, “g”, “h” and “I”, “j”, “k”}

SpA. Remove (4); //spA represents the SparseArray object mValues={"a"."b", *,"d"."e"."f"."g"."h"."i"."j"."k"} //* represents a DELETED marker. spA.put(15,"s") I = ~ ContainerHelpers. BinarySearch (mKeys, mSize, key) 11 and mGarbage mSize = = = 12 this timetrueHitting condition, call the gc method for the compression of array / / gc after mKeys =,3,5,6,7,8,9,10,11,12,12 {0} mValues = {"a"."b"."d"."e"."f"."g"."h"."i"."j"."k",null} mSize=10 i=~ContainerHelpers.binarySearch(mKeys, mSize, After the key) = 11 / / insert mKeys =,3,5,6,7,8,9,10,11,12,15 {0} mValues = {"a"."b"."d"."e"."f"."g"."h"."i"."j"."k"."s"}
mSize=11
Copy the code

Since the mKeys change after GC, it needs to be re-indexed, as shown above, I is equal to 12 before GC, and the final index after GC is 11. To note here that the gc process mKeys fill after a forward does not like mValues release space, such as the remove (4) and then remove (6), gc again after mKeys =,3,5,7,8,9,10,11,12,11,12 {0} will appear such circumstance, But that does not violate mKeys is orderly rules, because I = ~ ContainerHelpers. BinarySearch (mKeys, mSize, key); Lo =0,hi=size-1,hi is counted by the number of valid elements, like 11 and 12 are not valid key values, so binary search is still valid.

get

public E get(int key, E valueIfKeyNotFound) {
    int i = ContainerHelpers.binarySearch(mKeys, mSize, key);
    // Return the default value if the key is found in the mKeys array
    if (i < 0 || mValues[i] == DELETED) {
        return valueIfKeyNotFound;
    } else {
        return(E) mValues[i]; }}Copy the code

IndexOfValue, indexOfValueByValue

Retrieves the index of the key and value to determine if any element has been previously deleted by the tag. The difference is that key is a binary search, and value, because it is unordered, iterates through the array, and returns -1 if it is not found. So there’s a pit in there where indexOfValue internally uses == to compare addresses, and indexOfValueByValue compares two objects using equals, which is marked as private API. If you do want to determine the subscript by value, you can get it by traversing SparseArray and valueAt.

append

public void append(int key, E value) {
    if(mSize ! =0 && key <= mKeys[mSize - 1]) {
        put(key, value);
        return;
    }

    if (mGarbage && mSize >= mKeys.length) {
        gc();
    }

    mKeys = GrowingArrayUtils.append(mKeys, mSize, key);
    mValues = GrowingArrayUtils.append(mValues, mSize, value);
    mSize++;
}
Copy the code

If the array is empty and the key is larger than all the elements in the current mKeys, the search process is omitted. If there is no need to expand the array, append the value directly to the tail, and copy it once if necessary. This is the optimization made by Append for the special case of storage.

GrowingArrayUtils

Both insert and append are evaluated first to see if they need to be expanded, but inserts may need to be copied twice and end up calling System#arraycopy.

public static int growSize(int currentSize) {
    return currentSize <= 4 ? 8 : currentSize * 2;
}
Copy the code

If the current size is greater than 4, it will be doubled.

Other containers for extension

SparseIntArray, SparseBooleanArray, and SparseLongArray are all data containers used to store specific values. There are no generics and no GC, and array types are all basic data types, avoiding boxing these value types and further optimizing. LongSparseArray is used to ensure that the key is of type long and can store a wider range of key data.

Performance analysis

Speed is

First use HashMap to do a 100000 forward insert:

public void testHashMap(View view) {
    HashMap<Integer, String> map = new HashMap<>();
    long start = System.currentTimeMillis();
    for (int i = 0; i <= 100000; i++) {
        map.put(i, String.valueOf(i));
    }
    long end = System.currentTimeMillis();
    System.out.println("HashMap elapsed time:" + (end - start) );
}
Copy the code

Run 5 times to take the average and replace the HashMap with SparseArray. The result is as follows (time /ms)

The serial number	HashMap	SparseArray
1	75	66
2	72	73
3	64	80
4	46	63
5	74	68
On average,	66.2	70

Let’s do another reverse insert

for (int i = 100000; i >= 0; i--) {
    map.put(i, String.valueOf(i));
}
Copy the code

The result is no better. HashMap is about the same as forward insert, but SparseArray is more than 10 times slower. The reason for this is that SparseArray has to do a binary search every time, and in the worst case, the new key will be at the top of the array, so it will be very inefficient to copy the array every time.

Memory is

AndroidStudio Profiler generates 100,000 HashMap/SparseArray objects through a loop to see the difference in the Java Heap size before and after the generated objects

private HashMap<Integer, String> mMap = new HashMap<>();
private SparseArray<String> mSpa = new SparseArray<>();
    
public void test(a){
    for (int i = 0; i < 100000; i++) {
        mMap.put(i, String.valueOf(i));
        //mSpa.put(i, String.valueOf(i));}}Copy the code

Adb shell Dumpsys meminfo

Applications Memory Usage (in Kilobytes):
Uptime: 854810329 Realtime: 2161500060

** MEMINFO in pid 16304 [com.example.dataset] **
                   Pss  Private  Private  SwapPss     Heap     Heap     Heap
                 Total    Dirty    Clean    Dirty     Size    Alloc     Free
                ------   ------   ------   ------   ------   ------   ------
  Native Heap     5870     5836        0      215    22528    18214     4313
  Dalvik Heap        0        0        0        0     4192     2096     2096
        Stack       80       80        0        0                           
       Ashmem       13        0       12        0                           
      Gfx dev      392      392        0        0                           
    Other dev        2        0        0        0                           
     .so mmap    10058      416     5740       22                           
    .apk mmap      289        0        0        0                           
    .ttf mmap       54        0        0        0                           
    .dex mmap     3656       12     1756        0                           
    .oat mmap      387        0       28        0                           
    .art mmap     7397     7000       20      102                           
   Other mmap       19        4        0        0                           
   EGL mtrack    20144    20144        0        0                           
    GL mtrack     5896     5896        0        0                           
      Unknown     6131     6088        0       54                           
        TOTAL    60781    45868     7556      393    26720    20310     6409
 
 App Summary
                       Pss(KB)
                        ------
           Java Heap:     7020
         Native Heap:     5836
                Code:     7952
               Stack:       80
            Graphics:    26432
       Private Other:     6104
              System:     7357
 
               TOTAL:    60781       TOTAL SWAP PSS:      393
Copy the code

Memory usage after creating data using HashMap:

** MEMINFO inpid 16304 [com.example.dataset] ** Pss Private Private SwapPss Heap Heap Heap Total Dirty Clean Dirty Size Alloc Free ------ ------ ------ ------ ------ ------ ------ Native Heap 6414 6380 0 215 22528 18977 3550 Dalvik Heap 0 0 0 0 19243 9622 9621 Stack 88 88 0 0 Ashmem 13 0 12 0 Gfx dev 752 752 0 0 Other dev 2 0 0 0 .so mmap 10105 424 5740 22 .apk mmap 289 0 0 0 .ttf mmap 54 0 0 0 .dex mmap 3800 12 1828 0 .oat mmap 398 0 28 0 .art mmap 7592 7112 52 99 Other mmap 19 4 0 0  EGL mtrack 30216 30216 0 0 GL mtrack 6032 6032 0 0 Unknown 17420 17296 0 32 TOTAL 83562 68316 7660 368 41771 28599 13171 App Summary Pss(KB) ------ Java Heap: 7164 Native Heap: 6380 Code: 8032 Stack: 88 Graphics: 37000 Private Other: 17312 System: 7586 TOTAL: 83562 TOTAL SWAP PSS: 368Copy the code

Memory usage after creating data with SparseArray:

** MEMINFO in pid 16467 [com.example.dataset] **
                   Pss  Private  Private  SwapPss     Heap     Heap     Heap
                 Total    Dirty    Clean    Dirty     Size    Alloc     Free
                ------   ------   ------   ------   ------   ------   ------
  Native Heap     6326     6288        0      214    22528    19060     3467
  Dalvik Heap        0        0        0        0    10923     5462     5461
        Stack       88       88        0        0                           
       Ashmem       13        0       12        0                           
      Gfx dev      756      756        0        0                           
    Other dev        2        0        0        0                           
     .so mmap    10108      424     5740       22                           
    .apk mmap      299        0       20        0                           
    .ttf mmap       54        0        0        0                           
    .dex mmap     3737       12     1768        0                           
    .oat mmap      398        0       28        0                           
    .art mmap     7592     7112       52       99                           
   Other mmap       19        4        0        0                           
   EGL mtrack    30216    30216        0        0                           
    GL mtrack     6044     6044        0        0                           
      Unknown    10306    10180        0       32                           
        TOTAL    76325    61124     7620      367    33451    24522     8928
 
 App Summary
                       Pss(KB)
                        ------
           Java Heap:     7164
         Native Heap:     6288
                Code:     7992
               Stack:       88
            Graphics:    37016
       Private Other:    10196
              System:     7581
 
               TOTAL:    76325       TOTAL SWAP PSS:      367

Copy the code

Using Dalvik heap-heap Alloc as a reference standard and using memory before and after multiple comparisons, SparseArray was calculated to reduce memory by 30-40% compared to HashMap.

conclusion

SparseArray has less automatic boxing (int->Integer) than a HashMap, because a HashMap’s key stores the wrapper type
Compared to HashMap, it is more memory efficient and simpler. Because HashMap uses arrays + linked lists or trees to store data, there is an additional Entry for storing hash, key, value, and the next Entry Node (Node), SparseArray maintains only two one-dimensional arrays internally for storing keys and values
The core of SparseArray is binary search, and the time complexity is O(logN). If it is not found, it returns the left margin, and when it is inverted again, it is the index of the array that should be inserted.
As the name suggests, the concept of a sparse array is used here to implement delete, not really delete, but to make a mark that reuses the element if it happens to be in that position when it is inserted again, otherwise it is compressed again if it runs out of space, optimising both efficiency and space.
SparseArray is not designed for large data stores and is less efficient than HashMap when there are many data stores. Because it’s based on binary lookup, whereas HashMap hashes the subscripts directly without conflict, in order O(1) time.
SparseArray can be used if memory requirements are high and the amount of data is less than a thousand, otherwise HashMap is more efficient