Explore the principle of Android GC

The idea of writing an Article about Android GC comes from tracing a meizu mobile phone picture sliding lag problem. The problem of frame loss lag caused by continuous GC makes us think of many solutions to solve it, so we plan to have a detailed look at the principle of memory allocation and GC, and why the continuous GC occurs. What is the difference between GC ALLOC and GC COCURRENT? Can we expand the heap to reduce the frequency of GC?

1. JVM memory reclamation mechanism

1.1 Recovery Algorithm

Mark and Sweep GC

Starting with the “GC Roots” collection, the entire memory is traversed once, preserving all objects that can be directly or indirectly referenced by GC Roots, while the rest of the objects are treated as garbage and recycled. This algorithm interrupts the execution of other components in the process and may generate memory fragmentation

Copying algorithms

The existing memory space is divided into two parts and only one part is used at a time. During garbage collection, the surviving objects in the used memory are copied to the unused memory block. After that, all objects in the used memory block are cleared and the roles of the two memory blocks are swapped to complete garbage collection.

Mark-compact algorithm

All reachable objects need to be marked once, starting at the root node, but after that, instead of simply cleaning up the unmarked objects, it compresses all living objects to one end of memory. After that, clean up all the space outside the boundary. This method not only avoids fragmentation, but also does not require two identical memory Spaces, so it is cost-effective.

generational

All newly created objects are put into an area of memory called the young generation. The young generation is characterized by fast collection of objects, so a more efficient replication algorithm is selected in the young generation. When an object survives several collections, it is put into a memory space called the old generation. For the new generation, it is suitable to copy algorithm, but for the old age, it is marker-compression algorithm.

1.2 Differences between copy and mark-compression algorithms

At first glance, there seems to be little difference between the two algorithms. Both algorithms are marked and then moved to another memory address for collection. Why do different generations use different collection algorithms?

In fact, the biggest difference between the two is that the former is to use space for time and the latter is to use time for space.

The former work not without the separate mark and copy stages, but rather together in an action called scavenge. Set forwarding pointer (); / / Copy a forwarding pointer to the new location whenever a live object is found that has not been accessed in the collection. This way of working requires extra space.

The latter requires separate MARK and Compact phases to work. The MARK phase is used to discover and mark all live objects before the Compact phase moves objects for compact purposes. If the compact mode is a sliding compaction, objects can “slide” sequentially to one side of space after Mark. Because you’ve already traversed the object map of the entire space, you know all the live objects, so you can move in the same space without needing another space.

So the new generation of recycling will be a little faster, the old generation of recycling will take longer, and the compression phase will suspend the application, so we should try to avoid objects in the old age.

2. Dalvik VM

2.1 Java heap

The Java heap is actually made up of an Active heap, which manages various objects that the Zygote process preloads and creates during startup, and a Zygote heap, which is created before the Zygote process forks the first child process. All subsequent application processes are forked out by the Zygote process and each holds its own Dalvik virtual machine. During application creation, Dalvik vm uses COW policy to copy the address space of Zygote process.

COW policy: At the beginning (when the address space of the Zygote process is not copied), the application process and the Zygote process share the same heap to allocate objects. When a Zygote process or an application process writes to the heap, the kernel performs a true copy operation, allowing the Zygote process and the application process to have their own copies. This is called COW. Because copy is time consuming, you must avoid copy as little as possible.

To achieve this, when the first application process is created, the heap memory that is already used is divided into one part and the heap memory that is not yet used is divided into another. The former is called the Zygote heap and the latter is called the Active heap. This simply copies the contents of the Zygote heap to the application process. When Zygote processes and application processes need to allocate objects, they do so on the Active heap. This allows the Zygote heap to perform as few writes as possible, thus reducing copy-on-write operations. The objects allocated in the Zygote heap are basically classes, resources, and objects that the Zygote process preloads during startup. This means that these preloaded classes, resources, and objects can be long-term shared between Zygote processes and application processes. This reduces both copying operations and memory requirements.

2.2 Some metrics related to GC

I remember that when we were optimizing the GC lag problem of a meizu mobile phone, we found that it was easy to trigger GC_FOR_MALLOC. This GC category will be mentioned later, which is caused by insufficient memory of allocated objects. Starting Size, Maximum Size, and Growth Limit of the Java heap are the following concepts:

When starting Dalvik VIRTUAL machine, we can specify the above three values with the options -xms, -xmx, and -xx :HeapGrowthLimit. The above three values are represented separately

Starting Size: Dalvik vm starts with an initial chunk of heap memory allocated to the VM.

Growth Limit: is the maximum number of applications that the system gives the maximum number of applications, above which the application will be OOM

Maximum Size: the Maximum heap Size that can be retrieved from the system when using the largeheap attribute

Meanwhile, in addition to the above three indicators, there are several other indicators that deserve our attention, namely, Min Free, Max Free, and Target Utilization of the heap. Given that the memory footprint of the live objects after a certain GC is LiveSize, the ideal size of the heap should be (LiveSize/U). However, (LiveSize/U) must be greater than or equal to (LiveSize + MinFree) and less than or equal to (LiveSize + MaxFree), and the garbage collector tries to bring the heap utilization closer to the target utilization after each GC. So when we try to generate some manual hundreds of K objects, trying to enlarge the available heap size, it will lead to frequent GC, because the distribution of these objects can lead to GC, and GC will make after heap memory back to the right proportion, and we use local variables will soon be recycling theoretically live objects are so much. Our heap size will shrink back and not be able to scale. At the same time, this is a factor in the generation of a CONCURRENT GC, which we will discuss in more detail later.

2.3 Types of GC

GC_FOR_MALLOC: Represents the GC triggered by insufficient memory when allocating objects on the heap.
GC_CONCURRENT: When the heap memory of our application reaches a certain amount, or is almost full, the system automatically triggers a GC operation to free the memory.
GC_EXPLICIT: Indicates the GC triggered when the application calls the system. gc or vMRuntime. gc interface or receives SIGUSR1 signal.
GC_BEFORE_OOM: Indicates the GC that is triggered as a last-ditch effort before preparing to throw the OOM exception.

In fact, the GC_FOR_MALLOC, GC_CONCURRENT, and GC_BEFORE_OOM types of GCS are all triggered during the process of allocating objects. The main difference between concurrent and non-concurrent GC is that the former conditionally suspends and wakes up non-GC threads during GC, while the latter suspends non-GC threads all the time during GC. Parallel GC makes applications more responsive by conditionally suspending and waking up non-GC threads. But parallel GC also takes more CPU resources because it requires one more operation to mark the root set objects and recursively mark the objects that are accessed during GC. We will also focus on the difference between concurrent and non-concurrent GC in Art later.

2.4 Object allocation and GC trigger timing

1. Call dvmHeapSourceAlloc to allocate the specified size of memory on the Java heap. If the assignment succeeds, the assigned address is returned directly to the caller. The dvmHeapSourceAlloc function allocates memory without changing the current size of the Java heap, which is a lightweight memory allocation action.

2. If the previous allocation fails, then a GC is required. But if GC thread already in operation, namely gDvm. GcHeap – > gcRunning value equal to true, then the direct function call dvmWaitForConcurrentGcToComplete wait for GC is executed. Otherwise, you need to call gcForMalloc to perform a GC, with the parameter false indicating that the object referenced by the soft reference object is not to be reclaimed.

3. After GC is completed, call dvmHeapSourceAlloc again to try lightweight memory allocation operation. If the assignment succeeds, the assigned address is returned directly to the caller.

4. If the previous step fails to allocate memory, it is necessary to consider setting the current size of the Java heap to the maximum size specified when Dalvik vm starts before allocating memory. This is by calling the function dvmHeapSourceAllocAndGrow.

5. If the calling function dvmHeapSourceAllocAndGrow allocates memory, are allocated directly to get the address of the direct returned to the caller.

6. If the last memory allocation fails again, this is the time to make a bold move. Call the function gcForMalloc again to perform GC. The true argument indicates the object referenced by the soft reference object to be reclaimed.

7., GC has been completed dvmHeapSourceAllocAndGrow again the function of memory allocation. This is the last effort, the end of everything.

Here’s an example:

It can be seen from this process that GC will occur in the allocation of objects. The first time the allocation of objects fails, GC will be triggered but the reference of Soft will not be recycled. If the allocation fails again, GC will also be recycled. The latter is a GC_BEFORE_OOM type GC. When the memory allocation is successful, we determine whether the current memory usage has reached the GC_CONCURRENT threshold, and if so, GC_CONCURRENT is triggered again.

So where does this threshold come from? The target utilization we talked about above, after GC we record a target value, which theoretically needs to be within the range above, if not we select the boundary value as the target value. The virtual machine records this target value as the total amount of memory currently allowed to be allocated. The fixed value (200~500K) is subtracted from the target value as the threshold for triggering GC_CONCURRENT events.

2.5 Reclaiming Algorithms and Memory Fragments

Most of the mainstream Davik uses Mark and Sweep recovery algorithm, and some implement copy GC, which is different from HotSpot. The specific algorithm used is determined at compile time and cannot be dynamically changed at run time. If during the compilation of dalvik virtual machine command dictates “WITH_COPYING_GC” option, the compiler “dalvik/vm/alloc/Copying. CPP” source – this is copy GC algorithm implementation of Android, Compile “dalvik/vm/alloc/HeapSource CPP” – its annotations and clean up the GC algorithm.

Due to the shortcomings of Mark and Sweep algorithm, it is easy to cause memory fragmentation. Therefore, under this algorithm, when we have a large amount of discontinuous small memory, it is very easy to cause GC when we reallocate a large object. For example, we decode pictures on this mobile phone.

Therefore, for the mobile phone of Dalvik VIRTUAL machine, we should first try to avoid generating many temporary small variables frequently (such as getView, onDraw and other functions), and also try to avoid generating many large objects with long life cycle.

3. ART memory reclamation mechanism

3.1 Java heap

The Java heap used internally by ART runtime mainly consists of Image Space, Zygote Space, Allocation Space and Large Object Space. Image Space is used to store some preloaded classes. Zygote Space and Allocation Space work in the same way as Zygote heap and Active heap in Dalvik VM garbage collection.

Large Object Space is a collection of discrete addresses that are used to allocate some Large objects to improve the management efficiency and overall performance of GC, similar to the following figure:

In the following GC Log, we can also see that LOS information is included in the ART GC Log, so that we can see the situation of large memory.

3.2 Types of GC

KGcCauseForAlloc, a GC that stops the world when insufficient memory is found to be allocated
KGcCauseBackground will start GC when memory reaches a certain threshold. This is a background GC and does not cause stop world
KGcCauseExplicit displays the GC that will be made when the art is called. If the art has this option turned on, the GC will be made when the Art is system.gc
Other more

3.3 Object allocation and GC trigger timing

Since the memory allocation under Art is basically the same as that under Dalvik, I directly mapped it.

3.4 Concurrent and non-concurrent GC

Unlike Dalvik, Art has only one collection algorithm in GC. Art will choose different collection algorithms in different situations. For example, when Alloc memory is insufficient, Art will adopt non-concurrent GC, and when Alloc memory reaches a certain threshold, concurrent GC will be triggered. There are also different GC strategies for both front and back scenarios, which we will explain later.

The concurrent GC

Step 1. Invoke the member function InitializePhase implemented by the subclass to perform the GC initialization phase.

Step 2. Suspend all ART runtime threads.

Step 3. Call the member function MarkingPhase implemented by the subclass to perform the GC MarkingPhase.

Step 4. Invoke the member function ReclaimPhase implemented by the subclass to perform the GC collection phase.

Step 5. Restore the ART runtime thread suspended in step 2.

Step 6. Invoke the member function FinishPhase implemented by the subclass to perform the GC end phase.

Concurrent GC

Step 1. Invoke the member function InitializePhase implemented by the subclass to perform the GC initialization phase.

Step 2. Obtain the lock used to access the Java heap.

Step 3. Call the member function MarkingPhase implemented by the subclass to perform the GC parallel MarkingPhase.

Step 4. Release the lock used to access the Java heap.

Step 5. Suspend all ART runtime threads.

Step 6. Call the member function HandleDirtyObjectsPhase implemented by the subclass to handle the objects modified during the GC parallel marking phase.

Step 7. Restore the ART runtime thread suspended in step 4.

Step 8. Repeat steps 5 through 7 until all objects modified during the parallel GC phase are processed.

Step 9. Obtain the lock used to access the Java heap.

Step 10. Invoke the member function ReclaimPhase implemented by the subclass to perform the GC collection phase.

Step 11. Release the lock used to access the Java heap.

Step 12. Call FinishPhase, the member function implemented by the subclass, to perform the GC end phase.

Therefore, both concurrent and non-concurrent will cause the occurrence of StopWorld. In the case of concurrent, the time of single stopWorld will be shorter, which is basically different from.

3.5 Differences between Art concurrent GC and Dalvik concurrent GC

First of all, we can compare it with the following 2 pictures

Dalvik GC:

Art GC

What is the difference between Art’s concurrent GC and Dalvik’s concurrent GC? At first glance, it looks like the two are similar. Although there is no suspended thread, there is also a suspended thread to execute the process of marking objects. By reading relevant documents, we can know that Art concurrent GC has three advantages for Dalvik:

1. Mark yourself

Art reduces the scope of object traversal by pushing the newly allocated objects into the Allocation Stack described by allocation_stack_, a member variable of the Heap class.

2. Prefetch

If the Allocation Stack is marked with memory, the system prereads the next object to be traversed, and pushes other objects referenced by the object onto the Stack until the traversal is complete.

3. Reduce Pause times

The Mark phase does not Block other threads. There is dirty data in the Mark phase, such as data that Mark finds it will not use but is being used by other threads at the same time. The Mark phase also processes dirty data rather than leaving it to the last Block. This also reduces the processing time for dirty data in the later Block phases.

3.6 Foreground and Background GC

Foreground Foreground Foreground Foreground Foreground Foreground Foreground Foreground Foreground Foreground Foreground Foreground Foreground Foreground Foreground Foreground Foreground Foreground Foreground Foreground Foreground Foreground Foreground Foreground Foreground Foreground Foreground Foreground Foreground Foreground Foreground Foreground Foreground Foreground Foreground Foreground Foreground Foreground Foreground Foreground Foreground Foreground Foreground Foreground Foreground Foreground Foreground Foreground Foreground Therefore, Foreground GC is the GC that the application executes in Foreground, and Background is the GC that the application executes in Background.

When the application is running in the foreground, responsiveness is Paramount and therefore requires efficient GC to be performed. Conversely, when the application is running in the background, where responsiveness is not Paramount, it is appropriate to address memory fragmentation in the heap. Therefore, the Mark-sweep GC is suitable as Foreground GC, while the Mark-Compact GC is suitable as Background GC.

Fragmentation can be avoided in Art due to the existence of Compact capability, which is also a good capability of Art.

3.7 Art Good

In general, ART does much better than Dalvik in GC, not only in the efficiency of GC and the reduction of pause time, but also in the memory allocation for large memory has a separate allocation area, and at the same time, it can have algorithms to do memory consolidation in the background to reduce memory fragmentation. For developers, we can basically avoid a lot of issues like gc lag under ART. In addition, according to Google’s own data, the efficiency of Art memory allocation is 10 times higher than Dalvik, and the efficiency of GC is 2-3 times higher.

4, the GC Log

When we want to trace some possible GC crashes based on the GC log, we need to understand the composition of the GC log and what the different information means.

4.1 Dalvik GC logs

Dalvik’s log format is basically as follows:

D/dalvikvm: <GC_Reason> <Amount_freed>, <Heap_stats>, <Pause_time>, <Total_time>

Gc_reason: is it gc_alloc or Gc_concurrent? Knowing the different reasons, it is convenient for us to do different processing.

Amount_freed: indicates how much memory is freed by the system during this GC operation

Heap_stats: displays the current free memory ratio and usage (active object memory/total current program memory).

Pause_time: Indicates how long the GC operation caused the application to pause. Regarding this pause time, GC operations cannot be performed concurrently prior to 2.3, i.e. the system is GC in progress, so the application can only block and wait for THE GC to finish. Since 2.3, the GC operation has been changed to a concurrent one, meaning that the application is not affected by the GC process, but there is a short block at the beginning and end of the GC operation, so there is a total_time later.

Total_time: Pause_time (Pause_time); Total_time: Pause_time (Pause_time);

4.2 Art GC Logs

I/art: <GC_Reason> <Amount_freed>, <LOS_Space_Status>, <Heap_stats>, <Pause_time>, <Total_time>

The basic situation is no different from Dalvik, GC has more reasons, and OS_Space_Status

LOS_Space_Status: Large Object Space. This part of memory is not allocated to the heap, but still belongs to the application memory. It is mainly used to manage objects that occupy Large memory, such as bitmaps, to avoid frequent heap GC due to Large memory allocation.

Write at the end: pictures from the Internet, special thanks to Luo.