How AnimationHitches work

background

In Xcode12, Instrument added the AnimationHitches detection type to detect the deadlock, and removed the CoreAnimation detection mode. On devices that support PromotionDisplay, the frame rate can be adjusted to 120 frames and dynamically adjusts based on the current user gesture and device state. It would be a mistake to continue using frame rates to judge performance and fluency. So AnimationHitches are mainly used to replace framerate detection, and the concept of Hitch Time Ratio is proposed to replace FPS. Currently, there is little information about Hitch, and the maximum screen refresh frequency of iPhone before iPhone13Pro is still 60 HZ, so many students have not paid attention to this capability. Therefore, this article will mainly introduce the concept of Hitch, the overall process of RenderLoop, the types of Hitch and how to avoid it.

What is Caton?

concept

Any time a later frame appears on the screen than expected, it’s stuck.

The instance

For example, smooth Animation such as Scroll, Animation and Transition builds a visual connection between the user and the content on the screen. If the Animation freezes, the Animation jumps and breaks this connection, the user experience will be poor.A common example is when the user is scrolling up and down in a scroll view. The reason is that the delay of frame 4 causes frame 3 to take up two frames, giving the user the phenomenon of frame lag.

RenderLoop

concept

RenderLoop is a continuous process in which events are passed to the App through user gestures, etc., and then the App passes the event to the operating system and finally responds to the event, and then passes the response to the user.The time of RenderLoop varies with the refresh frequency of the device. IPhone devices below iPhone13 Pro(Max) all support a maximum of 60 frames, while iPhone13 Pro(Max) and iPadPro support a maximum of 120 frames. This means that a new frame can be displayed at a minimum of 8.33 milliseconds.

Frame preparation stage

The preparation of each frame can be divided into three phases. App, RenderServer and Display. The App mainly handles some user events, while RenderServer does the actual user interface drawing, both of which need to be completed before the next VSNYC comes. Finally, the Display stage will Display the buffered frame. The double processing of this frame is called double buffering. Since the display is scanned line by line, double buffering and vertical synchronization avoid screen tearing.Of course, the system also provides an alternate three-buffer mechanism that provides RenderServer with an extra frame to render, which is normally not enabled.

Stage details

The whole rendering cycle can be divided into five stages, including Event and Commit in our App, and the Commit stage can be further divided into Layout, Display, Prepare and Commit.

  1. In the event stage, touch, timer and other events are used to determine whether the user interface needs to be changed.
  2. In the Commit phase, the App will submit the rendering command to RenderServer.
  3. In the Prepare and Execute stages of RenderServer, the GPU is prepared for rendering. In the Execute stage, the GPU draws the user interface image.
  4. The final Display stage swaps the buffered frame for Display on the screen.

Take a rendering with a shadow as an example and look at what each frame in the RenderLoop does

App

Event

This phase indicates that the App has received events, such as touch events, network request callbacks, keyboards, and timers. An App can respond to these events by changing its hierarchy or in any other way.For example, the App can change the background color of the layer, and even change the size and position of the layer. When the App updates the layer limits, the CoreAnimation calls setNeedsLayout. It can identify which layers must be recalculated, and the system merges the layout requests and executes them sequentially during the Commit phase to reduce duplication.

Commit

There are four distinct phases involved in a transaction commit: the layout phase, the display phase, the preparation phase, and the final commit phase.

The layout phase

During the layout phase, layoutSubviews are called by all views that need to be laid out. Such as layout views (Frame, bounds, Transform), adding or removing views, or simply calling setNeedsLayout. Note that these layout operations are not performed immediately; the system consolidates the layout requests and performs them uniformly before Runloop goes to sleep.

According to phase

During the display phase, drawRect is called for each View that needs to be updated. Like UILabel, UIImageView, or just any class that overrides drawRect. They must call setNeedsDisplay to support updating the View. Each custom drawing layer receives a textured CoreGraphics background at draw time. They will use the CoreAnimation to draw, and these layers will become images. Therefore, do not rewrite the drawRect method if it is not necessary. The drawRect method not only allocates an extra chunk of memory to store the bitmap, but also draws on the CPU, increasing the overall main thread time. When there are many custom drawRect views, the overall memory pressure is also high.

Preparation stage

Images that have not been decoded in the Prepare phase will be decoded in this step, which is the common image mainline decoding operation that we need to optimize. The App may continue to have a large memory allocation for each image that is decoded. This memory allocation is proportional to the size of the input image and is not necessarily related to the size of the image view actually rendered in the FrameBuffer. As apps take up more and more memory, the operating system starts to compress physical memory. The entire process requires CPU participation, so in addition to our own App CPU usage, we may also increase the global CPU usage that we can’t control. Eventually, our App may consume so much physical memory that the operating system will need to start a termination process, which will start with a low-priority background process. If our App consumes a certain amount of memory, it may be terminated, which is why OOM is often generated for large images. If the color format of an image cannot be directly used by the GPU, the format conversion will also be performed in this step. This requires copying the image instead of using a pointer, which takes longer and consumes more memory.

The commit phase

During the commit phase, the view tree will be recursively packaged and sent to RenderServer, so this process can take longer when the view hierarchy is more complex, so minimize the view hierarchy.

RenderServer

RenderServer is responsible for converting our layer tree into a truly displayable image. RenderServer has two phases: Prepare and Execute. During the Prepare phase our layer tree was compiled into a series of simple instructions for the GPU to execute, and the frame animation was also processed here. During the rendering execution phase, THE GPU draws the layer of App into the final image.Here’s a rendering example. In the example below, both circles and bars have shadows around them.

Prepare

In the preparation phase, RenderServer traverses the App’s layer tree breadth-first, preparing a linear pipeline so that the GPU can execute commands in order to draw. Start with the root layer and iterate through it layer by layer until you have the entire pipeline that the GPU can execute in the next execution phase.

Execute

In the execution phase, the GPU performs vertex coloring, shape assembly, geometric coloring, rasterization, fragment coloring and layer mixing according to the layer tree prepared in the prepare phase. Once the GPU finishes execution, it will put the rendered image into the frame cache to wait for the arrival of the next VSYNC and swap it to the screen for display.In this case, the GPU’s job is to use the pipeline to draw each step into a texture that is eventually synthesized and displayed on the screen during the display phase.

Starting with the first blue layer, it draws the color within the specified boundary. Then dark blue is drawn within its boundaries, but the current circle and rectangle have shadows, so now the GPU has to draw shadows first. The shape of the shadow is defined by the two layers that have not yet been drawn, so the circle and rectangle need to be drawn first. In order to avoid shading these two layers, we need to switch to a different texture and draw the shadow first. For this situation we call it “off-screen rendering”. An extra block of memory is needed here to draw circles and rectangles, then turn the layer black and blur to achieve the shadow effect.The GPU can then copy the off-screen rendering texture of the shadow into the final texture. The next step is to draw the circle and rectangle again. Note that not only does an extra storage space open up for rendering shadows, but circles and rectangles are rendered twice, which can be very detrimental to performance.

The final text is drawn on the CPU, and the GPU copies the text image drawn by the CPU. After the above process is complete, the frame is ready to be displayed.

Note that in this process we had to render the shadows with an off-screen rendering, which took longer.

Off-screen rendering

The off-screen rendering channel means that the GPU must first create a block of memory somewhere else to render a layer, and then copy it back. In the case of shadows, it must draw layers to determine the final shape.The occasional off-screen rendering doesn’t have much of an impact on performance, but off-screen passes can add up and lead to render stuttering. So it needs to be monitored in the App and avoided as much as possible. There are four main types of off-screen channels that can be optimized: shadows, masks, rounded corners, and frosted glass.

Shadow: For example, the GPU does not have enough information to draw a Shadow without first drawing the Shadow attached to the graph.Mask: When a layer or layer tree needs to be masked, the GPU needs to render the shaded subtree, and it also needs to avoid overwriting pixels outside the shaded shape. So it will only to eventually need to display pixel replication back to the final texture, as the final result may be by the superposition of multi-layer rendering results, so must use additional memory space in the middle of the rendering results are cached, so the system will default to trigger the off-screen rendering, the off-screen rendering may lead to apply colours to a drawing many users will never see pixels.CornerRadius: Since GPU will start drawing from the root node first, if rounded corners are set on the root node and the clipping attribute of maskToBounds is set, an extra off-screen rendering buffer will be required to cache the clipping results in the middle and eventually copy back the pixels inside the rounded corners. Attributes such as group transparency can trigger off-screen rendering.UIBlurEffectView controls started to be supported in iOS8 to support blurring and brightening. To apply these effects, the GPU must copy the content into another texture using the off-screen channel, then blur it, scale it, overlay it, and copy the final result back.

Display

The process of Display is actually exchanging the contents of the frame cache to the Display for final Display, and we are not involved in this process.

conclusion

To achieve the target frame rate and keep input delays low, RenderLoop actually runs in parallel throughout each frame, making the pipeline parallel. The CPU can prepare a new frame at the same time as the system renders the previous frame, so the deadline for each frame is important.

Caton type

The entire workflow of RenderLoop has been described above, which actually takes place mainly in App and RenderServer, so there are two main types of stutter: commit stutter (which occurs in App) and render stutter (which occurs in RenderServer).

Submit caton

concept

Submission lag is when the App takes too long to process or submit events.

The submission took too long and the deadline was missed, so RenderServer had nothing to do with the next VSYNC and had to wait for the next VSYNC before rendering could begin. Now that you’ve delayed the frame delivery by one frame, in milliseconds this would be 16.67 milliseconds on an iPhone(60Hz) or iPad. This delay is called “Hitch Time.” If the commit takes longer, such as passing the start time of the next VSYNC, the frame is two frames late, or 33.34 milliseconds, during which the user does not scroll smoothly.

How to avoid jam

Keep the view light

  1. To keep the view light and take advantage of the GPU-accelerated properties available on CALayer, avoid custom drawing with CPU if not necessary.
  2. Do not overwrite the drawRect property unless necessary, because it opens up extra memory for CPU drawing, and it takes more time to draw on the CPU. For text, pictures and other system controls that are originally drawn on the CPU, we can try to use the thread-safe CoreGraphics capability of the lower level, such as TextKit and CoreText, with multi-threaded asynchronous drawing to reduce the main thread pressure.
  3. Try to reuse views rather than constantly adding or removing them.
  4. If you want to remove a view from an animation, use the Hidden property whenever possible.
  5. For the Prepare stage, when the size of our UIImage container view is smaller than the image itself, downsampling can usually be used to create thumbnails to save some memory space.

Avoid complex layouts

  1. Reduce costly and repetitive layouts and use only setNeedsLayout when you need to update your layout. LayoutIfNeeded can consume the lifetime of the current transaction and cause congestion. Most of the time you can wait until the next Runloop is executed to update your layout.
  2. Try to complete the layout with a minimum of constraints.
  3. Views should only invalidate themselves or their children, not their siblings or superviews, to avoid recursive layouts.
  4. Avoid unnecessary view hierarchy creation, which can increase the overall time of the commit phase

Reasonable multithreading capability

  1. Learn to make use of GCD multi-threading ability, make full use of CPU multi-core advantages, in advance in the sub-thread layout and other UI-unrelated operations, to avoid the main thread hang.
  2. Avoid disk-related operations, such as main thread I/O.
  3. As for the common main thread decoding operations, before iOS15, we usually encapsulated or used the most common third-party library SDWebImage to decode operations in child threads for us. In iOS15, official Apple finally provides a solution to solve the problem: UIImage prepareThumbnailOfSize: completionHandler: a new interface.
  4. For components that must be drawn on the CPU, try to use asynchronous drawing capability in combination with multi-threading to reduce main thread stress.

Rendering caton

concept

Render lag occurs when the render server is unable to prepare or execute the layer tree on time. It is clear that the Execute time exceeds the bounds of VSYNC, so this frame will not be ready on time. The green picture is one frame later than expected and has a 16-millisecond lag.

How to avoid jam

We have less influence in the preparation phase, and usually the main influence is the off-screen rendering in the execution phase. For shadows, when setting shadows, make sure shadowPath is set to reduce the number of off-screen channels. When rounding rectangles, use the cornerRadius and cornerCurve properties to avoid using masks or corner content to form rounded rectangles.Optimize the Mask of the entire App. Using masksToBounds to mask as rectangular rounded rectangles or ellipses performs much better than custom mask layers. It is important to use Instruments to analyze the App and check the layer tree for important tips to reduce the overall off-screen count.Use the shouldRasterize attribute wisely and carefully to raster and cache a layer. Using this property for layers that need to be refreshed frequently can adversely affect performance.

Use opaque layers to minimize layer blending.

Detection of caton

Stuck times can be useful when looking at just one or several stuck times, but can be difficult to deal with in longer events like scrolling, animations, or transitions. Unless each scroll or animation takes exactly the same amount of time, there will be the same number of frames. Also, iOS devices don’t always update screens, and if no transaction is sent to RenderServer, a new frame won’t be committed. It’s even harder to compare lag times with tests. So Apple offers something called the “Hitch Time Ratio” to measure the lag over Time.

The cackon time ratio is the total cackon time in an interval divided by its duration. Because it’s normalized to total time, we can cross-compare across different practices. It is measured in elapsed milliseconds per second. So it’s the number of milliseconds per second that the device freezes.An example is as follows, on an iPhone(60HZ) this is half a second of work, each frame is ready before VSYNC arrives, so the user doesn’t see the lag, the lag time is 0, and the lag time ratio is 0.The second example is the following, in which the lag is sometimes caused in the commit phase and sometimes in RenderServer. Add up the stuck times and you get 100.02 ms half a second. We get the caden time ratio of 200.04 ms per second.Here are Apple’s recommended carton time ratio targets. The target is below 5 ms/s, which is the least detectable by users. Users who are stuck at 5~10 ms/s will notice some interruption. More than 10 ms/s can seriously affect the user experience.This article focuses on RenderLoop and the entire process of presenting a new frame to the user, looking at what a lag is and its two types: submission and rendering. Finally, the lag time ratio is defined to measure the lag degree and performance of the current App. I believe you have a better understanding of the entire rendering cycle and caton types and can avoid these problems in everyday coding.

This article mainly introduces some concepts related to the principle, then how to measure the specific load? The next article will analyze some performance problems of DXSDK as a card level in the use of daily information flow through practice combined with Instrument’s AnimationHitches ability, and some performance optimization improvements made by DXSDK in the first half of the year.

The resources

WWDC 2020,2021