IOS underlying principles + reverse article summary

This paper mainly introduces the principle and optimization of interface lag

Interface card,

In general, the computer display process is as follows, byCPU,GPU,displayWork together to bring the picture to the screen

  • 1. The CPU calculates the displayed content and submits it to the GPU

  • 2. After rendering, the GPU puts the rendered results into the FrameBuffer (frame cache).

  • 3. The video controller then reads the FrameBuffer line by line according to the VSync signal

  • 4, through the possible digital-to-analog conversion passed to the display

In the beginning, there was only one FrameBuffer. In this case, the efficiency of reading and refreshing the FrameBuffer was very high. To solve this problem, dual buffers were introduced. Double buffering. In this case, the GPU will pre-render one frame and put it into the FrameBuffer for the video controller to read. After the next frame is rendered, the GPU will directly point the video controller’s pointer to the second FrameBuffer.

The dual cache mechanism solves the efficiency problem, but then comes a new problem. When the video controller has not finished reading, for example, the screen content is just half displayed, the GPU submits a new frame to the FrameBuffer and exchanges the two framebuffers, The video controller brings the lower half of a new frame onto the screen, causing the screen to tear

To solve this problem, the vSYNC signal mechanism is adopted. When VSync is enabled, the GPU waits for a VSync signal from the display before rendering a new frame and updating the FrameBuffer. Dual cache +VSync is currently used on iOS devices

For more information on the screen Lag rendering process, see Section 2: Screen Lag and Parsing the Rendering Process in iOS

Reason for screen stalling

Let’s talk about why the screen is stuck

After the arrival of VSync signal, the system graphics service will notify the App through mechanisms such as CADisplayLink, and the App main thread starts to calculate the display content in the CPU. Then the CPU will submit the calculated content to the GPU, which will transform, synthesize and render the content. The GPU then submits the render result to the frame buffer and waits for the next VSync signal to be displayed on the screen. Due to the VSync mechanism, if the CPU or GPU does not complete the submission within a VSync period, the frame is discarded until the next opportunity to display, while the display remains unchanged. So it can be simply understood that the frame is out of date

As shown in the figure below, it is a display process. The first frame is processed and displayed normally before the arrival of VSync. The second frame is still being processed after the arrival of VSyncFrame dropCase, rendering will appear when the card is obviousPhenomenon,

As can be seen from the figure, no matter which CPU or GPU obstructs the display process, frame dropping will occur. Therefore, in order to provide users with better experience, we need to carry out lag detection and corresponding optimization in the development

Caton monitoring

There are two types of catton monitoring schemes:

  • FPS monitoring: In order to maintain the UI interaction of the process, the App refresh struggle should be maintained at around 60fps. The reason is that the default refresh rate of iOS devices is 60 times per second, and the interval of 1 refresh (i.e. the VSync signal is sent) is 1000ms/60 = 16.67ms. So if the next frame is not ready within 16.67ms, there will be a lag

  • Main thread lag monitoring: Monitors the RunLoop of the main thread through the child thread to determine whether the time between the two states (kCFRunLoopBeforeSources and kCFRunLoopAfterWaiting) reaches a certain threshold

FPS monitoring

FPS monitoring, refer to YYFPSLabel in YYKit, mainly through CADisplayLink. With the help of link’s time difference, the time required for a refresh is calculated, and then the refresh frequency is obtained by refreshing times/time difference, and its range is judged. Different text colors are displayed to indicate the severity of the lag. Code implementation is as follows:

class CJLFPSLabel: UILabel { fileprivate var link: CADisplayLink = { let link = CADisplayLink.init() return link }() fileprivate var count: Fileprivate var lastTime: TimeInterval = 0.0 fileprivate var fpsColor: UIColor = {return uicolor.green}() fileprivate var FPS: Double = 0.0 Zero {f.ize = CGSize(width: 80.0, height: 22.0)} super.init(frame: f) self.textColor = UIColor.white self.textAlignment = .center self.font = UIFont.init(name: "Menlo", size: 12) self. BackgroundColor = uicolor.lightGray // Link = cadisPlayLink.init (target: CJLWeakProxy(target:self), selector: #selector(tick(_:))) link.add(to: RunLoop.current, forMode: RunLoop.Mode.common) } required init?(coder: NSCoder) { fatalError("init(coder:) has not been implemented") } deinit { link.invalidate() } @objc func tick(_ link: CADisplayLink){ guard lastTime ! = 0 else {lastTime = linking. timestamp return} count += 1 let detla = linking. timestamp - lastTime guard detla >= 1.0 Else {return} lastTime = link.timestamp // refresh times/time difference = refresh frequency FPS = Double(count)/detla let fpsText = "\(String.init(format: "%.2f", fps)) FPS" count = 0 let attrMStr = NSMutableAttributedString(attributedString: NSAttributedString(string: Else if (FPS >= 50.0 && FPS <= 55.0){// fpsColor = UIColor. Yellow} else {/ / caton fpsColor = UIColor. Red} attrMStr. SetAttributes ([NSAttributedString. Key. ForegroundColor: fpsColor], range: NSMakeRange(0, attrMStr.length - 3)) attrMStr.setAttributes([NSAttributedString.Key.foregroundColor: UIColor.white], range: NSMakeRange(attrMStr.length - 3, 3)) DispatchQueue.main.async { self.attributedText = attrMStr } } }Copy the code

For simple monitoring, an FPS is sufficient.

Main thread stalling monitoring

In addition to the FPS, you can also monitor with a RunLoop, because the transactions that are stuck are handled by the RunLoop on the main thread.

Implementation idea: detect the main thread each time the execution of the message loop time, when the time is greater than the specified threshold, it is recorded as a lag. This is also the principle of wechat’s three-way matrix

Here is a simple implementation of RunLoop monitoring

// // cjlBlockmonitor. swift // UIOptimizationDemo // // Created by Chen Jialin on 2020/12/2. // import UIKit class CJLBlockMonitor: NSObject { static let share = CJLBlockMonitor.init() fileprivate var semaphore: DispatchSemaphore! fileprivate var timeoutCount: Int! fileprivate var activity: CFRunLoopActivity! Private override init() {super.init()} public func start(){registerObserver() // startMonitor()}} fileprivate extension CJLBlockMonitor{ func registerObserver(){ let controllerPointer = Unmanaged<CJLBlockMonitor>.passUnretained(self).toOpaque() var context: CFRunLoopObserverContext = CFRunLoopObserverContext(version: 0, info: controllerPointer, retain: nil, release: nil, copyDescription: nil) let observer: CFRunLoopObserver = CFRunLoopObserverCreate(nil, CFRunLoopActivity.allActivities.rawValue, true, 0, { (observer, activity, info) in guard info ! = nil else{ return } let monitor: CJLBlockMonitor = Unmanaged<CJLBlockMonitor>.fromOpaque(info!) .takeUnretainedValue() monitor.activity = activity let sem: DispatchSemaphore = monitor.semaphore sem.signal() }, &context) CFRunLoopAddObserver(CFRunLoopGetMain(), observer, CFRunLoopMode.com monModes)} func startMonitor () {/ / create a signal semaphore = DispatchSemaphore (value: 0) dispatchQueue.global ().async {while(true){// timeout is 1 second, st is not equal to 0, Let st = self.semaphore.wait(timeout: dispatchtime.now ()+1.0) if st! = DispatchTimeoutResult. Success {/ / two state kCFRunLoopBeforeSources, kCFRunLoopAfterWaiting monitoring,  if self.activity == CFRunLoopActivity.beforeSources || self.activity == CFRunLoopActivity.afterWaiting { Self.timeoutcount += 1 if self.timeoutCount < 2 {print(" timeoutCount = \(self.timeoutCount)") continue} self.timeoutCount += 1 if self.timeoutCount < 2 {print(" timeoutCount = \(self.timeoutCount)") continue It is very possible to avoid mass printing continuously! }} self. TimeoutCount = 0}}}}Copy the code

When in use, it can be called directly

CJLBlockMonitor.share.start()
Copy the code

You can also use the three-party library directly

  • The main idea of Swift lag detection of the third party ANREye is as follows: create a child thread for loop monitoring, set the flag to true during each detection, then send tasks to the main thread, set the flag to false, and then judge whether the flag is false when the sleep of the child thread exceeds the threshold, if not, it indicates that the main thread has a lag

  • OC can use wechat Matrix and Didi DoraemonKit

Interface optimization

CPU level optimization

  • 1. Use lightweight objects instead of heavy objects to optimize performance. For example, use CALayer instead of UIView for controls that do not need corresponding touch events

  • 2. Minimize changes to UIView and CALayer properties

    • There are no properties inside the CALayer. When a property method is called, the CALayer temporarily adds a method to the object using a run-time resolveInstanceMethod and stores the corresponding property value in an internal Dictionary. The CALayer also notificates a delegate, creates an animation, and so on

    • Uiview-related display properties, such as frame, bounds, transform, and so on, are actually mapped from the CALayer, which can consume more resources to adjust than normal properties

  • 3. When there is a large number of objects to release, it is also very time-consuming, try to move to the background thread to release

  • 4, try to calculate the view layout in advance, that is, pre-typesetting, such as the row height of the cell

  • 5. Autolayout is a great way to improve development efficiency for simple pages, but it can cause serious performance problems for complex views. As the number of views increases, Autolayout’s CPU consumption increases exponentially. So use code layout as much as possible. If you do not want to manually adjust the frame, you can also use the three-party libraries, such as selector (OC), SnapKit (Swift), ComponentKit, And AsyncDisplayKit

  • 6, text processing optimization: When an interface has a large number of text, the calculation of the line height, drawing is also very time-consuming

    • 1) If there is no special requirement for text, the internal implementation method of UILabel can be used, and it needs to be placed in the child thread to avoid blocking the main thread
      • High computing text width: [NSAttributedString boundingRectWithSize: options: context:]

      • Text rendering: [NSAttributedString drawWithRect: options: context:]

    • 2) Custom text controls, usingTextKitOr the lowest levelCoreTextDraws text asynchronously. andCoreTextAfter the object is created, the width and height information of the text can be obtained directly, avoiding multiple calculations (adjustment and drawing need to be calculated once). CoreText directly uses CoreGraphics to occupy small memory, high efficiency
  • 7. Image processing (decoding + drawing)

    • 1) when using UIImage or CGImageSource method to create the image, the image data is not immediately decoding, but when set decoding (namely picture set to UIImageView/CALayer. The contents, and then before the CALayer submitted to GPU rendering, CGImage is decoded). This step is inevitable and occurs in the main thread. To get around this mechanism, it is common to draw the image into the CGBitmapContext in the child thread and then create the image directly from the Bitmap, such as the image codecs processing in the SDWebImage tripartite framework. So that’s the predecoding of the Image

    • When drawing an image to the canvas using a CG starting method, and then creating an image from the canvas, you can draw the image in a child thread

  • 8. Picture optimization

    • 1) Try to use PNG images instead of JPGE images

    • 2) Pre-decoded by the child thread, rendered by the main thread, that is, created by Bitmap, and assigned image in the child thread

    • 3) Optimize the image size and try to avoid dynamic zooming

    • 4) Try to combine multiple images into one for display

  • 9. Avoid using a transparent view as much as possible, because using a transparent view will cause the GPU to count the pixels of the lower layer of the transparent view, that is, the color blending process. You can refer to OpenGL rendering techniques: Depth testing, polygon offset, and Blending

  • 10. Load on demand. For example, when you slide in a TableView, instead of loading an image, use the default placeholder map and load it when the slide stops

  • 11. Do not use addView to dynamically add a view to a cell

GPU optimization

Compared with THE CPU, the GPU mainly receives the texture + vertices submitted by the CPU, passes through a series of transformations, finally mixes and renders, and outputs to the screen.

  • 1. Minimize the display of a large number of pictures in a short time, and combine many pictures into one display as much as possible, mainly because when a large number of pictures are displayed, both CPU calculation and GPU rendering are time-consuming, and frame dropping is likely to occur

  • 2. Try to avoid the image size exceeding 4096×4096, because when the image size exceeds this size, it will be preprocessed by THE CPU first and then submitted to the GPU for processing, resulting in additional CPU resource consumption

  • 3. Reduce the number and level of views as much as possible, mainly because when there are too many and overlapping views, GPU will mix them, which is also time-consuming

  • 4, try to avoid off-screen rendering, you can see this article four, in-depth analysis of [off-screen rendering] principle

  • 5. Asynchronous rendering, for example, all the controls and views in the cell can be synthesized into a picture for display. Consider the Graver tripartite framework

Note: The implementation of the above optimization methods needs to be evaluated according to their own projects, and be optimized with reasonable use