In APP development, Caton is the absolute head of optimization. In order to help developers better locate problems, Google provides many tools, such as Systrace, GPU presentation mode analysis tool, Android Studio CPU Profiler, etc., mainly to help locate which code and which piece of logic is time-consuming. Affecting UI rendering, resulting in stalling. Take the Profile GPU Rendering tool for example, which presents nodes that may time out in a very intuitive way. The tool and its principles are also the focus of this article:

CPU Profiler will also provide similar charts. This article mainly focuses on the GPU presentation mode analysis tool, briefly analyzes the principle of time consumption statistics in each stage, and summarizes some problems encountered in the process of use and analysis. It may be the BUG of the tool itself, which brings a lot of confusion to the analysis. Such as the following:

  • The GPU rendering mode analysis tool does not seem to correspond to the official Google documentation (each color represents the stage).
  • It seems that some of the calls in the CPU Profiler are merged, rather than separate call stacks (affecting which blocks of time are analyzed).
  • Skip Frame may be different from what we expected, and the statistics of Skip Frame may not be correct (mainly due to the delay part of Vsync, some time-consuming operations lead to delay, but the dropped Frame may not be counted).

Introduction to the GPU Rendering mode analysis Tool

The green line is a threshold of 16ms, which can cause frames to drop. This is related to the VSYNC VSYNC signal. Of course, this graph is not exact (for reasons explained later). Each color of the square represents a different processing stage. First take a look at the mapping table given in the official document:

To fully understand the phases, you need to have some understanding of hardware acceleration and GPU Rendering, but it’s important to keep in mind that despite the name Profile GPU Rendering, all phases in the diagram take place on the CPU, not the GPU. Finally, the CPU submits the command to the GPU to trigger the GPU to asynchronously render the screen, and then the CPU will process the next frame, while the GPU processes the rendering in parallel, which can be regarded as parallel in hardware. However, there are times when the GPU may be too busy to keep up with the CPU, and the CPU must wait for the final swapbuffer, mainly the final red and yellow parts (synchronous uploaders will not be a problem, While waiting, you will see spikes in the orange and red bars, and the command submission will be blocked until the GPU command queue has more space.

The first problem I faced with using Profile GPU Rendering was that the instructions for using the official documentation didn’t seem right.

Profile GPU Rendering tool color problem

When you actually use the tool, the color of the bar chart doesn’t seem to match the document. To test this, I’ll simulate the scene with a small piece of code, identify the stages, and then analyze the source code. From the bottom up, ignore the VSYNC part and look at the input events first, adding a delay for the touch events and triggering a redraw in a custom layout.

    @Override
    public boolean dispatchTouchEvent(MotionEvent ev) {
        try {
            Thread.sleep(20);
        } catch (InterruptedException e) {
            
        }
        mTextView.setText("" + System.currentTimeMillis());
        requestLayout();
        super.dispatchTouchEvent(ev);
        return true;
    }
Copy the code

At this time, the timeout part is mainly caused by the input event, and then determine the color of the input event:

If the input event is delayed by 20ms, the red square in the figure above just maps to the input event time. It can be seen here that the color of the input event is not consistent with the color of the official document, as shown below

Also, the time taken to measure the layout is not proportional to the documentation. Add a time to the layout measurement to verify:

@Override
protected void onMeasure(int widthMeasureSpec, int heightMeasureSpec) {
    try {
        Thread.sleep(20);
    } catch (InterruptedException e) {
        
    }
    super.onMeasure(widthMeasureSpec, heightMeasureSpec);
}
Copy the code

As you can see, the time taken to measure the layout in the figure above does not match the color of the official document. In addition, there seems to be a third part of time, which is actually the time of VSYNC synchronization. How does this part of time come from? Does it really exist? The official explanation seems to be the time between consecutive frames, but later analysis will find that this explanation may not correspond to the source code.

Miscellaneous

In addition to the time it takes the rendering system to perform its work, there’s an additional set of work that occurs on the main thread and has nothing to do with rendering. Time that this work consumes is reported as misc time. Misc time generally represents work that might be occurring on the UI thread between two consecutive frames of rendering.

Second, why does almost every bar chart have a measured layout time and input event time? Why one-to-one, not multiple? Should the measurement layout be performed immediately after the Touch event or wait until the next VSYNC signal arrives? The main topics covered in this tutorial are VSYNC VSYNC signals, ViewRootImpl, Choreographer, and Touch event handlers, which will be explained step by step. Let’s take a look at how the time spent on these three events is calculated.

Miscellaneous – VSYNC latency

In the Choreographer class, the moment is when VSYNC signal Message is executed, notice that the signal Message is executed, not the signal arrives, because the signal arrives does not mean that it is executed immediately. Because the VSYNC signal is applied asynchronously, the thread continues to execute the current message after the signal is applied, and the SurfaceFlinger directly inserts a VSYNC message into the MessageQueue of the APP UI thread when the VSYNC is sent next time. However, the message will not be executed immediately after being inserted. Instead, it waits for the previous message to complete, and the VSYNC delay is the delay between the time the VSYNC message is executed.

void doFrame(long frameTimeNanos, int frame) { final long startNanos; synchronized (mLock) { if (! mFrameScheduled) { ... long intendedFrameTimeNanos = frameTimeNanos; <! Mframeinfo. setVsync(intendedFrameTimeNanos, frameTimeNanos); mFrameScheduled = false; mLastFrameTimeNanos = frameTimeNanos; } the try {/ / handling input events, and record starting time mFrameInfo. MarkInputHandlingStart (); doCallbacks(Choreographer.CALLBACK_INPUT, frameTimeNanos); / / start dealing with animation, and record starting time mFrameInfo. MarkAnimationsStart (); doCallbacks(Choreographer.CALLBACK_ANIMATION, frameTimeNanos); / / start dealing with measurement layout, and record starting time mFrameInfo. MarkPerformTraversalsStart (); doCallbacks(Choreographer.CALLBACK_TRAVERSAL, frameTimeNanos); } finally { }Copy the code

The VSYNC delay here is actually mFrameInfo. MarkInputHandlingStart – frameTimeNanos, timestamp and frameTimeNanos is VSYNC signal arrived, as follows

private final class FrameDisplayEventReceiver extends DisplayEventReceiver implements Runnable { private boolean mHavePendingVsync; private long mTimestampNanos; private int mFrame; public FrameDisplayEventReceiver(Looper looper) { super(looper); } @Override public void onVsync(long timestampNanos, int builtInDisplayId, int frame) { ... <! Save the timestamp and send a message to the UI MessageQueue. mFrame = frame; Message msg = Message.obtain(mHandler, this); msg.setAsynchronous(true); mHandler.sendMessageAtTime(msg, timestampNanos / TimeUtils.NANOS_PER_MS); } @Override public void run() { <! -- Pass the previous timestamp to doFrame--> mHavePendingVsync = false; doFrame(mTimestampNanos, mFrame); }}Copy the code

OnVsync is a method that calls back to the Java layer at the Native layer when the VSYNC signal arrives. It is actually a set of Native message queues in the MessegeQueue, and the next VSYNC will not take effect until the first one is executed. Otherwise, the next VSYNC will have to wait in the queue. The third part of the delay is the VSYNC delay, but this should not be counted in the rendering, and depending on how it is written, the VSYNC delay may vary greatly. Look at the part of doFrame that counts frames dropped. Personally, maybe this part is not particularly reliable. Let’s look at the part of frame dropped.

Relationship between Skiped Frame drop and Vsync delay time

Some APM detection tools achieve frame drop detection by setting Choreographer’s SKIPPED_FRAME_WARNING_LIMIT to 1, namely:

    try {
        Field field = Choreographer.class.getDeclaredField("SKIPPED_FRAME_WARNING_LIMIT");
        field.setAccessible(true);
        field.set(Choreographer.class, 0);
    } catch (Throwable e) {
        
    }
Copy the code

If a delay occurs, the following information is displayed in the log

Feel here is not too rigorous, look at the source code as follows:

void doFrame(long frameTimeNanos, int frame) { final long startNanos; synchronized (mLock) { if (! mFrameScheduled) { return; // no work to do } long intendedFrameTimeNanos = frameTimeNanos; <! --skip frame --> startNanos = system.nanotime (); final long jitterNanos = startNanos - frameTimeNanos; if (jitterNanos >= mFrameIntervalNanos) { final long skippedFrames = jitterNanos / mFrameIntervalNanos; if (skippedFrames >= SKIPPED_FRAME_WARNING_LIMIT) { Log.i(TAG, "Skipped " + skippedFrames + " frames! " + "The application may be doing too much work on its main thread."); }... }Copy the code

It can be seen that the algorithm of frame hop detection is: Vsync signal delay /16ms, how many, even if the number of frames to jump. When the Vsync signal arrives, the redraw does not necessarily take place immediately, because the UI thread may block somewhere, such as in the Touch event, the redraw is triggered and a time-consuming operation continues, which will inevitably cause the Vsync signal to be delayed and the frame hop log to be printed, as shown below

    @Override
    public boolean dispatchTouchEvent(MotionEvent ev) {
        super.dispatchTouchEvent(ev);
        scrollTo(0,new Random().nextInt(15));
        try {
            Thread.sleep(40);
        } catch (InterruptedException e) {
            
        }
        return true;
    }
Copy the code

As you can see, the part of color 2 is the Vsync signal delay, this time will have a frame drop log.

But what if you put the message that triggers the UI redraw after the delayed operation? No doubt, the lag is still there, but an interesting phenomenon occurs. The system considers that no frames are lost, and the code is as follows:

    @Override
    public boolean dispatchTouchEvent(MotionEvent ev) {
        super.dispatchTouchEvent(ev);
        try {
            Thread.sleep(40);
        } catch (InterruptedException e) {
            
        }
        scrollTo(0,new Random().nextInt(15));
        return true;
    }
Copy the code

It can be seen that there is almost no Vsync signal delay in the figure. Why is this? Because the application for the next VSYNC signal is triggered by scrollTo, there is no delay operation after it is triggered. After the arrival of the VSYNC signal, doFrame is immediately executed. The delay between this is very small, and the system thinks that no frame is dropped, but in fact, the lag is still there. Since the frame rate is the same over a period of time, the overall picture is as follows:

Above is scrollTo difference before and after the delay, two are off the frame, but there is a problem log statistics jump frame, and each frame real consumption also don’t turn out the way we see, personal feel this tool may be a BUG, not very accurate response caton, rely on the FPS detection, should also have a problem. For example, when scrolling, after processing time-consuming operations, and then update the UI, this way is not able to detect the jump frame, of course, does not rule out other better solutions. The Input time is only one Touch event in a frame. The Input time is only one Touch event in a frame. Are all Touch events counted? How do Touch events affect GPU statistics tools?

Input event time analysis

Input event handling mechanism: InputManagerService captures user input and passes the event to the APP over the Socket (inserts messages into the message queue of the UI thread). There are different processing mechanisms for different touch events: For Down and UP events, the APP side needs to process them immediately; for Move events, the APP side needs to process them together with redraw events. In fact, it needs to wait for the next VSYNC and process them in batches. It can be assumed that only MOVE events are counted in the GPU histogram. UP and DOWN events are executed immediately without waiting for VSYNC to execute with UI redraw. Here, we can take a look at the statistical basis of time consumption at each stage. The graph drawing of GPU rendering tool is completed in native layer, and the statistical diagram of each stage is shown as follows:

FrameInfoVisualizer.cpp

Above analysis VSYNC delay is actually FrameInfoIndex: : HandleInputStart – FrameInfoIndex: : IntendedVsync color is 0 x00796b, Input event time is actually FrameInfoIndex: : PerformTraversalsStart – FrameInfoIndex: : HandleInputStart, but there is only seven, eight with the document in the corresponding. This can be verified in doFrame:

void doFrame(long frameTimeNanos, int frame) { final long startNanos; synchronized (mLock) { if (! mFrameScheduled) { ... // Set vsync to start and record the start time <! --> mframeInfo. setVsync(intendedFrameTimeNanos, frameTimeNanos); mFrameScheduled = false; mLastFrameTimeNanos = frameTimeNanos; } the try {/ / handling input events, and record starting time mFrameInfo. MarkInputHandlingStart (); doCallbacks(Choreographer.CALLBACK_INPUT, frameTimeNanos); / / start dealing with animation, and record starting time mFrameInfo. MarkAnimationsStart (); doCallbacks(Choreographer.CALLBACK_ANIMATION, frameTimeNanos); / / start dealing with measurement layout, and record starting time mFrameInfo. MarkPerformTraversalsStart (); doCallbacks(Choreographer.CALLBACK_TRAVERSAL, frameTimeNanos); } finally { }Copy the code

The code above is very simple, but when you look at the function call stack using CPU Profiler, you find a lot of problems. After adding a delay for Touch event processing, the CPU Profiler sees the following call stack:

What’s going on with this stack? A VSYNC signal calls a doFrame, and a doFrame call executes each type of CallBack in turn. It turns out that it really is a CPU Profiler BUG. The proof is that the number of doFrame calls does not correspond to the number of calls counted in the CPU Profiler, which is obviously much higher.

This means that the CPU Profiler should group similar function calls into the integration, so it looks as if a Vsync executes a doFrame once, but many callbacks are executed. In fact, by default, each type of CallBack is executed during a single Vsync. Generally, the command can be executed at most once. ** Under the vSYNC signal mechanism, the Android system can only process one MOVE Patch, one drawing request and one animation update at most before the next vsync signal arrives. ** How does dispatchTouchEvent get executed?

    @Override
    public boolean dispatchTouchEvent(MotionEvent ev) {
        try {
            Thread.sleep(20);
        } catch (InterruptedException e) {
            
        }
        mTextView.setText("" + System.currentTimeMillis());
        requestLayout();
        super.dispatchTouchEvent(ev);
        return true;
    }
Copy the code

InputManagerService receives the Touch event and sends it to the APP through the Socket. The UI Loop of the APP reads the event and sends the event to the Java layer with native preprocessing.

public abstract class InputEventReceiver { ... public final boolean consumeBatchedInputEvents(long frameTimeNanos) { if (mReceiverPtr == 0) { Log.w(TAG, "Attempted to consume batched input events but the input event " + "receiver has already been disposed."); } else { return nativeConsumeBatchedInputEvents(mReceiverPtr, frameTimeNanos); } return false; } // Called from native code. @SuppressWarnings("unused") private void dispatchInputEvent(int seq, InputEvent event) { mSeqMap.put(event.getSequenceNumber(), seq); onInputEvent(event); } // NativeInputEventReceiver // Called from native code. @SuppressWarnings("unused") private void dispatchBatchedInputEventPending() { onBatchedInputEventPending(); }... }Copy the code

If it is DOWN, UP, call dispatchInputEvent, if is MOVE events, are encapsulated into a Batch, call dispatchBatchedInputEventPending, For DOWN and UP events, the subclass enqueueInputEvent is called immediately

final class WindowInputEventReceiver extends InputEventReceiver { public WindowInputEventReceiver(InputChannel inputChannel, Looper looper) { super(inputChannel, looper); } @Override public void onInputEvent(InputEvent event) { <! EnqueueInputEvent (event, this, 0, true); } void enqueueInputEvent(InputEvent event, InputEventReceiver receiver, int flags, boolean processImmediately) { adjustInputEventForCompatibility(event); <! QueuedInputEvent q = obtainQueuedInputEvent(event, receiver, flags); . <! --> if (processImmediately) {doProcessInputEvents(); } else { scheduleProcessInputEvents(); }}Copy the code

For DOWN UP events doProcessInputEvents are called immediately, While for dispatchBatchedInputEventPending call WindowInputEventReceiver onBatchedInputEventPending delay to the next VSYNC implementation:

final class WindowInputEventReceiver extends InputEventReceiver { public WindowInputEventReceiver(InputChannel inputChannel, Looper looper) { super(inputChannel, looper); }... @Override public void onBatchedInputEventPending() { if (mUnbufferedInputDispatch) { super.onBatchedInputEventPending();  } else { scheduleConsumeBatchedInput(); }}Copy the code

MUnbufferedInputDispatch default is false, in order to improve the execution efficiency, distribution of source this parameter is false, so it can perform scheduleConsumeBatchedInput:

void scheduleConsumeBatchedInput() { <! -- mConsumeBatchedInputScheduled ensures the current Touch events are executed before, there will be no Batch event is inserted - > if (! mConsumeBatchedInputScheduled) { mConsumeBatchedInputScheduled = true; <! -- With Choreographer staging callback, At the same time request VSYNC signal - > mChoreographer. PostCallback (Choreographer. CALLBACK_INPUT mConsumedBatchedInputRunnable, null); }}Copy the code

Logic ensures that each VSYNC of scheduleConsumeBatchedInput only up to a Batch is processed. Choreographer.CALLBACK_INPUT type CallBack is the object for input event timing statistics. Only Batch Touch events (MOVE events) are involved in this type. A **MOVE scroll or slide event is usually accompanied by a UI update, and this continuous flow is the focus of framerate. Without continuous updates, FPS (frame rate) is meaningless. * * to see the Choreographer. PostCallback function

private void postCallbackDelayedInternal(int callbackType, Object action, Object token, long delayMillis) { synchronized (mLock) { final long now = SystemClock.uptimeMillis(); final long dueTime = now + delayMillis; <! --> mCallbackQueues[callbackType]. AddCallbackLocked (dueTime, Action, token); <! If (dueTime <= now) {scheduleFrameLocked(now); scheduleFrameLocked(now); }... }Copy the code

Choreographer adds a CallBack for a Touch event and joins the cache queue while asynchronously applying for VSYNC, waiting for the signal to arrive before processing the CallBack for the Touch event. After the VSYNC signal arrives, Choreographer first executes doCallbacks on doFrame (Choreographer.CALLBACK_INPUT, frameTimeNanos), This function will be called ConsumeBatchedInputRunnable run function, finally call doConsumeBatchedInput Batch processing events:

void doConsumeBatchedInput(long frameTimeNanos) { <! - tag event is processed, the new event to have the opportunity to be added - > if (mConsumeBatchedInputScheduled) {mConsumeBatchedInputScheduled = false; if (mInputEventReceiver ! = null) { if (mInputEventReceiver.consumeBatchedInputEvents(frameTimeNanos) && frameTimeNanos ! = 1) {... }} <! -- Handle events --> doProcessInputEvents(); }}Copy the code

DoProcessInputEvents call back to the corresponding dispatchTouchEvent to handle the Touch event. There’s an important point about this: If a UI redraw is triggered during Batch events (which is quite common), such as MOVE events that are usually accompanied by list scrolling, the redraw CallBack is immediately added to the Choreographer.CALLBACK_TRAVERSAL queue. And immediately after the current Choreographer.CALLBACK_INPUT callback is executed, which is why you can always see a Touch event followed by a UI redraw event in CPU Profiler. RequestLayout () will eventually call ViewRootImpl in the example above:

  @Override
public void requestLayout() {
    if (!mHandlingLayoutInLayoutRequest) {
        checkThread();
        mLayoutRequested = true;
        scheduleTraversals();
    }
}
Copy the code

To invoke scheduleTraversals, you can see that mTraversalScheduled is also used to guarantee a maximum of one redraw in a VSYNC:

Void scheduleTraversals() {// If (! mTraversalScheduled) { mTraversalScheduled = true; <! --> mTraversalBarrier = mhandler.getlooper ().getQueue().postsyncBarrier (); mChoreographer.postCallback( Choreographer.CALLBACK_TRAVERSAL, mTraversalRunnable, null); <! When updating the UI, usually with a MOVE event, a Vsync signal is requested in advance, without actually waiting for the message to arrive. Improve the throughput - > / / mUnbufferedInputDispatch = false is false so will perform scheduleConsumeBatchedInput commonly, if (! mUnbufferedInputDispatch) { scheduleConsumeBatchedInput(); } notifyRendererOfFramePending(); pokeDrawLockIfNeeded(); }}Copy the code

For redraw event, through mChoreographer. PostCallback directly add a CallBack, request Vsync signal at the same time, In general the scheduleConsumeBatchedInput scheduleTraversals request VSYNC is invalid, because the two consecutive requests VSYNC, only once is valid, ScheduleConsumeBatchedInput only accounted for a position for subsequent Touch events in advance. When you first execute the Touch event, the mCallbackQueues information looks like this:

As you can see, there is no CallBack of type Choreographer.CALLBACK_TRAVERSAL at the beginning. When the Touch event is processed, the redraw is triggered and the Choreographer.CALLBACK_TRAVERSAL class CallBack is dynamically added as follows

So, after the current MOVE time is processed, the doCallbacks(Choreographer.CALLBACK_TRAVERSAL, frameTimeNanos) will be executed, and the redraw CallBack that was just added will be executed immediately. Instead of waiting for the next Vsync signal, this is how MOVE and redraw correspond one to one and redraw is always executed after a MOVE event. Also, Choreographer has used a number of flags to ensure that during a single Vsync, At most one MOVE event and the return time are executed sequentially (ignoring the animation for now). The above two are the twisted places in the GPU metaphysics curve, and the remaining stages are actually quite clear.

CALLBACK_ANIMATION class CallBack time (seems to be included in Touch event time)

Normally when a MOVE event is accompanied by an Scroll, like a List Scroll, it might trigger something called an animation,

public void scrollTo(int x, int y) { if (mScrollX ! = x || mScrollY ! = y) { int oldX = mScrollX; int oldY = mScrollY; mScrollX = x; mScrollY = y; invalidateParentCaches(); onScrollChanged(mScrollX, mScrollY, oldX, oldY); if (! awakenScrollBars()) { postInvalidateOnAnimation(); }}}Copy the code

Final call Choreogtapher postInvalidateOnAnimation create Choreographer. CALLBACK_ANIMATION type callback

public void postInvalidateOnAnimation() { final AttachInfo attachInfo = mAttachInfo; if (attachInfo ! = null) { attachInfo.mViewRootImpl.dispatchInvalidateOnAnimation(this); } } final class InvalidateOnAnimationRunnable implements Runnable { ... private void postIfNeededLocked() { if (! mPosted) { mChoreographer.postCallback(Choreographer.CALLBACK_ANIMATION, this, null); mPosted = true; }}Copy the code

Call the View’s invalidate, which doesn’t take much time:

 final class InvalidateOnAnimationRunnable implements Runnable {
     @Override
        public void run() {
            final int viewCount;
            final int viewRectCount;
            synchronized (this) {
               ...
            for (int i = 0; i < viewCount; i++) {
                mTempViews[i].invalidate();
                mTempViews[i] = null;
            }
Copy the code

Of course, it would be different if there were custom animations. However, as far as the statistical time of GPU rendering mode is concerned, it seems that there is no such time as the official document says, and there are only seven paragraphs in the source code, as shown below:

If you override the invalidate function, you will find that the input event time is accounted for:

 @Override
public void invalidate() {
    super.invalidate();
    try {
        Thread.sleep(10);
    } catch (InterruptedException e) {
    }
}
Copy the code

That is to say, the following official statement may be wrong, because this part of the time is not seen on the real phone, or this part of the time is attributed to the Touch event time, from the source appears to be the same.

Measurement, layout, drawing time

When it comes time to measure the redraw, the whole process is clear. Measuring the redraw time in the UI thread is intuitive and faithful, and it is as much as it takes. There is no awkward problem like Vsync, there is no need for analysis, but it is important to note that Draw is just building the DisplayList number. So far, all of this is done in the UI thread. The remaining three stages, Sync/upload, Issue Commands, and swap buffers, are all done in the RenderThread thread.

Sync/upload Takes time

The Sync & Upload metric represents the time it takes to transfer bitmap objects from CPU memory to GPU memory during the current frame.

As different processors, the CPU and the GPU have different RAM areas dedicated to processing. When you draw a bitmap on Android, the system transfers the bitmap to GPU memory before the GPU can render it to the screen. Then, the GPU caches the bitmap so that the system doesn’t need to transfer the data again unless the texture gets evicted from the GPU texture cache.

In this case, the data used by the CPU and GPU are independent from each other. There will be no synchronization problem when the two are processed in parallel. If the time is too long, Note There is too much bitmap information to upload. I feel it is mainly for textures and materials.

Issue commands

The Issue Commands segment represents the time it takes to issue all of the commands necessary for drawing display lists to the screen.

This part takes time mainly because the CPU sends the drawing command to the GPU, and then the GPU can render according to these OpenGL commands. This part is mainly CPU call OpenGL ES API to achieve.

SwapBuffers time-consuming

Once Android finishes submitting all its display list to the GPU, the system issues one final command to tell the graphics driver that it’s done with the current frame. At this point, the driver can finally present the updated image to the screen.

After the previous GPU command has been issued, the CPU will usually send the last command to the GPU, telling the GPU that the current command has been sent and can be processed. Generally speaking, the GPU needs to send back a confirmation command. However, this does not mean that the GPU has finished rendering, but merely notifying the CPU that the GPU is free to start rendering. However, the APP side does not need to care about the following problems, and the CPU can continue to process the task of the next frame. If the GPU is too busy to reply to a notification, the CPU blocks and waits until it receives the notification before invoking the currently blocked Render thread to process the next message, which is done in swapBuffers. For further information, see Android Hardware Acceleration (2) -RenderThread and OpenGL GPU rendering.

OpenGL GPU Profiler source code (non real machine, software simulation of OpenGL library Libagl)

GPU Profiler drawing is implemented using the Draw function FrameInfoVisualizer:

void FrameInfoVisualizer::draw(OpenGLRenderer* canvas) { RETURN_IF_DISABLED(); . If (mType == ProfileType::Bars) {Patch up the current frame to pretend we ended here. CanvasContext // will overwrite these values with the real ones after we return. // This is a bit nicer looking than the  vague green bar, as we have // valid data for almost all the stages and a very good idea of what // the issue stage will look like, too FrameInfo& info = mFrameSource.back(); info.markSwapBuffers(); info.markFrameCompleted(); <! -- Calculate width and height --> initializeRects(Canvas ->getViewportHeight(), Canvas ->getViewportWidth())); drawGraph(canvas); drawThreshold(canvas); }}Copy the code

This code prefixes markSwapBuffers and markFrameCompleted, and uses real time for calibration of CanvasContext:

void FrameInfoVisualizer::drawGraph(OpenGLRenderer* canvas) { SkPaint paint; for (size_t i = 0; i < Bar.size(); i++) { nextBarSegment(Bar[i].start, Bar[i].end); paint.setColor(Bar[i].color | BAR_FAST_ALPHA); canvas->drawRects(mFastRects.get(), mNumFastRects * 4, &paint); paint.setColor(Bar[i].color | BAR_JANKY_ALPHA); canvas->drawRects(mJankyRects.get(), mNumJankyRects * 4, &paint); }}Copy the code

We have briefly analyzed the four types of Java layer time-consuming, now let’s look at the statistics of the last three types of time-consuming:

<! - synchronous start -- > void CanvasContext: : prepareTree (TreeInfo & info, int64_t * uiFrameInfo, int64_t syncQueued) { mRenderThread.removeFrameCallback(this); <! Import importUiThreadInfo(uiFrameInfo); import importUiThreadInfo(uiFrameInfo) mCurrentFrameInfo->set(FrameInfoIndex::SyncQueued) = syncQueued; MCurrentFrameInfo ->markSyncStart(); . mRootRenderNode->prepareTree(info); . }Copy the code

MarkSyncStart marks the start of uploading the bitmap. After prepareTree copies the Texture related bitmap to the GPU available memory area, CanvasContext:: Draw further issue the GPU buffer:

void CanvasContext::draw() { ... <! - the beginning of the Issue - > mCurrentFrameInfo - > markIssueDrawCommandsStart (); . <! --> profiler().draw(mCanvas); <! --> mCanvas->drawRenderNode(mrootrenderNode.get (), outBounds); <! -- mCurrentFrameInfo->markSwapBuffers(); if (drew) { swapBuffers(dirty, width, height); } // TODO: Use a fence for real completion? <! --> mCurrentFrameInfo->markFrameCompleted(); --> markFrameCompleted(); mJankTracker.addFrame(*mCurrentFrameInfo); mRenderThread.jankTracker().addFrame(*mCurrentFrameInfo); }Copy the code

MarkIssueDrawCommandsStart marked issue commands to start and mCanvas – > drawRenderNode is responsible for the real issue commands to the buffer, after the issue, inform the GPU rendering, and transfer the layer SurfaceFlinger, This is achieved through swapBuffers. On the real machine, the Fence mechanism is needed to synchronize GPU and CPU. See Android Hardware Acceleration (2) -RenderThread and OpenGL GPU rendering. Since the last three parts have little controllability, they will not be analyzed any more. If you are interested, you can check up the knowledge related to OpenGL and GPU.

conclusion

  • GPU Profiler color values with official documentation sorry to come
  • The animation time is not separate from the color block, but is merged into the Touch event time
  • Studio’s built-in CPU Profiler has a BUG with merge operations
  • Source code on the statistics of the frame may not be accurate, he is not the statistics of the frame, but VSYNC delay
  • Chorgropher ensures that a VSYNC signal has at most one Touch event, one redraw event, and one animation update through various tags
  • The graphics in GPU rendering mode are for reference only and not completely correct.

Small snail Android GPU presentation mode principle and stuck off frame analysis

For reference only, welcome correction