Hardware acceleration, intuitively speaking, relies on GPU to accelerate graphics drawing. The difference between hardware and software acceleration is whether graphics drawing is processed by GPU or CPU. If it is GPU, it is considered as hardware accelerated drawing, whereas software accelerated drawing. The same is true in Android, but hardware acceleration has been optimized in other ways compared to normal software drawing, not only in drawing, but also in how to build the drawing area before drawing. Therefore, hardware acceleration features can be analyzed from the following two parts:

  • 1. Early strategy: How to build the area to be drawn
  • 2. Post-rendering: separate rendering thread and rely on GPU for rendering

The allocation of drawing memory is similar for both software drawing and hardware acceleration, requiring the SurfaceFlinger service to allocate a chunk of memory, except that hardware acceleration may allocate memory directly from the FrameBuffer hardware buffer (as SurfaceFlinger has always done). There is no difference in this process. The real difference lies in how to complete UI data rendering at the APP end. This article will intuitively understand the difference between the two, which will involve part of the source code, but not very well.

Bifurcation point of hardware/software acceleration

Since Android 4.+, hardware acceleration has been enabled and enabled by default. Some phones support hardware acceleration, but some apis do not support hardware acceleration. Such as Canvas clipPath. However, View rendering is software accelerated or hardware accelerated implementation, generally in the development of time is not visible, that graphics rendering, hardware and software bifurcations where exactly? For example, if you have a View that needs to be redrawn, you’ll call the invalidate of the View to trigger the redraw, and you’ll follow this line to check for branch points.

View to redraw

As can be seen from the above call flow, view redrawing will finally enter ViewRootImpl’s draw, where a judgment point is the bifurcation point of software and hardware acceleration, simplified as follows

ViewRootImpl.java

private void draw(boolean fullRedrawNeeded) {
    ...
    if(! dirty.isEmpty() || mIsAnimating || accessibilityFocusDirty) { <! -- Keypoint 1 whether hardware acceleration is enabled -->if(mAttachInfo.mHardwareRenderer ! = null && mAttachInfo.mHardwareRenderer.isEnabled()) { ... dirty.setEmpty(); mBlockResizeBuffer =false; <! - 2 key hardware accelerated rendering - > mAttachInfo. MHardwareRenderer. The draw (mView, mAttachInfo, this); }else{... <! -- Key point 3 software drawing -->if(! drawSoftware(surface, mAttachInfo, xOffset, yOffset, scalingRequired, dirty)) {return; }...Copy the code

Key point 1 is the condition for enabling hardware acceleration. Hardware acceleration must be supported and enabled. If so, use hardwarerender.draw, otherwise drawSoftware (software drawSoftware). Jane a look at this condition, by default, the condition is true, because 4. + phone support hardware acceleration, generally after and at the time of adding window, will start by enableHardwareAcceleration ViewRootImpl hardware acceleration, New HardwareRenderer and initialize the hardware acceleration environment.

private void enableHardwareAcceleration(WindowManager.LayoutParams attrs) { <! -- Get hardware acceleration switch according to configuration --> // Try toenable hardware acceleration ifrequested final boolean hardwareAccelerated = (attrs.flags & WindowManager.LayoutParams.FLAG_HARDWARE_ACCELERATED) ! = 0;if(hardwareAccelerated) { ... <! - new hardware accelerated graphics renderer -- > mAttachInfo. MHardwareRenderer = HardwareRenderer. Create (mContext, translucent);if(mAttachInfo.mHardwareRenderer ! = null) { mAttachInfo.mHardwareRenderer.setName(attrs.getTitle().toString()); mAttachInfo.mHardwareAccelerated = mAttachInfo.mHardwareAccelerationRequested =true; }...Copy the code

In fact, here software drawing with hardware acceleration bifurcating point has been found, is ViewRootImpl in draw, if you need hardware acceleration will use HardwareRenderer for draw, otherwise go software drawing process, drawSoftware is actually very simple, Using SurfaceFlinger’s SurfaceLockCanvas, apply for a block of anonymous shared memory memory allocation, and obtain a common SkiaCanvas for calling Skia library for graph drawing.

private boolean drawSoftware(Surface surface, AttachInfo attachInfo, int xoff, int yoff, boolean scalingRequired, Rect dirty) { final Canvas canvas; try { <! Canvas = msurface. lockCanvas(dirty); . <! Draw --> mview.draw (canvas); . Key point 3 notify SurfaceFlinger layer synthetic surface. UnlockCanvasAndPost (canvas); }...return true;  }Copy the code

The drawSoftware work is completely done by the CPU and does not involve the operation of the GPU. Let’s focus on hardware-accelerated drawing by Hardwar Renderer.

Hardwar Renderer hardware-accelerated model rendering

As mentioned above, hardware-accelerated rendering consists of two stages: build phase + draw phase. The so-called build phase is to recursively traverse all views, cache the required operations, and then hand over to a separate Render thread using OpenGL to Render. In Android hardware acceleration framework, View View is abstracted into RenderNode node, and all drawings in View are abstracted into one DrawOp (DisplayListOp). For example, drawLine in View is abstracted into a DrawLintOp in construction. The drawBitmap operation is abstracted into DrawBitmapOp, and the drawing of each child View is abstracted into DrawRenderNodeOp. Each DrawOp has a corresponding OpenGL drawing command, and it also holds the data needed for drawing. As follows:

Drawing Op abstraction

So, each View not only holds its DrawOp List, but also holds the drawing entry of its child View, so that recursively, you can count all the drawing Op’s. Most analysis is called Display List, and that’s how you name classes in the source code, but it’s actually more like a tree. Instead of just a List, it looks like this:

Hardware acceleration. JPG

After the completion of the building, you can draw this drawing Op the tree to the Render thread, here is very different parts of the drawing software and drawing software, the View is generally done in the main thread, and hardware acceleration, unless special requirements, general drawing is done in a separate thread, so share the main thread a lot of pressure, Improved response time of UI threads.

Hardware acceleration model.jpg

Once you know the entire model, let’s look at the code for a quick look at the implementation process, starting with recursively building the RenderNode tree and DrawOp set.

Build DrawOp sets with Hardwar Renderer

HardwareRenderer is the entry point for hardware-accelerated rendering. The implementation is a ThreadedRenderer object, which, as the name suggests, should be associated with a Render thread. The ThreadedRenderer is created in the UI thread, so it must be relevant to the UI thread.

  • 1. Complete the DrawOp set construction in the UI thread
  • Communicate with the render thread

Seeing the ThreadedRenderer’s role is important, a quick look at the implementation:

ThreadedRenderer(Context context, boolean translucent) { ... <! --> New Native node--> long rootNodePtr = nCreateRootRenderNode(); mRootNode = RenderNode.adopt(rootNodePtr); mRootNode.setClipToBounds(false); <! New NativeProxy--> mNativeProxy = nCreateProxy(Always, rootNodePtr); ProcessInitializer.sInstance.init(context, mNativeProxy); loadSystemProperties(); }Copy the code

The ThreadedRenderer has a RootNode that identifies the root of the DrawOp tree, which gives access to all draw ops, and a RenderProxy object, which is the handle used to communicate with the rendering thread. Take a look at its constructor:

RenderProxy::RenderProxy(bool translucent, RenderNode* rootRenderNode, IContextFactory* contextFactory)
        : mRenderThread(RenderThread::getInstance())
        , mContext(nullptr) {
    SETUP_TASK(createContext);
    args->translucent = translucent;
    args->rootRenderNode = rootRenderNode;
    args->thread = &mRenderThread;
    args->contextFactory = contextFactory;
    mContext = (CanvasContext*) postAndWait(task);
    mDrawFrameTask.setContext(&mRenderThread, mContext);  
   }Copy the code

Can be seen from RenderThread: : getInstance (), is a singleton RenderThread thread, that is, each process only up to a hardware rendering thread, so as not to multi-threaded concurrent access conflict problems, here actually hardware rendering environment has set up a good environment. Let’s look at the ThreadedRenderer’s draw function and see how it builds the render Op tree:

@Override
void draw(View view, AttachInfo attachInfo, HardwareDrawCallbacks callbacks) {
    attachInfo.mIgnoreDirtyState = true; final Choreographer choreographer = attachInfo.mViewRootImpl.mChoreographer; choreographer.mFrameInfo.markDrawStart(); <! Build the View's DrawOp tree --> updateRootDisplayList(View, callbacks); <! RenderThread --> int syncResult = nSyncAndDrawFrame(mNativeProxy, frameInfo, frameinfo.length); . }Copy the code

Just focus on key point 1, updateRootDisplayList, build RootDisplayList, which is essentially building the DrawOp tree of the View, The updateRootDisplayList will then call the updateDisplayListIfDirty of the root View and recurse to the updateDisplayListIfDirty of the child View, thus creating the DrawOp tree.

private void updateRootDisplayList(View view, HardwareDrawCallbacks callbacks) { <! Update -- - > updateViewTreeDisplayList (view);if(mRootNodeNeedsUpdate || ! mRootNode.isValid()) { <! DisplayListCanvas--> DisplayListCanvas canvas = mRootNode.start(mSurfaceWidth, mSurfaceHeight); try { <! Canvas cache Op--> final int saveCount = canvas. Save (); canvas.translate(mInsetLeft, mInsetTop); callbacks.onHardwarePreDraw(canvas); canvas.insertReorderBarrier(); canvas.drawRenderNode(view.updateDisplayListIfDirty()); canvas.insertInorderBarrier(); callbacks.onHardwarePostDraw(canvas); canvas.restoreToCount(saveCount); mRootNodeNeedsUpdate =false; } finally { <! -- Fill all Op to RootRenderNode--> mRootNode.end(canvas); }}}Copy the code
  • Get a DisplayListCanvas using the View’s RenderNode
  • Build and cache all drawops using DisplayListCanvas
  • Fill the DisplayListCanvas cache’s DrawOp with RenderNode
  • Set the root View’s cache DrawOp to RootRenderNode to complete the build
Drawing process

A quick look at the View recursively builds DrawOp and populates itself to

 @NonNull
    public RenderNode updateDisplayListIfDirty() { final RenderNode renderNode = mRenderNode; . // start gets a DisplayListCanvas for drawing hardware accelerated final DisplayListCanvas canvas = renderNode.start(width, height); Try {// If textureView final HardwareLayer layer = getHardwareLayer();if(layer ! = null && layer.isValid()) { canvas.drawHardwareLayer(layer, 0, 0, mLayerPaint); }else if(layerType == LAYER_TYPE_SOFTWARE) {// Whether to force software to draw buildDrawingCache(true);
                    Bitmap cache = getDrawingCache(true);
                    if (cache != null) {
                        canvas.drawBitmap(cache, 0, 0, mLayerPaint);
                    }
                } else{// If it is just a ViewGroup, and does not draw itself, recurse directly to the child Viewif ((mPrivateFlags & PFLAG_SKIP_DRAW) == PFLAG_SKIP_DRAW) {
                        dispatchDraw(canvas);
                    } else{<! Draw (canvas); draw(canvas); draw(canvas); } } } finally { <! -- Cache build Op--> renderNode.end(canvas);setDisplayListProperties(renderNode); }}return renderNode;
    }Copy the code

TextureView is a special View that forces software to draw, so it has extra processing, so I don’t care about that, I’m just going to look at ordinary draw, so if in View onDraw there’s a drawLine, it’s going to call the drawLine function of the DisplayListCanvas, The DisplayListCanvas and RenderNode class diagrams look something like this

Hardware acceleration class diagram

The DisplayListCanvas drawLine function ends up in the DisplayListCanvas. CPP drawLine,

void DisplayListCanvas::drawLines(const float* points, int count, const SkPaint& paint) {
    points = refBuffer<float>(points, count);

    addDrawOp(new (alloc()) DrawLinesOp(points, count, refPaint(&paint)));
}Copy the code

As you can see, a DrawLinesOp is constructed and added to the cache list of the DisplayListCanvas, so that the recursion completes the construction of the DrawOp tree, after which RenderNode’s end function is used, Cache data from DisplayListCanvas into RenderNode:

public void end(DisplayListCanvas canvas) { canvas.onPostDraw(); long renderNodeData = canvas.finishRecording(); <! NSetDisplayListData (mNativeRenderNode, renderNodeData); Canvas. Recycle (); mValid =true;
}Copy the code

After that, the DrawOp tree is built, and RenderProxy is used to send a message to RenderThread requesting the OpenGL thread to render.

RenderThread renders the UI to the Graphic Buffer

After the DrawOp tree is constructed, the UI thread uses RenderProxy to send a DrawFrameTask request to the RenderThread thread, and the RenderThread is awakened to start rendering. The general process is as follows:

  • First, merge the DrawOp
  • And then draw the special Layer
  • Finally, all the remaining drawoplists are drawn
  • The call to swapBuffers submits the previously drawn graphic buffers to Surface Flinger for synthesis and display.

After all, the DrawOp tree is constructed only in ordinary user memory, and some data is not visible to SurfaceFlinger. After that, data drawn to shared memory will be synthesized by SurfaceFlinger. The UI that software draws is from anonymous shared memory, so where does hardware accelerated shared memory come from? At this point you might want to go back and look at view Roti LP

private void performTraversals() {...if(mAttachInfo.mHardwareRenderer ! = null) { try { hwInitialized = mAttachInfo.mHardwareRenderer.initialize( mSurface);if (hwInitialized && (host.mPrivateFlags
                        & View.PFLAG_REQUEST_TRANSPARENT_REGIONS) == 0) {
                    mSurface.allocateBuffers();
                }
            } catch (OutOfResourcesException e) {
                handleOutOfResourcesException(e);
                return; }}... /** * Allocate buffers ahead of time to avoid allocation delays during rendering * @hide */ public voidallocateBuffers() { synchronized (mLock) { checkNotReleasedLocked(); nativeAllocateBuffers(mNativeObject); }}Copy the code

As can be seen, for hardware-accelerated scenarios, the memory allocation time will be slightly earlier, rather than software drawing, initiated by Surface’s lockCanvas. The main purpose is: Avoid applying again at the time of rendering, one is to avoid allocation failure, waste the CPU before the preparation work, two is also can simplify the rendering thread, in the analysis of Android window management analysis (4) : After successful allocation, if necessary, a COPY of UI data will be carried out. This is the foundation of partial drawing and also the basis to ensure that DrawOp can be partially executed. Here, the memory is also allocated. However, there is another problem. An APP process may have a Surface drawing interface at the same time, but there is only one rendering thread. This is where you need to bind the Surface to the render thread (context).

static jboolean android_view_ThreadedRenderer_initialize(JNIEnv* env, jobject clazz,
        jlong proxyPtr, jobject jsurface) {
    RenderProxy* proxy = reinterpret_cast<RenderProxy*>(proxyPtr);
    sp<ANativeWindow> window = android_view_Surface_getNativeWindow(env, jsurface);
    return proxy->initialize(window);
}Copy the code

First, obtain the Surface through android_view_Surface_getNativeWindowSurface. In the Native layer, the Surface corresponds to an ANativeWindow. Then, Bind the previously obtained ANativeWindow to the RenderThread using the RenderProxy class member function Initialize

bool RenderProxy::initialize(const sp<ANativeWindow>& window) {
    SETUP_TASK(initialize);
    args->context = mContext;
    args->window = window.get();
    return (bool) postAndWait(task);
}Copy the code

The initialize function of CanvasContext is used to bind the drawing context to the drawing memory:

bool CanvasContext::initialize(ANativeWindow* window) {
    setSurface(window);
    if (mCanvas) return false;
    mCanvas = new OpenGLRenderer(mRenderThread.renderState());
    mCanvas->initProperties();
    return true;
}Copy the code

The CanvasContext binds the currently rendering Surface to the RenderThread using the setSurface. The process is basically to obtain an EGLSurface using the eglApi, which encapsulates a drawing Surface, and then, EglApi is used to set the EGLSurface to the current rendering window, and the drawing memory and other information are synchronized. After rendering with RenderThread, we can know which window to draw on. The main thing here is to connect to the OpenGL library, and all the operations ultimately boil down to the eglApi abstraction interface. If, here is not Android, is the ordinary Java platform, also need similar operations, packaging processing, and bind the current EGLSurface to render, because OpenGL is a set of specifications, want to use, you must follow this specification. And then we create an OpenGLRenderer object, and when we do OpenGL related operations, we actually use the OpenGLRenderer.

The binding process

The DrawOp tree has been created, the memory has been allocated, and the environment and scene have been bound. All that is left is to draw. However, there are still some merge operations before calling OpenGL to draw. Rendernode: OpenGLRenderer drawRenderNode: rendernode: OpenGLRenderer drawRenderNode

void OpenGLRenderer::drawRenderNode(RenderNode* renderNode, Rect& dirty, int32_t replayFlags) { ... <! AvoidOverdraw --> Build Custom List--> Custom DisplayList deferredList(mstate.CurrentCliprect (), avoidOverdraw); DeferStateStruct deferStruct(deferredList, *this, replayFlags); <! -- Merge and group --> renderNode->defer(deferStruct, 0); <! Draw layer--> flushLayers(); startFrame(); <! --> deferredList. Flush (*this, dirty); . }Copy the code

RenderNode ->defer(deferStruct, 0). The DrawOp tree is not drawn directly, but is first optimized by merging the DeferredDisplayList. This is an optimization method used in Android hardware acceleration. It can not only reduce unnecessary drawing, but also concentrate similar drawing to improve drawing speed.

void RenderNode::defer(DeferStateStruct& deferStruct, const int level) {  
    DeferOperationHandler handler(deferStruct, level);  
    issueOperations<DeferOperationHandler>(deferStruct.mRenderer, handler);  
}Copy the code

RenderNode::defer actually contains recursive operations. For example, if the current RenderNode represents a DecorView, it will recursively merge and optimize all its child views. Briefly describe the merge and optimize process and algorithm. DeferedDisplayList is basically built from the DrawOp tree. Defer is meant to be deferred. There are two requirements for the merge of drawops.

  • 1: The two drawops must be of the same type. This type is abstracted as Batch ID during the merger, and the value can be Batch ID

      enum OpBatchId {  
          kOpBatch_None = 0, // Don't batch kOpBatch_Bitmap, kOpBatch_Patch, kOpBatch_AlphaVertices, kOpBatch_Vertices, kOpBatch_AlphaMaskTexture, kOpBatch_Text, kOpBatch_ColorText, kOpBatch_Count, // Add other batch ids before this };Copy the code
  • 2: Merge ID (patchop, bitmapop, and DrawTextOp); Merge ID (patchop, bitmapop, and DrawTextOp); The terms of the merger are also tough

During the merge process, DrawOp is divided into two types: Those that need to be combined and those that do not need to be combined are cached in different lists. Those that cannot be combined are stored in Batch mBatchLookup[kOpBatch_Count] by type. The batches that can be combined into TinyHashMap< MERgeid_t, DrawBatch> mMergingBatches[kOpBatch_Count] are stored by type and MergeID, as shown in the following diagram:

DrawOp Merge operate.jpg

DeferredDisplayList Vector mBatches contain all of the integrated drawing commands, which are then applied to render. Note that the combination is not just a collection of batches, just to make it easier to use different resource textures, like when drawing text. Need according to the texture of text rendering, and this time you need to query the texture coordinates of the word, convenient together unified management, a rendering, reduce the waste of resource loading, of course, to understand the overall process of hardware acceleration for the merge operation can be completely ignored, even can intuitive thought, build, can directly apply colours to a drawing, Its main feature is drawing in another Render thread using OpenGL, which is its most important feature. All of the drawops of mBatches are drawn into the GraphicBuffer using OpenGL, and swapBuffers are used to notify SurfaceFlinger of synthesis.

conclusion

The main difference between software rendering and hardware composition is that the overall process of memory allocation and composition is the same in rendering, but the hardware acceleration is more reasonable than the software rendering algorithm and reduces the burden of the main thread at the same time.

Understanding Android hardware acceleration

For reference only, welcome correction