preface

Why change old medicine for new soup

As one of the most important mechanics in Android, there have been numerous articles analyzing it for over a decade, and a great deal of content has been mined. So:

  • Readers who are already familiar with this mechanism will see nothing new in this article.

  • But for those of you who are not familiar with the mechanics of messaging, you can continue to dig around.

In general, articles like this on Android’s messaging mechanism, after a brief search and analysis, mostly revolve around:

  • Handler, Looper, MQ

  • Handler, Looper, MQ source code analysis

. Learning from these perspectives alone does not fully understand the messaging mechanism.

This article is essentially a brainstorm, both to avoid it and to help the reader sort out the context of the content. Here’s the brain map:

Brain burst: OS solves interprocess communication problems

There are a lot of communication scenarios in the program world. There are several ways to search our knowledge and solve interprocess communication problems:

This paragraph can be read extensively, understand the line, does not affect the next reading

The pipe

  • Pipe: A half-duplex communication mode in which data flows only in one direction and is used only between related processes.

  • Command flow s_pipe: full duplex and can be transmitted bidirectionally at the same time

  • Named pipe FIFO: A half-duplex communication mode that allows communication between unrelated processes.

MessageQueue:

A linked list of messages, stored in the kernel and identified by message queue identifiers. The message queue overcomes the disadvantages of little signal transmission, pipe carrying only plain byte stream and limited buffer size.

SharedMemory:

Maps a segment of memory that can be accessed by other processes. This shared memory is created by one process but can be accessed by multiple processes. Shared memory is the fastest IPC method and is specifically designed for the low efficiency of other interprocess communication methods. Often used in conjunction with other communication mechanisms, such as semaphores, to achieve synchronization and communication between processes.

Semaphore:

Is a counter that can be used to control access to a shared resource by multiple processes. It usually acts as a lock mechanism to prevent other processes from accessing a shared resource when one process is accessing the same resource. Therefore, it is mainly used as a means of synchronization between processes and threads within the same process.

Socket:

Different from other communication mechanisms, it can communicate between different machines through the network.

Signal signal:

Used to notify the receiving process that an event has occurred. The mechanism is complicated.

As we can imagine, Android also has a large number of interprocess communication scenarios, and the OS must adopt at least one mechanism to enable interprocess communication.

A closer look reveals that Android OS takes more than one approach. Moreover, Binder has been developed based on OpenBinder for interprocess communication in user space.

Here’s a question we’ll explore later:

Does Android use the MessageQueue mechanism in the Linux kernel to do things

The message mechanism design based on message queue has many advantages, and Android adopts this design idea in many communication scenarios.

Three elements of a messaging mechanism

Wherever we talk about messaging mechanisms, there are three elements:

  • The message queue

  • Message loop (distribution)

  • Message processing

Message queue, is a queue of message objects, the basic rule is FIFO.

Message loop (dispatch), a basically universal mechanism that uses an infinite loop to continuously retrieve messages from the head of a message queue and dispatch them for execution

Message processing, it has to be mentioned that there are two forms of messages:

  • Enrichment has complete information

  • Query-back information is incomplete and needs to be checked Back

The trade-off between the two mainly depends on the game between the cost of generating information and the cost of retrieving information.

When the information is complete, the receiver can process the message.

Android Framework

The Android Framework has two message queues:

Java layer frameworks/base/core/Java/android/OS/MessageQueue. Java

Native layer frameworks/base/core/jni/android_os_MessageQueue CPP

MQ at the Java layer is not implemented as a data structure in the Jdk like List or Queue.

I downloaded the source code for Android 10 (github.com/leobert-lan… It’s not long. You can read it in its entirety.

It is not difficult to understand: the user space will receive messages from the kernel space. As can be seen from the following figure, this part of the message is first learned by the Native layer, so:

  • Message queue is established through Native layer, which has various basic capabilities of message queue

  • JNI is used to break through the Runtime barrier between Java layer and Native layer and map message queues in Java layer

  • The application is built on the Java layer, which implements message distribution and processing

PS: In the era of Android 2.3, the message queue implementation was implemented in Java layer. As for why it was changed to Native implementation 10 years ago, it is speculated that it is related to CPU idling, the author did not continue to explore, if there are readers who know, I hope you can leave a message to help me solve my doubts.

PS: There is also a classic system startup architecture diagram not found, this diagram is more intuitive

Code parsing

Let’s simply read and analyze the MQ source code in Native

Native layer message queue creation:

static jlong android\_os\_MessageQueue\_nativeInit(JNIEnv\* env, jclass clazz) { NativeMessageQueue\* nativeMessageQueue = new NativeMessageQueue(); if (! nativeMessageQueue) { jniThrowRuntimeException(env, "Unable to allocate native queue"); return 0; } nativeMessageQueue->incStrong(env); return reinterpret\_cast<jlong>(nativeMessageQueue); }Copy the code

Create a Native layer message queue. If the creation fails, throw an exception message and return 0. Otherwise, convert pointer to Java long value and return. Of course, it is held by MQ in the Java layer.

NativeMessageQueue class constructor

NativeMessageQueue::NativeMessageQueue() : mPollEnv(NULL), mPollObj(NULL), mExceptionObj(NULL) { mLooper = Looper::getForThread(); if (mLooper == NULL) { mLooper = new Looper(false); Looper::setForThread(mLooper); }}Copy the code

Static Looper::getForThread() is used to obtain the object instance. If not, the object instance is created and set statically.

Take a look at the native methods used in Java layer MQ



class MessageQueue {  
    private long mPtr; // used by native code  
  
    private native static long nativeInit();  
  
    private native static void nativeDestroy(long ptr);  
  
    private native void nativePollOnce(long ptr, int timeoutMillis); /\*non-static for callbacks\*/  
  
    private native static void nativeWake(long ptr);  
  
    private native static boolean nativeIsPolling(long ptr);  
  
    private native static void nativeSetFileDescriptorEvents(long ptr, int fd, int events);  
}  



Copy the code

Corresponding signature:



static const JNINativeMethod gMessageQueueMethods\[\] = {  
    /\* name, signature, funcPtr \*/  
    { "nativeInit", "()J", (void\*)android\_os\_MessageQueue\_nativeInit },  
    { "nativeDestroy", "(J)V", (void\*)android\_os\_MessageQueue\_nativeDestroy },  
    { "nativePollOnce", "(JI)V", (void\*)android\_os\_MessageQueue\_nativePollOnce },  
    { "nativeWake", "(J)V", (void\*)android\_os\_MessageQueue\_nativeWake },  
    { "nativeIsPolling", "(J)Z", (void\*)android\_os\_MessageQueue\_nativeIsPolling },  
    { "nativeSetFileDescriptorEvents", "(JII)V",  
            (void\*)android\_os\_MessageQueue\_nativeSetFileDescriptorEvents },  
};  



Copy the code

MPtr is a Java layer mapping of the memory address of MQ at the Native layer.

  • The Java layer determines if MQ is still working:

private boolean isPollingLocked() { // If the loop is quitting then it must not be idling. // We can assume mPtr ! = 0 when mQuitting is false. return ! mQuitting && nativeIsPolling(mPtr); }Copy the code


static jboolean android\_os\_MessageQueue\_nativeIsPolling(JNIEnv\* env, jclass clazz, jlong ptr) {  
    NativeMessageQueue\* nativeMessageQueue = reinterpret\_cast<NativeMessageQueue\*>(ptr);  
    return nativeMessageQueue->getLooper()->isPolling();  
}  



Copy the code


/\*\*  
 \* Returns whether this looper's thread is currently polling for more work to do.  
 \* This is a good signal that the loop is still alive rather than being stuck  
 \* handling a callback.  Note that this method is intrinsically racy, since the  
 \* state of the loop can change before you get the result back.  
 \*/  
bool isPolling() const;  



Copy the code
  • Wake up Native layer MQ:



static void android\_os\_MessageQueue\_nativeWake(JNIEnv\* env, jclass clazz, jlong ptr) {  
    NativeMessageQueue\* nativeMessageQueue = reinterpret\_cast<NativeMessageQueue\*>(ptr);  
    nativeMessageQueue->wake();  
}  
  
void NativeMessageQueue::wake() {  
    mLooper->wake();  
}  



Copy the code
  • Native layer Poll:

static void android\_os\_MessageQueue\_nativePollOnce(JNIEnv\* env, jobject obj, jlong ptr, jint timeoutMillis) { NativeMessageQueue\* nativeMessageQueue = reinterpret\_cast<NativeMessageQueue\*>(ptr); nativeMessageQueue->pollOnce(env, obj, timeoutMillis); } void NativeMessageQueue::pollOnce(JNIEnv\* env, jobject pollObj, int timeoutMillis) { mPollEnv = env; mPollObj = pollObj; mLooper->pollOnce(timeoutMillis); mPollObj = NULL; mPollEnv = NULL; if (mExceptionObj) { env->Throw(mExceptionObj); env->DeleteLocalRef(mExceptionObj); mExceptionObj = NULL; }}Copy the code

This is important, so let’s take a look at how the Native layer Looper distributes messages

//Looper.h int pollOnce(int timeoutMillis, int\* outFd, int\* outEvents, void\*\* outData); inline int pollOnce(int timeoutMillis) { return pollOnce(timeoutMillis, NULL, NULL, NULL); PollOnce (int timeoutMillis, int * outFd, int * outEvents, void * * outData) {int result = 0; pollOnce(int timeoutMillis, int * outFd, int * outEvents, void * * outData) {int result = 0; for (;;) { while (mResponseIndex < mResponses.size()) { const Response& response = mResponses.itemAt(mResponseIndex++); int ident = response.request.ident; if (ident >= 0) { int fd = response.request.fd; int events = response.events; void\* data = response.request.data; #if DEBUG\_POLL\_AND\_WAKE ALOGD("%p ~ pollOnce - returning signalled identifier %d: " "fd=%d, events=0x%x, data=%p", this, ident, fd, events, data); #endif if (outFd ! = NULL) \*outFd = fd; if (outEvents ! = NULL) \*outEvents = events; if (outData ! = NULL) \*outData = data; return ident; } } if (result ! = 0) { #if DEBUG\_POLL\_AND\_WAKE ALOGD("%p ~ pollOnce - returning result %d", this, result); #endif if (outFd ! = NULL) \*outFd = 0; if (outEvents ! = NULL) \*outEvents = 0; if (outData ! = NULL) \*outData = NULL; return result; } result = pollInner(timeoutMillis); }}Copy the code

First process the retained Response of Native layer, and then call pollInner. The details here are a little more complicated, and we’ll do a brain burst later in the Native Looper analysis.

Before going into detail here, we know that calling a method, which is blocking, is described in plain English as waiting for the method to return.

Native void nativePollOnce(long PTR, int timeoutMillis); The process is blocked.

Let’s take a look at the message retrieval for Java layer MQ: the code is long, so you can comment the key points directly in the code.

Before we look at it, let’s think about the main scenarios purely from a TDD perspective: Of course, not all of them fit into Android’s existing design

Whether the message queue is working

  • At work, a message is expected to return

  • Not working, expected to return NULL

Whether the working message queue currently has messages

  • Special internal functional messages are expected to be handled internally by MQ

  • A message whose processing time has expired returns a message

  • Not up to processing time, if all are sorted, expect idling to remain blocked or return silent and set wake up? As discussed earlier, the expectation is to keep idling

  • No message, block or return null? If null is returned, the idle or wake mechanism needs to be kept externally to support normal operation. From the packaging point of view, you should keep idling and solve the problem yourself

  • There is news

class MessageQueue { Message next() { // Return here if the message loop has already quit and been disposed. // This can  happen if the application tries to restart a looper after quit // which is not supported. // 1. If the native message queue pointer map is 0, that is, a virtual reference, it indicates that the message queue has exited and there is no message. // Return null final long PTR = mPtr; if (ptr == 0) { return null; } int pendingIdleHandlerCount = -1; // -1 only during first iteration int nextPollTimeoutMillis = 0; // 2. An infinite loop, which keeps idling for (;;) when a message needs to be processed for \ 'distribution. { if (nextPollTimeoutMillis ! = 0) { Binder.flushPendingCommands(); Pollpollonce (PTR, nextPollTimeoutMillis); pollMessage (nextPollTimeoutMillis); synchronized (this) { // Try to retrieve the next message. Return if found. final long now = SystemClock.uptimeMillis();  Message prevMsg = null; Message msg = mMessages; //4. If a barrier is found, look for the next possible asynchronous message in the queue. = null && msg.target == null) { // Stalled by a barrier. Find the next asynchronous message in the queue. do { prevMsg =  msg; msg = msg.next; } while (msg ! = null && ! msg.isAsynchronous()); } if (msg ! = null) {// 5. Find the message, // if the message is not the specified time, If (now < msg.when) {// Next message is not ready. Set a timeout to wake up when it is ready. nextPollTimeoutMillis = (int) Math.min(msg.when - now, Integer.MAX\_VALUE); } else {// Found message \ 'to processing time \'. // Got a message. mBlocked = false; if (prevMsg ! = null) { prevMsg.next = msg.next; } else { mMessages = msg.next; } msg.next = null; if (DEBUG) Log.v(TAG, "Returning message: " + msg); msg.markInUse(); return msg; } } else { // No more messages. nextPollTimeoutMillis = -1; } // Process the quit message now that all pending messages have been handled. If (McOntract) {// Process the quit message now that all pending messages have been handled. dispose(); return null; } // Maintain the following IDLEHandler information, // if there is no IDLEHandler, IDLEHandler // If first time idle, then get the number of idlers to run. // Idle handles only run if the queue is empty or if the first message // in the queue (possibly a barrier) is due to be handled in the future. if (pendingIdleHandlerCount < 0 && (mMessages == null || now < mMessages.when)) { pendingIdleHandlerCount = mIdleHandlers.size(); } if (pendingIdleHandlerCount <= 0) { // No idle handlers to run. Loop and wait some more. mBlocked = true; continue; } if (mPendingIdleHandlers == null) { mPendingIdleHandlers = new IdleHandler\[Math.max(pendingIdleHandlerCount, 4)\]; } mPendingIdleHandlers = mIdleHandlers.toArray(mPendingIdleHandlers); Handlers. // We only ever reach this code block during the first iteration. For (int  i = 0; i < pendingIdleHandlerCount; i++) { final IdleHandler idler = mPendingIdleHandlers\[i\]; mPendingIdleHandlers\[i\] = null; // release the reference to the handler boolean keep = false; try { keep = idler.queueIdle(); } catch (Throwable t) { Log.wtf(TAG, "IdleHandler threw exception", t); } if (! keep) { synchronized (this) { mIdleHandlers.remove(idler); } } } // Reset the idle handler count to 0 so we do not run them again. pendingIdleHandlerCount = 0; // While calling an idle handler, a new message could have been delivered // so go back and look again for a pending message without waiting. nextPollTimeoutMillis = 0; }}}Copy the code
  • Java laminated messages

This is easier when the message itself is valid and the message queue is still working. From the perspective of TDD:

If the message queue does not have a header, it is expected to do so directly

If there is a head

  • Messages that precede header messages or require immediate processing are the new headers

  • Otherwise, insert into the appropriate position according to the processing time

boolean enqueueMessage(Message msg, long when) { if (msg.target == null) { throw new IllegalArgumentException("Message must have a target."); } synchronized (this) { if (msg.isInUse()) { throw new IllegalStateException(msg + " This message is already in use."); } if (mQuitting) { IllegalStateException e = new IllegalStateException( msg.target + " sending message to a Handler on a  dead thread"); Log.w(TAG, e.getMessage(), e); msg.recycle(); return false; } msg.markInUse(); msg.when = when; Message p = mMessages; boolean needWake; if (p == null || when == 0 || when < p.when) { // New head, wake up the event queue if blocked. msg.next = p; mMessages = msg; needWake = mBlocked; } else { // Inserted within the middle of the queue. Usually we don't have to wake // up the event queue unless there is  a barrier at the head of the queue // and the message is the earliest asynchronous message in the queue. needWake = mBlocked && p.target == null && msg.isAsynchronous(); Message prev; for (;;) { prev = p; p = p.next; if (p == null || when < p.when) { break; } if (needWake && p.isAsynchronous()) { needWake = false; } } msg.next = p; // invariant: p == prev.next prev.next = msg; } // We can assume mPtr ! = 0 because mQuitting is false. if (needWake) { nativeWake(mPtr); } } return true; }Copy the code

Just behind the synchronization barrier, the rest of the brain bursts

Java layer message distribution

In this section, we start with a message burst distribution. We have already seen a MessageQueue. Message distribution is the process of taking messages out of a MessageQueue and assigning them to a handler. It was Looper who did the job.

We already know that the Native layer also has Looper, but it’s easy to understand:

  • Message queues require a bridge between the Java layer and the Native layer

  • Looper only needs to handle its own message queue distribution on its own end

So, when we look at the Java layer’s message distribution, we just look at the Java layer’s Looper. Focus on three main approaches:

  • Go to work

  • work

  • I come home from work

  • Prepare to go to work



class Looper {  
  
    public static void prepare() {  
        prepare(true);  
    }  
  
    private static void prepare(boolean quitAllowed) {  
        if (sThreadLocal.get() != null) {  
            throw new RuntimeException("Only one Looper may be created per thread");  
        }  
        sThreadLocal.set(new Looper(quitAllowed));  
    }  
}  



Copy the code

There are two caveats:

  • He’s out of the door, and he can’t go out again until he gets in. Similarly, one Looper is enough for one thread; as long as it is alive, there is no need to create another.

  • A Looper serves a Thread. This requires registration, indicating that a Thread has been served by itself. ThreadLocal is taken advantage of, because multi-threaded access to collections is always a consideration

ThreadLocal is an inhumane way to separate threads from each other, so that each Thread can operate its own content without interfering with each other

  • Work in the loop

Note that the work is distributed in nature and does not need to be handled yourself

  • If you don’t register, you won’t be able to find the person responsible for the job.

  • Do not hurry when you are already working, because it will lead to mistakes and order problems.

  • The job is to continually fetch the boss – MQ’s instructions – Message and hand them to the responsible person – Handler to process and record the information

  • 007, no sleep, when MQ stops sending messages, there’s no work to do, let’s all go home

class Looper { public static void loop() { final Looper me = myLooper(); if (me == null) { throw new RuntimeException("No Looper; Looper.prepare() wasn't called on this thread."); } if (me.mInLoop) { Slog.w(TAG, "Loop again would have the queued messages be executed" + " before this one completed."); } me.mInLoop = true; final MessageQueue queue = me.mQueue; // Make sure the identity of this thread is that of the local process, // and keep track of what that identity token actually is. Binder.clearCallingIdentity(); final long ident = Binder.clearCallingIdentity(); // Allow overriding a threshold with a system prop. e.g. // adb shell 'setprop log.looper.1000.main.slow 1 && stop && start' final int thresholdOverride = SystemProperties.getInt("log.looper." + Process.myUid() + "." + Thread.currentThread().getName() + ".slow", 0); boolean slowDeliveryDetected = false; for (;;) { Message msg = queue.next(); // might block if (msg == null) { // No message indicates that the message queue is quitting. return; } // This must be in a local variable, in case a UI event sets the logger final Printer logging = me.mLogging; if (logging ! = null) { logging.println(">>>>> Dispatching to " + msg.target + " " + msg.callback + ": " + msg.what); } // Make sure the observer won't change while processing a transaction. final Observer observer = sObserver; final long traceTag = me.mTraceTag; long slowDispatchThresholdMs = me.mSlowDispatchThresholdMs; long slowDeliveryThresholdMs = me.mSlowDeliveryThresholdMs; if (thresholdOverride > 0) { slowDispatchThresholdMs = thresholdOverride; slowDeliveryThresholdMs = thresholdOverride; } final boolean logSlowDelivery = (slowDeliveryThresholdMs > 0) && (msg.when > 0); final boolean logSlowDispatch = (slowDispatchThresholdMs > 0); final boolean needStartTime = logSlowDelivery || logSlowDispatch; final boolean needEndTime = logSlowDispatch; if (traceTag ! = 0 && Trace.isTagEnabled(traceTag)) { Trace.traceBegin(traceTag, msg.target.getTraceName(msg)); } final long dispatchStart = needStartTime ? SystemClock.uptimeMillis() : 0; final long dispatchEnd; Object token = null; if (observer ! = null) { token = observer.messageDispatchStarting(); } long origWorkSource = ThreadLocalWorkSource.setUid(msg.workSourceUid); Try {/ / pay attention to this MSG. Target. DispatchMessage (MSG); if (observer ! = null) { observer.messageDispatched(token, msg); } dispatchEnd = needEndTime ? SystemClock.uptimeMillis() : 0; } catch (Exception exception) { if (observer ! = null) { observer.dispatchingThrewException(token, msg, exception); } throw exception; } finally { ThreadLocalWorkSource.restore(origWorkSource); if (traceTag ! = 0) { Trace.traceEnd(traceTag); } } if (logSlowDelivery) { if (slowDeliveryDetected) { if ((dispatchStart - msg.when) <= 10) { Slog.w(TAG, "Drained"); slowDeliveryDetected = false; } } else { if (showSlowLog(slowDeliveryThresholdMs, msg.when, dispatchStart, "delivery", msg)) { // Once we write a slow delivery log, suppress until the queue drains. slowDeliveryDetected = true; } } } if (logSlowDispatch) { showSlowLog(slowDispatchThresholdMs, dispatchStart, dispatchEnd, "dispatch", msg); } if (logging ! = null) { logging.println("<<<<< Finished to " + msg.target + " " + msg.callback); } // Make sure that during the course of dispatching the // identity of the thread wasn't corrupted. final long newIdent  = Binder.clearCallingIdentity(); if (ident ! = newIdent) { Log.wtf(TAG, "Thread identity changed from 0x" + Long.toHexString(ident) + " to 0x" + Long.toHexString(newIdent) + " while dispatching to " + msg.target.getClass().getName() + " " + msg.callback + " what=" + msg.what); } msg.recycleUnchecked(); }}}Copy the code
  • The quit/quitSafely from work

This is rude behavior, MQ can’t function without Looper, leaving work means quitting

class Looper { public void quit() { mQueue.quit(false); } public void quitSafely() { mQueue.quit(true); }}Copy the code

/ Handler /

It’s a little bit clearer here. Apis fall into the following categories:

  • User oriented:

  • Create a Message, using the Message’s share schema

  • Send a message. Note that postRunnable is also a message

  • Remove the message,

  • Exit etc.

Message oriented processing:

class Handler { /\*\* \* Subclasses must implement this to receive messages. \*/ public void handleMessage(@NonNull Message MSG) {} /\*\* * * Handle System messages here. \* Looper API \*/ public void dispatchMessage(@nonnull) Message msg) { if (msg.callback ! = null) { handleCallback(msg); } else { if (mCallback ! = null) { if (mCallback.handleMessage(msg)) { return; } } handleMessage(msg); }}}Copy the code

If the handleMessage is not overridden, the message is dropped.

The message sending part can be combed with the following figure:

Step by step, we now have a complete understanding of the messaging mechanism at the Framework level. Previously, we combed:

  • Both the Native layer and the Java layer have message queues, and there is a corresponding relationship through JNI and pointer mapping

  • Overview of the process for retrieving MQ messages at the Native layer and Java layer

  • How does the Java layer Looper work

  • Overview of Java layer Handlers

From the Java Runtime:

  • The message queuing mechanism serves at the thread level, meaning that a thread can have a working message queue or not.

  • That is, a Thread has at most one working Looper.

  • Looper corresponds to Java layer MQ one-to-one

  • Handler is the entry point to MQ and the Handler of the message

  • Message – Message applies the share pattern and has enough information to be self-consistent. Creating Message pairs is expensive, so the share pattern is used to reuse Message objects.

Let’s dig into the details and resolve some of the questions left unanswered:

  • The type and nature of the message

  • Native layer Looper’s pollInner

Type and essence

Several important member variables in message:



class Message {  
  
    public int what;  
  
    public int arg1;  
  
    public int arg2;  
  
    public Object obj;  
  
    public Messenger replyTo;  
  
    /\*package\*/ int flags;  
  
    public long when;  
  
    /\*package\*/ Bundle data;  
  
    /\*package\*/ Handler target;  
  
    /\*package\*/ Runnable callback;  
  
}  



Copy the code

Where target is a target, or if there is no target, a special message: a synchronization barrier;

What is the message identifier arg1 and arg2 are inexpensive data that can be put into Bundle data if it is insufficient to express information.

ReplyTo and OBJ are used to pass messages across processes, for the moment.

Flags is the state of a message, such as whether it is in use and whether it is a synchronous message

The synchronization barriers mentioned above, known as barriers, that prevent subsequent synchronization messages from being retrieved, were read about earlier in the Next method of Java layer MQ.

We remember that in the next method, an infinite loop is used to try to read a message that meets the processing condition. If it does not get the message, the caller (Looper) is blocked forever because of the infinite loop.

At this point, a conclusion can be confirmed that messages can be divided into three types according to the functional classification:

  • Ordinary message

  • Synchronous barrier messages

  • Asynchronous messaging

Synchronizing messages is an internal mechanism. After the barrier is set up, it needs to be removed at an appropriate time. Otherwise, ordinary messages will never be processed. In this case, the token returned when the barrier is set up needs to be used.

Native layer which

I’m sure everyone is interested in seeing what Looper does at Native.

Those interested in the full source code can see it here (github.com/leobert-lan… Let’s read some excerpts below.

Looper’s pollOnce is mentioned above. After processing the shelved Response, it calls pollInner to get the message

int Looper::pollInner(int timeoutMillis) { #if DEBUG\_POLL\_AND\_WAKE ALOGD("%p ~ pollOnce - waiting: timeoutMillis=%d", this, timeoutMillis); #endif // Adjust the timeout based on when the next message is due. if (timeoutMillis ! = 0 && mNextMessageUptime ! = LLONG\_MAX) { nsecs\_t now = systemTime(SYSTEM\_TIME\_MONOTONIC); int messageTimeoutMillis = toMillisecondTimeoutDelay(now, mNextMessageUptime); if (messageTimeoutMillis >= 0 && (timeoutMillis < 0 || messageTimeoutMillis < timeoutMillis)) { timeoutMillis = messageTimeoutMillis; } #if DEBUG\_POLL\_AND\_WAKE ALOGD("%p ~ pollOnce - next message in %lldns, adjusted timeout: timeoutMillis=%d", this, mNextMessageUptime - now, timeoutMillis); #endif } // Poll. int result = ALOOPER\_POLL\_WAKE; mResponses.clear(); mResponseIndex = 0; struct epoll\_event eventItems\[EPOLL\_MAX\_EVENTS\]; Int eventCount = epoll\_wait(mEpollFd, eventItems, epoll\ _MAX\_EVENTS, timeoutMillis); // Acquire lock. mLock.lock(); Check for poll Error. if (eventCount < 0) {if (errno == EINTR) {goto Done; } ALOGW("Poll failed with an unexpected error, errno=%d", errno); result = ALOOPER\_POLL\_ERROR; goto Done; If (eventCount == 0) {#if DEBUG\_POLL\_AND\_WAKE ALOGD("%p ~ pollOnce - timeout", this); #endif result = ALOOPER\_POLL\_TIMEOUT; goto Done; #if DEBUG\_POLL\_AND\_WAKE ALOGD("%p ~ pollOnce - handling events from %d FDS ", this, eventCount); #endif for (int i = 0; i < eventCount; i++) { int fd = eventItems\[i\].data.fd; uint32\_t epollEvents = eventItems\[i\].events; if (fd == mWakeReadPipeFd) { if (epollEvents & EPOLLIN) { awoken(); } else { ALOGW("Ignoring unexpected epoll events 0x%x on wake read pipe.", epollEvents); } } else { ssize\_t requestIndex = mRequests.indexOfKey(fd); if (requestIndex >= 0) { int events = 0; if (epollEvents & EPOLLIN) events |= ALOOPER\_EVENT\_INPUT; if (epollEvents & EPOLLOUT) events |= ALOOPER\_EVENT\_OUTPUT; if (epollEvents & EPOLLERR) events |= ALOOPER\_EVENT\_ERROR; if (epollEvents & EPOLLHUP) events |= ALOOPER\_EVENT\_HANGUP; pushResponse(events, mRequests.valueAt(requestIndex)); } else { ALOGW("Ignoring unexpected epoll events 0x%x on fd %d that is " "no longer registered.", epollEvents, fd); } } } Done: ; // Invoke Pending message callbacks. mNextMessageUptime = LLONG\_MAX; while (mMessageEnvelopes.size() ! = 0) { nsecs\_t now = systemTime(SYSTEM\_TIME\_MONOTONIC); const MessageEnvelope& messageEnvelope = mMessageEnvelopes.itemAt(0); if (messageEnvelope.uptime <= now) { // Remove the envelope from the list. // We keep a strong reference to the handler until the call to handleMessage // finishes. Then we drop it so that the handler can be deleted \*before\* // we reacquire our lock. { // obtain handler sp<MessageHandler> handler = messageEnvelope.handler; Message message = messageEnvelope.message; mMessageEnvelopes.removeAt(0); mSendingMessage = true; mLock.unlock(); #if DEBUG\_POLL\_AND\_WAKE || DEBUG\_CALLBACKS ALOGD("%p ~ pollOnce - sending message: handler=%p, what=%d", this, handler.get(), message.what); #endif handler->handleMessage(message); } // release handler mLock.lock(); mSendingMessage = false; result = ALOOPER\_POLL\_CALLBACK; } else { // The last message left at the head of the queue determines the next wakeup time. mNextMessageUptime = messageEnvelope.uptime; break; } } // Release lock. mLock.unlock(); // Invoke all response callbacks. For (size\_t I = 0; i < mResponses.size(); i++) { Response& response = mResponses.editItemAt(i); if (response.request.ident == ALOOPER\_POLL\_CALLBACK) { int fd = response.request.fd; int events = response.events; void\* data = response.request.data; #if DEBUG\_POLL\_AND\_WAKE || DEBUG\_CALLBACKS ALOGD("%p ~ pollOnce - invoking fd event callback %p: fd=%d, events=0x%x, data=%p", this, response.request.callback.get(), fd, events, data); #endif int callbackResult = response.request.callback->handleEvent(fd, events, data); if (callbackResult == 0) { removeFd(fd); } // Clear the callback reference in the response structure promptly because we // will not clear the response vector itself until the next poll. response.request.callback.clear(); result = ALOOPER\_POLL\_CALLBACK; } } return result; }Copy the code

It’s marked for attention

  • 1 The epoll mechanism waits for mEpollFd to generate an event. This wait has a timeout period.

  • 2, 3, and 4 are the three results of waiting, and the goTO statement can jump directly to the tag

  • 2 Check whether poll fails. If yes, go to Done

  • 3 Check whether the pool times out. If yes, go to Done

  • 4 Handle all events after epoll

  • 5 Handle callbacks for pending messages

  • 6 Handle all callbacks to Response

And we can find that the returned results are as follows:

  • ALOOPER_POLL_CALLBACK

Any pending message or request. Ident ALOOPER_POLL_CALLBACK Response was processed. If not:

  • ALOOPER_POLL_WAKE Normally wakes up

  • ALOOPER_POLL_ERROR epoll error

  • ALOOPER_POLL_TIMEOUT epoll timeout

Look for enumeration values:



ALOOPER\_POLL\_WAKE = -1,  
ALOOPER\_POLL\_CALLBACK = -2,  
ALOOPER\_POLL\_TIMEOUT = -3,  
ALOOPER\_POLL\_ERROR = -4  



Copy the code

In stage summary, a brain burst was performed on the message and Native layers of pollInner, and the epoll mechanism was derived.

In fact, the Looper distribution of Native layer still has many points worth brain exploding, but we can’t wait to brain explode the epoll mechanism.

Brain burst: the I/O model in Linux

PS: In this paragraph, some pictures are directly quoted from the article. I was too lazy to find the original content and mark the source

Blocking I/O model diagram: The process of waiting for and copying data occurs in the kernel when the recv() function is called

The implementation is very simple, but there is a problem that blocking prevents threads from performing any other computations. In the context of network programming, you need to use multithreading to improve the ability to handle concurrency.

Note that you don’t need to use hardware trigger events like the Android click screen to correspond to network concurrency here, which is a different matter.

If multiple processes or threads are used to implement concurrent replies, the model is as follows:

So far, we have looked at the I/O blocking model.

Brain burst, blocking for calling a method after it has been waiting for a return value, thread execution content like stuck here.

If you want to eliminate this lag, instead of calling a method and waiting for the I/O result, return it immediately! Here’s an example:

  • You go to a suit shop to have your suit made, you decide on the style and size, you sit in the shop and wait until it’s ready and brought to you, this is a clog, this will kill you;

  • When you go to a suit shop to order a suit and decide on the style and size, the clerk tells you not to wait for a few days. Come to see it when you have time. This is non-blocking.

After changing to a non-blocking model, the response model is as follows:

Understandably, this approach requires customers to poll. It’s not customer-friendly, but it doesn’t hurt the store at all, and it makes the waiting area less crowded.

Some suit shops have reformed to become more customer-friendly:

Go to suit shop custom suit, determine good style and size, leave contact information, such as the suit made contact customers, let him come to take.

This becomes the Select or poll model:

Note: The reformed suit shop needs to add an employee, identified in the picture as the user thread, whose job is:

  • Record customer orders and contact information at the front desk

  • Go to the production room with a little book of orders and check to see if the order is finished. When it’s finished, you can pick it up and contact the customer.

Also, when he went to see the order completed, he couldn’t record customer information at the front desk, which meant he was stuck and other work had to be put on hold.

This approach, for production, is not that different from a non-blocking model. I added a clerk, but, you know, one clerk solved the problem where a lot of people would go to the production room and ask, “Is the order ready yet?” The problem.

It is worth mentioning that, in order to improve the service quality, this employee needs to record some information every time he goes to the production room to inquire about an order:

  • Whether the order completion is answered when asked;

  • Answer whether to lie; Etc.

Some stores have a record book for each different test item, similar to the SELECT model

Some stores only use a record book, but the book can record various items using tables, similar to the poll model

The approximation degree of SELECT model and poll model is higher.

Before long, the boss found that the work efficiency of the shop assistant was a little low. He always took an order book to ask the order again. It was not that the staff was not diligent, but that there was something wrong with the mode.

So the boss made another reform:

  • Add a mail channel between the front desk and the production room.

  • The production room has progress to report, send a letter to the front desk, the letter on the order number.

  • The front desk staff went straight to ask for the order.

This becomes the epoll model which solves the traversal efficiency problem of the SELECT /poll model.

The change will eliminate the need for front desk staff to go from top to bottom of an order book. Improved efficiency, the front desk staff as long as nothing happens, can be elegant paddling.

Let’s look at the NativeLooper constructor:

Looper::Looper(bool allowNonCallbacks) : mAllowNonCallbacks(allowNonCallbacks), mSendingMessage(false), mResponseIndex(0), mNextMessageUptime(LLONG\_MAX) { int wakeFds\[2\]; int result = pipe(wakeFds); LOG\_ALWAYS\_FATAL\_IF(result ! = 0, "Could not create wake pipe. errno=%d", errno); mWakeReadPipeFd = wakeFds\[0\]; mWakeWritePipeFd = wakeFds\[1\]; result = fcntl(mWakeReadPipeFd, F\_SETFL, O\_NONBLOCK); LOG\_ALWAYS\_FATAL\_IF(result ! = 0, "Could not make wake read pipe non-blocking. errno=%d", errno); result = fcntl(mWakeWritePipeFd, F\_SETFL, O\_NONBLOCK); LOG\_ALWAYS\_FATAL\_IF(result ! = 0, "Could not make wake write pipe non-blocking. errno=%d", errno); // Allocate the epoll instance and register the wake pipe. mEpollFd = epoll\_create(EPOLL\_SIZE\_HINT); LOG\_ALWAYS\_FATAL\_IF(mEpollFd < 0, "Could not create epoll instance. errno=%d", errno); struct epoll\_event eventItem; memset(& eventItem, 0, sizeof(epoll\_event)); // zero out unused members of data field union eventItem.events = EPOLLIN; eventItem.data.fd = mWakeReadPipeFd; result = epoll\_ctl(mEpollFd, EPOLL\_CTL\_ADD, mWakeReadPipeFd, & eventItem); LOG\_ALWAYS\_FATAL\_IF(result ! = 0, "Could not add wake read pipe to epoll instance. errno=%d", errno); }Copy the code

conclusion

I believe that you have already understood all kinds of problems by yourself. In accordance with the convention, or to sum up, because this is a brain storm, so my thoughts are relatively jumping, the content of the context is not obvious.

Let’s combine a question to clarify the context of the content.

What does Java layer Looper and MQ do that uses an infinite loop but doesn’t “block” the UI thread/doesn’t cause ANR/can still respond to click events

  • Android is event-driven and has a well-established messaging mechanism

  • Java layer messaging mechanism is only a partial, which is responsible for message queue oriented, message queue management, message distribution, message processing

  • Looper’s infinite loop ensures that message distribution on the message queue is always running and stops without a loop.

  • The infinite loop of MessageQueue ensures that Looper can obtain valid messages and keeps running as long as there is a message. When a valid message is found, Looper jumps out of the infinite loop.

  • Moreover, in the infinite loop of next() method, Java layer MessageQueue calls pollOnce of Native layer MQ through JNI, driving Native layer to process Native layer messages

  • It’s worth noting that the UI thread also handles message-based things, whether it’s updating the UI or responding to click events.

Therefore, it is Looper that loops endlessly after loop() to ensure that the UI thread works properly.

ANR, which is Android’s way of checking that the main thread messaging mechanism is healthy and working.

Since the main thread Looper needs to use the message mechanism to drive UI rendering and interactive event processing, if the execution of a message or the business derived from it takes a lot of time in the main thread, the main thread will be blocked for a long time, which will affect user experience.

Therefore, ANR detection adopts a mechanism to bury time bombs and must rely on the efficient operation of Looper to eliminate the time bombs installed before. And this time bomb is more interesting, only when it’s discovered.

When it comes to responding to click events, similar events always start from hardware, go to kernel, and then communicate between processes to user space. These events exist in the Native layer in the form of messages, and after processing, they show:

The ViewRootImpl receives input from InputManager and performs event handling

Here’s a diagram to summarize the flow of the messaging mechanism:

Finally, like friends can click attention, think this article is good friends can click a like, your support is my biggest motivation to update.