Without epoll there would be no Handler

Handler: a person who handles something. He is the Android worker, carrying the burden of delivering messages. Many people who first try to read the framework’s source code will probably start with the Handler messaging mechanism, and we’ll find Handler used in many parts of the source code. However, it doesn’t seem to be as simple as we thought, and if we want to dig deeper, we can trace it all the way back to the Linux kernel.

prepare

This article analyzes the source code based on API 29, from the official AOSP:cs.android.com

The question mark

The Handler API is relatively simple to use. We often use the following method to send a Runnable to the Handler thread:

new Handler().post(() -> { /* do something */ });
new Handler().postDelayed(() -> { /* do something after 3s */ }, 3000);
Copy the code

Anonymous classes are used for simplicity, but you can also implement a custom Handler and rewrite the handleMessage method to handle messages. Post actually calls Handler’s sendMessageDelayed method, encapsulates the Runnable into a Message and sends it to MessageQueue, which is retrieved by Looper loop and handed back to Handler for processing. So when a child thread calls the main thread’s Handler to send a message, we have achieved our goal of asynchronously executing the task and then telling the main thread the result.

If you are curious, you will have a question mark: ** How to implement the above delay (today’s focus)? Is ** a timer? Or periodic polling? Event driven? With these questions, you have to go deep into the source code.

Java layer

Handler source code:

public final boolean sendMessageDelayed(@NonNull Message msg, long delayMillis) {
    if (delayMillis < 0) {
        delayMillis = 0;
    }
    return sendMessageAtTime(msg, SystemClock.uptimeMillis() + delayMillis);
}

public boolean sendMessageAtTime(@NonNull Message msg, long uptimeMillis) { MessageQueue queue = mQueue; .return enqueueMessage(queue, msg, uptimeMillis);
}
Copy the code

One detail here is that the sendMessageAtTime method is called, and instead of passing delay directly in, it is passed an exact time by adding the current time. The reason for doing this is as follows: If the delay is needed in subsequent calls to the bottom layer of the system, the new current time will be subtracted from the exact uptime to ensure accuracy and reduce error (because it takes a lot of time to call from the application layer to the bottom layer).

The enqueueMessage method of the Handler will call the MessageQueue method of the same name after a little processing. We will keep a brief concept in mind before we understand the source code, that is, when the MessageQueue is empty or the delay of the message header has not expired. The relevant code blocks (but the thread has released CPU resources to sleep) and can only wake up when a new message arrives, which we will explain later. Back to MessageQueue’s enqueueMessage:

boolean enqueueMessage(Message msg, long when) { // When is uptimeMillis.synchronized (this) {... msg.when = when;// The uptimeMillis is assigned to the MSG message object
        Message p = mMessages; // Why is mMessages named plural, because it is a linked list structure to store message queues
        boolean needWake; // Do I need to wake up and why? We'll talk about that later
        if (p == null || when == 0 || when < p.when) {
            // Insert message to queue head, wake up if blocked
            msg.next = p;
            mMessages = msg;
            needWake = mBlocked;
        } else {
            // The following stack (such as isAsynchronous) can be ignored for the moment, just know that a new message is being inserted into the list
            needWake = mBlocked && p.target == null && msg.isAsynchronous();
            Message prev;
            for (;;) {
                prev = p;
                p = p.next;
                // The for loop, combined with the if condition here, ensures that messages are sorted by when
                if (p == null || when < p.when) {
                    break;
                }
                if (needWake && p.isAsynchronous()) {
                    needWake = false;
                }
            }
            msg.next = p; // P is the original next node of prev
            prev.next = msg; // Update the next node of prev, MSG is inserted into the message queue
        }
        if (needWake) {
            // the mPtr is actually a reference to MessageQueue in the Native layer, which will be discussed laternativeWake(mPtr); }}return true;
}
Copy the code

This code is relatively simple, mainly is to queue new messages in order of delay time, and if necessary, wake up operation. The question that arises is:

What is this awakening awakening? Why make native method calls?
The “when” time field is assigned to MSG.

With these questions in mind, Looper is the engine of the entire message pipeline, powered by the loop method:

public static void loop(a) {
    final Looper me = myLooper(); // One thread for each Looper.final MessageQueue queue = me.mQueue; // Each Looper corresponds to a message queue.for (;;) {
        Message msg = queue.next(); // might block. }... }Copy the code

Looper was lazy enough to call queue next instead of dealing with message delay alone. MessageQueue was the most tired. Of course, That’s not Looper’s job. It just distributes the news.

Take a look at MQ’s next method, and we are about to enter Wonderland:

Message next(a) {
    final long ptr = mPtr; // Get the native layer reference of MQ.int nextPollTimeoutMillis = 0; // Poll timeout for the next poll. This is the first time we saw poll
    for(;;) {...// [key] Another native method that blocks, this is the primary reason why messages can be delayed
        nativePollOnce(ptr, nextPollTimeoutMillis);

        synchronized (this) {
            // Stop blocking, start fetching and returning the MSG object to Looper
            final long now = SystemClock.uptimeMillis();
            Message prevMsg = null;
            Message msg = mMessages;
            if(msg ! =null && msg.target == null) {... }if(msg ! =null) {
                if (now < msg.when) {
                    // Update the blocking timeout before the next message, and the for loop will use it the next time
                    // Here is a when-now operation in response to what I said above
                    nextPollTimeoutMillis = (int) Math.min(msg.when - now, Integer.MAX_VALUE);
                } else {
                    // Got a message.
                    mBlocked = false;
                    if(prevMsg ! =null) {
                        prevMsg.next = msg.next;
                    } else {
                        mMessages = msg.next;
                    }
                    msg.next = null; .returnmsg; }}else {
                // No more messages.
                nextPollTimeoutMillis = -1; }... }... }}Copy the code

NativePollOnce (PTR, nextPollTimeoutMillis) is a local method that blocks the call. In combination with the previous nativeWake(PTR) method, We can get an idea of how message latency works:

When usingpostDelayedAfter a delayed message is sent, the incoming time eventually passesnativePollOnceMethod to block with timeout to achieve the purpose of delay, when the time is up to the end of the blocking,nextMethod returns the message object to Looper.
In the MessageQueueenqueueMessageMethod to determine whether the time of the newly inserted message is less than the time of the queue header message to decide whether to wake up immediately, i.e. passnativeWakeMethod to break a block that has not timed out.

The Java layer analysis is pretty much there, but for those of you who are curious, won’t this block consume CPU resources all the time? Could this be the true cause of Android’s power consumption? ? If so, you’re still underestimating Linux. To further understand this Native blocking, we need to dig into the Android system’s Native source code.

Native layer

According to the standard of the source code, we can directly find the MessageQueue CPP code: frameworks/base/core/jni/android_os_MessageQueue nativePollOnce function in the CPP:

static void android_os_MessageQueue_nativePollOnce(JNIEnv* env, jobject obj, jlong ptr, jint timeoutMillis) {
    NativeMessageQueue* nativeMessageQueue = reinterpret_cast<NativeMessageQueue*>(ptr);
    // pollOnce is actually called
    nativeMessageQueue->pollOnce(env, obj, timeoutMillis);
}

void NativeMessageQueue::pollOnce(JNIEnv* env, jobject pollObj, int timeoutMillis) {
    mPollEnv = env;
    mPollObj = pollObj;
    mLooper->pollOnce(timeoutMillis); // It turns out Looper is still working at native
    mPollObj = NULL;
    mPollEnv = NULL; . }Copy the code

Further to find the system/core/libutils/stars. The CPP pollOnce:

int Looper::pollOnce(int timeoutMillis, int* outFd, int* outEvents, void** outData) {
    int result = 0;
    for(;;) {...if(result ! =0) {...return result;
        }
        // The timeout passes here, and then the pollInner function
        result = pollInner(timeoutMillis); }}int Looper::pollInner(int timeoutMillis) {...If the timeout is earlier than the one passed in, delay it at an earlier time to avoid missing MSG
    if(timeoutMillis ! =0&& mNextMessageUptime ! = LLONG_MAX) {nsecs_t now = systemTime(SYSTEM_TIME_MONOTONIC);
        int messageTimeoutMillis = toMillisecondTimeoutDelay(now, mNextMessageUptime);
        if (messageTimeoutMillis >= 0
                && (timeoutMillis < 0|| messageTimeoutMillis < timeoutMillis)) { timeoutMillis = messageTimeoutMillis; }}// The initial return value is wake up
    intresult = POLL_WAKE; .// Epoll_wait will suspend to release CPU resources.
    mPolling = true;
    struct epoll_event eventItems[EPOLL_MAX_EVENTS];
    The epoll_wait system call is where the entire message mechanism actually blocks, while waiting to read the pipe's notification, as described below
    int eventCount = epoll_wait(mEpollFd.get(), eventItems, EPOLL_MAX_EVENTS, timeoutMillis);
    // Epoll_wait returns an event, exits the idle state, and obtains the CPU execution opportunity again
    mPolling = false; .// Return -1, error, goto to Done
    if (eventCount < 0) {... result = POLL_ERROR;goto Done;
    }
    // Return 0 to indicate that the timeout period is up
    if (eventCount == 0) {
        result = POLL_TIMEOUT;
        goto Done;
    }
    // Return eventCount > 0, indicating that a new event was written to the pipe before timeout
    for (int i = 0; i < eventCount; i++) {
        int fd = eventItems[i].data.fd;
        uint32_t epollEvents = eventItems[i].events;
        if (fd == mWakeEventFd.get()) {
            if (epollEvents & EPOLLIN) {
                awoken(a);// 【 key 】 to be awakened
            } else{... }}else{... } } Done: ; .return result;
}
Copy the code

There’s a long pollInner code, but the core of the system call is the epoll_wait function, and we’ll take a look at its function definition (epoll_wait(2) — Linux manual Page, epoll-wiki).

int epoll_wait(int epfd, struct epoll_event *events, int maxevents, int timeout);
Copy the code

Epoll_wait This is also where the entire Android messaging mechanism actually blocks. The blocking wait ensures that threads go to sleep, do not consume CPU resources, and listen for registered events.

For (int I = 0; for (int I = 0; i < eventCount; I++). So this epoll should be an event-driven mechanism.

Why not rush into the awoken() function above, because before we get there, we need to have a basic understanding of Linux’s file descriptor (FD), pipe, and epoll mechanisms, so we can understand everything.

Introduction to Kernel Knowledge

File Descriptor

The following sections are referred to as FD. Excerpts from Wikipedia concepts:

A computer science term is an abstract concept used to describe a reference to a file. The file descriptor is formally a non-negative integer. In fact, it is an index value that points to the record table of open files that the kernel maintains for each process. When a program opens an existing file or creates a new file, the kernel returns a file descriptor to the process.

The concept seems a bit abstract, but before we can understand FD we need to understand the design philosophy of Linux: everything is a file. “Everything in Linux can be treated as a file, including regular files, linked files, sockets, device drivers, etc., and operations on them may create a corresponding file descriptor. File descriptors are indexes created by the kernel to efficiently manage opened files. They are used to refer to opened files. All system calls (such as read and write) related to file I/O operations are required through file descriptors.”

As you can see, FD is a valuable system resource in Linux, like oil in the industrial age, without which our file systems would not function. In essence, when a Linux process is started, it generates a file descriptor Table (FD Table) in the kernel space, which records all available FDS of the current process. In other words, it maps all open files of the process.

FD is essentially an array subscript (and therefore a non-negative integer) of the file descriptor table. Colloquially, it is the key for the system to operate I/O resources. For more details, please refer to the link at the end of the article.

From this we can also develop a preliminary concept: with easy file manipulation, we can achieve cross-process communication.

Pipe

So what are pipes? Again, let’s look at the concept of an encyclopedia:

In Unix-like operating systems (and some others that borrow this design, such as Windows), a Pipeline is a series of processes that link standard input and output, in which the output of each process is taken directly as input to the next process. This concept was invented by Douglas McRoy for the Unix command line and is named for its resemblance to a physical pipe.

While the concept is fairly graphic, pipes are a mechanism commonly used for interprocess communication. As the name suggests, it’s like a pipe that carries water from one end to the other. The inventor of Pipe found that when system operations execute commands, it is often necessary to pass the output of one program to another program for processing. This operation can be implemented using input and output files, such as:

ls > abc.txt Type the list of files in the current directory into ABC text
grep xxx abc.txt # adb text as output, let grep look for the content XXX
Copy the code

This is really cumbersome, but the advent of pipes has simplified this operation, and we can now use pipe characters (as they are commonly called) to connect two commands with vertical lines:

ls | grep xxx
Copy the code

To achieve the same effect, you do not need to explicitly produce the file. The shell uses a pipe to connect the input and output of two processes to enable cross-process communication. Therefore, we can think of the essence of a pipe as a file, which the former process opens by writing, and the latter process opens by reading. So the system call function for the pipe looks like this:

int pipe(int pipefd[2]);
Copy the code

After the call, two file descriptors are created to populate the PIpeFD array, where pipefd[0] is read open and pipefd[1] is write open, respectively, as read and write descriptors for the pipe. Although a pipe is a file, it does not occupy disk storage space itself, but occupies memory space, so a pipe is a buffer of memory that operates in the same way as a file (so we do not have a narrow understanding of the file in Linux, not on disk is called a file). Data written to the pipe is cached until it is read by the other end, so the above command blocks and grep will not be executed until ls has produced a result.

So in practice, we usually have one process close the read side and the other close the write side for simplex communication, quoting a diagram from the other big guys:

For code like this, see the form (see link at the end of this article for more details) :

// The parent process fork generates the child process
/ / the parent process
read(pipefd[0],...). ;/ / the child process
write(pipefd[1],...). ;Copy the code

Furthermore, it is important to note that PIPES are not limited to cross-process communication, but are certainly available within the same process.

epoll

Now that we know about file descriptors and pipes, we can finally talk about the epoll mechanism. Again, let’s look at the definition:

Epoll is an extensible I/O event notification mechanism for the Linux kernel. Debuting in Linux 2.5.44, it is designed to replace the existing POSIX Select (2) and poll(2) system functions, enabling better performance for programs that require a lot of manipulation of file descriptors. Epoll implements a similar function to Poll in that it listens for events on multiple file descriptors. Epoll searches for monitored file descriptors using red-black trees (RB-trees). When an event is registered on an ePoll instance, ePoll adds the event to the red-black tree of the ePoll instance and registers a callback function to add the event to the ready list when it occurs.

Epoll is an I/O event notification mechanism (event-driven, watch-like mode). We mentioned a pipeline mechanism that requires one end to write data and the other end to read data, but in practice, we often don’t want to wait forever. We want a listener to tell me when you write and then let me read.

Before the emergence of epoll, there were also monitoring mechanisms such as SELECT and poll, but their efficiency was relatively low. Some of them needed to traverse FD indiscriminately, polling I/O after waking up even though they were non-blocking, or they had disadvantages such as upper limit of FD monitoring, which will not be described here.

In summary, ePoll addresses these issues, enabling high-performance I/O multiplexing, and also using MMAP to accelerate messaging between the kernel and user space. Epoll system calls are also relatively simple, with just three functions:

Create an epoll instance in the kernel and return an epoll file descriptor
int epoll_create(int size);
int epoll_create1(int flags);
// Add, modify, or delete listening for events on fd (the third argument) to the epoll instance corresponding to EPfd (create above)
int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event);
// This function, which we mentioned above, waits for events registered on epFD to be carried out from the events argument
int epoll_wait(int epfd, struct epoll_event *events, int maxevents, int timeout);
Copy the code

The internal implementation of epoll is still relatively complex. Red-black tree structure is used to manage FDS and two-way linked list structure is used to manage callback events. For details, please refer to the link at the end of this article. The epoll_ctl function stores the FDS and events passed in and establishes a relationship with the corresponding device driver. When the corresponding event occurs, the internal callback function will be called to add the event to the linked list. Finally, the thread will be notified to wake up and epoll_wait can be returned. When no event occurs, epoll_wait is suspended.

Not to be confused here: epFD is the descriptor or index of an epoll instance; Fd is the descriptor corresponding to the event you want to listen on, and ultimately the read and write pipe depends on this FD.

Back in the Native layer

Now that epoll is clear, let’s go back to MessageQueue and Looper’s Native source code, and it becomes very clear.

Remember that the mPtr variable in MQ is actually initialized in the constructor of MQ:

MessageQueue(boolean quitAllowed) {
    mQuitAllowed = quitAllowed;
    mPtr = nativeInit(); // Local call
}
Copy the code

So there is also an MQ object in the Native layer. MPtr is a mapping reference of MQ in the Native layer, which is easy to access by the upper layer:

static jlong android_os_MessageQueue_nativeInit(JNIEnv* env, jclass clazz) {
    NativeMessageQueue* nativeMessageQueue = new NativeMessageQueue(a); .return reinterpret_cast<jlong>(nativeMessageQueue); // long, which is actually an address
}

NativeMessageQueue::NativeMessageQueue() : mPollEnv(NULL), mPollObj(NULL), mExceptionObj(NULL) {
    mLooper = Looper::getForThread(a);if (mLooper == NULL) {
        mLooper = new Looper(false); // Native MQ also creates an instance of Looper at initialization
        Looper::setForThread(mLooper); }}Copy the code

Take a look at Looper’s initialization:

Looper::Looper(bool allowNonCallbacks)
    ...
    // The eventfd system function creates a file descriptor and assigns the value to mWakeEventFd, on which all pipe reads and writes will take place
    mWakeEventFd.reset(eventfd(0, EFD_NONBLOCK | EFD_CLOEXEC)); .// create an epoll instance
    rebuildEpollLocked(a); }void Looper::rebuildEpollLocked(a) {
    // If there is already a descriptor for the epoll instance, reset it first
    if (mEpollFd >= 0) {... mEpollFd.reset(a); }// Create a new epoll instance and wake pipe
    Epoll_create1 is used to create the instance and assign the return value to the mEpollFd descriptor
    mEpollFd.reset(epoll_create1(EPOLL_CLOEXEC));

    struct epoll_event eventItem;
    memset(& eventItem, 0.sizeof(epoll_event));
    eventItem.events = EPOLLIN; // IN, listening for the pipe's input (that is, write) operation
    // The wake up event fd is held by the event data until the match is found later
    eventItem.data.fd = mWakeEventFd.get(a);EPOLL_CTL_ADD = EPOLL_CTL_ADD = EPOLL_CTL_ADD
    int result = epoll_ctl(mEpollFd.get(), EPOLL_CTL_ADD, mWakeEventFd.get(), &eventItem); . }Copy the code

The epoll_create1, epoll_ctl, and epoll_wait epoll_wait processes are connected again as follows:

int Looper::pollInner(int timeoutMillis) {...The epoll_wait system call is where the entire message mechanism actually blocks, while waiting to read the pipe's notification
    int eventCount = epoll_wait(mEpollFd.get(), eventItems, EPOLL_MAX_EVENTS, timeoutMillis); .// Return eventCount > 0, indicating that a new event was written to the pipe before timeout
    for (int i = 0; i < eventCount; i++) {
        int fd = eventItems[i].data.fd;
        uint32_t epollEvents = eventItems[i].events;
        if (fd == mWakeEventFd.get()) { // Find the match wake up event
            if (epollEvents & EPOLLIN) {
                awoken(a);// 【 key 】 to be awakened
            } else{... Done: ; .return result;
}
            
void Looper::awoken(a) {
    uint64_t counter;
    // Perform read operations
    TEMP_FAILURE_RETRY(read(mWakeEventFd.get(), &counter, sizeof(uint64_t)));
}
Copy the code

The awoken (awoken) function is actually a read on a pipe. The epoll event driver woke up to read the awoken (awoken) function, and the epoll event driver woke up to read the awoken.

void Looper::wake(a) {
    uint64_t inc = 1;
    // Write a 1 in the wake pipe to trigger the wake pipe
    ssize_t nWrite = TEMP_FAILURE_RETRY(write(mWakeEventFd.get(), &inc, sizeof(uint64_t))); . }Copy the code

As an additional reminder, older versions of the source code create wake event descriptors like this:

int wakeFds[2];
int result = pipe(wakeFds);
mWakeReadPipeFd = wakeFds[0];
mWakeWritePipeFd = wakeFds[1];
Copy the code

Read and write are two descriptors, and only one mWakeEventFd descriptor is used in the latest system source code, probably because the Handler message mechanism does not require cross-process, which remains to be explored.

The Looper::wake() function is called from MQ’s nativeWake function:

static void android_os_MessageQueue_nativeWake(JNIEnv* env, jclass clazz, jlong ptr) {
    NativeMessageQueue* nativeMessageQueue = reinterpret_cast<NativeMessageQueue*>(ptr);
    nativeMessageQueue->wake(a);// a further call to Looper::wake() will be made internally.
}
Copy the code

Recall above, nativeWake(mPtr); In the Java layer it is called by the enqueueMessage method when the message is queued. At this point, we’ve finally covered the entire process, from the Java layer all the way to the Linux kernel system call.

To summarize, back to the headline, why no Epoll no Handler. Handler + Looper + MessageQueue can process delayed messages and achieve event-driven effect without occupying CPU resources. In essence, the Native layer uses the Epoll I/O event notification mechanism of the Linux kernel. Two scenarios are satisfied:

When delayed messages are sent via postDelayed, the incoming time will eventually be blocked with a timeout through the nativePollOnce method, essentially due to the suspension of the epoll_wait function, which achieves the purpose of delay.

Epoll_wait returns, and the thread is woken up to obtain CPU resources. If the number of epoll events (event_count) is 0, nativePollOnce returns directly. MQ’s next method, in turn, returns the message object to the upper Looper.
The enqueueMessage method of MessageQueue determines whether the time of the newly inserted message is less than the time of the queue header message to decide whether to wake up immediately, that is, to break the blocking that has not expired through the nativeWake method.

Since epoll listens for the mWakeEventFd wakeup event descriptor, epoll_WAIT completes the pending state and returns events greater than 0. Awoken is then called. Finally, nativePollOnce returns result as POLL_WAKE, and the upper-layer message processing continues. Because we know Looper’s loop is always calling Next, if the bottom layer doesn’t wake up, the top layer will block.

So either timeout has reached an automatic wake up or an active wake up due to new message insertion. Use a loose flow chart to describe the process of active arousal:

The latter

The main thread blocks in the nativePollOnce method in queue.next() and frees CPU resources to sleep without consuming a lot of CPU resources. Even in the foreground, as long as your UI doesn’t have animations or touch interactions, it’s pretty much the same. This also answers the question of why Looper is doing an infinite loop without causing abnormal power consumption.

How do Linux system calls hang without consuming CPU time slices, and how are CPU timers and interrupts implemented? That may involve hardware knowledge, good review composition principle ha ha ha. I will add more information later when I have time.

In fact, it was not until the end of the study that I realized why epoll was called epoll, because the biggest difference between epoll and the original poll mechanism is that it has been improved to be driven by event. If I guess correctly, this E should stand for event.

reference

Research on Native layer of Handler in Android
What exactly is a Linux file descriptor?
Understand Linux file descriptors FD and Inode
Description of File Descriptor FD (File Descriptor)
Interprocess communication for Linux: Pipes
Linux pipe command (PIPE)
Linux I/O multiplexing and epoll details
Asynchronous blocking IO — epoll
In the Handler epoll