This series introduces memory monitoring schemes around the following aspects:

  • Number of FD

  • Number of threads

  • Virtual memory

  • The Java heap

  • Native memory

FD monitoring

FDS are File descriptors. For Android, a process can use a limited number of FD resources, up to 1024 before Android9, up to 3W after Android9. In the OOM, it is difficult to locate, because the stack information after the crash may not point to the “originator”. So it is necessary to monitor FD leakage.

What operations occupy FD resources? Common: file reading and writing, Socket communication, creating Java threads, enabling HandlerThread, creating Windows, database operations, etc.

For example, creating a Java thread first creates a JNIEnv in the Native layer. This step includes:

  1. 4KB kernel-state memory is allocated through anonymous shared memory.
  2. Mapping to user-mode virtual memory address space through MMAP.

The /dev/ashmem file is opened when anonymous shared memory is created, so a FD is required to create the thread.

FD information

/proc to obtain the FD of the process, the code can be seen in the matrix.

  • Read process status /proc/pid/limits and interpret limite.rlim_max field. (I actually measured rlim_cur and rlim_max.)

  • Read the process file /proc/pid/fd and count the number of files.

  • Traverse the process file /proc/pid/fd and interpret the file link via readlink. (For RequiresApi 21 and above you can use the system method os.readlink (file.absolutePath) directly.)

Not clear how to call c++ code, you can see my previous blog, Android training series (19), to compile a so library of your own bar, in my Samsung S8 test machine part of the data is as follows:

plan

Solution: Directly start a thread and check the number of FD’s in the current process every 10 seconds. When the number of FD’s reaches the threshold (for example, 90%), the current process’s FD information, thread information, and memory snapshot information are captured.

We can use the path in FD information to locate IO problems, locate Java threads and handlerThreads by thread name, and troubleshoot Socket and window problems by memory snapshot.

A separate section on how to dumpHprofData memory snapshot will be written later.

Thread monitoring

In Android, a Java thread occupies about 1M stack memory. If it is a native thread, you can specify the stack size by pthread_atta_t. If you create threads without limit, OOM crash will result.

The system limits the number of threads in three ways:

  • The /proc/sys/kernel/threads-max configuration file specifies the maximum number of system-wide threads.

  • The RLIMIT_NPROC parameter for Linux Resource Limits corresponds to the maximum number of threads applied.

  • Memory reasons, such as insufficient virtual address space or kernel failure to allocate Vmas, also limit the number of threads that can be created.

Try reading the threads-max file directly, no permissions.

Thread information

We can get all Java threads from threadGroups:

val threadGroup: ThreadGroup = Looper.getMainLooper().thread.threadGroup
val threadList = arrayOfNulls<Thread>(threadGroup.activeCount() * 2)
val size = threadGroup.enumerate(threadList);
Copy the code

Threads in /proc/[pid]/status, where /proc/[pid]/task records the tid and thread name of all Threads:

File(String.format("/proc/%s/status", Process.myPid())).forEachLine { line ->
    when {
        line.startsWith("Threads") -> {
            Log.d("mjzuo", line)
        }
    }
}
Copy the code

Solution: To monitor the number of threads, the same as the idea of FD, is to open a child thread, check the number of current threads of the application periodically, when the threshold is exceeded, grab thread information and report.

Thread leakage

Both Java and native threads are created using the pthread_create method. The pthread_DETach, Pthread_JOIN, and pthread_exit apis are common. When creating a thread using pthread_create, the thread state is joinable by default. To automatically free stack memory when the thread exits after execution, otherwise you need to wait for a call to JOIN to free memory.

If you exit the create thread without calling detach or JOIN, the stack memory will not be released and the thread will leak.

Now that we know the technical principle, then the monitoring method is ready to come out, hook several interfaces above, record the leakage thread information of joinable state. Take KOOM source code as an example:

Java layer code will not say, directly look at the c++ logic, this is the bridge JNI interface:

JNIEXPORT void JNICALL
Java_com_kwai_performance_overhead_thread_monitor_NativeHandler_start( JNIEnv *env, jclass obj) {
  koom::Log::info("koom-thread"."start");
  koom::Start(a); }JNIEXPORT void JNICALL
Java_com_kwai_performance_overhead_thread_monitor_NativeHandler_stop( JNIEnv *env, jclass obj) {
  koom::Stop(a); }Copy the code

Take a look at the koom. Cpp# Start interface:

void Start() { if (isRunning) { return; } // Initialize data delete sHookLooper; sHookLooper = new HookLooper(); / / create HookLooper used for forwarding message koom: : ThreadHooker: : Start (); // start hook isRunning = true; }Copy the code

This is thread_hook cpp# Start interface, including dlopencb. H logic is not posted, directory in koom – common/third party/xhook/SRC/main/CPP/xhook/SRC / :

void ThreadHooker::Start(a) { ThreadHooker::InitHook(a); }void ThreadHooker::InitHook(a) {
  koom::Log::info(thread_tag, "HookSo init hook");
  std::set<std::string> libs;
  DlopenCb::GetInstance().GetLoadedLibs(libs); // Get the dynamic library to be hooked
  HookLibs(libs, Constant::kDlopenSourceInit); // hook
  DlopenCb::GetInstance().AddCallback(DlopenCallback); // listen, where GetLoadedLibs(libs, true) callback
}
Copy the code

This is the thread_hook. cppcpp# HookLibs interface

void ThreadHooker::HookLibs(std::set<std::string> &libs, int source) {
  koom::Log::info(thread_tag, "HookSo lib size %d", libs.size());
  if (libs.empty()) {
    return;
  }
  bool hooked = false;
  pthread_mutex_lock(&DlopenCb::hook_mutex);
  xhook_clear(a);// Clear the xhook cache and reset all global flags
  for (const auto &lib : libs) {
    hooked |= ThreadHooker::RegisterSo(lib, source); // start hook so method
  }
  if (hooked) {
    int result = xhook_refresh(0); // 0: synchronous hook operation. 1: asynchronous hook operation
    koom::Log::info(thread_tag, "HookSo lib Refresh result %d", result);
  }
  pthread_mutex_unlock(&DlopenCb::hook_mutex);
}
Copy the code

This is the method we hook: thread_hook. Cppcpp# RegisterSo

bool ThreadHooker::RegisterSo(const std::string &lib, int source) {
  if (IsLibIgnored(lib)) { // Filter libraries that do not hook
    return false;
  }
  auto lib_ctr = lib.c_str(a); koom::Log::info(thread_tag, "HookSo %d %s", source, lib_ctr);
  xhook_register(lib_ctr, "pthread_create".reinterpret_cast<void *>(HookThreadCreate), nullptr);
  xhook_register(lib_ctr, "pthread_detach".reinterpret_cast<void *>(HookThreadDetach), nullptr);
  xhook_register(lib_ctr, "pthread_join".reinterpret_cast<void *>(HookThreadJoin), nullptr);
  xhook_register(lib_ctr, "pthread_exit".reinterpret_cast<void *>(HookThreadExit), nullptr);

  return true;
}
Copy the code

When the pthread_create method is called, it is intercepted into our hook method:

int ThreadHooker::HookThreadCreate(pthread_t *tidp, const pthread_attr_t *attr,
                                   void *(*start_rtn)(void *), void *arg) {
  if (hookEnabled() && start_rtn ! =nullptr) {...// The information returned by hook
    if(thread ! =nullptr) { 
      koom::CallStack::JavaStackTrace(thread, hook_arg->thread_create_arg->java_stack); / / Java stack
    }
    koom::CallStack::FastUnwind(thread_create_arg->pc, koom::Constant::kMaxCallStackDepth); // Native stack tracing
    thread_create_arg->stack_time = Util::CurrentTimeNs() - time; 
    return pthread_create(tidp, attr,
                          reinterpret_cast<void* (*) (void *)>(HookThreadStart),
                          reinterpret_cast<void *>(hook_arg));
  }
  return pthread_create(tidp, attr, start_rtn, arg);
}
Copy the code

Thread_hook.cppcppcpp# HookThreadStart is then called

ALWAYS_INLINE void ThreadHooker::HookThreadStart(void *arg) {...// Add hook info to group HookAddInfo
  auto info = new HookAddInfo(tid, Util::CurrentTimeNs(), self,
                              state == PTHREAD_CREATE_DETACHED,
                              hookArg->thread_create_arg);

  sHookLooper->post(ACTION_ADD_THREAD, info); / / forwarding HookLooper cpp# handle
  void *(*start_rtn)(void *) = hookArg->start_rtn;
  void *routine_arg = hookArg->arg;
  delete hookArg;
  start_rtn(routine_arg);
}
Copy the code

The message is forwarded to HookLooper. Cpp# handle:

case ACTION_ADD_THREAD: {
  koom::Log::info(looper_tag, "AddThread");
  auto info = static_cast<HookAddInfo *>(data);
  holder->AddThread(info->tid, info->pthread, info->is_thread_detached,
                    info->time, info->create_arg); / / forward again
  delete info;
  break;
}
Copy the code

The message is forwarded to thread_holder.cpp#AddThread, where the thread is recorded and the status marked:

void ThreadHolder::AddThread(int tid, pthread_t threadId, bool isThreadDetached,
                             int64_t start_time, ThreadCreateArg *create_arg) {
  bool valid = threadMap.count(threadId) > 0;
  if (valid) return;

  koom::Log::info(holder_tag, "AddThread tid:%d pthread_t:%p", tid, threadId);
  auto &item = threadMap[threadId]; // List of threads
  item.Clear();
  item.thread_internal_id = threadId;
  item.thread_detached = isThreadDetached; // This is the thread state mentioned above, falseitem.startTime = start_time; item.create_time = create_arg->time; item.id = tid; .// stack contents are not attached
  delete create_arg;
  koom::Log::info(holder_tag, "AddThread finish");
}
Copy the code

The detach and join logic are the same when the message is forwarded.

void ThreadHolder::DetachThread(pthread_t threadId) {
  bool valid = threadMap.count(threadId) > 0;
  koom::Log::info(holder_tag, "DetachThread tid:%p", threadId);
  if (valid) {
    threadMap[threadId].thread_detached = true; // Change the state
  } else {
    leakThreadMap.erase(threadId); // Remove from the list of leaking threads}}Copy the code

Detached: This is the logic of exit, where non-detached threads are added to the leak collection. Note that you can remove them if you call join after exit:

void ThreadHolder::ExitThread(pthread_t threadId, std::string &threadName,
                              long long int time) {
  bool valid = threadMap.count(threadId) > 0;
  if(! valid)return;
  auto&item = threadMap[threadId]; .if(! item.thread_detached) {/ / let the cat out of the
    koom::Log::error(holder_tag,
                     "Exited thread Leak! Not joined or detached! \n tid:%p",
                     threadId);
    leakThreadMap[threadId] = item;
  }
  threadMap.erase(threadId); // Remove from thread collection
  koom::Log::info(holder_tag, "ExitThread finish");
}
Copy the code

Virtual memory, Java heap, and native memory monitoring will be covered in the next section.

In this section.

Reference:

Cloud.tencent.com/developer/a…