When developing embedded Linux applications, GDB is often used to analyze crash logs and, in general, can directly locate the location of the crash. Sometimes, however, when analyzing core files, there is no meaningful crash stack, and the problem gets a little more complicated. This can happen for several reasons:

  1. The -g option was not specified when the application was compiled, resulting in no debugging information for the executable.
  2. There is no debugging information for the dynamic and static libraries on which the application depends.
  3. The runtime environment of the application has no debugging information, such as LIBC, etc.
  4. The embedded Linux system application runtime environment and the cross-compilation toolchain runtime environment are not of the same version, for example, the liBC library used by the embedded Linux environment is not of the same version as the liBC library used by the cross-compiler.

The problems listed in items 1-3 can be easily detected by the GDB prompt. For example, if the test application is compiled without the -g option, the GDB debug will prompt the following:

Reading symbols from /home/jetpack/test/test... (no debugging symbols found)... done.Copy the code

The same is true for dynamic library prompts.

Today, I’m going to focus on the fourth scenario.

Here is a specific scenario to illustrate the problem.

Problem scenario

Article 4 says that the Linux system runtime environment is inconsistent with the cross-toolchain environment, so let’s take strlen from LIBC as an example.

Strlen is used to calculate the length of the c string, excluding the terminator ‘\0’. However, strlen passing a NULL pointer causes a segment error. Let’s simulate this scenario to see the problems caused by the inconsistency between the system running environment and the compilation environment.

Here’s the code that will crash:

#include<string.h> int main(int argc, char* argv[]) { int rlen = strlen(NULL); return 0; } note: -g is specified at compile time, otherwise test has no debugging information.Copy the code

Embedded Linux operating environment LIBC information:

root@zpd /lib$ll libc.so.6 LRWXRWXRWX 1 root root 12 Jan 1 00:00 libc.so.6 -> libc-2.13.so root@zpd /lib$ll Libc-2.13.so-rwxr-xr-x 1 root root 1409189 Jan 1 22:08 LIBC-2.13.soCopy the code

Cross compiler liBC information:

LRWXRWX 1 jetpack Jetpack 12 3月 9 12:07 LIBc-2.13. So * -rwxr-xr-x 1 jetpack Jetpack 1496962 3月 15 2012 Libc - 2.13. So *Copy the code

After compiling test, copy it to an embedded Linux environment and run it. Note that the uLIMIT of the shell environment on core files needs to be reconfigured, otherwise the core files will not appear. The command is as follows:

ulimit -c unlimited
./test
Segmentation fault (core dumped)
Copy the code

Copy the core file to the development environment and use GDB to view the core file information.

1. Reading symbols from /home/jetpack/test/test... done. [New Thread 852] 2. warning: .dynamic section for "/home/jetpack/work/OKMX6UL-C2/fsl-linaro-toolchain/arm-fsl-linux-gnueabi/multi-libs/lib/libc.so.6"  is not at the expected address (wrong library or version mismatch?) Reading symbols from /home/jetpack/work/OKMX6UL-C2/fsl-linaro-toolchain/arm-fsl-linux-gnueabi/multi-libs/lib/libc.so.6... done. Loaded symbols for /home/jetpack/work/OKMX6UL-C2/fsl-linaro-toolchain/arm-fsl-linux-gnueabi/multi-libs/lib/libc.so.6 Reading symbols from /home/jetpack/work/OKMX6UL-C2/fsl-linaro-toolchain/arm-fsl-linux-gnueabi/multi-libs/lib/ld-linux.so.3... done. Loaded symbols for /home/jetpack/work/OKMX6UL-C2/fsl-linaro-toolchain/arm-fsl-linux-gnueabi/multi-libs/lib/ld-linux.so.3 Core was generated  by `./test'. Program terminated with signal 11, Segmentation fault. #0 0x76e8d864 in tr_freehook () from /home/jetpack/work/OKMX6UL-C2/fsl-linaro-toolchain/arm-fsl-linux-gnueabi/multi-libs/lib/libc.so.6 (gdb) bt 3. #0 0x76e8d864 in tr_freehook () from /home/jetpack/work/OKMX6UL-C2/fsl-linaro-toolchain/arm-fsl-linux-gnueabi/multi-libs/lib/libc.so.6 #1 0x7ec97dd4 in ?? () #2 0x7ec97dd4 in ?? () Backtrace stopped: previous frame identical to this frame (corrupt stack?)Copy the code

Above is what GDB gets from core. Note the following points:

  1. Test was loaded successfully and has debugging information.
  2. warning: .dynamic section for “/home/jetpack/work/OKMX6UL-C2/fsl-linaro-toolchain/arm-fsl-linux-gnueabi/multi-libs/lib/libc.so.6” is not at the expected address (wrong library or version mismatch?) The dynamic section is not found in the liBC of the cross-compile environment. The result is a wrong library or version mismatch. That is, the library files are corrupted or inconsistent. This is the key to solving the problem.
  3. At this time, if you check the crash stack information through BT, you will get some puzzling information, such as tr_freehook here. If you do not pay attention to the warning information in 2, you may follow the wrong stack information to find the cause of the problem, which will lead to a lot of detour, and ultimately fruitless.

solution

Through the analysis of core in section 2, it can be seen that the key problem is caused by the inconsistency between the liBC repository of the running environment and the compilation environment, so the idea of solving the problem is obvious:

  1. Or modify the cross-compiled runtime environment.
  2. Or modify the runtime environment.

Copy the libc-2.13.so library of the cross-compiler to the embedded Linux environment, replace the previous libc-2.13.so, run test again, and get the core file, and debug it with GDB. The core file information is as follows:

1. Reading symbols from /home/jetpack/test/test... done. [New Thread 858] 2. Reading symbols from /home/jetpack/work/OKMX6UL-C2/fsl-linaro-toolchain/arm-fsl-linux-gnueabi/multi-libs/lib/libc.so.6... done. Loaded symbols for /home/jetpack/work/OKMX6UL-C2/fsl-linaro-toolchain/arm-fsl-linux-gnueabi/multi-libs/lib/libc.so.6 Reading symbols from /home/jetpack/work/OKMX6UL-C2/fsl-linaro-toolchain/arm-fsl-linux-gnueabi/multi-libs/lib/ld-linux.so.3... done. Loaded symbols for /home/jetpack/work/OKMX6UL-C2/fsl-linaro-toolchain/arm-fsl-linux-gnueabi/multi-libs/lib/ld-linux.so.3 Core was generated  by `./test'. Program terminated with signal 11, Segmentation fault. 3. #0 0x76e5e864 in strlen () from /home/jetpack/work/OKMX6UL-C2/fsl-linaro-toolchain/arm-fsl-linux-gnueabi/multi-libs/lib/libc.so.6 (gdb) bt #0 0x76e5e864  in strlen () from /home/jetpack/work/OKMX6UL-C2/fsl-linaro-toolchain/arm-fsl-linux-gnueabi/multi-libs/lib/libc.so.6 #1 0x00008390 in main (argc=1, argv=0x7ef85dc4) at test.c:5Copy the code

Compared with 1, 2 and 3 above, you can see that the second one can correctly load the LIBc.so.6 library; The third can be seen because the crash location is in strlen.

conclusion

This article uses a case study to briefly explain the failure of core file analysis in embedded Linux application development due to inconsistencies between the runtime environment and the cross-compile environment, and how to solve this problem. Some people ask, why would that lead to environmental inconsistency? When building an embedded Linux environment, the root file system is certainly involved in the build process, and the entire cross-compiler environment is copied to the root file system so that this problem does not occur. Yes, there should be no problem if we follow this process, but the reality is that we have a built Linux system, and if the cross-compilation tool chain is upgraded but the embedded Linux runtime environment is not, then there is a problem. At this point, you need to analyze the root cause of the problem based on various information, and here are some warning messages when GDB analyzes the core.

So, again, any warning message during development must be carefully analyzed, and the clue to the solution often lies in it.