In the previous two articles, we covered how to troubleshoot long-running Queries and how to enable Core Generation on the server. Today, we are going to show you how to collect the coredump of a process for troubleshooting.

1. Introduction to Coredump

When the program is abnormal or crashes, the operating system will stop the process and copy the contents of the memory of the process to the specified directory on the disk for storage. A core file (a memory image, plus debugging information) is generated, which records the detailed description of the state when the program dies, so as to facilitate the debugging of programmers.

When does a Greenplum produce a coredump

Coredump is produced when one of the following occurs

  1. In some cases, the program sends a termination signal to the process to force the program to exit and restart because a hardware failure prevents certain content from being accessed internally.
  2. The process accessed the wrong memory address due to some program error.
  3. The user actively controls the process to dump core files. This operation is usually used for troubleshooting, which will be discussed later.

Where is the Coredump for the Greenplum cluster?

If you want the system to allow you to dump core files, you need to turn on some other parameters, such as setting the core size to large enough (or unlimited) in ulimit.

$ ulimit -a | grep core
core file size          (blocks, -c) unlimited
Copy the code

In Linux, you can check the path of the core file using the following parameters:

# sysctl kernel.core_pattern
kernel.core_pattern = /<directory>/core-%e-%s-%u-%g-%p-%t
Copy the code

For more configuration details, see how to Enable Core Generation on the Server.

In addition, in some cases (such as troubleshooting the query slowdowns mentioned in our previous articles), users need to actively collect the Coredump on a running process. This is where the gcore command comes in. Note that this command is included in the GDB program component, and you need to install the GDB package on your operating system if you want to use it. Centos is used as an example:

Install GDB ###
$ sudo yum -y install gdb
​
### Collect coredump of con8 process on Seg0 using Gcore, process number is 27705 ###$ ps -ef | grep con[0-9] | grep con8 | grep seg0 gpadmin 27705 27589 0 16:19 ? Con8 SEG0 cmd6 MPPEXEC UTILITY $Gcore 27705... Saved corefile core.27705Copy the code

After that, a file named core.[pid] is generated in the current directory, which is the Coredump for process 27705

2. How do I collect Coredump

If You need to perform Offline Troubleshooting for core files. In addition to the core file itself, we also need to collect the library files of all the corresponding process calls. If the library file is missing, there will be information confirmation when opening the core file through GDB, which will make it difficult to troubleshoot.

Finding and collecting library files is relatively tedious, but fortunately, the GPMT tool we used to collect logs already supports automatic collection of instructions to package coredump and library files. See the previous article for more details.

For example:

$ ./gpmt packcore -cmd collect -core ~/core/core.27705
Creating temp directory ./packcore-core.27705
...
Packcore generated:
  packcore-core.27705.tar.gz
Copy the code

3. How to quickly and manually collect all files

Sometimes we cannot use the GPMT tool to collect files. For example, there is a problem with the core file itself (GPMT cannot parse binary properly), or there is a security requirement in our environment that we cannot upload the execution file to the server. In this case, we can collect files manually by using the following methods.

In a nutshell, the steps are as follows:

  1. The GDB command info shared library is used to obtain library files for all process calls
  2. Copy all library files to their respective directories. (Note: many library files exist in the form of soft links, so we need to copy their corresponding target files.)

We have written the following script to help you quickly accomplish the above steps:

#! /bin/bashPath to execute core file ###
coreFile='/home/gpadmin/core/core.27705'
### GPHOME path ###
GOHOME="/opt/greenplum-db"unset PYTHONHOME
unset PYTHONPATH
unset LD_LIBRARY_PATH
​
for i in `gdb -ex "info sharedlibrary" -ex "quit" $GPHOME/bin/postgres $coreFile | grep "^0x" | awk '{print $NF}'`; do dir=`dirname $i`; file=`basename $i`; mkdir -pv .$dir ; cp -Lrpv $dir/$file .$dir/$file ;done
​
cp -rp $coreFile ./
cp -rp $GPHOME/bin/postgres ./
Copy the code

The following is an example:

$bash collect_core.sh mkdir: created directory './lib64 ' '/libz.so.1' -> './lib64/libz.so.1 'mkdir: created directory'./lib64/libz.so. Created directory './opt 'mkdir: created directory'./opt/greenplum-db 'mkdir: created directory'./opt/greenplum-db ' Create directory './opt/greenplum-db/./lib ' '/opt/greenplum-db/./lib/libgpopt.so.3' ->...### result ###$ ll total 405188 -rw-rw-r-- 1 gpadmin gpadmin 351986816 Mar 15 16:22 core.27705 drwxrwxr-x 2 gpadmin gpadmin 293 Mar 16  11:16 lib64 drwxrwxr-x 4 gpadmin gpadmin 50 Mar 16 11:16 opt -rwxr-xr-x 1 gpadmin gpadmin 62921216 Sep 26 00:14 postgresCopy the code

4. Reference materials

  • Community. The pivotal. IO/s/article/H…
  • Community. The pivotal. IO/s/article/h…