Simplified Pod fault diagnosis: Introduction to Kubectl-DEBUG

background

One of the best practices in container technology is to build as minimal a container image as possible. However, this practice can cause trouble in troubleshooting problems: the simplified containers are generally missing common troubleshooting tools, and some containers don’t even have shells (such as FROM Scratch). In this case, we can only troubleshoot problems through logs or through docker-CLI or nsenter on the host, which is very inefficient. The Kubernetes community has long been aware of this problem. In 2016, there was a related Issue of Support for Troubleshooting Distroless containers and a corresponding Proposal was formed. Unfortunately, due to the extensive nature of the changes, the implementation has not yet been incorporated into Kubernetes upstream code. By accident (PingCAP called for a kubectl plug-in to do something similar), I developed Kubectl-debug: to help diagnose the target container by launching a container with various troubleshooting tools installed.

The working principle of

Let’s not rush into Quick Start. Kubectl-debug itself is very simple, so once you understand how it works, you can fully master the tool and use it to do more than just debug.

As we know, a container is essentially a set of processes with cGroup resource constraints and namespace isolation. Therefore, we can simply start a process and add it to the various namespaces of the target container to “go inside the container” (note the quotation marks), The same root file system, virtual network card, and process space that the process in the container “sees” — this is how commands like Docker exec and Kubectl exec work.

Now, not only do we want to “get inside the container,” but we want to bring a set of tools with us to help troubleshoot problems. The best way to manage a toolset efficiently and cross-platform is to package the tools themselves in a container image. Next, we just need to start the container through the “tool image”, and then specify the container to join the target container’s various namespaces, naturally achieve “carry a set of tools into the container”. In fact, you can do this using docker-CLI:

export TARGET_ID=666666666
Add network, PID, and IPC namespace to the target container
docker run -it --network=container:$TARGET_ID --pid=container:$TARGET_ID --ipc=container:$TARGET_ID busybox
Copy the code

This is the starting point of Kubectl-Debug: use the tool container to diagnose the business container. The idea behind the design is consistent with patterns like Sidecar’s: each container does one thing.

Kubectl debug

kubectl debug

The steps are as follows:

Check whether demo- Pod exists in ApiServer
ApiServer Returns to the node where Demo-Pod resides
The plug-in request is created on the target nodeDebug Agent Pod
Kubelet createDebug Agent Pod
Plug-in foundDebug AgentReady, initiate debug request (long connection)
Debug AgentAfter receiving a debug request, create a Debug container and add it to each Namespace of the target container. After the creation is complete, connect to the TTY of the debug container

Next, the client can start debugging through connections 5 and 6. After the operation is complete, the Debug Agent clears the Debug container and the plug-in clears the Debug Agent. The effect is shown below:

Begin to use

Mac can be installed directly using BREW:

brew install aylei/tap/kubectl-debug
Copy the code

Binary can be installed by downloading binary on all platforms:

exportPLUGIN_VERSION = while# linux x86_64
curl -Lo kubectl-debug.tar.gz https://github.com/aylei/kubectl-debug/releases/download/v${PLUGIN_VERSION}/kubectl-debug_${PLUGIN_VERSION}_linux_amd64.tar.gz
# macos
curl -Lo kubectl-debug.tar.gz https://github.com/aylei/kubectl-debug/releases/download/v${PLUGIN_VERSION}/kubectl-debug_${PLUGIN_VERSION}_darwin_amd64.tar.gz

tar -zxvf kubectl-debug.tar.gz kubectl-debug
sudo mv kubectl-debug /usr/local/bin/
Copy the code

Windows users can download it on the Release page.

Once you’ve downloaded it, you can start using the Debug plug-in:

kubectl debug target-pod --agentless --port-forward
Copy the code

Kubectl has supported automatic discovery of plug-ins from PATH since version 1.12. Kubectl prior to version 1.12 does not support this plug-in mechanism, but it can also be called directly with the command kubectl-debug.

Refer to the Project’s README in Chinese for more documentation and help information.

Typical cases

Basic debugging

Kubectl Debug uses Nicolaka/Netshoot as the default base image and has quite a few built-in troubleshooting tools, including:

Use iftop to view container network traffic:

➜ ~ kubectl debug demo-pod root@ / [2] 🐳 → iftop -i eth0 interface: eth0 IP address is: 10.233.111.78 MAC address is: 86:c3:ae:9d:46:2b# (image omitted)
Copy the code

Use drill to diagnose DNS resolution:

Root @ / [3] 🐳 → 5 demo service; ->>HEADER<<- opcode: QUERY, rcode: NOERROR, id: 0 ;; flags: rd ; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0 ;; QUESTION SECTION: ;; demo-service. IN A ;; ANSWER SECTION: ;; AUTHORITY SECTION: ;; ADDITIONAL SECTION: ;; Query time: 0 msec ;; WHEN: Sat Jun 1 05:05:39 2019 ;; MSG SIZE rcvd: 0 ;; ->>HEADER<<- opcode: QUERY, rcode: NXDOMAIN, id: 62711 ;; flags: qr rd ra ; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0 ;; QUESTION SECTION: ;; demo-service. IN A ;; ANSWER SECTION: ;; AUTHORITY SECTION: . 30 IN SOA a.root-servers.net. nstld.verisign-grs.com. 2019053101 1800 900 604800 86400 ;; ADDITIONAL SECTION: ;; Query time: 58 msec ;; SERVER: 10.233.0.10;; WHEN: Sat Jun 1 05:05:39 2019 ;; MSG SIZE rcvd: 121Copy the code

Packet capture using tcpdump:

root@ / [4] 🐳 → tcpdump -i eth0 -c 1-xvv tcpdump: Listening on eth0, Link-type EN10MB (Ethernet), Capture Size 262144 bytes 12:41:49.707470 IP (TOS 0x0, TTL 64, id 55201, offset 0, flags [DF], proto TCP (6), length 80) demo-pod.default.svc.cluster.local.35054 > 10-233-111-117.demo-service.default.svc.cluster.local.8080: Flags [P.], cksum 0xf4d7 (incorrect -> 0x9307), seq 1374029960:1374029988, ack 1354056341, win 1424, options [nop,nop,TS val 2871874271 ecr 2871873473], length 28 0x0000: 4500 0050 d7a1 4000 4006 6e71 0ae9 6f4e E.. P.. @[email protected].. oN 0x0010: 0ae9 6f75 88ee 094b 51e6 0888 50b5 4295 .. ou... KQ... P.B. 0x0020: 8018 0590 f4d7 0000 0101 080a ab2d 52df ............. -R. 0x0030: ab2d 4fc1 0000 1300 0000 0000 0100 0000 .-O............. 0x0040: 000e 0a0a 08a1 86b2 ebe2 ced1 f85c 1001 ............. \.. 1 packet captured 11 packets received by filter 0 packets dropped by kernelCopy the code

Accessing the root file system of the target container:

Container technologies, such as Docker, leverage the /proc/{pid}/root/ directory provided by the /proc filesystem to provide a separate root filesystem (chroot below) for isolated container processes. When we want to access the root file system of the target container, we can access this directory directly:

root@ / [5] 🐳 → tail-f /proc/1/root/log_
Hello, world!
Copy the code

A common problem is that commands that rely on the /proc filesystem, such as free top, display host information, and this is something that developers need to get used to during containerization (and, of course, the runtime will also need to get used to, For example, the infamous Java 8U121 and earlier versions that didn’t recognize the Cgroups restriction are examples.

Diagnostic CrashLoopBackoff

Checking CrashLoopBackoff is a very troublesome problem, Pod may constantly restart, kubectl exec and Kubectl debug can not be stable to check the problem, basically can only hope that Pod logs printed out useful information. Kubectl -debug adds –fork to CrashLoopBackoff to make checking CrashLoopBackoff easier. When –fork is specified, the plugin copies the current Pod Spec, makes some minor changes, and creates a new Pod:

All Labels of the new Pod will be removed to prevent the Service from sending traffic to the Pod fork
The new PodReadinessProbe 和 LivnessProbeWill also be removed to prevent Kubelet from killing Pod
The startup command of the target container (the container to be defused) in the new Pod will be overwritten to prevent the new Pod from continuing to Crash

Next, we can try to replicate the problem that caused the Crash in the old Pod in the new Pod. To ensure consistency, chroot to the root file system of the target container:

➜ ~ kubectl debug demo-pod --fork root@/ [4] 🐳 → chroot /proc/1/root root@/ [#] 🐳 - > ls
 bin            entrypoint.sh  home           lib64          mnt            root           sbin           sys            tmp            var
 dev            etc            lib            media          proc           run            srv            usr

root @ /
 [#] 🐳 -. / entrypoint. Sh
 Observe the information when the startup script is executed and use that information to further troubleshoot
Copy the code

The ending mumble

Kubectl-debug started out as PingCAP homework during interviews, and the first version was completed at the end of last year. At the time, the project was very crude, with missing documentation and many features:

Diagnosing Pod in CrashLoopBackoff is not supported
It is mandatory to pre-install a Debug Agent DaemonSet
Public clouds are not supported (Debugging cannot be performed when a node does not have a public IP address or the public IP address cannot be accessed due to a firewall)
There are no permissions, and the security risks are high

To my great excitement, WHEN I had no time to take care of the project, I would receive the notification email of Pull Request every one or two weeks. Until now, most of the problems affecting the basic use experience have been solved. Kubectl-debug has also been released in 4 versions (0.0.1, 0.0.2, 0.1.0, 0.1.1). Especially thanks to @tkanng. In the first PR, he said that he had never written Go before, but in version 0.1.1, he has been a contributor to most features of this version and solved several issues lasting for a long time. Thanks!

Finally, the project address: github.com/aylei/kubec…

If you have any questions about the use or the project itself, please submit the issue. You can also leave a comment in the comment section of the article or my email for discussion.

Simplified Pod fault diagnosis: Introduction to Kubectl-DEBUG

background

The working principle of

Begin to use

Typical cases

Basic debugging

Diagnostic CrashLoopBackoff

The ending mumble

Related Posts

Liang Cheng, Technical Manager, Unity Greater China: We are always working with developers to build an open and win-win platform

2019, thank you once to his | Denver annual essay

King way computer network Ethernet