Podman uses the traditional fork/exec model (as opposed to the client/server model) to run containers.

Before I dive into the main topic of this article, Podman and containers, I need to learn a little about the techniques of Linux auditing capabilities.

What is an audit?

The Linux kernel has an interesting security feature called auditing. It allows administrators to monitor security events on the system and log them to audit.log, which can be stored locally or remotely on another machine to prevent a hacker from trying to cover his tracks.

The /etc/shadow file is a security file that is often monitored because adding records to it may allow an attacker to gain access to the system. The administrator wants to know if any process has modified the file. You can do this by executing the following command:

# auditctl -w /etc/shadow
Copy the code

Now let’s see what happens when I modify the /etc/shadow file:

# touch /etc/shadow 
# ausearch -f /etc/shadow -i -ts recent

type=PROCTITLE MSG =audit(10/10/2018 09:46:03.042:4108) : PROCTITLE =touch /etc/shadowtype=audit(10/10/2018 09:46:03.042:4108) : arch=x86_64 SYSCALL =openat success=yesexit=3 a0=0xffffff9c a1=0x7ffdb17f6704 a2=O_WRONLY|O_CREAT|O_NOCTTY| O_NONBLOCK a3=0x1b6 items=2 ppid=2712 pid=3727 auid=dwalsh uid=root gid=root euid=root suid=root fsuid=root egid=root sgid=root fsgid=root tty=pts1 ses=3 comm=touch exe=/usr/bin/touch subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 key=(null)`
Copy the code

There is a lot of information in the audit record, but I note that it records that root modified the /etc/shadow file and that the owner of the audit UID (AUID) for the process is DWalsh.

Has the kernel modified this file?

Trace login UID

Loginuid (loginuid), stored in /proc/self_loginuid, is part of the proc structure for each process on the system. This field can be set only once. Once set, the kernel will not allow any process to reset it.

When I log on to the system, the login program sets the loginUID field for my login process.

My (DWalsh) UID is 3267.

$ cat /proc/self/loginuid
3267
Copy the code

Now, even if I become root, my login UID will remain the same.

$ sudo cat /proc/self/loginuid
3267
Copy the code

Note that every process that forks and exec from the initial login process automatically inherits the LoginUID. This is how the kernel knows that the logon is DWalsh.

The container

Now let’s look at the container.

sudo podman run fedora cat /proc/self/loginuid
3267
Copy the code

Even the container process retains my LoginUID. Now let’s try Docker.

sudo docker run fedora cat /proc/self/loginuid 
4294967295
Copy the code

Why is it different?

Podman uses the traditional fork/exec model for containers, so container processes are descendants of Podman processes. Docker uses a client/server model. The Docker command I execute is a Docker client tool that communicates with the Docker daemon through client/server operations. The Docker daemon then creates the container and handles the communication between STdin/STdout and the Docker client tool.

The default loginUID for the process (before setting the loginUID) is 4294967295. Since the container is a descendant of the Docker daemon, and the Docker daemon is a descendant of the init system, we see that systemd, Docker daemon, and container processes all have the same loginUID: 4294967295, the audit system considers that the audit UID is not set.

cat /proc/1/loginuid 
4294967295
Copy the code

How can it be abused?

Let’s take a look at what happens if the container process Docker starts changes the /etc/shadow file.

$ sudo docker run --privileged -v /:/host fedora touch /host/etc/shadow 
$ sudo ausearch -f /etc/shadow -i type= PROCTITLE MSG = audit (10/10/2018 10:27:20. 055:4569) : proctitle=/usr/bin/coreutils --coreutils-prog-shebang=touch /usr/bin/touch /host/etc/shadowtype=audit(10/10/2018 10:27:20.055:4569) : arch=x86_64 SYSCALL =openat success=yesexit=3 a0=0xffffff9c a1=0x7ffdb6973f50 a2=O_WRONLY|O_CREAT|O_NOCTTY| O_NONBLOCK a3=0x1b6 items=2 ppid=11863 pid=11882 auid=unset uid=root gid=root euid=root suid=root fsuid=root egid=root sgid=root fsgid=root tty=(none) ses=unset comm=touch exe=/usr/bin/coreutils subj=system_u:system_r:spc_t:s0 key=(null)
Copy the code

In Docker case, auID is not set (4294967295); This means that security personnel may know that a process modified the /etc/shadow file but lost its identity.

If the attacker then deletes the Docker container, there will be no trace information on the system about who modified the /etc/shadow file.

Now let’s look at the same scenario under Podman.

$ sudo podman run --privileged -v /:/host fedora touch /host/etc/shadow 
$ sudo ausearch -f /etc/shadow -i type= PROCTITLE MSG = audit (10/10/2018 10:23:41. 659:4530) : proctitle=/usr/bin/coreutils --coreutils-prog-shebang=touch /usr/bin/touch /host/etc/shadowtype=audit(10/10/2018 10:23:41.659:4530) : arch=x86_64 SYSCALL =openat success=yesexit=3 a0=0xffffff9c a1=0x7fffdffd0f34 a2=O_WRONLY|O_CREAT|O_NOCTTY| O_NONBLOCK a3=0x1b6 items=2 ppid=11671 pid=11683 auid=dwalsh uid=root gid=root euid=root suid=root fsuid=root egid=root sgid=root fsgid=root tty=(none) ses=3 comm=touch exe=/usr/bin/coreutils subj=unconfined_u:system_r:spc_t:s0 key=(null)
Copy the code

Because it uses the traditional fork/exec approach, Podman records everything correctly.

This is just a simple example of looking at the /etc/shadow file, but auditing systems are very useful for looking at processes on the system. Starting containers with fork/exec container runtimes (rather than client/server container runtimes) allows you to maintain better security through audit logging.

Final thoughts

The fork/exec model has many other nice features when starting containers compared to the client/server model. For example, Systemd features include:

  • SD_NOTIFY: If you put the Podman command into the Systemd cell file, the container process can return a notification via Podman indicating that the service is ready to receive the task. This is something that cannot be done in client/server mode.
  • Socket activation: You can pass connected sockets from Systemd to Podman and to the container process to use them. This is not possible in the client/server model.

In my opinion, its best feature is to run Podman and containers as a non-root user. This means that you never grant root privileges to the user on the host, whereas in the client/server model (as Docker uses) you have to open the socket of the privileged daemon running as root to start the container. There, you are governed by the security mechanisms implemented in the daemon versus the security mechanisms implemented in the host operating system — a dangerous proposition.


Via: opensource.com/article/18/…

By Daniel J Walsh, Lujun9972

This article is originally compiled by LCTT and released in Linux China