When building Docker containers, you should try to get smaller images because it is faster to transfer and deploy smaller images.

But how do you get a smaller image when the RUN statement always creates a new layer and requires many intermediate files before the image is generated?

You may have noticed that most Dockerfiles use some strange tricks:

FROM ubuntu
RUN apt-get update && apt-get install vim
Copy the code

Why use &&? Instead of using two RUN statements instead? Such as:

FROM ubuntu
RUN apt-get update
RUN apt-get install vim
Copy the code

Starting with Docker 1.10, the COPY, ADD, and RUN statements ADD new layers to the image. The previous example created two layers instead of one.

The mirroring layer is like a Git commit.

The Docker layer is used to save the differences between the previous version of the image and the current version. Just like Git commits, they come in handy if you share them with other repositories or images.

In effect, when you request an image from the registry, you are simply downloading layers that you do not already own. This is a very efficient way to share mirrors.

But the extra layers don’t come without a cost.

Layers still take up space, and the more layers you have, the bigger the final mirror image. Git repositories are similar in this respect, increasing in size with the number of layers because Git must save all changes between commits.

In the past, it might have been a good idea to combine multiple RUN statements in a single command, as in the first example above, but now it’s not a good idea.

1. Compress multiple layers into one through Docker multi-stage construction

As the Git repository gets larger, you can choose to compress historical commit records into a single commit.

As it turns out, you can use multistage builds for a similar purpose in Docker.

In this example, you will build a Node.js container.

Let’s start with index.js:

const express = require('express') const app = express() app.get('/', (req, res) => res.send('Hello World! ')) app.listen(3000, () => { console.log(`Example app listening on port 3000! `)})Copy the code

And package. Json:

{" name ":" hello, world ", "version" : "1.0.0", "main" : "index. Js", "dependencies" : {" express ":" ^ 4.16.2 "}, "scripts" : { "start": "node index.js" } }Copy the code

You can use the following Dockerfile to package the application:

FROM node:8
EXPOSE 3000
WORKDIR /app
COPY package.json index.js ./
RUN npm install
CMD ["npm", "start"]
Copy the code

Then start building the image:

$ docker build -t node-vanilla .
Copy the code

Then verify that it works with the following methods:

$ docker run -p 3000:3000 -ti --rm --init node-vanilla
Copy the code

You should be able to visit http://localhost:3000 and receive “Hello World! .

Dockerfile uses a COPY statement and a RUN statement, so it is expected that the new image should have at least two more layers than the base image:

$ docker history node-vanilla IMAGE CREATED BY SIZE 075d229d3f48 /bin/sh -c #(nop) CMD ["npm" "start"] 0B bc8c3cc813ae /bin/sh -c NPM install 2.91MB bac31AFb6f42 /bin/sh -c #(nop) COPY multi: 3071DDd474429E1 /bin/sh -c #(nop) COPY multi: 3071DDd474429E1... 364B 500a9fbef90e /bin/sh -c #(nop) WORKDIR /app 0B 78b28027dfbf /bin/sh -c #(nop) EXPOSE 3000 0B b87c2ad8344d /bin/sh -c #(nop) CMD ["node"] 0B < missing> /bin/sh -c set-ex && for key in 6A010... 4.17 MB & lt; missing> /bin/sh -c #(nop) ENV YARN_VERSION= 1.3.2b< missing> /bin/sh -c ARCH= &&dpkgarch ="$(DPKG --print... 56.9 MB & lt; missing> /bin/sh -c #(nop) ENV NODE_VERSION= 8.9.4b< missing> /bin/sh -c set-ex && for key in 94AE3... 129kB < missing> /bin/sh -c groupadd --gid 1000 node && use... 335kB < missing> /bin/sh -c set -ex; apt-get update; Apt - ge... 324MB < missing> /bin/sh -c apt-get update && apt-get install... 123MB < missing> /bin/sh -c set -ex; if ! command -v gpg > /... 0B < missing> /bin/sh -c apt-get update && apt-get install... 44.6 MB & lt; missing> /bin/sh -c #(nop) CMD ["bash"] 0B < missing> /bin/sh -c #(nop) ADD file: 1DD78a123212328bd... 123MBCopy the code

Instead, the generated image has five new layers: each layer corresponds to a statement in the Dockerfile.

Now, let’s try Docker’s multi-phase build.

You can continue to use the same Dockerfile as above, but now call it twice:

FROM node:8 as build
WORKDIR /app
COPY package.json index.js ./
RUN npm install
FROM node:8
COPY --from=build /app /
EXPOSE 3000
CMD ["index.js"]
Copy the code

The first part of the Dockerfile creates three layers, which are then merged and copied to the second stage. In the second phase, two additional layers are added to the top of the mirror, making a total of three layers.

Now let’s verify that. First, build the container:

$ docker build -t node-multi-stage .
Copy the code

View mirror history:

$ docker history node-multi-stage IMAGE CREATED BY SIZE 331b81a245b1 /bin/sh -c #(nop) CMD ["index.js"] 0B bdfc932314af /bin/sh -c #(nop) EXPOSE 3000 0B f8992f6c62a6 /bin/sh -c #(nop) COPY dir:e2b57dff89be62f77... 1.62MB b87c2ad8344D /bin/sh -c #(nop) CMD ["node"] 0b< missing> /bin/sh -c set-ex && for key in 6A010... 4.17 MB & lt; missing> /bin/sh -c #(nop) ENV YARN_VERSION= 1.3.2b< missing> /bin/sh -c ARCH= &&dpkgarch ="$(DPKG --print... 56.9 MB & lt; missing> /bin/sh -c #(nop) ENV NODE_VERSION= 8.9.4b< missing> /bin/sh -c set-ex && for key in 94AE3... 129kB < missing> /bin/sh -c groupadd --gid 1000 node && use... 335kB < missing> /bin/sh -c set -ex; apt-get update; Apt - ge... 324MB < missing> /bin/sh -c apt-get update && apt-get install... 123MB < missing> /bin/sh -c set -ex; if ! command -v gpg > /... 0B < missing> /bin/sh -c apt-get update && apt-get install... 44.6 MB & lt; missing> /bin/sh -c #(nop) CMD ["bash"] 0B < missing> /bin/sh -c #(nop) ADD file: 1DD78a123212328bd... 123MBCopy the code

Has the file size changed?

$ docker images | grep node-
node-multi-stage   331b81a245b1   678MB
node-vanilla       075d229d3f48   679MB
Copy the code

The final image (Node-multi-stage) is smaller.

You’ve reduced the size of the image, even though it’s a very small application.

But the whole image is still huge!

Is there any way to make it smaller?

2. Use distroless to remove all unnecessary stuff from the container

This image contains Node.js as well as yarn, NPM, bash, and other binaries. Since it’s also based on Ubuntu, you have a complete operating system, complete with all the little binaries and utilities.

You don’t need these things to run containers, you just need Node.js.

The Docker container should contain only one process and the minimum number of files needed to run that process, you don’t need the entire operating system.

In fact, you can delete everything except Node.js.

But how?

Fortunately, Google provides distroless for us.

Here’s a description of the distroless repository:

The distroless image contains only the application and its runtime dependencies, not the package manager, shell, or any other programs found in standard Linux distributions.

Just what you need!

You can adjust the Dockerfile to take advantage of the new base image, as shown below:

FROM node:8 as build
WORKDIR /app
COPY package.json index.js ./
RUN npm install
FROM gcr.io/distroless/nodejs
COPY --from=build /app /
EXPOSE 3000
CMD ["index.js"]
Copy the code

You can compile the image as usual:

$ docker build -t node-distroless .
Copy the code

This image should work fine. To verify it, run the container like this:

$ docker run -p 3000:3000 -ti --rm --init node-distroless
Copy the code

You can now visit the http://localhost:3000 page.

Isn’t an image that doesn’t contain additional binaries much smaller?

$docker images | grep node - distroless node - distroless 7 b4db3b7f1e5 76.7 MBCopy the code

Only 76.7 MB.

600MB smaller than the previous image!

But there are a few caveats when using distroless.

If you want to examine the container while it is running, attach it to the running container using the following command:

$ docker exec -ti < insert_docker_id> bashCopy the code

Attaching to the running container and running the bash command is like setting up an SSH session.

But distroless is a stripped-down version of the original operating system without the extra binaries, so there’s no shell in the container!

How do you attach to a running container without a shell?

The answer is, you can’t. This is both bad news and good news.

The bad news is that you can only execute binaries in a container. The only binary you can run is Node.js:

$ docker exec -ti < insert_docker_id> nodeCopy the code

This is good news because an attacker using your application to gain access to the container won’t be able to do as much damage as accessing the shell. In other words, fewer binaries means smaller size and greater security, but at the expense of painful debugging.

Perhaps instead of attaching and debugging containers in production, you should use logging and monitoring.

But what if you do need to debug, and you want to keep it small?

3. Small volume Alpine Base mirror

You can replace the Alpine base image with the Distroless base image.

Alpine Linux is:

A lightweight security-oriented Linux distribution based on Musl Libc and BusyBox.

In other words, it’s a smaller and more secure Distribution of Linux.

But you shouldn’t take their claims for granted, so let’s see if the mirror image is smaller.

Modify Dockerfile to use node:8-alpine:

FROM node:8 as build
WORKDIR /app
COPY package.json index.js ./
RUN npm install
FROM node:8-alpine
COPY --from=build /app /
EXPOSE 3000
CMD ["npm", "start"]
Copy the code

Build the image using the following command:

$ docker build -t node-alpine .
Copy the code

Now you can check the mirror size:

$docker images | grep node - alpine node - alpine aa1f85f8e724 69.7 MB 69.7 MB!Copy the code

Even smaller than distrless mirrors!

Can we attach to the running container now? Let’s try it.

Let’s start the container first:

$ docker run -p 3000:3000 -ti --rm --init node-alpine
Example app listening on port 3000!
Copy the code

You can attach to a running container using the following command:

$ docker exec -ti 9d8e97e307d7 bash
OCI runtime exec failed: exec failed: container_linux.go:296: starting container process caused "exec: \"bash\": executable file not found in $PATH": unknown
Copy the code

No, but maybe a shell?

$ docker exec -ti 9d8e97e307d7 sh / #
Copy the code

Success! You can now attach to the running container.

It looks promising, but there’s just one problem.

Alpine base images are based on MUSLC, an alternative standard library for THE C language, while most Linux distributions such as Ubuntu, Debian, and CentOS are based on Glibc. The two libraries should implement the same kernel interface.

But their purpose is different:

Glibc is more common and faster;

Muslc uses less space and focuses on security.

When an application is compiled, most of it is compiled for a particular LIBC. If you want to use them with another LIBC, you’ll have to recompile them.

In other words, building containers based on Alpine base images can cause unexpected behavior because the standard C library is different.

You may notice a difference, especially if you’re dealing with pre-compiled binaries such as node.js C++ extensions.

For example, PhantomJS ‘pre-built package doesn’t run on Alpine.

Which base image should you choose?

Should you use Alpine, distroless or raw mirror?

If you are running containers in a production environment and are more concerned with security, distroless mirroring may be more appropriate.

Every binary you add to a Docker image adds a certain amount of risk to the entire application.

Installing only one binary in the container reduces the overall risk.

For example, if an attacker were able to exploit a vulnerability in an application running on distroless, they would not be able to use a shell in the container because there is no shell there!

Note that OWASP itself recommends minimizing attack surfaces.

If smaller mirror volumes are all you care about, consider Alpine based mirrors.

They are very small, but at the cost of poor compatibility. Alpine uses a slightly different standard C library, MusLC. You may run into compatibility issues from time to time.

Raw base images are great for testing and development.

It’s big, but it offers the same experience as an Ubuntu workstation. In addition, you have access to all the binaries of the operating system.

Let’s review the size of each image:

IO /distroless/nodejs 76.7MB node:8-alpine 69.7MBCopy the code

I hope the above content can help you. Many PHPer will encounter some problems and bottlenecks when they are advanced, and they have no sense of direction when writing too many business codes. I have sorted out some information, including but not limited to: Distributed architecture, high scalability, high performance, high concurrency, server performance tuning, TP6, Laravel, YII2, Redis, Swoole, Swoft, Kafka, Mysql optimization, shell scripting, Docker, microservices, Nginx and many other knowledge points can be shared free of charge to everyone, you can join my PHP technology exchange group 953224940

>>> Architect growth path