When we implement a highly scalable microservice architecture, we need to consider a key point: how can our microservice cluster rapidly increase the number of corresponding containers by reaching a sufficient number of containers to cope with the surge of access pressure on the front-end. To expand a large number of back-end service resources in a short time, all aspects of architecture and operation and maintenance need to be considered. By reducing the volume of Docker images, the image pulling time can be shortened with the same network bandwidth, thus shortening the deployment readiness time of the whole container.

Best practices for Writing Dockerfiles and Best Practices – Running your application with Amazon ECS We do a review of docker image slimming skills.

To sum up, we have several ways to reduce the volume of the container:

  • Use suitable small volume base images
  • Reduce the number of layers in the Docker image
  • The image contains only the minimal resources needed to run the program
    • Eliminate unnecessary resources
    • According to the running characteristics of the program language to do the corresponding resource clipping
  • Use multi-stage build techniques

Use suitable small volume base images

One of the easiest and most common strategies to follow is to keep the base image used in the FROM XXX statement as small as possible for performance and stability.

For example, if I have a Java project to run and I need to import an OpenJDK image as the base image, we need to consider which OpenJDK image is the most appropriate base.

Here are a few options that can be retrieved on the Docker Hub OpenJDK.

TAG Compressed Size
openjdk:11-jre-slim 76.39 MB
openjdk:11-jre 117.76 MB
openjdk:11-slim 225.52 MB
openjdk:11 318.71 MB
openjdk:17 231.66 MB
openjdk:17-slim 210.62 MB
openjdk:17-alpine 181.71 MB

It is clear that openJDK: 11-JRE-slim is the best choice when we are running a project that requires JDK11, and openJDK :17-slim is the best choice if our project is JDK17.

Why don’t I recommend itopenjdk:xxx-alpineOption?

This image is based on the popular Alpine Linux project, available in the alpine official image. Alpine Linux is much smaller than most distribution base images (~5MB), and thus leads to much slimmer images in general.

The OpenJDK based on Alpine Linux does have minimal volume for the most part, but not always; Java programs that only need jre do not necessarily need a full JDK environment. Alpine Linux-based images are not necessarily the smallest in the presence of images with jre versions, such as OpenJDK :11- jRE-Slim is the smallest base image in the table above

The OpenJDK port for Alpine is not in a supported release by OpenJDK, since it is not in the mainline code base. It is only available as early access builds of OpenJDK Project Portola. See also this comment. So this image follows what is available from the OpenJDK project’s maintainers.

What this means is that Alpine based images are only released for early access release versions of OpenJDK. Once a particular release becomes a “General-Availability” release, the Alpine version is dropped from the “Supported Tags”; they are still available to pull, but will no longer be updated.

Alpine is not preferred because of the two explanations above:

  • The openJDK for Alpine is not officially supported and is not in the main line of OpenJDK code. It can be tested and verified early before the OPENJDK GA phase. After a certain OPENJDK LTS version GA, subsequent maintenance efforts are minor. So the reliability and performance of Java running is not necessarily strictly tested and guaranteed by the authorities
  • The Alpine OpenJDK was only built and released in the early build phase before the RELEASE of the GA. After the release of the GA, the TAG can only be retained and pulled, but it has not been updated

slimandjreWhat does version mean?

First of all, the JRE version represents a tailoring of the Java runtime environment. The JRE is the basic executable environment of Java. It removes unnecessary development kits from the JDK, thereby saving volume.

Second, the Slim version represents a pruning of Linux compared to the full Linux version. It removes some Linux components that are almost impossible to use in container operations, such as curl shell commands that may not be preinstalled because of pruning

In conclusion, it is best to choose the base image of jre and slim suffix, which is tailored to both Linux and JDK to minimize the image size without affecting the performance of Java programs

What’s the value of a normal mirror image?

Openjdk without jre, Slim, and alpine I’m going to go plain, it has a full JDK and a full Linux environment, and in most cases is our most mindless choice. If you don’t know what to choose, it’s the right choice. Of course, the image is much larger, and it has the advantage of being comprehensive. For example, there are complete Linux package management and shell tools that allow you to perform detailed operations. For example, you can use curl to perform health checks. If you use the Slim version of curl, you may fail the health check because of the lack of curl. This gives you both the slim’s small size and the detail you need.

Reduce the number of layers in the Docker image

A Docker image is made up of many Layers (up to 127 Layers). Each designation in a Dockerfile creates an image layer, but only RUN, COPY, and ADD will increase the size of the image.

Only the instructions RUN, COPY, ADD create layers. Other instructions create temporary intermediate images, and do not increase the size of the build.

So when we use the three instructions described above, we need to take special care to combine them into one shell statement, rather than one line per statement

For example, the following usage

. RUN apt update -y RUN apt install curl -y ...Copy the code

So we can merge with theta

RUN apt update -y && apt install curl -y
Copy the code

This reduces one layer in theory. In addition, for the above two instructions, merge can also get to avoid unexpected problems

Possible problems best-practices/#run

Using apt-get update alone in a RUN statement causes caching issues and subsequent apt-get install instructions fail

Docker sees the initial and modified instructions as identical and reuses the cache from previous steps. As a result the apt-get update is not executed because the build uses the cached version. Because the apt-get update is not run, your build can potentially get an outdated version of the curl and nginx packages.

So merging orders is a good way to do it

The image contains only the minimal resources needed to run the program

When we use Dockerfile to build the Docker image, some intermediate files may be generated or irrelevant files may be accidentally introduced due to the consideration step, which also needs to be considered

The first is to eliminate unnecessary resources

Our program only needs the resources it needs, all irrelevant files, such as some images, documents, etc., we can use.dockerignore. It’s the same principle as.gitignore

According to the running characteristics of the program language to do the corresponding resource clipping

Sometimes we need to install some Linux tools while building, such as curl for some slim versions of Linux to perform an internal health check. This is when we run apt/ DNF package management software, we may inadvertently generate a large amount of temporary cache, which can easily be unintentionally packaged into the image. So I suggest using the combined instruction merge technique plus rm -rf /var/lib/apt/lists/* to immediately clear the cache after the software is installed. RUN apt update -y && apt install -y curl && rm -rf /var/lib/apt/lists/*

For example, Golang is a pure compiled language. After the program is compiled, it only needs the Basic Linux environment to run

Some people might write FROM Golang :1.17 and think it’s the perfect environment for Golang to run, but it’s not. Golang builds binary executables that only need Linux. Golang can be compiled FROM Debian: Bullseye-Slim, which is much smaller and does not affect performance at all.

There are also interpreted languages like NodeJS and Python3. We need to execute the package manager to download the dependency packages before the program can run, but in fact, the program only needs the program itself and the dependency packages to run. The cache and temporary files generated by the package manager and the package manager are not needed. We can also use a phased build technique (described later) to remove these run-irrelevant files in subsequent phases, even if they are so important early in the build.

In addition, an interpreted language may come with debugging and additional tools that are useful during development but not necessary in a formal environment. It is also best to be careful to install only production-level dependencies, excluding ancillary development components.

In the case of building a Docker image for production we want to ensure that we only install production dependencies in a deterministic way, and this brings us to the following recommendation for the best practice for installing npm dependencies in a container image: RUN npm ci --only=production

10 Best practices to Containerize Node.js Web Applications with Docker

Use multi-stage build techniques

Multi-stage builds are a relatively complex technology in our build process, but they are also extremely practical. For example, nodeJS and other interpreted languages mentioned above can make full use of this technology to slim down. Of course, compiled languages such as Java and Go can also benefit from this technique

Its core idea is: the construction of Docker image is divided into multiple stages, the next stage depends on the output of the previous stage, and the output of the previous stage is used as input to exclude discarded files that cannot be excluded in the construction process of the previous stage

There are usually two rough stages of compile/dependency download -> actual image build

Take the official example

For golang build, the first phase uses the Golang base image for dependent download and compilation, and the second phase only copies the binaries generated in the previous phase and uses a more streamlined Debian Scratch image as the actual runtime environment

# syntax=docker/dockerfile:1 FROM golang:1.16-alpine AS build # Install tools required for project # Run 'docker build --no-cache .` to update dependencies RUN apk add --no-cache git RUN go get github.com/golang/dep/cmd/dep # List project dependencies with Gopkg.toml and Gopkg.lock # These layers are only re-built when Gopkg files are updated COPY Gopkg.lock Gopkg.toml /go/src/project/ WORKDIR /go/src/project/ # Install library dependencies RUN dep ensure -vendor-only # Copy the entire project and build it # This layer is rebuilt when a file changes in the project directory  COPY . /go/src/project/ RUN go build -o /bin/project # This results in a single layer image FROM scratch COPY --from=build /bin/project /bin/project ENTRYPOINT ["/bin/project"] CMD ["--help"]Copy the code

Nodejs builds, the first step downloads the dependencies, and the second step copies only the dependencies from the previous stage, excluding the previous stage’s NPM and intermediate files during the NPM run, while using a more streamlined Slim image to provide the actual run environment

FROM node:14 AS build WORKDIR /srv ADD package.json . RUN npm install FROM node:14-slim COPY --from=build /srv . ADD . .  EXPOSE 3000 CMD ["node", "index.js"]Copy the code

From these two examples, it is clear that multi-phase builds can be a good way to slim down both compiled and interpreted languages.