Introduction: Container technology is indispensable for software construction and deployment in the cloud native era. When it comes to containers, almost everyone subconsciously thinks of Docker. There are two very important concepts in Docker, one is Image and the other is Container. The former is a static view that packages the application’s directory structure, runtime environment, and so on. The latter is a dynamic view (process), showing the running status of the program (CPU, memory, storage) and other information. The following articles will focus on how to write tips to make the Dockerfile build process faster and build images smaller.

Hello, everyone. My name is Chen Zefeng. I am in charge of Flow assembly line arrangement and task scheduling engine in Cloud Efficiency. Under the cloud effect product system, we have served all kinds of r&d scale, technical depth of enterprise users, received a lot of user feedback. For users using Flow cloud build, build speed is generally concerned about the key elements, in the process of in-depth analysis of the user stories, we found many common problems, only need to modify the optimize configuration of your project or program, can greatly improve the performance of the building, so as to further accelerate the efficiency of CICD. Today we will start with container image construction as a starting point to summarize some optimization techniques that are very useful in real engineering.

The construction and deployment of software in the cloud native era are inseparable from container technology. When it comes to containers, almost everyone subconsciously thinks of Docker. There are two very important concepts in Docker, one is Image and the other is Container. The former is a static view that packages the application’s directory structure, runtime environment, and so on. The latter is a dynamic view (process), showing the running status of the program (CPU, memory, storage) and other information. The following articles will focus on how to write tips to make the Dockerfile build process faster and build images smaller.

Image definition

First let’s take a look at the Docker image, which consists of multiple read-only layers stacked on top of each other, each layer being an incremental change to the previous layer. When a new container is created based on an image, a new writable layer is added at the top of the base layer. This layer is often referred to as the “container layer.” The following figure shows the container view of an application image based on the Docker. IO /centos image.

From the figure, we can see the process of image building and container starting.

  • First, pull the base image docker. IO /centos;
  • Docker. IO /centos/yum update/docker commit commit A new read-only layer v1.
  • Start A new container based on the temporary image A, run the installation and configuration of HTTP server and other software, submit A new read-only layer V2, and also generate the image version B which is finally referenced by the developers here;
  • For containers running on version B, an additional read/write layer is added (creating, modifying, and deleting files on containers takes effect on this layer).

Image source

Image is mainly generated by Docker by reading and running Dockerfile instructions. Dockerfile:

FROM Ubuntu :18.04 COPY. /app RUN make /app CMD python /app/appCopy the code

Its core logic is to define the base image of reference, such as COPY command to COPY files from context to container, RUN RUN to execute user-defined build script, and finally define CMD or ENTRYPOINT to start the container. Building more efficient mirrors should also be optimized around the concepts covered above.

Dockerfile optimization techniques

Use a domestic base image

As a product built on the cloud, Flow provides users with a new construction environment for each construction to avoid excessive operation and maintenance costs caused by environmental pollution. Because of this, every build of Flow redownloads the underlying image specified in the Dockerfile.

If the base image is specified in the Dockerfile as originating from the Docker Hub, the download may be slow due to network delay, such as:

  • From Nginx
  • From java:8
  • FROM openjdk:8-jdk-alpine

Typical phenomena are as follows:

You can save your own basic image file to the domestic image warehouse and modify your own Dockerfile file by following the steps:

  1. Pull the foreign mirror to the local. Docker pull its: 8 – JDK – alpine;

  2. Push the base image to a domestic region (such as Beijing, Shanghai, etc.) in the Alicloud image repository (cr.console.aliyun.com). docker tag openjdk:8-jdk-alpine registry.cn-beijing.aliyuncs.com/yournamespace/openjdk:8-jdk-alpinedocker push registry.cn-beijing.aliyuncs.com/yournamespace/openjdk:8-jdk-alpi;

  3. Modify FROM in your dockerfile to download the image FROM your own repository. The From registry.cn-beijing.aliyuncs.com/yournamespace/openjdk:8-jdk-alpine;

As small as possible, sufficient base image

In addition to occupying more disk space, a large image consumes more network resources during application deployment and takes longer to start services. Use a smaller base image, such as Alpine as the base image. Here we look at an image packed with mysql-Client binaries, based on alpine and Ubuntu image size comparison.

FROM alpine:3.14
RUN apk add --no-cache mysql-client
ENTRYPOINT ["mysql"]
Copy the code

FROM Ubuntu :20.04 RUN apt-get update \ && apt-get install -y --no-install-recommends mysql-client \ && rm -rf /var/lib/apt/lists/* ENTRYPOINT ["mysql"]Copy the code

It can be seen that using a base image as small as possible can greatly reduce the size of the image.

Reduce context-dependent directory files

Docker is an architectural design of C/S. When users execute docker builds, they do not build directly on the client, but pass the directory specified by build to the server as the context, and then perform the image construction process mentioned above. If a large number of unnecessary files are associated in the context of an image build, you can use.dockerIgnore to ignore these files (similar to.gitignore, defined files are not tracked and transferred).

Context = context = context = context = context = context = context

mkdir myproject && cd myproject echo "hello" > hello echo -e "FROM busybox\nCOPY / /\nRUN cat /hello" > Dockerfile docker build -t helloapp:v1 --progress=plain . #7 [internal] load build context #7 Sha256:6 b998f8faef17a6686d03380d6b9a60a4b5abca988ea7ea8341adfae112ebaec # 7 transferring context: 26 b done # 7 done 0.0 sCopy the code

When we placed a large file under MyProject that was not related to the application (or a small file that was not related to the application build dependencies, etc.), we rebuilt HelloApp :v3 and found that we needed to transfer 70 MB of content to the server and the image size was 71MB.

#5 [internal] load build context #5 sha256:746b8f3c5fdd5aa11b2e2dad6636627b0ed8d710fe07470735ae58682825811f #5 Transferring the context: 70.20 MB 1.0 s done # 5 done 1.1 sCopy the code

Reduce the number of layers and control the size of layers

If you equate the simplicity of image building to the execution of script commands like bash, you tend to overstep the mirror layer, which contains pits of useless files. Let’s look at how the three dockerfiles are written and the size of their images.

  • Centos_git_nginx :normal image: centos_git_nginx:normal image: centos_git_nginx: Normal image

    FROM centos RUN yum install -y git RUN yum install -y nginx

  • The image size is reduced to 384 MB, proving that the reduction of layers can reduce the image size.

    FROM centos RUN yum install -y git && yum install -y nginx

  • Since the Yum Install process generates some cache data that is not needed during the running of the application, we removed it immediately after installing the software and reduced the view image again to 357 MB.

    FROM centos RUN yum install -y git &&

    yum install -y nginx &&

    yum clean all && rm -rf /var/cache/yum/*

TIPS: We know that each layer in the mirror build process is read-only and cannot be modified. The following method does not reduce the size of the mirror, but adds a layer of useless mirror.

FROM centos
RUN yum install -y git && \
    yum install -y nginx
RUN yum clean all && rm -rf /var/cache/yum/*
Copy the code

It is important to note that it is not always a good idea to pursue too few layers, as this will reduce the probability of layers being cached when building or pulling mirrors.

Put the immutable layer in front and the mutable layer in back

When we execute docker build several times in the same time, we can find that docker will directly reuse the image data in the cache after building an image again.

Docker actually steps through the instructions in the Dockerfile, executing each instruction in the specified order. As each instruction is examined, Docker looks for existing images in its cache that can be reused. Starting with the parent image that already exists in the cache, Docker compares the next instruction to all child images derived from that base image to see if any of them were generated using the exact same instruction. Otherwise, the cache is invalid.

For example, we can put simple, often relied on basic software such as Git, make and other frequently used instructions in front of the execution, so that the process layer of image construction can directly use the previously generated cache, rather than repeatedly download software, which is a waste of bandwidth and time.

Here we compare the two methods. First we initialize the related directories and files:

 mkdir myproject && cd myproject
 echo "hello" > hello
Copy the code
  • The first method of writing dockerfile is to COPY the file first and then RUN the installation software operation.

    FROM Ubuntu :18.04 COPY /hello/RUN apt-get update –fix-missing && apt-get install -y aufs-tools automake Build -essential curl DKG -sig libcap-dev libsqlite3-dev Mercurial reprepro ruby1.9.1&& rm -rf /var/lib/apt/lists/*

Docker build -t cache_test -f Dockerfile. If the image is built successfully and executed for several times, it can be found that subsequent builds directly match the cache to generate an image.

Time docker build -t cache_test -f Dockerfile. [+] Building 59.8s (8/8) FINISHED => [internal] Load Build definition From Dockerfile 2.0s => => dokerfile: 35B 0.9s => [internal] load.dockerignore 0.9s => => transferring-transferring-context: 2 b 0.0 s = > (internal) load metadata for docker. IO/library/ubuntu: 18.04 0.0 s = > (internal) load build context 0.0 s = > = > transferring context: 26 b 0.0 s = > [1/3] FROM docker. IO/library/ubuntu: 18.04 0.0 s = > CACHED (2/3) COPY/hello / 0.0 s = > [3/3] RUN apt - a get Update && apt-get install -y aufs-tools automake build -sig && rm -rf /var/lib/apt/lists/* 58.3s => Exporting to image 1.3s => => Exporting layers 1.3s => => Writing image Sha256:5922 b062e65455c75a74c94273ab6cb855f3730c6e458ef911b8ba2ddd1ede18 0.0 s = > = > naming the to IO /library/cache_test 0.0s docker build -t cache_test -f Dockerfile. 0.33s user 0.31s system 1% CPU 1:00.37 Total time docker build -t cache_test -f Dockerfile. Docker build -t cache_test -f Dockerfile. 0.12s user 0.08s system 0.558 the total 34% CPUCopy the code

Echo “world” >> hello and run time docker build -t cache_test -f Dockerfile. The build time is about 1 minute.

  • The second version of the dockerfile looks like this, where we put the basic software installation that is basically unchanged on top and the Hello file that may change on the bottom.

    FROM Ubuntu :18.04 RUN apt-get update && apt-get install -y aufs-tools automake build-essential curl dppg-sig && rm -rf /var/lib/apt/lists/* COPY /hello /

Docker build -t cache_test -f Dockerfile. The first image construction takes about 1 minute. (If the image is successfully constructed, the image is generated by hitting the cache for multiple times.)

Run the time docker build -t cache_test -f Dockerfile. In this case, the image building takes less than 1s, that is, the cache layer built at layer 2 is successfully reused.

Use multiple phases to separate build and Runtime

To take an example of golang, start with the example library github.com/golang/exam… Clone to the local PC and add a dockerfile to build the application image.

The FROM golang: 1.17.6 ADD/go/src/github.com/golang/example WORKDIR/go/src/github.com/golang/example RUN go build - o /go/src/github.com/golang/example/hello /go/src/github.com/golang/example/hello/hello.go ENTRYPOINT ["/go/src/github.com/golang/example/hello"]Copy the code

We can see that the size of the image is 943 MB, and the program normally prints Hello, Go examples!

Next, let’s optimize the above process with a multi-phase build and as small a runtime as possible.

The FROM golang: 1.17.6 AS BUILDER. The ADD/go/src/github.com/golang/example RUN go build - o / go/src/github.com/golang/example/hello go/src/github.com/golang/example/hello/hello.go FROM golang: 1.17.6 - alpine WORKDIR /go/src/github.com/golang/example COPY --from=BUILDER /go/src/github.com/golang/example/hello /go/src/github.com/golang/example/hello ENTRYPOINT ["/go/src/github.com/golang/example/hello"]Copy the code

You can see that the current mirror size is only 317 MB. The multi-phase build decouples application build and runtime dependencies, with only run-time dependent software eventually being bundled into the application image.

The original link

This article is the original content of Aliyun and shall not be reproduced without permission.