The process of containerization always involves building images, and a smaller image can save disk space on the machine and improve transmission efficiency. This article mainly wants to describe the measures I take when optimizing the mirror volume. Of course, not all schemes have obvious effects on reducing the mirror volume, and specific projects need to be analyzed. In this article, I use a mirror build of a Rails project as an example.

Why is the image so large?

Before we optimize the image size, we need to know why our image is so large in the first place. Here is the Dockerfile file used to build the image in my project

FROM Ruby :2.5.3 RUN apt-get update -y && apt-get install -y \ build-essential \ imagemagick \ default-libmysqlclient-dev  RUN apt-get install -y \ nodejs \ yarn RUN rm -rf /var/lib/apt/lists/* WORKDIR /beansmile-web COPY . /beansmile-web RUN  bundle installCopy the code

The image file, which I have defined arbitrarily, produces the following image information

Web1 Latest 1a8a32d5253a 9 Hours ago 1.26GBCopy the code

The process of building an image is similar to that of building the infrastructure for a project based on an operating system. However, the daily operating system usually has more than one project running, so the system contains more comprehensive things. Mirrors are only expected to be used by specific projects, so rely on things that are specific and don’t need to be added.

For the above Dockerfile FILE I think there are several optimization directions

  1. Basic Ruby :2.5.3 is built based on Buildpack-deps which includes a lot of additional packages, perhaps with a lighter image as the base image to further reduce space.
  2. There are too many RUN commands in the file, and each command will add several layers, which may cause the file size to increase.
  3. It is not a good idea to copy the entire project directory into the image. As the project grows, static resources such as images may exist in the public directory of a Rails project, and it does not make sense to pack them into the image.
  4. Is it possible to take a multi-stage build and throw away some of the less critical resources to make the final image smaller?

Let’s do it one by one.

A concrete analysis

Direction 1: Based on smaller operating systems

The previous example ended up with a very large mirror, largely due to the relative size of the underlying mirror.

REPOSITORY TAG IMAGE ID CREATED SIZE Ruby 2.5.3 60c3a1518797 3 weeks ago 871MB web1 latest 1a8a32d5253a 9 hours ago 1.26 GBCopy the code

As you can see, our base Ruby image itself is over 800 MEgabytes, and the process of building the image requires the installation of dependencies, resulting in a final Web image size of 1.26 GIGABytes. This size is not conducive to network transmission. There are many versions of Ruby base images available. In addition to the version of Ruby itself, there are many base images built on different operating systems to choose from, and the size of the base images built by these different operating systems is very different

REPOSITORY TAG IMAGE ID CREATED SIZE Ruby 2.5.3- Slim-stretch 20132a4ab93D 2 weeks ago 129MB Ruby 2.5.3 60c3a1518797 3 Weeks ago 871MB ruby 2.5.3-alpine b3361f13ff1f 3 weeks ago 49 1mbCopy the code

The Alpine OS based Ruby image is the smallest at 43.6MB. Slim-stretch is also a good option. Perhaps a more lightweight mirror would be an opportunity for optimization.

Tip: Based on my experience, slim-Stretch may be a more user-friendly choice. It is a Debian system, and the package manager is apt-get, which is the same as Ubuntu. People who are used to Ubuntu will definitely feel friendly. Alpine uses apK as its package manager. Some commonly used packages have different names that need to be solved slowly. *

Either way, it’s time consuming, and there aren’t many ready-made solutions online, so you’ll have to install some of the software you rely on during the build process.

Direction 2: Reduce the number of mirrored layers

According to the official website of Docker, mirroring is composed of layers of read-only layers. The fewer layers, the better the performance of mirroring. This is why it is recommended that we build our own project images with a specific base image rather than a bare-bones operating system image such as Ubuntu.

In the example above, we used three RUN commands, which inadvertently built two more layers. We could have combined them into a single RUN command

RUN apt-get update -y && apt-get install -y \ build-essential \ imagemagick \ default-libmysqlclient-dev \ nodejs \ yarn  \ && rm -rf /var/lib/apt/lists/*Copy the code

Create a new image of Web2 based on this change

REPOSITORY TAG IMAGE ID CREATED SIZE Web2 latest 221a316a6903 14 minutes ago 1.25GB Web1 latest 1a8a32d5253a 9 hours ago 1.26 GBCopy the code

It can be seen that the effect of this change on reducing the mirror volume is not obvious

The official line is this

In older versions of Docker, it was important that you minimized the number of layers in your images to ensure they were performant.

We can conclude that perhaps the main reason for reducing the number of layers is to make the mirror more efficient. The optimization direction of reducing the number of layers does not help to reduce the size of the mirror, but it is good to do so.

Direction 3: Ignore some files

As you can see from the configuration above, I moved the entire project directly to the mirror to facilitate the build of the image (COPY command). However, not all files should be of concern to the image being built, and only the source code should be of most concern. So I expect the following directories to be removed from the build image

  • Public /: A directory used to store static files, which can affect the size of the image if it contains a large number of resources such as images.
  • TMP /: used to store cache resources, project process files, etc., which are not useful for the image.
  • Log /: Stores log information.

PS: Of course, everyone’s consideration of the actual project will be different. These lists are only made according to my personal project situation, and are not universal.

To ignore these files, we’ll use a file called.dockerignore, just put it in the current directory, it’s written like the.gitignore file, and it looks something like this

/public/**
/tmp/**
/log/ * *Copy the code

Then rebuild the image

Web3 latest fb13CC1301B2 About a minute ago 1.2GB web2 latest 221a316a6903 23 hours ago 1.25GB web1 latest 1a8a32d5253a 33 hours line 1.26 GBCopy the code

This approach doesn’t have much of an impact either, because currently my local directories contain a small proportion of “junk” resources.

Direction 4: Multi-stage scheme

This is the official recommended solution, available after Docker17.05

In Docker 17.05 and higher, you can do multi-stage builds and only copy the artifacts you need into the final image. This allows you to include tools and debug information in your intermediate build stages without increasing the size of the final image.

It seems a bit complicated, but its principle is to use a larger volume, rely on a complete image to build the required resources, then copy these resources to the base image of a light weight, and continue our mirror building work, so that you can give abandoned the original large base image. This prevents our final image from containing a bunch of useless dependencies and reduces the size of the final image to some extent.

This seemed like a good strategy, and I tried it on my project. We decided to place the installation of the bundle dependencies and the compilation of static files into a fully functional base image, then copy the required resources into a lightweight base image (like Alpine’s for lightweight systems) and continue the build process.

However, I encountered the following problems during the build process

  • Installing dependencies using bundles involves not only importing Ruby code, but also third-party libraries such as mysql2 and Nokogiri that compile ruby code at installation time and generate shared libraries. Copying dependent resources from one image to another requires copying the shared libraries that these third-party libraries depend on, in addition to the Ruby code in the bundle’s directory, which is more difficult than expected.
  • We expect static files to be built in an image, so we don’t need to install NodeJS, yarn, and other dependencies to build static resources in the final image. But later, I felt that this scheme was not very suitable. On the one hand, the difference between the image with nodeJS installed and that without nodeJS installed is about 30M, on the other hand, to runbin/rails cRelying on the JS runtime is an important operation for both development and production, so it is not a good idea to discard the JS runtime in the final image.

The final build

Four directions of optimization were mentioned earlier, but it seems to end up with only one

  • Build with a more lightweight base image of the operating system.
  • Multi – stage.

It has a great influence on the final mirror volume. Considering that the multi-stage solution would probably be more of a hassle than a benefit, I decided to abandon it and instead go straight to the bare-skeleton Ruby :2.5.3- Alpine as the base mirror to build my own project mirror. The biggest problem with choosing a lean operating system is that in the process of building a project image, all the basic dependencies have to be solved one by one, which requires a lot of time and effort. The following is the Dockerfile file I obtained after repeatedly testing (for reference only, after all, your project may depend on different things).

FROM Ruby :2.5.3-alpine RUN apk --update --upgrade add \# bundle installs dependencies
        git \
        curl \
        # mysql2 rely on
        mysql-dev \
        # Infrastructure, such as gCC-related stuff
        build-base \
        # nokogiri related dependencies
        libxslt-dev \
        libxml2-dev \
        # Dependency on image processing
        imagemagick \
        # tz-dependent, error if bundle is not present
        tzdata \
        nodejs \
        yarn \
        && rm -rf /var/cache/apk/*

WORKDIR /beansmile-web
COPY . /beansmile-web/
RUN bundle install
Copy the code

The constructed image is shown below

web4                latest               71b75128d0d9        14 hours ago         586MB
Copy the code

It’s a huge reduction in volume compared to the previous mirror image. This is a size we can accept, considering the time cost will not be further compressed.

conclusion

This article is a brief summary of my own exploration of reducing Rails project mirroring. In order to reduce the size of the mirror, four main optimization directions are proposed. It is very effective to reduce the size of the mirror by building the mirror with a mini operating system. However, different types of projects based on different languages may have different emphases, so it is impossible to generalize. Multi-stage may save you more time in some projects.