First posted on Jenkins Chinese community

It’s a story about Dailymotion’s journey from Jenkins to Jenkins X, the problems we had, and how we solved them.

Our context

At Dailymotion, we believe strongly in Devops best practices and have invested heavily in Kubernetes. Some of our products are already deployed on Kubernetes, but not all of them. So when it came time to migrate our AD tech platform, we wanted to be completely “Kubernetes” — or cloud native — to follow technology trends! This means redefining the entire CI/CD pipeline from a static/permanent environment to a dynamic on-demand environment. Our goal is to empower our developers, reduce our uptime and reduce our operating costs.

Our initial requirements for the new CI/CD platform were:

  • Avoid starting from scratch whenever possible: Our developers are used to Jenkins and declarative pipelining, and they fit our current needs well.
  • Target public cloud infrastructure — Google Cloud Platform and Kubernetes cluster
  • Compatible with the Gitops methodology — because we love version control, peer review, and automation

There are quite a few players in the CI/CD ecosystem, but only one fits our needs, Jenkins X, which is based on Jenkins and Kubernetes, native support preview environment and Gitops

Jenkins on Kubernetes

Jenkins X is fairly simple to set up and is well documented on their official website. Since we already used Google Kubernetes Engine (GKE), the JX command line tool created everything itself, including the Kubernetes cluster. There’s a little wow effect here, getting a complete working system in just a few minutes is pretty impressive.

Jenkins X provides a lot of quick starts and templates to add wow effect, however, at Dailymotion we already have a repository with Jenkins pipeline and we want to reuse them. We decided to do things the “hard way” and refactor our declarative pipelines to make them compatible with Jenkins X.

In fact, this section is not specific to Jenkins X, but to run Jenkins on Kubernetes based on the Kubernetes plugin. If you’re used to using “classic” Jenkins, running static agents on bare metal or virtual machines, the major change here is that each build will be performed on its own ephemeral POD. Each step of the pipeline can specify on which container of the POD it should be executed. There are some examples of pipelining in the plug-in source code. Our “challenge” here is to define the granularity of the containers and what tools they will contain: enough containers so that we can reuse their images across pipelines, but not so much that we can control the amount of maintenance — we don’t want to spend time rebuilding container images.

Previously, we used to run most pipeline steps in a Docker container, and when we needed to customize a step, we built it in the running pipeline, just before running it. It’s slow, but easy to maintain because everything is defined in source code. For example, upgrading a version of the Go runtime can be done in a pull-request. Therefore, pre-building container images sounds like adding more complexity to an existing setup. It also has several advantages: less duplication between warehouses, faster builds, and fewer build errors due to the failure of some third-party hosting platforms.

Build the image on Kubernetes

These days will bring us an interesting topic: building container images in Kubernetes clusters.

Jenkins X comes with a set of “build packages” that use “Docker in Docker” to build images from inside containers. But with the advent of the new container runtime, Kubernetes introduced its Container Runtime Interface (CRI), and we wanted to explore other options. Kaniko is the most mature solution that fits our needs/technology stack. We’re excited…

… Until we ran into two problems:

  • The first problem was a blocking problem for us: multi-phase builds didn’t work. Thanks to Google, we soon found out that we weren’t the only ones affected, and there’s no fix yet. However, Kaniko developed with Go, and we are Go developers, so… Why not look at the source code? It turns out that once we understand the root cause of the problem, fixing it is easy. The Kaniko maintainer was willing to help and quickly incorporated the fixes so that a day later a fixed Kaniko image was available.
  • The second problem is that we cannot use the same Kaniko container to build two different images. This is because Jenkins didn’t use Kaniko the way he expected — because we needed to start the container first and then run the build. This time, we found a solution for Google: declare that we need as many Kaniko containers as possible to build the image, but we don’t like it. So going back to the source code, again, once we understand the root cause, the fix is easy.

We tested several solutions to build custom “tool” images for the CI pipeline, and in the end, we chose to use a single repository with a Dockerfile and image for each branch. Because we hosted the source code on Github and used the Jenkins Github plugin to build our repository, it could build all of our branches and create new tasks for new branches on Webhook events, which made it easy to manage. Each branch has its own Jenkinsfile declarative pipeline, which uses Kaniko to build images and push them into our container registry. This is very useful for quickly adding new images or editing existing ones, because we know Jenkins will take care of everything.

State the importance of required resources

One of the major problems we encountered in the previous Jenkins platform came from static agents/actuators and sometimes long build queues during peak times. Jenkins on top of Kubernetes makes this problem easy to solve, mainly when running on a Kubernetes cluster, it supports automatic cluster scaling. The cluster simply adds or removes nodes based on the current load. But this is based on the resources requested, not the resources observed to be used. This means that it is our job as developers to define the required resources (CPU and memory) in building the POD template. The Kubernetes scheduler will then use this information to find a matching node to run the POD — or it may decide to create a new node. This is great because we no longer have long build queues. But instead, we need to be careful to define the right amount of resources we need and update them as we update the pipeline. Because resources are defined at the container level, not at the POD level, processing is a bit complicated. But we don’t care about restrictions, we care about requests. The POD request is simply the sum of all container requests. Therefore, we simply write the resource request for the entire POD on the first container — or JNLP container — which is the default container. Here is an example of Jenkinsfile we use and how we declare the requested resource:

pipeline {
    agent {
        kubernetes {
            label 'xxx-builder'
            yaml """ kind: Pod metadata: name: xxx-builder spec: containers: - name: jnlpCopy the code

Preview environment on Jenkins X

Now that we have all the tools and are able to build an image for our application, we are ready to take the next step: deploy it to a “preview environment”!

Jenkins X makes it easy to deploy preview environments by reusing existing tools — mainly Helm — as long as you follow conventions such as the name of the value used for the image tag. It is best to copy/paste from Helm Charts provided in the “package”. If you’re not familiar with Helm, it’s basically a Kubernetes application package manager. Each application is packaged as a “chart”, which can then be deployed as a “release” using the HELM command line tool. The preview environment is deployed using the JX command-line tool, which is responsible for deploying Helm Chart and adding the URL of the exposed service to the Github pull-Request in the form of comments. This is all very well, and it works well for our first POC that uses pure HTTP. But it’s 2018 and no one uses HTTP anymore. Let’s encrypt it! Thanks to cert-Manager, when creating the ingress resource in Kubernetes, we can automatically get an SSL certificate for our new domain name. We tried to enable the TLS-ACme flag — bound to cert-Manager — in our setup, but it didn’t work. This gave us a chance to look at the source code for Jenkins X — it was also developed with Go. After a few fixes, you can now enjoy a secure preview environment using the automated certificates provided by Let’s Encrypt.

Another problem we encountered in the preview environment had to do with cleaning up the above environment. Each pull-request opened creates a preview environment, so the preview environment should be removed when pull-Request is merged or closed. This is handled by the Kubernetes task set up by Jenkins X, which removes the namespace used by the preview environment. The problem is that this task does not remove Helm Release — so, for example, if you run Helm List, you will still see a large list of old preview environments. In response to this problem, we decided to change the way we deployed the preview environment using Helm. The Jenkins X team had already written about these issues with Helm and Tiller (Helm’s server-side component), so we decided to use the helmTemplate feature flag and just use Helm as the template rendering engine, And use Kubectl to process the generated resources. This way, we won’t “pollute” the Helm Releases list with a temporary preview environment.

Gitops applied to Jenkins X

At some stage during the initial POC, we were happy with our setup and pipeline and wanted to transform our POC platform into a quasi-production platform. The first step is to install the SAML plug-in to set up OKTA integration — to allow internal users to log in. It worked fine, and after a few days, I noticed that our OKTA integration no longer existed. I was busy doing other things, so I just asked my colleague if he had made any changes and moved on to other things. But when it happened again a few days later, I started investigating. The first thing I noticed was that Jenkins Pod was recently rebooted. But we have a persistent store, and our tasks are still there, so it’s time to take a closer look! It turns out that the Helm Chart used to install Jenkins has a startup script that resets the Jenkins configuration from Kubernetes Configmap. Of course, we can’t manage Jenkins running in Kubernetes the same way we manage Jenkins on a virtual machine!

So instead of manually editing ConfigMap, we took a step back and looked globally. The ConfigMap itself is managed by Jenkins-X-Platform, so upgrading the platform will reset our custom changes. We need to store our “customizations” in a secure place and track our changes. We could install/configure everything the Jenkins X way using an Umbrella Chart, but there are some disadvantages to this method: It doesn’t support “secret” — we store some sensitive values in our Git repository — it “hides” all the sub-charts. So, if we list all the Helm Releases installed, we’ll only see one. But there are other Helm-based tools that are even more GitOPs-friendly. One of these is Helmfile, which provides native support for Secrets through the Helm Secrets plug-in and SOPS. I won’t go into the details of our setup right now, but don’t worry, it will be the subject of my next blog post!

The migration

Another interesting part of our story is the actual migration from Jenkins to Jenkins X. And how we use the two build systems to process the warehouse. First, we set up the new Jenkins to build only the “Jenkinsx” branch, and updated the old Jenkins configuration to build all branches except the “Jenkinsx” branch. We plan to prepare the new pipeline in the “JenkinsX” branch and merge it for migration. It worked fine for our initial POC, but when we started using the preview environment, we had to create new PR’s that weren’t built based on the new Jenkins because of branching restrictions. Therefore, we chose to build everything on these two Jenkins instances, but use the Jenkinsfile file name for the old Jenkins and the Jenkinsxfile file name for the new Jenkins. After the migration, we will update this configuration and rename the files, but it is worth it because it allows us to have a smooth transition between the two systems, and each project can migrate on its own without affecting the others.

Our destination

So, is Jenkins X ready for you? To be honest, I don’t think so. Not all features and supported platforms — Git hosting or Kubernetes hosting — are stable enough. However, if you are prepared to invest enough time in in-depth research and choose stable features and platforms that fit your usage scenarios, you will be able to improve your pipeline with everything you need, such as CI/CD. This will shorten your uptime, reduce your costs, and if you’re serious about testing, be confident in the quality of your software.

At the beginning, we said this is our journey from Jenkins to Jenkins X. But our journey is not over. We are still travelling. Part of the reason is that our goals are still moving: Jenkins X is still in the big stages of development, and it itself is heading in the direction of Serverless, and is currently on its way to being built with Knative. Its destination is Cloud native Jenkins. It’s not ready yet, but you can already preview what it will look like.

Our journey will continue because we don’t want it to end. Our current destination is not our final destination, but a step in our evolution. That’s why we like Jenkins X: because it follows the same pattern. So what are you waiting for to begin your own journey?

The translator has made some contributions to the Chinese localization of the Jenkins X documentation, and we hope that more people will participate in the Jenkins Chinese community during the Jenkins X journey to improve the Jenkins X Documentation in Chinese.

Translator: Wang Donghui