GitOps is a recent hot trend in software delivery, following and extending trends such as DevOps, Infrastructure as Code, and CI/CD. We have published a number of introductory articles about GitOps. You can get a collection of GitOps articles by replying to “Git Articles” on the Rancherlabs account.

The advantages of GitOps can be summarized as follows:

  • Audit changes freely
  • Continuous integration and delivery
  • Better control of change management

However, the reality is that building a GitOps pipeline is not easy, and it involves many decisions, large and small, that can cause a lot of trouble in implementation. We refer to these decisions as “GitOps architecture,” which can lead to a number of implementation challenges.

The good news is that with some planning and experience, the transition to the GitOps delivery model can be much less painful.

In this article, I’ll explain some of these challenges through the story of one company. The company has grown from a small, scattered start-up adopting GitOps into a well-regulated multinational. While this kind of accelerated growth is rare, it does reflect the experience of many teams in large organizations going from proof of concept, to a minimum viable product, to a mature system.

Simple start

If you’re just getting started, the easiest thing to do is to create a single Git repo and put all the necessary code in it. These may include:

  • Application code
  • Dockerfile, used to build application images
  • Some CI/CD pipelining code (e.g. GitLab CI/CD or GitHub)

The Actions)

  • Terraform to configure the resources needed to run the application

In addition, all changes are made directly to the master, so changes take effect directly.

The main advantages of this approach are that you have a single reference point and that all the code is tightly integrated. If all your developers are fully trusted and speed is everything, this approach will continue to work.

Unfortunately, the downsides of this approach quickly become apparent as your business grows.

First, as more code is added to the code base, the ballooning size of the code base confuses engineers as they encounter more conflicts between changes that must be resolved. If the team grows substantially, the resulting redistribution of work or consolidation can lead to further disruption.

Second, if you need to control pipelining separately, you run into difficulties. Sometimes, you just want to quickly test changes to your code rather than deploy them for full end-to-end delivery. This approach creates an increasing number of issues that need to be addressed in terms of integrity, as these changes can affect the work of others.

Third, as you grow, you may want the boundaries of responsibility between your engineers and your team to become more specific. This can be done with a single REPO, and a REPO is usually a clearer and cleaner border.

The separation of the Repository

As the business grows, the assembly line becomes more and more crowded, and merge becomes painful. In addition, your team also needs to specialize, dividing different areas of responsibility among different members.

So you need to isolate the Repo. At this point, you first face a number of decisions, such as how far should the REPO be detached? Do you need to create a separate REPO for your application code? Doesn’t that seem reasonable? And put the Docker stuff in there as well? So this separation doesn’t really make sense.

What about the Terraform code for all the teams? Should it be in a new REPO? This sounds reasonable, but: the newly created central “platform” team wants to control access to the core IAM (Identity and Access Management) rule definitions in AWS, and the team RDS configuration code is in there, which the development team needs to adjust periodically.

So you decide to split Terraform into two REPOs: a “platform” repO and an “application specific” REPO. This presents another challenge, as you now also need to separate Terraform’s status files. This isn’t an insurmountable problem, but it’s not the fast feature delivery you’re used to, so the product manager will have to explain why feature requests are taking longer than before.

Unfortunately, there are no established best practices or patterns for these GitOps decisions.

The problems of separation do not stop there. Where previously coordination between components built in the pipeline was trivial (because all required components were co-existing), now you have to coordinate the flow of information between the REPO’s. For example, when building a new Docker image, this might require triggering a deployment in the centralized platform REPO and passing the new image name as part of the trigger.

Again, these are not insurmountable problems, but they are easier to meet in the early stages of building a GitOps pipeline. Later, as steps, policies and processes become more mature, it takes more time to make changes.

Distributed vs centralized

Your business is growing, and you’re building more and more applications and services. It’s becoming increasingly clear that you need some kind of structural consistency in how you build and deploy your applications. The central platform team needs to try to enforce these standards. But you may run into opposition from development teams who feel they are being given more autonomy and control in centralized IT than they were before DevOps and GitOps.

If this sounds familiar to you, it’s probably because GitOps is similar to the singleton versus microservices debate in the application architecture space. As you can see in these debates, the tension between distributed and centralized IT arises with increasing frequency as systems mature and grow in size and scope.

In a sense, your GitOps process is just like any other distributed system, and if you don’t design it well, a failure in one part can cause unexpected problems.

The environment

As you decide to separate the REPO, you realize that you need a consistent way to manage different deployment environments. Going live is no longer an option, as it requires the QA team to test the changes before going live.

Now you need to specify different Docker tags for your app images in test and QA environments. You may also want to enable different instance size or copy functions in different environments. How do you manage the configuration of these different environments in the source code? A more straightforward approach is to create a separate Git repository for each environment (e.g., super-app-dev, super-app-qa, super-app-live).

Separating the REPO has the benefit of being “separate”, as we saw when dividing the Terraform code above. However, very few people end up liking this solution, because most teams don’t have the Git knowledge and expertise to migrate changes between different REPOs. To complicate matters further, there is inevitably a lot of duplicate code between repOs, and there can be a lot of drift over time.

If you want to keep things in a single REPO, you have at least three options:

  • Each environment has a directory
  • Every environment has a branch
  • Each environment has a label

Step Selection

If you rely heavily on YAML generation tools or templates, you may want to consider another approach. Kustomize, for example, strongly encourages directory-based environment separation. If you are using raw YAML, the branching or tagging approach works better for you.

Granularity of the runtime environment

However, in your runtime environment, you can choose what level of separation you want. At the cluster level, if you are using Kubernetes, you can choose between the following scenarios:

  • One cluster manages all
  • One cluster per environment
  • One cluster per team

In extreme cases, you can put all your environments in one cluster. Typically, however, most organizations have at least one separate cluster for production.

Once you have your clustering strategy figured out, you still have options at the namespace level:

  • Each environment has a namespace
  • Each application/service has a namespace
  • Each engineer has a namespace
  • Each build has a namespace

Platform teams often start with namespace Settings for “dev”, “test”, “prod”, and then realize that they want to divide the team’s work more finely.

You can also mix and match these options — for example, a “Desk testing” namespace for each engineer, and a namespace for each team.

Total knot

We’ve only outlined some of the decision areas required for a mature GitOps process. If your business does grow into that multinational, you can also consider requirements such as RBAC/IAM.

More often than not, launching GitOps feels like an investment that may not end up producing much satisfaction. But before GitOps, teams often experienced chaos and delays because no one was sure what state anything should be. These all lead to secondary costs, as auditors perform spot checks, and outages due to unexpected and unrecorded changes take up a lot of employee attention, which is a high cost.

However, as your GitOps process matures and multiplies its benefits, it will solve many of the problems that previously existed. But more often than not, the pressure is to show the benefits of the GitOps process more quickly.

The biggest challenge with GitOps right now is that there are no established models or best practices to guide your choices. GitOps consultants often just guide the team to find the solution that works best for them, and guide the team in certain directions based on experience.

But what I have observed is that early choices that seem “too complicated” are often later regretted. This doesn’t mean you should jump to the point of creating a namespace for each build and having a Kubernetes cluster per team for two reasons:

Every time you add complexity to a GitOps architecture, you end up increasing the cost and time to deliver a viable GitOps solution that you may really never need

Until we accept a viable standard in this area, the right GitOps architecture will always be an art, not a science.

Original link:

Blog.container-solutions.com/gitops-deci…

About the Rancher Labs

Rancher Labs was founded by Liang Sheng, the father of CloudStack. Rancher, the flagship product, is an open source enterprise-level Kubernetes management platform, which realizes the centralized deployment and management of Kubernetes cluster in hybrid cloud + local data center. Rancher has always been favored by users for its intuitive and minimalist operating experience. Rancher was named the Global Container Management Platform Leader of 2018 by Forrester and the Coolest cloud infrastructure vendor in the world by Gartner in 2017.

Rancher currently has more than 300 million core image downloads worldwide, And own including China life, huawei, China’s ping an, societe generale, minsheng bank, ping an securities, hna science and technology, xiamen airlines, Shanghai automotive industry corporation, haier, Michelin, Toyota, Honda, China heavy industry, zoomlion, Disney, IBM, Cisco, Nvidia, Pfizer, the world famous companies such as Siemens, China central television (CCTV), China unicom, A total of 40,000 corporate customers.