Some practical experience of automated operation and maintenance in small teams

Note: This article requires some knowledge of Ansible and Jenkins.

Happy families are all alike. Every unhappy family is unhappy in its own way

The automation operation and maintenance architecture of the industry’s major players has a variety of cool functions, as shown in the following figure, which is unattainable. We all know what the end result will look like, but the question is how do you evolve towards that goal based on your team’s current situation?

My team, three and a half developers, maintains dozens of cloud machines and deploys a dozen applications, 90% of which are legacy systems. The compilation and packaging of the application system is basically done on the programmer’s own computer. Branch management is also the same as dev branch development, after the test passes, and then merged into the master branch. Application configuration in the production environment is only known by logging in to the specific machine, not to mention the configuration center and configuration versioning.

Oh, and there’s not even basic machine-level monitoring.

My normal work is 50% business development, 50% operation and maintenance. Faced with so many problems, I wondered how to realize automatic operation and maintenance at low cost. This article is to sum up my experience and practice in this respect. Hopefully helpful to readers.

Don’t talk. Monitor and alarm first

There are priorities, and monitoring and alerting is something I feel I need to do from the start, even if business development is slowed down. Only when you know the current situation can you make a plan for the next step.

There are many monitoring systems out there: Zabbix, Open-Falcon, Prometheus. The authors chose Prometheus. Because:

  1. It’s in pull mode
  2. It is convenient to configure in text mode, which facilitates configuration versioning
  3. There are too many plugins. What do you want to monitor
  4. I basically have to relearn all three, so why don’t I learn one recommended by Google SRE?

As we have seen before, installing Prometheus had to be automated and versioned, since there were few people and many machines. The author uses Ansible + Git implementation. The final look is as follows:

Here is a brief introduction:

  1. Prometheus Server monitors data collection and storage
  2. Prometheus Alert Manager generates alarms based on alarm rules and integrates multiple alarm channels
  3. Node-exporter’s role is to read indicators from the machine and expose an HTTP service from which Prometheus collects monitoring indicators. Of course Prometheus officially had a variety of exporters.

One advantage of using Ansible as a deployment tool is that there are too many off-the-shelf roles. I used promethee-Ansble to install Prometheus

With monitoring data, we were able to visualize the data, and Grafana and Prometheus integrated very well, so we deployed Grafana again:

A rendering of the data collected by NodeX-Exporter on Grafana looks something like this:

But you can’t just stare at a screen 24 hours a day to see if your CPU is overloaded, can you? By default, Promehtues integrates N alarm channels. Unfortunately, there is no integrated nail. But it doesn’t matter, a kind-hearted student has opened source a Component that integrates Prometheus alarms:prometheus-webhook-dingtalk. Then, we added the alarm:

With that done, our basic monitoring frame is complete. We are ready for Redis monitoring, JVM monitoring and other upper monitoring.

Configuration versioning starts with kids

During the process of setting up the monitoring system, we have taken the configuration out and managed it in a separate code repository. For all future deployments, we will separate configuration and deployment logic.

For more information on How to use Ansible for configuration management, see How to Manage Multistage Environments with Ansible. This is how we organize environment variables.

├── Environment / # Parent Directory for Our Environment-Specific directories │ │ ├─ dev/ # Contains all files Specific to the dev environment │ │ ├ ─ ─ group_vars / # dev specific group_vars files │ │ │ ├ ─ ─ all │ │ │ ├ ─ ─ the db │ │ │ └ ─ ─ ├─ web │ │ ├─ web │ │ ├─ prod/ # Contains all files specific to the Prod environment │ ├─ group_vars/ # prod specific group_vars files │ │ ├─ all │ │ ├─ db │ │ ├─ web │ ├─ Hosts # Contains only the hosts in the prod environment │ │ ├ ─ stage/ # Contains all files specific to the stage Environment │ ├─ ├─ web │ ├─ all │ ├─ environment │ ├─ ├─ web │ ├─ Contains only the hosts in the stage environment │Copy the code

Currently, all of our configurations are stored as text, and switching to Consul as a configuration hub will be very convenient as versions of Ansible2.0 and above already have a native integration with Consule: Consul_Module

Ansible configuration variables are hierarchical, which gives us great flexibility in configuration management.

Jenkinization: Hand the package over to Jenkins

We’ll leave it to Jenkins to package all the projects. Of course, in reality, we put some projects on Jenkins first, and then gradually put the projects on Jenkins.

First we have Jenkins. The ansible-role-Jenkins script is also available. Ansible-role-jenkins allows you to install your plugins automatically, using jenkins_plugins as a configuration variable.

---
- hosts: all
  vars:
    jenkins_plugins:
      - blueocean
      - ghprb
      - greenballs
      - workflow-aggregator
    jenkins_plugin_timeout: 120

  pre_tasks:
    - include_tasks: java-8.yml

  roles:
    - geerlingguy.java
    - ansible-role-jenkins
Copy the code

Once Jenkins is set up, it’s time to integrate Gitlab. We already had Gitlab, so we didn’t have to rebuild it. How to integrate is not detailed list, there are many articles on the web.

Jenkins ended up with something like this:As for the connection mode of Jenkins Master and Jenkins Agent, due to the different network environment, there are many ways on the Internet, so we can choose the appropriate way by ourselves.

Ok, now we need to show Jenkins how to compile and package our business code. There are two ways:

  1. Interface Settings
  2. Using a Jenkinsfile: a text file similar to a Dockerfile

The author did not hesitate to choose the second option, because one is conducive to versioning; Second, flexibility.

Jenkinsfile looks something like this:

pipeline {
    agent any
    stages {
        stage('Build') {
            steps {
                sh './gradlew clean build'
                archiveArtifacts artifacts: '**/target/*.jar', fingerprint: true
            }
        }
    }
}
Copy the code

So where do I put Jenkinsfile? Together with the business code, each project manages its own Jenkinsfile like this:

Now we can create a Pipleline Job on Jenkins:

As for branch management, we have few people, so it is recommended that all projects be developed and released in the Master branch.

Have Jenkins help us with Ansible

We’ve been doing Ansible on programmers’ computers, so we’re going to hand it over to Jenkins. Specific operation:

  1. Install the Ansible plugin in Jenkins
  2. Execute in Jenkinsfile
    withCredentials([sshUserPrivateKey(keyFileVariable:"deploy_private",credentialsId:"deploy"),file(credentialsId: 'vault_password', variable: 'vault_password')]) {
                 ansiblePlaybook vaultCredentialsId: 'vault_password', inventory: "environments/prod", playbook: "playbook.yaml",
                 extraVars:[
                   ansible_ssh_private_key_file: [value: "${deploy_private}", hidden: true],
                   build_number: [value: "${params.build_number}", hidden: false]
                 ]
    }
    Copy the code

    Here are some explanations:

  3. ansiblePlaybookIs the Jenkins Ansible plugin provided by the pipeline syntax, similar to manual execution:ansible-playbook
  4. withCredentials 是 Credentials BindingSyntax used to reference sensitive information, such as SSH keys and Ansible Vault passwords.
  5. Some sensitive configuration variables are encrypted using Ansible Vault technology.

Where should I put Ansible scripts?

We already know that each project is responsible for its own automated build, so Jenkinfile is placed in its own project. What about deployment of the project? In the same way, we feel that each project should be responsible for it itself, so we have one for each project we deployansibleDirectory for storing Ansible scripts. Something like this:

But how? We zip package the Ansible directory during the packaging phase. When it comes time to deploy, unpack and execute the PlayBook inside.

Quickly generate Ansible scripts and Jenkinsfiles for all projects

Above, we did Jenkins and Ansible for one project, but we have many more projects that need to do the same. I decided to use it because it is a manual job and we will be doing it a lot in the futurecookiecutterThe technology automatically generates jenkinsfiles and Ansible scripts to create a project like this:

summary

To sum up, the implementation sequence of automatic operation and maintenance of our small team is roughly as follows:

  1. Superbase monitoring
  2. On Gitlab
  3. Go to Jenkins and integrate with Gitlab
  4. Use Jenkins to achieve automatic compilation and packaging
  5. Use Jenkins to perform Ansible

The above is just a shelf, based on this “shelf”, can be those big factory’s lofty architecture evolution. Such as:

  • CMDB construction: We use Ansible-CMDB to automatically generate all current machine profiles against inventory
  • Release management: You can customize each stage of a release on Jenkins. Publishing methods such as blue-green publishing can be implemented by modifying Ansible scripts and Inventory.
  • Automatic capacity expansion: This function is implemented by configuring Prometheus alarm rules and calling the corresponding Webhook
  • ChatOps: ChatOps of actual combat

These are some of the author’s practices on automated operation and maintenance. It’s still evolving. I hope to communicate with you.

End


Pay for your harvest