The paper

“Operation and Maintenance Thinking” introduces a series of articles on operation and maintenance norms, operation and maintenance management and automation, mainly sharing some ideas and ideas of operation and maintenance automation construction. From the point of view of the reader, perhaps only I understand, then they in the whole operation and maintenance automation construction in the end where the position, play what role?

Let’s share a relatively preliminary and grounded gas picture:

The operation and maintenance tools in the figure should be familiar and commonly used. From the perspective of the operation and maintenance framework:

  • Infrastructure layer, Vsphere virtualization, physical machines, etc.
  • Data layer, database, ELK, cache, etc.
  • Application layer, various basic components, business applications, such as Java, Python, PHP, Nginx, middleware, etc.
  • Platform layer, various monitoring tools, management tools, such as Vsphere, JumpServer, Jenkins, etc.

Our operation and maintenance work is basically distributed in the above four levels, so how to efficiently and high-quality delivery has become the main problem we face.

Technology stack

For complex operation and maintenance work, our first thought should be automation, then automation on the technology stack is high? Is there a need for development capability? With that in mind, move on.

As you can see from the figure above, I mainly used the following technical components:

  • Cobbler
  • Ansible
  • Jenkins
  • The blue whale
  • Python
  • Zabbix
  • Vsphere
  • JumpServer

The above tools are familiar to everyone, but the different positioning of the tools determines the size of their role. Especially for Ansible, Jenkins and Blue Whale.

1.Ansible

Ansible, as an automated operation and maintenance tool, is little more than a scripting tool if you just think of it as a tool for executing commands in batches. But what about using it as a configuration hub for our operations?

It can help us with many configuration management requirements:

  • The operating system is initialized
  • Safety baseline adjustment
  • Environment initialization, such as Java, Python
  • Install basic components such as Nginx and Python
  • Batch Command
  • Centralized backup control
  • Mysql synchronization

As the configuration center, the idempotent and client-free properties of Ansible enable us to implement more personalized requirements after the delivery of the operating system.

2.Jenkins

Jenkins, as a continuous integration/delivery tool, not only has an important role to play in the DevOps space, but it can also leverage its pipeline power and Ansible components for more parameterized process control.

  • Initialization of the parameterized operating system
  • The parameterized application system goes online

Theoretically, all terminal operations completed by Ansible can be constructed using Jenkins’ interface parameterization. Therefore, Jenkins + Ansible can form a simple operation and maintenance platform to realize a series of scenario-based operations.

3. The blue whale

As an open source r&d, operation and maintenance integrated platform of Tencent, Blue Whale provides CMDB, standard operation and maintenance, operation platform and fault self-healing, which can enrich our operation and maintenance means. However, the most important thing is that we can rely on CMDB configuration management to provide reliable data support for upper-layer services, and then cooperate with fault self-healing + operation platform access to different alarm sources to realize fault self-healing of services. Standing on the shoulders of giants, we will get twice the result with half the effort.

If Jenkins+Ansible is not personalized enough, then blue Whale is our backup solution. Here we use blue whale to achieve the following functions:

  • Multi-environment (test, quasi-production, production) virtual machine rack, realize the linkage of CMDB, Zabbix, JumpServer and other platforms;
  • CMDB event push, in conjunction with the event push gateway developed by Python itself, realized cmDB-based asset synchronization between Zabbix and CMDB;

After all, Blue Whale is a relatively heavy platform, so we did not fully access it, but based on the standard operation and maintenance development framework of Blue Whale, we developed and defined it by ourselves

  • Vsphere atoms
  • Jumpserver atoms
  • The CMDB automatically registers atoms

The atomic development of standard o&M requires us to be familiar with the Django framework, and by following the development specifications, we can break through our isolated o&M platform.

In terms of technology stack, except for blue Whale standard operation and maintenance development, there is no other operation and maintenance need much high development capability, which only needs us to sum up more in common use. If you don’t believe me, you can read more of my column (the public account is the most complete) :

  • The road to Ansible
  • The road to the blue whale
  • The road of CI/CD
  • The road to Jenkins

These columns are constantly summarized from the daily improvement, with a responsible attitude to their own, this is actually not difficult.

Operational specification

Having a technology stack doesn’t mean you’re going to have an easy road ahead. Instead, you’ll find that your automation path is almost impossible due to the variety of systems or configurations.

The reason is that there is no standard for operation and maintenance. Team members have their own habits. Without a unified standard, it will only become more and more chaotic.

Therefore, we summarize different o&M specifications from the infrastructure layer, application layer and platform layer respectively.

1. Infrastructure layer

  • Operating system installation specifications

    The Cobbler unattended installation of the operating system will follow the installation specifications, and the Vsphere vm clone will deliver a unified operating system.

  • Operating system configuration specifications

    The Ansible operating system initializes kernel parameters, time synchronization, security baseline, and installation source according to this specification.

  • Catalogue Management Specification

    Define directories such as basic components, application components, and logs. Subsequent operations will be based on these standard directories.

2. The application layer

Application configuration management specifications are used to standardize a series of dependent components and parameters when application systems are delivered through the pipeline. The application management specification is just a general term in this context, which can be extended to the development stack in practical applications:

  • java
  • python
  • php

The data directory, log directory, startup parameters, and configuration of each application are specified.

3. Platform specification

The platform specification is our last step, because our final operation is managed through the operation and maintenance of various platforms. If there is no relevant specification restriction at this time, it will undoubtedly leave hidden dangers for some operations in the future.

Here we create the following specifications:

  • Vsphere Management Specifications

    Vsphere LIFE cycle management basis for creating, allocating, and reclaiming VMS

  • JumpServer management specifications

    Control of grouping and permissions of JumpServer assets

  • Zabbix management specifications

    Zabbix manages host groups, hosts, templates, monitoring intervals, and alarms

  • CI/CD specification

    Jenkins’ management of jobs, assembly lines, slave nodes, etc

conclusion

In the process of operation and maintenance automation construction, it is mainly based on the combination of operation and maintenance norms, operation and maintenance tools, process control and other aspects, rather than divide and conquer. There is no requirement for high development capability for operation and maintenance, so I call this process automation construction of ground gas. I hope you can use this preliminary model for reference and bring some substantial changes to your actual work.

The above is just my share and insight in the initial operation and maintenance automation construction, I hope to bring you some ideas and inspiration, do not be confused. With the continuous deepening of the operation and maintenance automation construction, there will be higher requirements on the development ability and platform sorting ability. At this time, I believe your ability will gradually match. Let’s work together!