The technical team of the author is responsible for the development and maintenance of dozens of projects, each of which has at least four environments, dev, QA, Hidden and Product, with hundreds of machines. They are struggling to solve various trivial problems among various systems. How to free themselves from these trivial things? Devops became our natural choice.

This article is a summary of DevOPS practice based on the current environment and team size. The solution is easy to understand, easy to implement and effective.

Implementation method

Let’s take a look at the flow chart:

The engineer develops locally, submits the code to the code repository after the completion of development, [automatically] triggers Jenkins to conduct continuous integration and deployment, and receives the result email upon completion of deployment. During the project operation, the program log can be checked through the log system. If there is any abnormality, the monitoring system will trigger to send an alarm. Engineers can independently complete the process from coding to result feedback after going online, forming a complete closed loop. Operation and maintenance is responsible for providing the tool chain of the complete process and assisting in the handling of abnormal situations. The workload is reduced but the efficiency is higher.

  • Automatic triggering Jenkins deployment is realized by SVN and git hooks. Whether automatic triggering is determined by the internal communication of the project, we do not have automatic triggering at present, because QA does not want to be interrupted by automatic triggering deployment during the testing process, but it can also be easily triggered manually on Jenkins
  • Jenkins pulls the code from SVN –> compiles –> JS/CSS merge compression –> other initialization operations –> generates the final online running code package, packages the image into docker Hub through Dockerfile, and then trigger Kubernetes rolling update
  • Image contains a base image + project code, base image is a packaged according to the project operating environment to minimize the running environment of (does not include the project code), according to the project depend on different technology stacks we packed a lot of not understand the type of base image, contain nginx service base image, for example, contains the JDK + tomcat base image
  • If the program is found to have errors or bugs that cannot be solved within a short period of time, Jenkins can be used to quickly roll back to the previous mirror version, which is very convenient
  • If you see a sudden increase in traffic, you can quickly adjust the container copy count via Kubernetes

Software and Tools

  • Code management: SVN, git
  • Continuous integration: Jenkins, shell, Python
  • Docker-like: Docker, Harbor, Kubernetes
  • Monitoring alarm: Zabbix, Prometheus
  • Log systems: FileBeat, Kafka, LogStash, ElasticSearch, Kibana

Code management

Most projects are managed through SVN. Here, using SVN as an example, each project has three codelines: Dev, Trunk, and Releases

  • Dev: Local development. Once a feature or task is developed, it can be submitted to the Dev branch and deployed to the DEV environment for self-testing
  • Trunk: When a major feature development is complete and planned to go live, the code is merged into the Trunk branch and QA is deployed to the Trunk environment for detailed testing
  • Releases: If the QA test is successful and the project is about to go live, merge the code into the Releases branch and deploy the Hidden environment (the simulation environment, with all configurations, code, etc.) back again. If the regression is successful, release the product environment

Some projects are released based on releases, so after the code is merged into Releases, a branch/tag deployment to Hidden is tested

Continuous integration

The main work of this step is to package the source code into the final online project according to the requirements. Most of the work is completed by shell and Python scripts, such as pulling the code from SVN, compiling the source code, merging and compressing static resource files and so on. By using Jenkins to string together so many of our scattered steps into a complete process, OPERATION and maintenance should be familiar with this part, but more details

Docker,

Docker is a very important part of our whole program, which can be easily deployed. All environments using the same Docker image also ensures the unification of the environment, greatly reducing the normal operation of the development environment and the occurrence of errors in online operation. Meanwhile, resource occupation can be adjusted in real time according to the load of the project to save costs.

  • Dockerfile: Package images by writing dockerfiles
  • Harbor: Acts as a docker Hub image repository with a Web interface and API interface for easy integration
  • Kubernetes: Kubernetes (K8S) integrates a Docker instance into a cluster, which is convenient for image distribution, upgrade, rollback, increase or delete the number of copies, and also provides ingress Internet access. This part is relatively heavy, but we do not use too advanced functions, just some basic functions mentioned above. There is no need for secondary development or customization of K8S. It is only deployed and used, which is not technically difficult for operation and maintenance.

Monitoring alarm

Monitoring alarm is very important in the whole operation and maintenance process. It can take precautions to reduce the occurrence of faults and speed up the solution of faults. This is also the basis of operation and maintenance and I don’t want to talk about it too much

  • Zabbix: The host computer monitors and alarms through Zabbix
  • Prometheus: Monitoring and reporting of Docker containers through Prometheus (not yet completed)

Logging system

Elk log system is the Gospel of operation and maintenance, use all say good, from now on never have to listen to the development to you say “XX, help me pull log on the offline”. The architecture we use is FileBeat /rsyslog –> kafka –> logstash –> ElasticSearch –> Kibana

  • Filebeat/rsyslog: The client collects logs through FileBeat or rsyslog. Filebeat is a program developed by Go, which is very convenient to deploy and matches perfectly with Docker. Our Docker basic image has a default fileBeat service initialization configuration file. No additional configuration is required for later integration of project code; The advantage of using Rsyslog is that most systems come with the Rsyslog service, so there is no need to install an additional program to collect logs. However, omkafka module is required for rsyslog to send data to Kafka. Omkafka has requirements on the version of Rsyslog. Most systems that need to upgrade to the Rsyslog version are cumbersome and abandoned
  • Kafka: Kafka is designed to process log data. We use three machines to do Kafka cluster, and one topic corresponds to multiple groups, avoiding a single point
  • Logstash: Gets data from Kafka, filters and writes elasticSearch. Kafka is a topic that corresponds to multiple groups. Basically, we want to get rid of the single point in the logstash index for each group
  • Elasticsearch: Stores filtered data in a 3-node cluster, avoiding a single point
  • Kibana: visualization tool, convenient to search the data you want, colleagues also do a variety of reports, at a glance

conclusion

  1. Support: The project is half way there to get everyone’s support, nothing can’t be solved by a barbecue, if there are two
  2. Specification: many projects, huge systems, must have specification, specification is the basis of automation
  3. Documentation: Detailed procedures for implementation, how to use, and how to maintain should be documented
  4. Training: For Jenkins and ELK tools that are not used in operation and maintenance, relevant training should be shared with users. Of course, various details of the project should also be shared within operation and maintenance

Related articles are recommended reading

  • DevOps operation automation tool system platform
  • Fortress WebSSH full function implementation tutorial
  • The principle and application of ELK system in production environment