Recently, in order to train the skills of developers, the company held a skill training covering multiple technical lines, and I was lucky to be invited to be in charge of the problem setting and review of Java technology. Here is a review of the process from the following aspects:

  • The subject design
  • Program requirements
  • The test method

The subject design

The topic design mainly considers the following points:

  • Technology evolution requirements: Under the strategic background of cloud migration of the company’s system, our applications will be migrated from traditional VIRTUAL machine deployment to PAAS cloud environment on a large scale, which requires developers to master the development skills of cloud environment, and the application development framework needs to be upgraded from SpringMVC+ Dubbo to SpringCloud service governance framework.

  • Both old and new employees: At present, more than half of the company’s developers are new employees who have been in the company for one or two years. The topic design should take into account the skill level of new employees. It should not be too difficult, otherwise it will discourage the enthusiasm of new employees to participate. At the same time, it also needs to be able to reflect the real technical strength, not because the topic is too simple and make the score can not open the gap. Therefore, the questions should be designed so that everyone can complete the basic functions, while achieving high scores is more difficult.

  • Solve production problems: The original intention of the training is to train the technical strength of the developers, improve the development skills of enterprise applications, better complete the daily tasks, and finally solve the production problems and ensure the stability of the system on the line. Similar to the algorithm on the web, not suitable for the training of the topic.

Based on the above considerations, the final topic design is as follows:

Complete a transaction system: Product subsystem, order subsystem and Integral experience subsystem, and complete the distributed deployment and invocation of the three subsystems. Business scenario: in the process of a transaction, inventory should be deducted in the product subsystem, new orders should be added in the order subsystem, and corresponding points should be increased in the integral subsystem. The bonus rule adds 1 bonus point for every 1 yuan spent.

Functional requirements:

Complete 5 interfaces: order, inventory query, order query, points query, personal transaction information overview

How do you ensure data consistency while completing basic functions?

Program requirements

According to the preliminary statistics, about 50 people are expected to submit the deliverables. If so many deliverables are to be scored by offline demonstration review, the cycle will be long, and manual deployment and debugging will be required, bringing huge workload. Therefore, scoring must be done in an automated manner. At the same time, in order to control the implementation scope of the program and prevent the developers from spreading the development needs infinitely and increasing the extension beyond the scope, the following restrictions are made on the program:

  • Middleware limitations: Since this training is intended to provide technical reserves for the subsequent cloud migration, the selection of middleware must be controlled within the scope of cloud middleware. Therefore, middleware such as SpringCloud+ Consul service governance framework and Nginx soft load should be used instead of Apache, Dubbo and other middleware not used on cloud.

  • Data format: The title provides the core database table structure of the three subsystems, such as product information table, order information table, user experience table, etc. The send and return fields of the five interfaces are given at the same time, and everyone must provide the service in this format.

  • Package format: Since the decision was made to implement an automated scoring program, package requirements must be unified, including the directory hierarchy of compressed packages, SQL script format, Nginx configuration files, start and stop scripts, and so on. The title requires three modules (Product, Order and Experience) to be implemented, and an optional other module (Other) is provided for participants to implement according to their needs. Functions such as gateway and aggregation service can be implemented in the Other module. Other Module package must be the same as the other three modules.

The main difficulty of this topic is how to ensure the consistency of data. However, because the test environment is the same network segment of the machine, the Intranet test environment is very stable, in the normal program running process, it is difficult to happen data inconsistency. How to generate events that may cause data inconsistency is the focus of this question.

So in addition to the above restrictions, this training also introduced a big kill device: chaos-Monkey.

Chaos-monkey: A piece of software developed by Netflix that randomly generates abnormal events in production systems, including timeouts, program exceptions, and outages. The idea for Chaos-Monkey comes from chaos engineering.

Chaos engineering, as a discipline of system experiment, aims to understand the ability of the system to cope with various chaotic conditions in the production environment and build confidence in the system. By carrying out experiments in chaos engineering, we can test whether the system has defects, so as to understand how the system performs under chaotic production conditions.

The Chaos-Monkey principle, the main way to avoid most failures is to fail often. The production environment is resilient by frequently creating failures in the production environment.

The chaos-Monkey jar and configuration are provided, which participants must introduce to simulate timeouts or exceptions that cause data inconsistencies and test the robustness of the program.

The test method

Testing is divided into two steps: automatic deployment and automatic testing.

The service topology is as follows:

Automatic deployment

All players’ deliverables are submitted to a unified FTP directory, so it’s just a matter of traversing files through FTP and triggering deployment.

Pipeline steps are as follows:

forDelete redis cache SQL statement, initialize database to get compressed package from FTP, unzip If(with nginx configuration file) replace nginx configuration file, and reloadelseDistribute the default nginx configuration file, and reload the implementation of the base data script For(traversing the number of services) according to the rules of each service compression package distribution to each service nodeifSQL startup script start.sh Run the test script, collect test results, calculate scores, and output them to a CSV file to stop each serviceCopy the code

Typically, Jenkins is used for automated deployment. Jenkins does not support loop traversal, however, and a job can only complete the deployment and testing of one contestant. If each participant has one job, the correctness of each job configuration cannot be guaranteed. In addition, because there is only one set of evaluation environment and each person has one job, the serial execution of jobs cannot be guaranteed and the simultaneous execution of multiple jobs will affect each other. Therefore, I decided to write my own deployment script. Only after the deployment is complete, a test job is triggered when a test is required. The test job completes the test and scores the test. The deployment script listens for the status of the test job, and when the test is complete, the next entrant’s deployment process begins.

Automated testing

The company already has a complete Robot Framework automation testing infrastructure, so you can use it directly for this test. Because the deployer has already initialized the database and cache before the program starts, the test program simply reads the test data and makes a call to Nginx.

To test the consistency of data when the program is abnormal, we need to start the chaos-Monkey before the test. After the logic of writing data is executed (i.e. ordering interface), we need to close the chaos-Monkey and start the query. By querying the data of each interface and comparing it with the written data, we can verify the consistency of data and score. The steps are as follows:

1. Enable the chaos-Monkey function of each service.

2. Execute the test script to write data, that is, call the single interface for hundreds of times;

3. Disable the chaos-Monkey function of each service.

4. Execute a single data query interface, verify the availability of each interface, and record scores;

5. Perform combination verification logic to verify data consistency of multiple services and record scores.

After the test execution of each participant is completed, the scores calculated are written into a summary CSV file. After getting the final CSV file, you can use Excel to achieve sorting according to grades.

thinking

Through this training, we have the following thoughts:

  • Continuous investment in tool building: tool building can be used at the lowest cost and obtain the maximum benefit. In this training, one person was involved in the development of the deployment line and one person in the automated test script, and the development and test were completed in only two weeks using spare time, which was much cheaper than manual deployment, demonstration and test. At the same time, the tool can be executed repeatedly and automatically triggered after each modification. Daily development and testing process should use as many tools as possible to do regression testing, to avoid human negligence caused by the omission of analysis and regression coverage.
  • Environmental issues are important: everyone can run programs in their own environment, but some programs won’t run in the review environment. Therefore, in the daily testing process, in addition to the correctness of the program itself, we also need to pay attention to the possible problems caused by environmental differences. By the way, if the company can use container technology like Docker, it will improve the quality of our versions and reduce the probability of errors depending on the environment.
  • Awareness of standardization needs to be improved: although the title has made requirements on package format and program port, and has informed everyone of the points of attention, there are still many people who do not comply with them, resulting in a low score of the test program. This reflects the tendency of most developers to write code and ignore specification requirements, leading to rework and even online failures, which can lead to serious consequences. It is recommended that you be familiar with and comply with the specifications and requirements in advance before daily coding to avoid detours.

Finally, thank you for your patience with me. 2021, wish all procedures are not bug!