preface

First of all, let me introduce the background: Our company’s products will be directly deployed in Party A, because there are many Party A and they contain more personalized needs, so each Party A can be understood as a Git branch of our product. Because the machine environment and network environment of Party A are different, there are often some problems in operation, so I designed this simple intelligent monitoring system to monitor the interface of each party A in real time.

Scope of application

The scope of application derived from the scheme is as follows:

  1. Docker multi-container running projects, and currently does not have interface health detection, this set of solutions can implement the detection of multiple running points.
  2. The company has multiple operating environments for the same set of code and needs to monitor the status of each environment.
  3. After branching, the project is deployed in each party (including personalized needs) —— we are this one.

rendering

Click on the node to emulate that node again, bringing up the parameter in the browser.

Double – click a node to duplicate its outgoing parameter.

The main analysis

Because our product has completed the front and back end separation (described in previous articles), we focus on monitoring interfaces.

My plan is as follows:

  1. Get (or post) an HTTP interface (including user information) every three minutes to get the outgoing interface parameters.
  2. Check whether the outgoing parameters of this interface are the same as the outgoing parameters of the previous interface. If the outgoing parameters are not the same, mark them.
  3. The front end shows that each interface corresponds to all the changing nodes of each project and arranges a node difference graph.
  4. In this case, we don’t care about the contents of the interface’s outgoing arguments, and we don’t validate the node’s outgoing arguments because we’re looking at all interfaces.

Analyze an interface

Here because each company’s each interface, input, out to participate in the secret way is not the same, and the identity authentication is not the same, we will write a method to achieve encryption and decryption in an independent way, so, I only introduce the unencrypted, decrypted parameters; The following uses {“code”:”200″,”data”: NULL,”message”::”success”} as an example of identity authentication. I use simple tokens to explain this article.

An interface to get a news list:

http://*.com/v1/news?token=2c789e34dc81d79feba6a005ad63902b
Copy the code

Decrypted output parameter:

{"code":"200"."data": [{"id":"1"."title":"This is fake news."."url":"http://"}, {"id":"2"."title":"This is fake news."."url":"http://"}]."message": :"success"}
Copy the code

From the interface

  1. The GET method
  2. The token = 2 c789e34dc81d79feba6a005ad63902b refs
  3. A parameter to {” code “:” 200 “, “data” : [], “message” : : “success”}
  4. *.com may be different for different environments, such as 10.0.0.1, 10.0.0.2 multiple addresses under the container
  5. Among them, the token is different in each party.

When adding a news fetch interface:

Relative url:

v1/news
Copy the code

Request type:

get
Copy the code

Body main body:

token={token}
Copy the code

Corn expression:

0 0/3 * * *, right? (Run every 3 minutes)Copy the code

When adding environment A:

Host (required)

http://a.com
Copy the code

Parameters (list)

token 2c789e34dc81d79feba6a005ad63902b
Copy the code

When adding environment B:

Host (required)

http://b.com
Copy the code

Parameters (list)

token 4297f44b13955235245b2497399d7a93
Copy the code

Bind interfaces and environments

  1. An interface can be bound to multiple environments
  2. Multiple interfaces can be bound to an environment
  3. Only one-way bindings are recorded in the database and are logically bidirectional.

For example, in an environment where the news fetch interfaces are bound to A and B, the interface is requested once every 3 minutes.

A:

A Environment host+ relative URL =

http://a.com/v1/news
Copy the code

Request type:

get
Copy the code

Body main body:

token=2c789e34dc81d79feba6a005ad63902b
Copy the code

B:

B Environment host+ relative URL =

http://b.com/v1/news
Copy the code

Request type:

get
Copy the code

Body main body:

token=4297f44b13955235245b2497399d7a93
Copy the code

The framework itself doesn’t care how many arguments there are, it just iterates through the placeholders in the interface, replacing the fields in the placeholders with variables in the parameters list in the environment. Such as a body for the token = 2 c789e34dc81d79feba6a005ad63902b & type mobile = = 1 & 2

Then, when adding an interface, fill body with:

Token ={token}&type={type}&mobile={mobile} Where placeholders can be arbitrarily namedCopy the code

The parameters (list) for configuring the environment are as follows:

k v
token 2c789e34dc81d79feba6a005ad63902b
type 1
mobile 2

Parameters and placeholders, independent of other url variables.

The front-end display

For different status codes, we have different colors. You can choose the appropriate color for the error code in your product. The higher the error level is, the more conspicuous the color will be.

Different presentation

You can tap to view the monitoring information of all interfaces in environment A or interface 1.

Environment Editing interface:

Task editing interface:

The core technology

Since this is periodic monitoring, it is natural to use CRon expressions, and we are a Java project, so we use the Quartz framework.

Quartz core code

public class QuartzSchedule {

    private static SchedulerFactory sf = new StdSchedulerFactory();
    private static Scheduler sched;
    final static String groupName = "task"; Public static void init() throws IOException, SchedulerException {// Query all tasks and projects that need to be executed. Load the configuration file for mybatis InputStream InputStream = Resources. The getResourceAsStream ("mybatis-config.xml"); SqlSessionFactory SqlSessionFactory = new SqlSessionFactoryBuilder().build(inputStream); SqlSession sqlSession = sqlSessionFactory.openSession(); MprojectTaskDao mprojectTaskDao = sqlSession.getMapper(MprojectTaskDao.class);sched= sf.getScheduler(); List<MprojectTask> mprojectTaskList = mProjecttAskDao.findList ();for(MprojectTask mprojectTask : mprojectTaskList) { startJob(mprojectTask); } } public static void stopTask(MprojectTask mprojectTask) { TriggerKey triggerKey = TriggerKey.triggerKey(mprojectTask.getId(), groupName); try { sched.pauseTrigger(triggerKey); // Stop trigger sched.unscheduleJob(triggerKey); // Remove trigger sched.deleteJob(jobkey.jobkey (mprojecttask.getid (), groupName)); } catch (Exception e) {e.printStackTrace(); */ public static void startTask(MprojectTask MprojectTask) {// Close the task before starting it, whether it is closed or not. stopTask(mprojectTask); startJob(mprojectTask); } private static void startJob(MprojectTask mprojectTask) { try { JobDetail jobDetail = JobBuilder.newJob(TaskQuzrtzJob.class).withIdentity(mprojectTask.getId(), groupName).build(); / / triggers TriggerBuilder < Trigger > TriggerBuilder = TriggerBuilder. NewTrigger (); / / the trigger, the trigger group triggerBuilder. WithIdentity (mprojectTask. GetId (), groupName); triggerBuilder.startNow(); / / the trigger time setting triggerBuilder. WithSchedule (CronScheduleBuilder. CronSchedule (mprojectTask. GetCron ())); CronTrigger Trigger = (CronTrigger) triggerBuilder.build(); // Set JobDetail and Trigger sched.scheduleJob(JobDetail, Trigger); / / startif(! sched.isShutdown()) { sched.start(); } } catch (Exception e) { e.printStackTrace(); }}}Copy the code

Job core code

public class TaskQuzrtzJob implements Job { public TaskQuzrtzJob() throws IOException { } //JobExecutionContext Context pass parameter value public void execute(JobExecutionContext context) {// Use the context to get all the environments for the task // walk through all the environments // piece together the request address // use the encapsulated framework to encrypt the request // get the result after decryption // compare with last time, if not the same, mark, enter node table // enter record table}}Copy the code

The amount of data

Since each interface (task) we monitor has multiple environments, assuming 100 interfaces and 100 environments are monitored every 3 minutes, the daily record amount is

100 * 100 * 20 * 24=4800 000 records, which is quite amazing, so we divided two tables. One table only records records, and the other table records changes. When displaying nodes, only query the change table, which can solve performance problems.

The front end

The front end uses CSS to draw circles and colors

round {
    border-radius: 50%;
    text-align: center;
    width: 25px;
    height: 25px;
    line-height: 25px;
}

.on {
    border: 1px solid #7CBA23;
}
Copy the code

Lines Draw lines using lines

line {
    border-bottom: 1px solid gainsboro;
    height: 2px;
    width: 20px;
}
Copy the code

Angular JS renders the list as a whole

The database

We use mysql database, Mybatis framework connection.

Abnormal remind

We will associate each interface and status code. When there is an exception, we will send an email, and the tester will verify the problem and feed back to the developer. After the developer solves the problem, we will input the solution for the next query.

conclusion

This platform has been tried out for more than a month, and it generally meets our needs. In the past, we ran test scripts and often could not monitor the server health status in real time. Now, we can monitor any environment with problems in real time, and we can count the time from when to when the interface of a certain environment is abnormal.

This platform has few functions, but it greatly simplifies our assessment of environmental stability, and also collects a large amount of data information for later statistical analysis. Secondly, the scheme can be used in a variety of monitoring scenarios with slight modification.

Looking forward to

In the future, I will incorporate some intelligent analysis, for example, the following is the content of an email reminder:

(Intelligent monitoring) It was found that the interface [Get News List] in A Customer’s Private Environment was abnormal, and the exception code was 102; {“data”:null,”message”:” success “,”state”:102}

Historical solutions include:

  1. Restart the container on the management side.
  2. Abnormal dirty data exists in the xx table. According to historical statistics, the probability of option 1 is 80% and option 2 is 10%.

{“data”:null,”message”:” no data yet “,”state”:200}

Two other environment anomalies were found, namely XXX and YYY environment. However, the abnormal status is not similar to that of the school. The cause of the interface WAR packet bug is excluded.

Due to the XXX environment other interfaces during a monitoring are the exception, in 5 minutes so (intelligent monitoring) has independent call container self-healing module, at present has been restored to service, the data collection took five minutes, to analyze 100 milliseconds, self-healing takes 60 seconds, the environment interrupt service time is 6 minutes, interrupt service this month, a total length of 15 minutes, The high availability was 98.33%.

over