Through the previous two articles, we built the basic environment of SW, monitored the microservices, and understood the operation of all services. However, when the service response is slow and the interface time is serious, we need to locate the problem immediately, which requires us today’s protagonist – alarm monitoring, and this is the last part of the SW series.

The UI parameters

First let’s take a look at some of the key parameters on the SW DashBoard, as shown in the figure below

The alarm configuration

The alarm process

Skywalking Sends alarms by polling the link-tracing data collected by the Skywalk-Collector at intervals. Then, based on the configured alarm rules, such as service response time and percentage of service response time, the Skywalking sends response alarms when the specified alarm threshold is reached. To send alarm information, the thread pool asynchronously invokes the Webhook interface (the specific Webhook interface can be customized by users). In this way, developers can write various alarm modes, such as pin alarm and email alarm, in the specified WebHook interface.

Rule configuration

The core of the alarm is driven by a set of rules defined inconfig/alarm-settings.yml, the following is displayed after opening:

The definition of alarm rules is divided into two parts.

  • Alarm rules. They define how measurement alerts should be triggered and what conditions should be considered.
  • [network hook](#Webhook}. Which service endpoints need to be told when a warning is triggered.

The alarm rules are as follows

  • Rule name. The unique name displayed in the alarm information. Must end with _rule.
  • The Metrics of the name. It is also the name of the measure in the OAL script.
  • Include names. The names of entities below them are in this rule. For example, service name, terminal name.
  • Threshold. The threshold.
  • The OP. Operators that support >, <, =.
  • Period. How often do I check whether the current indicator data complies with alarm rules this is a time window that matches the time of the back-end deployment environment.
  • The Count. In a Period window, if values exceed the Threshold value (press op) and reach the Count value, an alert needs to be sent.
  • Silence period. After the alarm is triggered in time N, no alarm is generated in TN -> TN + period. By default, it is the same as Period, which means that the same alarms (with the same Id in the same Metrics name) will only be triggered once during the same Period

Webhook

SkyWalking’s alert Webhook requires that the peer be a Web container. Alarm messages are sent through HTTP requests. The request method is POST, the content-type is Application /json, and the JSON format is based on List

  • ScopeId. Check out all available scopesorg.apache.skywalking.oap.server.core.source.DefaultScopeDefine.
  • Name. Entity name of the target Scope.
  • Id0. Scope Indicates the ID of the entity.
  • Id1. Not used alarmMessage. AlarmMessage content.
  • StartTime. Alarm time, between the current time and UTC 1920/1/1.
[{
    "scopeId": 1, 
        "name": "serviceA", 
    "id0": 12,  
    "id1": 0,  
    "alarmMessage": "alarmMessage xxxx",
    "startTime": 1560524171000
}, {
    "scopeId": 1,
        "name": "serviceB",
    "id0": 23,
    "id1": 0,
    "alarmMessage": "alarmMessage yyy",
    "startTime": 1560524171000
}]Copy the code

The code field

  • Write entity classes to receive SW alarm messages
@Data
public class SwAlarmVO {
    private int scopeId;
    private String name;
    private int id0;
    private int id1;
    private String alarmMessage;
    private long startTime;
}Copy the code

  • Write a Webhook interface
@RestController
@RequestMapping("sw")
@Log4j2
public class AlarmController {
    @PostMapping("/alarm")
    public void alarm(@RequestBody List<SwAlarmVO> alarmList){
        log.info("skywalking alarm message:{}",alarmList);
        //todo doalarm
    }
}Copy the code
  • Modify alarm configurations to enable the Webhook interface
  • To simulate request invocation slowness, we use it in codeThread.sleep(1000)Increase the interface time and wait for the webhooOK interface alarm to respond

Details are as follows:

[SwAlarmVO(scopeId = 2, name = dubbo - consumer - pid: 13812 @ jianzhang11, id0 = 28, id1 = 0, alarmMessage = Response time of service instance dubbo - consumer - pid: 13812 @ jianzhang11 is more than 1000ms in 2 minutes of last 10 minutes, startTime = 1573122018755), SwAlarmVO(scopeId = 2, name = dubbo - provider2 - pid: 14108 @ jianzhang11, id0 = 25, id1 = 0, alarmMessage = Response time of service instance dubbo - provider2 - pid: 14108 @ jianzhang11 is more than 1000ms in 2 minutes of last 10 minutes, startTime = 1573122018755)]Copy the code

In this case, Webhook can normally receive the alarm information from the SW, and the subsequent message notification can be customized and developed directly.

Related articles:

Distributed tracking system based on SkyWalking – environment construction

Distributed tracking system based on SkyWalking – microservice monitoring

For more content, please pay attention to the public number: JAVA Daily Records