preface

Monitoring is an important part of ensuring system stability. In Kubernetes open source ecosystem, resource monitoring tools and components bloom together. In addition to community-incubated Metrics-Server, and Prometheus, a CNCF graduate, there are plenty of options available to developers. However, resource class monitoring alone is not enough, because there are two major deficiencies in resource monitoring:

  • The real-time performance and accuracy of monitoring are insufficient

Most resource monitoring is done offline on a push or pull basis, so data is usually collected at intervals. If there are some burrs or anomalies in the interval, and they are recovered when the next collection point arrives, most acquisition systems will swallow the exception. For the burr scenario, the stage acquisition will automatically cut the peak, resulting in the reduction of accuracy.

  • The monitoring scenario coverage is insufficient

Some monitoring scenarios cannot be expressed by resources. For example, the start and stop of Pod cannot be measured simply by the resource utilization rate, because when the resource is 0, we cannot distinguish the real cause of this state.

How does Kubernetes solve these two problems?

Event Monitoring – New dimension of monitoring

Kubernetes as a cloud native platform implementation, from the architecture design interface and implementation to achieve a complete decoupling and plug, state machine as the overall design principle, by setting the expected state, perform the state conversion, check and compensate the state of the way to take over the life cycle of resources.



Kubernetes has two types of events: one is a Warning event, which indicates that the state transition that generated this event was generated between unexpected states. The other is a Normal event, which indicates that the desired state is the same as the current state. We take the life cycle of a Pod as an example. When a Pod is created, the Pod will enter the Pending state first, waiting for the image to be pulled. When the image is accepted and passed the health check, the Pod will be in the Running state. A Normal event is generated. If the Pod crashes during the runtime due to OOM or other reasons, and enters the Failed state, which is not expected, then the Kubernetes will generate a Warning event. In view of this scenario, if we can generate monitoring events, we can timely check some problems that are easily ignored by resource monitoring.

A standard Kubernetes event has several important properties that can be used to better diagnose and alert problems.

  • Namespace: Namespace of the object that generated the event.
  • Kind: Type of object to which events are bound, such as Node, Pod, Namespace, Componenet, etc.
  • Timestamp: time at which the event occurred, etc.
  • There is a Reason why this happened.
  • Message: description of the event.
  • Other information

Through the mechanism of events, we can enrich the dimension and accuracy of Kuernetes in monitoring and make up for the deficiency of other monitoring schemes.

Kube-eventer V1.0.0 release and open source



For the Kubernetes event monitoring scenario, the Kuernetes community provided a simple event offline capability in Heapter, which was later archived with the deprecation of Heapster. In order to make up for the absence of event monitoring scenarios, Ali Cloud Container service released and opened source the Kubernetes event offline tool Kube-Eventer. Support offline Kubernetes events to staple bots, SLS logging services, Kafka open source message queues, InfluxDB timing databases, and more.

In this official release of V1.0.0, the following features are enhanced.

  • The nailing plug-in supports Namespace and Kind filtering
  • Support integration and deployment with NPD plug-ins
  • Optimize the data offline performance of the SLS plug-in
  • Fixed invalid startup parameters of the InfluxDB plug-in
  • Fix security vulnerabilities in Dockerfile
  • And 11 other functional fixes

Typical scenario analysis:

  • Use nails to generate event alarms
  • Event alarms are generated using SLS
  • Combined with NPD for exception detection and event alarms, developers using container services can deploy directly with Helm Chart in the application directory.

Project open source address: github.com/AliyunConta…


Author: Mo Yuan

The original link

This article is the original content of the cloud habitat community, shall not be reproduced without permission.