Abstract:

Description of the APM

APM is Application Performance Management, which monitors and manages the Performance and availability of applications. In the narrow sense, APM refers to the monitoring of applications, such as interface performance and error monitoring, distributed call link tracking, and other kinds of monitoring information for diagnosis (memory, thread, etc.). In a broad sense, APM not only monitors the application layer, but also includes mobile App side monitoring, page side monitoring, container monitoring, server monitoring, and other platform components such as middleware containers, databases and other levels of monitoring.

APM is a new monitoring field developed with cloud technology and microservice architecture in recent five years. At home and abroad, both cloud vendors (AWS, Azure, etc.) and independent companies (Dynatrace, Appdynamics, etc.) have excellent APM products.

As the largest cloud manufacturer in China and the top three in the world, Ali Cloud also provides many excellent products in the FIELD of APM, and the whole product family is relatively comprehensive. This article takes APM field as an example to introduce ali Cloud’s various products in this field.

Introduction to application Architecture in the Cloud era

Typical applications in the cloud era, such as those running on Aliyun, have their architectures as shown in the figure below.



Among them:

  • Applications generally provide two client access modes: mobile App access or browser-based page access. Client monitoring is also called User Experience Management (UEM) in the APM field.
  • In addition to client access accidents, users typically deploy business probes to directly test service performance or perform health checks through apis.
  • The back-end application server provides services directly to the client. Microservices-based applications generally have multiple applications running on multiple nodes, and there are complex call dependencies between applications.
  • Back-end applications typically run in containers or directly on top of a (virtual) operating system, depending on whether the user further uses container technology to optimize development and operations.
  • Applications also directly rely on various PaaS/SaaS cloud services, such as OSS, OTS, MQ, and RDS, to provide responsive platform services for applications, simplifying application operation and maintenance costs.

The ultimate goal of all products in the APM field on Ali Cloud is to effectively monitor the above components. The following describes the corresponding APM products provided by Aliyun for each component.

Aliyun’s APM solution map

Based on today’s application architecture on the cloud, Aliyun’s APM solution map is shown below.



Among them:

  • PC/ mobile terminal page monitoring: This component is monitored by the front-end monitoring sub-product of real-time service monitoring. The principle of front-end monitoring is to monitor the health status of a page from three aspects: the number of calls and response time of the page, the call time and error return of API, and the JS error monitoring of the page. In addition, front-end monitoring also supports page status monitoring by dimension, including region dimension, network operator dimension, browser type and version dimension, etc.
  • Mobile terminal APP monitoring: monitoring is accomplished by mobile data analysis. Mobile data analysis for developers to provide one-stop digital operation services, including general of multi-dimensional open and support custom user behavior analysis, data analysis, data seamless docking other data application products, can help mobile developers to achieve fine operation based on the technology of big data, improving product quality and experience, enhance the user viscosity.
  • Regarding the probe/pressure section: generally divided into two parts.

    • Probe: Dial the monitoring status of the page through an external probe. This part can be achieved by cloud-monitored site monitoring. Site monitoring is positioned as a monitoring product of Internet network detection. It is used to send detection requests simulating the access of real users through Internet terminal nodes all over the country to monitor the access of terminal users of operators’ networks to business service sites in various provinces and cities across the country.
    • Pressure test: By simulating the user access in the external real network environment, to carry out pressure test on the online system. This part of the functionality can be implemented by performance testing. The product is derived from the single link/full link pressure measuring platform that has served ali’s whole ecosystem for more than 4 years. It simulates users’ real traffic by deploying pressure measuring traffic on edge computing nodes.
  • Application server: Monitoring is performed by application monitoring of real-time business monitoring. Application monitoring is the predecessor of Ali distributed tracking system – Hawk-eye. The principle is to monitor the interface performance, link tracing, and error diagnosis of the application program through the application program’s probe burying point. Among them, application monitoring can also capture the performance and state of containers and operating systems through probes, so it can also be used to monitor the performance of containers and operating systems.
  • Operating system: mainly by cloud monitoring host monitoring to achieve. Cloud monitoring Host The cloud monitoring service provides system monitoring services for users by installing plug-ins on servers. You can use the host monitoring service to query server resource usage and monitoring data during troubleshooting. Server Whether ali cloud server ECS, or other cloud vendors’ servers or physical machines, can use the host monitoring service.
  • Other Aliyun PaaS and SaaS services: including RDS, OSS, MQ, cache, etc. This kind of product has its own monitoring, but users can also be practical monitoring and cloud monitoring products as the monitoring entrance to monitor. The focus of the two products is different, among which:

    • Real-time service monitoring mainly obtains external service data from the embedded point of the client on the application side. The obtained performance data is the real status of the application. The service invocation time includes not only the server response time of the external service, but also the intermediate network delay.
    • Cloud monitoring mainly obtains external service data from the performance data of Ali cloud service side. The obtained performance data is the server response time of external service, excluding the intermediate network delay. Although it cannot truly reflect the status of application side, it can be used to effectively eliminate potential problems of server side.

About Service Monitoring

Another thing about some of the APM scenarios is business level monitoring, or business monitoring for short. Why do I need service monitoring?

  1. Most of the time, a partial application failure does not directly reflect the impact on the service. For example, some interfaces of a critical level go down. Due to some application circuit breakers or caching mechanisms, it may not have a direct impact on the service revenue (transactions, orders, etc.). Therefore, IT system monitoring and fault classification often do not depend on a single system failure, but on business indicators.
  2. Business-level monitoring helps IT systems optimize the business in turn. For example, for the operation analysis of an IT e-commerce company, through business monitoring, we can analyze the geographical distribution of sellers, the distribution of operators and the dynamic inventory of sellers, and make real-time statistics of best-selling categories, which in turn can help businesses make real-time data decisions.

In view of the above, several APM products of Ali Cloud actually support monitoring at the business level to varying degrees. Among them:

  • Business custom monitoring function of real-time monitoring by the user’s application of log data, page data push, or even different data sources, such as the message queue by real-time calculation pre-aggregated to different dimensions of data stored in a sequence database, and to provide users with interactive dynamic visualization and alarm strategy, the market known user scenarios, including air travel electric company, all kinds of business networking.
  • The log monitoring function of cloud monitoring collects statistics on the log content of the user log service to map various service trays for users.
  • Mobile data analysis Collects statistics on mobile service usage based on logs reported by mobile devices.

Aliyun APM solution map

The following table summarizes aliyun APM solutions.



The original link