This article has participated in the activity of “New person creation Ceremony”, and started the road of digging gold creation together.

Micrometer provides a common API for performance monitoring data collection for JVM-based applications, supporting multiple metric types that can be used to observe, alert, and respond to the current state of the application.

Service metric data collected by Micrometer can be published to Prometheus by adding the following dependencies.

<dependency>
  <groupId>io.micrometer</groupId>
  <artifactId>micrometer-registry-prometheus</artifactId>
  <version>${micrometer.version}</version>
</dependency>
Copy the code

Of course, if you are not sure which monitoring system to access, you can also rely on Micrometer-Core directly and create a SimpleMeterRegistry.

Can be connected to the monitoring system

Micrometer has a set of modules that contain various monitoring system implementations, each of which is called Registry.

Before we dive into Micrometer, let’s take a look at three important features of the monitoring system:

  • Dimensionality: Describes whether the system supports the multidimensional data model.

    Dimensional Hierarchical
    AppOptics, Atlas, Azure Monitor, Cloudwatch, Datadog, Datadog StatsD, Dynatrace, Elastic, Humio, Influx, KairosDB, New Relic, Prometheus, SignalFx, Sysdig StatsD, Telegraf StatsD, Wavefront Graphite, Ganglia, JMX, Etsy StatsD
  • Rate Aggregation: Refers to the Aggregation of a group of samples over a specified time interval. One method is to perform rate aggregation on the client before sending indicator data, and the other method is to directly send the aggregation value.

    Client-side Server-side
    AppOptics, Atlas, Azure Monitor, Datadog, Elastic, Graphite, Ganglia, Humio, Influx, JMX, Kairos, New Relic, all StatsD flavors, SignalFx Prometheus, Wavefront
  • Publishing: it describes how to publish indicator data. One way is that the client periodically pushes data to the monitoring system, and the other way is that the monitoring system adjusts the client interface to pull data in idle time.

    Client pushes Server polls
    AppOptics, Atlas, Azure Monitor, Datadog, Elastic, Graphite, Ganglia, Humio, Influx, JMX, Kairos, New Relic, SignalFx, Wavefront Prometheus, all StatsD flavors

Registry

Meter is an interface for collecting application metric data. All Meters in Micrometer are created and managed through MeterRegistry, and every monitoring system supported by Micrometer has a corresponding MeterRegistry implementation.

The simplest Register is SimpleMeterRegistry (automatically assembled in spring-based applications), which keeps the latest value of each meter in memory but does not publish it anywhere.

MeterRegistry registry = new SimpleMeterRegistry();
Copy the code

Composite Registries

Micrometer provides a CompositeMeterRegistry that allows developers to simultaneously publish metrics data to multiple monitoring systems by adding multiple Registries.

CompositeMeterRegistry composite = new CompositeMeterRegistry();

Counter compositeCounter = composite.counter("counter");
// The increment statement waits until the CompositeMeterRegistry registers a Registry.
// the counter counter is 0
compositeCounter.increment(); (1)

SimpleMeterRegistry simple = new SimpleMeterRegistry();
// counter Counters are registered with the Simple Registry
composite.add(simple); (2)

// Simple Registry counter is incremented along with counters of other registries in the CompositeMeterRegistry
compositeCounter.increment(); (3)
Copy the code

Global Registry

Micrometer provides a globalRegistry, metrics.globalregistry, which is also a CompositeMeterRegistry that provides a set of methods for building meters.

public class Metrics {
    public static final CompositeMeterRegistry globalRegistry = new CompositeMeterRegistry();
    private static final More more = new More();

    /** * when using Metrics. Counter (...) After building meters like that, you can add Registry to globalRegistry * These meters will be added to each Registry * *@param registry Registry to add.
     */
    public static void addRegistry(MeterRegistry registry) {
        globalRegistry.add(registry);
    }

    /**
     * Remove a registry from the global composite registry. Removing a registry does not remove any meters
     * that were added to it by previous participation in the global composite.
     *
     * @param registry Registry to remove.
     */
    public static void removeRegistry(MeterRegistry registry) {
        globalRegistry.remove(registry);
    }

    /**
     * Tracks a monotonically increasing value.
     *
     * @param name The base metric name
     * @param tags Sequence of dimensions for breaking down the name.
     * @return A new or existing counter.
     */
    public static Counter counter(String name, Iterable<Tag> tags) {
        returnglobalRegistry.counter(name, tags); }... }Copy the code

The custom Registry

Micrometer provides us with a number of Registry out of the box that can be used for most business scenarios. Users can also customize Registry based on actual scenarios.

Typically we can create a custom Registry by inheriting MeterRegistry, PushMeterRegistry, or StepMeterRegistry.

// Customize registry config
public interface CustomRegistryConfig extends StepRegistryConfig {

  CustomRegistryConfig DEFAULT = k -> null;

  @Override
  default String prefix(a) {
    return "custom"; }}// Customize registry
public class CustomMeterRegistry extends StepMeterRegistry {

  public CustomMeterRegistry(CustomRegistryConfig config, Clock clock) {
    super(config, clock);

    start(new NamedThreadFactory("custom-metrics-publisher"));
  }

  @Override
  protected void publish(a) {
    getMeters().stream().forEach(meter -> System.out.println("Publishing " + meter.getId()));
  }

  @Override
  protected TimeUnit getBaseTimeUnit(a) {
    returnTimeUnit.MILLISECONDS; }}/ * * * * /
@Configuration
public class MetricsConfig {

  @Bean
  public CustomRegistryConfig customRegistryConfig(a) {
    return CustomRegistryConfig.DEFAULT;
  }

  @Bean
  public CustomMeterRegistry customMeterRegistry(CustomRegistryConfig customRegistryConfig, Clock clock) {
    return newCustomMeterRegistry(customRegistryConfig, clock); }}Copy the code

Meters

Micrometer supports many types of gauges, This includes Timer, Counter, Gauge, DistributionSummary, LongTaskTimer, FunctionCounter, FunctionTimer, and TimeGauge.

In Micrometer, a meter is uniquely identified by name and dimensions (also known as “tags”, or Tag tags in the API). Introducing the concept of dimension is convenient for us to study the separation of a certain index data with finer granularity.

Naming Meters

Each monitoring system has its own naming style, and naming rules may not be compatible between different systems. The naming convention adopted by Micrometer is through. To separate lowercase words. In Micrometer, different implementations of each monitoring system will incorporate this. The naming style of separated words is converted to the naming convention recommended by each monitoring system, and the special characters prohibited in the naming are also removed.

// Micrometer naming convention
registry.timer("http.server.requests");

// Prometheus naming convention
registry.timer("http_server_requests_duration_seconds");

// Atlas naming convention
registry.timer("httpServerRequests");

// Graphite naming convention
registry.timer("http.server.requests");

// InfluxDB naming convention
registry.timer("http_server_requests");

Copy the code

Of course, we can override the default NamingConvention rules by implementing the NamingConvention interface:

registry.config().namingConvention(myCustomNamingConvention);
Copy the code

Tag Naming

For Tag naming, it is recommended to use the same method as meter to separate lowercase words with dots, which also helps to convert the naming style to the naming mode recommended by each monitoring system.

Recommend writing

registry.counter("database.calls"."db"."users")
registry.counter("http.requests"."uri"."/api/users")
Copy the code

This naming gives us enough contextual semantics to analyze the data, assuming that if we only analyze the data by name, the resulting data will make sense. For example, select Database. calls and we can get access to all databases. If you want to further analyze, you can filter the data further by using the Tag DB.

The wrong sample

registry.counter("calls"."class"."database"."db"."users");

registry.counter("calls"."class"."http"."uri"."/api/users");
Copy the code

Again, if you look at the data just through the name attribute calls, you get all the metrics that contain the DB access and HTTP call. Obviously this kind of data is meaningless for us to analyze the production problem, and we need to further select the class tag to refine the data dimension.

Common Tags

Common tags are Registry level tags that are applied to all metrics reported to the monitoring system. These tags are usually attributes of the system dimension, such as host, instance, region, stack information, and so on.

registry.config().commonTags("stack"."prod"."region"."us-east-1");
registry.config().commonTags(Arrays.asList(Tag.of("stack"."prod"), Tag.of("region"."us-east-1"))); // equivalently
Copy the code

Common tags must be added to Registry before any meter can be added.

Tag Values

First, tag values cannot be empty.

In addition, we need to normalize the tag value and limit its possible values. For example, you can set the response value of 404 exceptions in HTTP requests to uniformly return NOT_FOUND. Otherwise, the measurement dimension of indicator data will increase as the number of such exceptions increases, resulting in discrete indicator data that should be aggregated.


Meter Filters

The Meter Filter is used to control the timing of Meter registration and what types of statistics can be published. We can configure filters for each Registry.

Filters provide the following three basic functions:

  • Reject/acceptmeterRegistration.
  • changemeterID information ofio.micrometer.core.instrument.Meter.Id)
  • For certain typesmeterConfigure distribution statistics.
registry.config()
    // Multiple filter configurations take effect in sequence
    .meterFilter(MeterFilter.ignoreTags("too.much.information"))
    .meterFilter(MeterFilter.denyNameStartsWith("jvm"));
Copy the code

Reject/acceptMeters

Use to configure meters to accept only the specified form, or to block certain meters.

new MeterFilter() {
    @Override
    public MeterFilterReply accept(Meter.Id id) {
       if(id.getName().contains("test")) {
          return MeterFilterReply.DENY;
       }
       returnMeterFilterReply.NEUTRAL; }}public enum MeterFilterReply {
    Registry will return a NOOP version of the meter (e.g. NoopCounter, NoopTimer)
    DENY,

    // When no filter returns DENY, the meter registration process moves forward
    NEUTRAL,

    // Indicates that the meter is successfully registered, and there is no need to continue the "ask" for other filter accept(...). methods
    ACCEPT
}


Copy the code

MeterFilter provides some common methods for Meter deny/ Accept policies:

  • accept(): Accept allmeterRegister, any filter after this method is invalid.
  • accept(Predicate<Meter.Id>): Receives a packet that meets the specified conditionmeterRegistration.
  • acceptNameStartsWith(String): Receives name starting with a specified charactermeterRegistration.
  • deny(): Reject allmeterAny filter after this method is invalid.
  • denyNameStartsWith(String): Rejects all names that start with a specified stringmeterRegistration request for.
  • deny(Predicate<Meter.Id>): Refusing to meet a specific conditionmeterRegistration request for.
  • maximumAllowableMetrics(int): When registeredmetersWhen the number reaches the allowed registration limit, all subsequent registration requests are rejected.
  • maximumAllowableTags(String meterNamePrefix, String tagKey, int maximumTagValues, MeterFilter onMaxReached): Set atagsThe upper limit at which subsequent registration requests are rejected.
  • denyUnless(Predicate<Meter.Id>): Whitelist mechanism that rejects all entries that do not meet a given criteriameterRegistration request for.

changeMeterThe ID of the information

new MeterFilter() {
    @Override
    public Meter.Id map(Meter.Id id) {
       if(id.getName().startsWith("test")) {
          return id.withName("extra." + id.getName()).withTag("extra.tag"."value");
       }
       returnid; }}Copy the code

Common methods:

  • commonTags(Iterable<Tag>): Adds a set of common tags for all metrics. It is generally recommended that developers add common tags to the application name, host, region, and other information.
  • IgnoreTags (String)...: used from allmeterTo remove the specified tag key. For example, when we find that a tag has a high base and has put pressure on the monitoring system, we can use this method to quickly reduce the pressure on the system without changing all the detection points at once.
  • ReplaceTagValues (String tagKey, Function<String, String> replacement, String... exceptions): Replaces all tag values that meet the specified conditions. This way you can specify the base size of a tag.
  • renameTag(String meterNamePrefix, String fromTagKey, String toTagKey): renames all those named with the given prefixmetricThe tag of the key.

Configure distribution statistics

new MeterFilter() { @Override public DistributionStatisticConfig configure(Meter.Id id, DistributionStatisticConfig config) { if (id.getName().startsWith(prefix)) { return DistributionStatisticConfig. Builder () / / ID name requests to specify prefix index statistics histogram information. PublishPercentiles (0.9, 0.95). The build (). The merge (config); } return config; }};Copy the code

The rate of polymerization

Rate aggregation can be done on the client side before metric data is published, or it can be done temporarily on the server side as part of a server query. Micrometer can be tailored to the style of each monitoring system

Not all metrics need to be viewed as a rate to publish or view. For example, neither the gauge value nor the number of active tasks in a long-term scheduled task are rates.

Server-side Aggregation

Monitoring systems that perform server-side rate calculations are expected to report count absolute values at every publish interval. For example, the absolute count and total of all increments produced by the counter counter at each publishing interval since application startup.

When the service restarts, the counter count drops to zero. Once the new service instance is successfully started, the rate aggregation graph will return to a value around 55.

The figure below shows a counter without rate aggregation. This counter is almost useless because it only reflects the growth rate of counter over time.

Contrast can be found through the above shown, if in the actual production environment, we achieved zero downtime deployment deployment (for example, red and black), you can by setting the minimum alarm threshold rate of polymerization curve to implement service exception monitoring (zero downtime deployment environment without having to worry about as a result of the service restart counter count down).

Client side aggregation

In practice, there are two types of monitoring systems that expect the client to complete rate aggregation before publishing indicator data.

  • Aggregate data is expected. Most of the time in a production environment we need to make decisions based on the rate of service metrics, in which case the server needs to do less computation to meet the query requirements.
  • There is little or no math in the query phase that allows us to do rate aggregation. For these systems, publishing pre-aggregated data makes a lot of sense.

Micrometer Timer records count and totalTime values, respectively. Let’s say we configure a release interval of 10s and then have 20 requests that take 100ms each. So, for the first time interval:

  1. count= 10 requests * (20 requests / 10 seconds) = 20 requests;
  2. totalTime= 10 seconds _ (20 _ 100 ms / 10 seconds) = 2 seconds.

Count indicates the service throughput information, and totalTime indicates the totalTime of all requests within the entire time range.

TotalTime/count = 2 seconds / 20 requests = 0.1 seconds/request = 100 ms/request indicates the average latency of all requests.


The index type

Counters

Counters are used to report a single number of indicators. The Counter interface allows incrementing by a fixed forward value.

When building charts and alarms using counter, we are usually most interested in the rate at which events occur over a given time interval. For example, given a queue, we can use counter to measure how quickly items are written to and removed from the queue.

Normal rand = ... ; // a random generator MeterRegistry registry = ... Counter counter = registry.counter("counter"); (1) flux.interval (duration.ofmillis (10)).dooneach (d -> {if (rate.nextdouble () + 0.1 > 0) {(2) counter.increment(); (3) } }) .blockLast(); // call counter counter = counter. Builder ("counter"). BaseUnit ("beans" of what this counter does") // optional .tags("region", "test") // optional .register(registry);Copy the code

Gauges

Gauge is used to obtain the current value. Common application scenarios include counting the number of running threads in real time.

Gauge is useful for monitoring properties that have natural upper limits. It is not suitable for counting the number of requests for an application, since the number of requests will extend indefinitely as the service life cycle increases.

Never use Gauge to measure data that could have been counted using counter.

List<String> list = registry.gauge("listGauge", Collections.emptyList(), new ArrayList<>(), List::size); (1)
List<String> list2 = registry.gaugeCollectionSize("listSize2", Tags.empty(), new ArrayList<>()); (2)
Map<String, Integer> map = registry.gaugeMapSize("mapGauge", Tags.empty(), new HashMap<>());

// maintain a reference to myGauge
AtomicInteger myGauge = registry.gauge("numberGauge", new AtomicInteger(0));

// ... elsewhere you can update the value it holds using the object reference
myGauge.set(27);
myGauge.set(11);

Copy the code

There is also a special type of Gauge-MultiGauge that can publish a set of metrics at a time.

// SELECT count(*) from job group by status WHERE job = 'dirty' MultiGauge statuses = MultiGauge.builder("statuses") .tag("job", "dirty") .description("The number of widgets in various statuses") .baseUnit("widgets") .register(registry);  . // run this periodically whenever you re-run your query statuses.register( resultSet.stream() .map(result -> Row.of(Tags.of("status", result.getAsString("status")), result.getAsInt("count"))));Copy the code

Timers

Timer is used to measure the event delay and response frequency in a short time. All Timer implementations record the total event response time and the total number of events. Timer does not support negative numbers, and if it is used to record high-volume, long-delay events, it is easy to cause the index value data to be out of bounds (longer than long.max_value).

public interface Timer extends Meter {...void record(long amount, TimeUnit unit);
    void record(Duration duration);
    double totalTime(TimeUnit unit);
}
Copy the code

The maximum statistics defined in the basic implementation of Timer (such as CumulativeTimer and StepTimer) refer to the maximum value in a time window (TimeWindowMax). If no new value is recorded in the time window range, the maximum value is reset to zero as a new time window begins.

Time window size is the default MeterRegistry define small step of growing up, but can be by DistributionStatisticConfig expiry of (…). Method explicitly set.

/ * * *@return The step size to use in computing windowed statistics like max. The default is 1 minute.
 * To get the most out of these statistics, align the step interval to be close to your scrape interval.
 */
default Duration step(a) {
    // PrometheusMeterRegistry has a default step size of one minute
    return getDuration(this."step").orElse(Duration.ofMinutes(1));
}


/ / but can be by DistributionStatisticConfig custom step length
public class DistributionStatisticConfig implements Mergeable<DistributionStatisticConfig> {
    public static final DistributionStatisticConfig DEFAULT = builder()
            .percentilesHistogram(false)
            .percentilePrecision(1)
            .minimumExpectedValue(1.0)
            .maximumExpectedValue(Double.POSITIVE_INFINITY)
            .expiry(Duration.ofMinutes(2))
            .bufferLength(3) .build(); . }public Builder expiry(@Nullable Duration expiry) {
    config.expiry = expiry;
    return this;
}

Copy the code

Timer.Sample

You can use it to count method execution time. Before the method starts execution, the timestamp of the start time is recorded by sample, and then the timer is completed by calling the stop operation when the method finishes execution.

Timer.Sample sample = Timer.start(registry);

// do stuff
Response response = ...

sample.stop(registry.timer("my.timer"."response", response.status()));
Copy the code

@Timed

@timed can be added to any method, including Web methods, to enable method timing.

@timed was not recognized in Micrometer’s Spring Boot configuration.

An AspectJ aspect is provided in Micrometer-Core that makes it possible to make the @timed annotation available on any method via Spring AOP.

@Configuration
public class TimedConfiguration {
   @Bean
   public TimedAspect timedAspect(MeterRegistry registry) {
      return newTimedAspect(registry); }}@Service
public class ExampleService {

  @Timed
  public void sync(a) {
    // @Timed will record the execution time of this method,
    // from the start and until it exits normally or exceptionally.. }@Async
  @Timed
  publicCompletableFuture<? > async() {// @Timed will record the execution time of this method,
    // from the start and until the returned CompletableFuture
    // completes normally or exceptionally.
    return CompletableFuture.supplyAsync(...);
  }

}
Copy the code

Distribution Summaries

Distributed digests record the distribution of events, similar in structure to timers, but do not record values in a time unit. For example, we can record the size of the request load that hits the server through a distributed digest.

Distributed summaries can be created in the following ways:

DistributionSummary summary = registry.summary("response.size");

DistributionSummary summary = DistributionSummary
    .builder("response.size")
    .description("a description of what this summary does") // optional
    .baseUnit("bytes") // optional (1)
    .tags("region"."test") // optional
    .scale(100) // optional (2)
    .register(registry);
Copy the code

Long Task Timers

A long task timer is a special type of timer that allows you to measure time while the detected task is still running. Normal timers only record the duration of a task when it is complete.

The long task timer collects the following statistics:

  • Number of active tasks;
  • Total duration of all active tasks;
  • Maximum duration in active tasks.

Unlike Timers, long task timers do not publish statistics about completed tasks.

Imagine a scenario where a background process periodically refreshes metadata in a database, normally within minutes. A long task timer can be used to record the total active time for refreshing data when the refresh task may take a long time in the event of a service exception.

@Timed(value = "aws.scrape", longTask = true)
@Scheduled(fixedDelay = 360000)
void scrapeResources(a) {
    // find instances, volumes, auto-scaling groups, etc...
}
Copy the code

If your framework does not support @timed, you can create a long task timer as follows.

LongTaskTimer scrapeTimer = registry.more().longTaskTimer("scrape");
void scrapeResources(a) {
    scrapeTimer.record(() => {
        // find instances, volumes, auto-scaling groups, etc...
    });
}
Copy the code

It is also important to note that if we want to trigger an alarm when the process exceeds the specified threshold, when using a long task timer, we can receive an alarm within the first reporting interval after the task exceeds the specified threshold. If you use a regular Timer, you can’t receive an alarm until the first report interval after the task ends, which may be a long time later.

Histograms

Timers and distribution summaries support collecting data to see the percentage of the data distribution. There are two ways to view the percentage:

  • Percentile histograms: Micrometer first accumulates all values into an underlying histogram and then sends a predetermined set of buckets to the monitoring system. The query language of the monitoring system is responsible for calculating the percentiles of this histogram.

    At present, only Prometheus, Atlas, and Wavefront support percentage approximations based on histograms (via histogram_quantile, : Percentile, and hs()). If you choose one of the above monitoring systems, this approach is recommended because it allows you to aggregate the histogram across dimensions and derive the percentage that can be aggregated from the histogram.

  • Client-side Percentiles: Micrometer is responsible for calculating percentage approximations under each meter ID and sending them to the monitoring system. This approach is not as flexible as Percentile histograms because it does not support cross-dimensional aggregation percentage approximation.

    However, this approach provides a degree of insight into the percentage distribution for monitoring systems that do not support server-side percentage calculations based on histograms.

Timer.builder("my.timer")
   .publishPercentiles(0.5.0.95) // Used to set percentage values calculated in the application, not aggregated across dimensions
   .publishPercentileHistogram() / / (2)
   .serviceLevelObjectives(Duration.ofMillis(100)) / / (3)
   .minimumExpectedValue(Duration.ofMillis(1)) / / (4)
   .maximumExpectedValue(Duration.ofSeconds(10))
Copy the code

Access Prometheus

Prometheus pulls metric data periodically from application instances based on a pattern of service discovery, and supports custom query languages and mathematical operations.

  1. To access Prometheus, the following Maven dependencies are first introduced:

    <dependency>
      <groupId>io.micrometer</groupId>
      <artifactId>micrometer-registry-prometheus</artifactId>
      <version>${micrometer.version}</version>
    </dependency>
    Copy the code
  2. To create the Prometheus Registry, you need to expose an HTTP endpoint to Prometheus’s scraper for pulling data.

    PrometheusMeterRegistry prometheusRegistry = new PrometheusMeterRegistry(PrometheusConfig.DEFAULT);
    
    try {
        HttpServer server = HttpServer.create(new InetSocketAddress(8080), 0);
        server.createContext("/prometheus", httpExchange -> {
            String response = prometheusRegistry.scrape(); (1)
            httpExchange.sendResponseHeaders(200, response.getBytes().length);
            try(OutputStream os = httpExchange.getResponseBody()) { os.write(response.getBytes()); }});new Thread(server::start).start();
    } catch (IOException e) {
        throw new RuntimeException(e);
    }
    Copy the code
  3. Sets the data format to pull. By default the scrape() method of PrometheusMeterRegistry returns Prometheus’s default text format. Starting with Micrometer 1.7.0, you can also specify the data format defined by OpenMetrics as follows:

    String openMetricsScrape = registry.scrape(TextFormat.CONTENT_TYPE_OPENMETRICS_100);
    Copy the code
  4. Graphical display. The metric data captured by Prometheus is shown in the Grafana panel. The following figure shows a publicly available Grafana Dashboard template (JVM-Dashboard).


How to use SpringBoot

  1. Spring Boot Actuator provides dependency management and automatic configuration for Micrometers. The following configuration needs to be introduced:

    <dependency>
      <groupId>org.springframework.boot</groupId>
      <artifactId>spring-boot-starter-actuator</artifactId>
    </dependency>
    <dependency>
      <groupId>io.micrometer</groupId>
      <artifactId>micrometer-registry-prometheus</artifactId>
    </dependency>
    <dependency>
      <groupId>org.springframework</groupId>
      <artifactId>spring-aop</artifactId>
    </dependency>
    <dependency>
      <groupId>org.aspectj</groupId>
      <artifactId>aspectjweaver</artifactId>
    </dependency>
    
    Copy the code

    Next configure Registry with the MeterRegistryCustomizer, such as configuring registry level public label attributes before the meter registers with Registry.

    @Configuration
    public class MicroMeterConfig {
    
        @Bean
        public MeterRegistryCustomizer<MeterRegistry> meterRegistryCustomizer(a) {
            return meterRegistry -> meterRegistry.config().commonTags(Collections.singletonList(Tag.of("application"."mf-micrometer-example")));
        }
    
    
        // Spring Boot cannot use @timed directly, need to introduce TimedAspect section support.
        @Bean
        public TimedAspect timedAspect(MeterRegistry registry) {
            return newTimedAspect(registry); }}@RequestMapping("health")
    @RestController
    public class MetricController {
    
        Timed(percentiles = {0.5, 0.80, 0.90, 0.99, 0.999})
        @GetMapping("v1")
        public ApiResp health(String message) {
            try {
                Thread.sleep(new Random().nextInt(1000));
            } catch (InterruptedException e) {
                e.printStackTrace();
            }
            return ApiResp.ok(new JSONObject().fluentPut("message", message));
        }
    
    
        @GetMapping("v2")
        Timed(percentiles = {0.5, 0.80, 0.90, 0.99, 0.999})
        public ApiResp ping(a) {
            return ApiResp.ok(new JSONObject().fluentPut("message"."OK")); }}Copy the code
  2. By default, Spring Boot provides a /actuator/ Promethues endpoint for pulling service indicator data. Data exposed by endpoints may contain application-sensitive data. You can configure the following parameters to limit data exposure (exclude has a higher priority than include).

    Property Default
    management.endpoints.jmx.exposure.exclude
    management.endpoints.jmx.exposure.include *
    management.endpoints.web.exposure.exclude
    management.endpoints.web.exposure.include health
  3. Start the service, visit http://localhost:8800/actuator/prometheus index data can see the following service:

  4. To configure Prometheus, add the following to Prometheus.

    # my global config
    global:
      scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
      evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
      # scrape_timeout is set to the global default (10s).
    
    # Alertmanager configuration
    alerting:
      alertmanagers:
      - static_configs:
        - targets:
          # - alertmanager:9093
    
    # Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
    rule_files:
      # - "first_rules.yml"
      # - "second_rules.yml"
    
    # A scrape configuration containing exactly one endpoint to scrape:
    # Here it's Prometheus itself.
    scrape_configs:
      # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
      - job_name: 'mf-micrometer-example'
        scrape_interval: 5s
        metrics_path: '/actuator/prometheus'
        static_configs:
          - targets: ['127.0.0.1:8800']
            labels:
               instance: 'mf-example'
    Copy the code

    Access the Prometheus console (http://localhost:9090), and you can see the status of all clients currently connected to Prometheus on the Targets page. At the same time, index data of specified conditions can be queried in Graph interface through query statement:

  5. At this point, we have finished measuring and capturing the service metrics data. Finally, we need to graphically display the data captured by Prometheus, and here we use Grafana.

    1. The data source is created first. Grafana supports multiple data source access, and we chose Prometheus here.

    2. To create dashboards, you can customize them or use official published templates such as4701Template. Import the template and select the data source we just created.As you can see, this is the result of a graphical display of the metric data, which gives you a very intuitive view of the service call volume.

Other problems

Batch job indicator fetching

In addition, for temporary or batch jobs, they may not be executed long enough for Prometheus to capture metric data. For this type of work, you can use the Prometheus Pushgateway active push data to the Prometheus (PrometheusPushGatewayManager used to manage the data pushed to Prometheus).

<dependency>
    <groupId>io.prometheus</groupId>
    <artifactId>simpleclient_pushgateway</artifactId>
</dependency>
Copy the code

Using pushgateway need to set up management at the same time. The metrics. Export. Prometheus. Pushgateway. Enabled = true.

about@Timed

As mentioned in the previous article, @timed cannot be directly used in Spring Boot according to the official document, and TimedAspect section support needs to be introduced. However, after actual testing, it is found that for SpringMVC requests, interface invocation time can be recorded without introducing TimedAspect.

By analyzing the source code, it can be found that Spring Boot Actuator has a WebMvcMetricsFilter class, which can intercept the request and determine whether @timed is added to the method and class where the interface is located.

public class WebMvcMetricsFilter extends OncePerRequestFilter {

    private void record(WebMvcMetricsFilter.TimingContext timingContext, HttpServletRequest request, HttpServletResponse response, Throwable exception) {
      Object handler = this.getHandler(request);
      // Find the @timed annotation on a class or method
      Set<Timed> annotations = this.getTimedAnnotations(handler);
      Sample timerSample = timingContext.getTimerSample();
      if(annotations.isEmpty()) {
        if(this.autoTimer.isEnabled()) {
          // no @timed, use the default configuration to construct Timer. Here metricName = "HTTP server requests"
          Builder builder = this.autoTimer.builder(this.metricName);
          timerSample.stop(this.getTimer(builder, handler, request, response, exception)); }}else {
        Iterator var11 = annotations.iterator();

        while(var11.hasNext()) {
          Timed annotation = (Timed)var11.next();
          Timed timer builder metricName="http.server.requests"
          Builder builder = Timer.builder(annotation, this.metricName);
          timerSample.stop(this.getTimer(builder, handler, request, response, exception)); }}}/** * check if the method exists@TimedAnnotations, if not on the method, continue to look for */ on the class
    private Set<Timed> getTimedAnnotations(HandlerMethod handler) {
        Set<Timed> methodAnnotations = this.findTimedAnnotations(handler.getMethod());
        return! methodAnnotations.isEmpty()? methodAnnotations:this.findTimedAnnotations(handler.getBeanType()); }}Copy the code

So, based on the above analysis, we removed the codeTimedAspectAnd then check the statistics of indicator data again:joinTimedAspectAfter the indicator data statistics, you can see the record at the same timemethod_timed_secondsandhttp_server_requests_secondsIndex data of two names. In addition, the interface time displayed by the two statistics methods has some error. In terms of the execution process, useTimedAspectThe calculation time of method is closer to the logical execution time of method itself.


Welcome to follow my wechat official account: gao Gaomu. The first time to read the latest experience sharing, communication and growth together.