Originally Exploring Prometheus Go Client Metrics has been modified.

In this article, I’ll explore the Prometheus Go client metrics exposed by client_Go via promhttp.handler (). These metrics will help you better understand how Go works.

Want to learn more about Prometheus? Take Monitoring Systems and Services with Prometheus, a great course to get you started.

Let’s start with a simple program that registers a PROM handler and listens on port 8080:


package main

import (
    "log"
    "net/http"

    "github.com/prometheus/client_golang/prometheus/promhttp"
)

func main(a) {
    http.Handle("/metrics", promhttp.Handler())
    log.Fatal(http.ListenAndServe(": 8080".nil))}Copy the code

When you request a metric endpoint, you will see something like the following:

# HELP go_gc_duration_seconds A summary of the GC invocation durations. # TYPE go_gc_duration_seconds summary Go_gc_duration_seconds {quantile="0"} 3.5101E-05 # HELP GO_goroutines Number of goroutines that currently exist. # TYPE go_goroutines gauge go_goroutines 6 ... process_open_fds 12 # HELP process_resident_memory_bytes Resident memory size in bytes. # TYPE Process_resident_memory_bytes gauge process_resident_memory_bytes 1.1272192e+07 # HELP process_virtual_memory_bytes # TYPE process_virtual_memory_bytes gauge process_virtual_memory_bytes 4.74484736e+08Copy the code

At initialization, client_golang registers two Prometheu collectors:

  • Process collector – Used to collect basic Linux process information, such as CPU, memory, file descriptor usage, and startup time.

  • Go collector – Used to collect information about the Go runtime, such as GC, gouroutine, and the number of OS threads.

Process collector

This collector is used to read the Proc file system. The Proc file system exposes the kernel’s internal data structures for obtaining system information.

For example, the Prometheus client reads the /proc/pid/stat file and gets something like this:

1 (sh) S 0 1 1 34816 8 4194560 674 43 9 1 5 0 0 0 20 0 1 0 89724 1581056 209 18446744073709551615 94672542621696 94672543427732 140730737801568 0 0 0 2637828 65538 1 0 0 17 30 0 0 0 0 0 0 0 94672545527192 94672545542787 94672557428736 140730737807231 140730737807234 140730737807234 140730737807344 0Copy the code

You can obtain a readable version of the above information by running cat /proc/pid/status.

Process_cpu_seconds_total – This metric is used to calculate utime(the number of ticks the Go process executes in user mode) and Stime (the number of ticks the Go process executes in kernel mode, such as system calls) in jiffies, Jiffy describes the ticking time between two system timer outages. Process_cpu_seconds_total equals the sum of utime and stime divided by USER_HZ. This makes sense, because dividing the total number of program clicks by Hz (the number of clicks per second) is the total time (in seconds) that the operating system took to run the process

Process_virtual_memory_bytes – That is, Virtual Set Size (VSS). VSS refers to the Virtual memory Set. It is the total allocated memory, including allocated but unused memory, shared memory, and swapped out memory.

Process_resident_memory_bytes – that is, Resident Set Size (RSS). RSS refers to the Resident memory Set, which is the memory actually used by the process. It does not include allocated but unused memory or paged out memory pages, but does include shared memory.

Process_start_time_seconds – it uses start_time, which describes the time when the process started, in jiffies, with data from /proc/stat. Finally, divide start_time by USER_HZ to get the value in seconds.

Process_open_fds – obtained by counting the number of files in the /proc/pid/fd directory. It shows the total number of regular files, sockets, and pseudo terminals currently open by the Go process.

Process_max_fds – Read /proc/{PID}/limits file,Max Open Files line value obtained, the value is soft limit. Soft limits are values that the kernel enforces for the resource, and hard limits serve as upper limits for soft limits.

Setrlimit(syscall.rlimit_nofile, & syscall.rlimit {Cur: 9, Max: 10}) to set the maximum number of open files in Go.

Go the collector

Most of the metrics for Go Collector come from the Runtime and Runtime/DEBUG packages.

Go_goroutines – obtained by the runtime.numgoroutine () call, which is calculated based on the scheduler structure Sched and the global allglen variable. Because all the fields of the Sched structure may change concurrently, the calculated value is finally checked to see if it is less than 1, and if it is less than 1, 1 is returned.

Go_threads – by the runtime. CreateThreadProfile () call, it reads the allm global variables. If you don’t know what M or G is, you can read my blog post.

Go_gc_duration_seconds – The data comes from a call to debug.readgcSTATS (), which sets the PauseQuantile field of the GCStats structure to 5, This function will return the minimum, 25%, 50%, 75%, and maximum percentiles of GC pause times. The Prometheus Go client then creates a summary type indicator based on the returned percentile of GC pause time, as well as the NumGC and PauseTotal variables.

Go_info – This metric gives us the Go version information. This metric data is from Runtime.version ().

memory

The Go collector provides a number of metrics regarding memory and GC. All memory metrics come from Runtime.readmemStats (), which provides us with metrics for the MemStats structure.

What worries me is that runtime.readmemStats () will STW(stop-the-world). So I wonder what the actual cost of this pause is? During stop-the-world pauses, all goroutines are paused so that the GC can run. I may compare applications with or without the Prometheus Go client in a future article.

We have seen from above that Linux provides us with RSS/VSS metrics for memory statistics, so it is natural to wonder whether we should use the metrics provided by MemStats or those provided by RSS/VSS?

The advantage of using RSS and VSS is that it is based on Linux primitives and programming language-independent. In theory you can detect any program to know how much memory it is consuming, and you can ensure consistent naming of indicators such as process_virtual_memory_bytes and process_resident_memory_bytes in the Prometheus Go client.

But in practice, the Go process takes up a lot of virtual memory up front when it starts, and a simple program like the one above takes up 544MiB of VSS on my machine (x86_64 Ubuntu), which is a bit confusing, while RSS is around 7Mib, which is closer to actual usage.

Using metrics based on the Go runtime can provide more fine-grained information about what is happening in the running application. This makes it easier to find out if your program has a memory leak, how long GC is taking, and how much memory has been reclaimed. In addition, it points you in the right direction when you optimize your program’s memory allocation.

I didn’t Go into the details of how Go’s GC and memory models work, they are part of the concurrency model. This part is new to me. Let’s take a look at these metrics:

Go_memstats_alloc_bytes – This metric shows how many bytes of memory are allocated to objects on the heap. This value is the same as go_memstatS_HEAP_alloc_bytes. This metric includes the amount of memory consumed by all reachable heap objects and unreachable objects that have not yet been freed by GC.

Go_memstats_alloc_bytes_total – This metric increases as the object is allocated in the heap but does not decrease when the object is released. I find it useful because it only increases, similar to Prometheus’ counter type, for which we can use rate() to get the memory consumption rate.

Go_memstats_sys_bytes – This metric measures how many bytes of memory Go has fetched from the system.

Go_memstats_lookups_total – This is a counter value that counts how many Pointers are dereferenced. We can use the rate() function to calculate the pointer dereference rate.

Go_memstats_mallocs_total – this is a counter value that shows how many heap objects are allocated. We can use the rate() function to calculate the heap object allocation rate.

Go_memstats_frees_total – This is a counter value indicating that more than one heap object has been freed. We can use the rate() function to calculate the heap object release rate. We can get the number of surviving heap objects by go_memstats_mallocs_total -go_memstats_frees_total.

Go manages memory in the form of spans, which are contiguous memory Spaces of 8K size or greater. There are three types of spans:

  1. Free SPAN – This span does not hold any objects that can be released back to the operating system, reused for heap allocation, or reused for stack memory.

  2. Span being used – there is at least one heap object on the span.

  3. Stack SPAN – This span is used for the Goroutine stack. This type of SPAN can be used for both heap and stack, but not for both heap and stack allocation.

Heap memory metric

Go_memstats_heap_alloc_bytes – Similar to the go_memstats_alloc_bytes indicator.

Go_memstats_heap_sys_bytes – This indicator displays the number of memory bytes allocated to the heap from the operating system. It includes virtual address Spaces that are reserved but not yet used.

Go_memstats_heap_idle_bytes – Displays the number of memory bytes occupied by an idle SPAN.

Go_memstats_heap_idle_bytes minus go_memstats_heap_RELEased_bytes can be used to estimate how much memory can be freed or not. However, this memory is maintained by the Go Runtime and is not necessarily returned to the OS. So that it can be used quickly to allocate objects on the heap.

Go_memstats_heap_inuse_bytes – Displays the number of bytes occupied by the span inuse.

You can estimate how much of the allocated heap memory is unused by subtracting go_memstats_heap_alloc_bytes from go_memstats_heap_inuse_bytes

Go_memstats_heap_released_bytes – Displays how many free spans have been returned to the OS.

Go_memstats_heap_objects – Shows how many objects are allocated on the heap, which changes as the GC and new objects are allocated.

Stack memory index

Go_memstats_stack_inuse_bytes – Displays the amount of memory used on the stack memory span where at least one stack object is allocated.

Go_memstats_stack_sys_bytes – Shows how many bytes of stack memory were taken from the OS. It is go_memstats_stack_inuse_bytes plus the OS thread stack.

The Prometheus Go client does not provide go_memstatS_STACK_IDLE_bytes because the unused stack SPAN is counted as go_memstats_heAP_IDLE_bytes.

Off-heap memory metric

The out-of-heap memory metric is an indicator of how much memory is allocated for Go runtime internal structures that are not allocated on the heap because they implement the heap.

Go_memstats_mspan_inuse_bytes – Displays the size of memory used by the MSPAN structure.

Go_memstats_mspan_sys_bytes – Displays the amount of memory allocated from the operating system for the MSPAN structure.

Go_memstats_mcache_inuse_bytes – Displays the memory size used by the McAche structure.

Go_memstats_mcache_sys_bytes – Displays the amount of memory allocated from the operating system for the McAche structure.

Go_memstats_buck_hash_sys_bytes – Displays how much memory hash tables are used for profiling.

Go_memstats_gc_sys_bytes – Displays the memory usage of garbage collection metadata.

Go_memstats_other_sys_bytes – Displays how much memory is used for other run-time allocations.

Go_memstats_next_gc_bytes – Displays how much memory the heap occupies for the next GC cycle. The goal of GC is to ensure that go_memstatS_HEAP_ALloc_bytes is less than this value.

Go_memstats_last_gc_time_seconds – Timestamp of the last GC completion.

Go_memstats_last_gc_cpu_fraction – Displays the percentage of CPU time that GC has occupied since the program started. You can also view this indicator when setting the environment variable GODEBUG=gctrace=1.

Analysis based on data

The Prometheus Go client provides a number of metrics that I think the best way to learn about is to use it, so I’ll use the same program I used at the beginning of this article and get /metrics endpoint data, some of which are shown below:

Process_resident_memory_bytes 1.09568e+07 process_virtual_memory_bytes 6.46668288e+08 go_memstats_heap_alloc_bytes 2.2434e +06 go_memstats_heap_idle_bytes 6.3643648e+07 go_memstats_heap_inuse_bytes 3.039232e+06 go_memstats_heap_objects 6498 go_memstats_heap_released_bytes 0 go_memstats_heap_sys_bytes 6.668288e+07 go_memstats_lookups_total 0 Go_memstats_frees_total 12209 go_memstats_mallocs_total 18707 go_memstats_buck_hash_sys_bytes 1.443899e+06 go_memstats_mcache_inuse_bytes 6912 go_memstats_mcache_sys_bytes 16384 go_memstats_mspan_inuse_bytes 25840 Go_memstats_mspan_sys_bytes 32768 go_memstats_other_sys_bytes 1.310909e+06 go_memstats_stack_inuse_bytes 425984 Go_memstats_stack_sys_bytes 425984 go_memstats_sys_bytes 7.2284408e+07 go_memstats_next_gc_bytes 4.194304e+06 Go_memstats_gc_cpu_fraction 1.421928536233557e-06 go_memstats_gc_sys_bytes 2.371584e+06 go_memstats_last_gc_time_seconds 1.5235057190167596 e+09Copy the code

According to the above indicators, we converted to get more readable data:

RSS = 1.09568e+07 = 10956800 bytes = 10700 KiB = 10.4 MiB VSS = 6.46668288e+08 = 646668288 bytes = 631512 KiB = 616.7 MiB heap_alloc_bytes = 2.24344e+06 = 2243440 = 2190 KiB = 2.1 MiB heap_inuse_bytes = 3.039232e+06 = 3039232 = 2968 KiB = 2,9 MiB heap_idle_bytes = 6.3643648e+07 = 63643648 = 62152 KiB = 60.6 MiB heap_released_bytes = 0 heap_sys_bytes = 6.668288e+07 = 66682880 = 65120 KiB = 63.6 MiB frees_total = 12209 mallocs_total = 18707 mspan_inuse_bytes = 25840 = 25.2 KiB mSPAN_SYS_bytes = 32768 = 32 KiB McAche_inuse_bytes = 6912 = 6.8 KiB McAche_sys_bytes = 16384 = 12 KiB Buck_hash_sys_bytes = 1.443899e+06 = 1443899 = 1410 KiB = 1.4 MiB gc_sys_bytes = 2.371584e+06 = 2371584 = 2316 KiB = 2.3 MiB other_sys_bytes = 1.310909e+06 = 1310909 = 1280,2 KiB = 1.3MiB stack_inuse_bytes = 425984 = 416 KiB stack_sys_bytes = 425984 = 416 KiB sys_bytes = 7.2284408e+07 = 72284408 = 70590.2 KiB = 68.9 MiB next_gc_bytes = 4.194304e+06 = 4194304 = 4096KIb = 4 MiB GC_CPU_Fraction = 1.421928536233557E-06 = 0.000001Copy the code

The interesting thing is that heap_inuse_bytes are more numerous than heap_alloc_bytes. I personally think heap_alloc_bytes is the number of bytes of the object, and heap_inuse_bytes is the number of memory bytes of span. Divide heap_inuse_bytes by the size of span: 3039232/8192 = 371 span.

Heap_inuse_bytes minus heap_alloc_bytes displays the available memory space of the span inuse, that is, 2.9 MiB – 2.1 MiB = 0.8 MiB. This means that we can allocate 0.8 MiB of objects on the heap without using the new SPAN. Note the presence of memory fragmentation. Imagine if there might not be 10K bytes of contiguous memory blocks in memory when creating a 10K byte slice, it would need to create new spans rather than reuse them.

Subtracting heAP_IDLE_bytes from heAP_RELEased_BYTE shows that we have approximately 60.6 MiB of unused spans that are retained from the operating system and can be returned to the operating system. It has 63643648/8192 = 7769 spans.

The heAP_SYS_bytes size is 63.6MiB, which is the maximum size of the heap with 66682880/8192 = 8140 spans.

Mallocs_total shows that we allocated 18,707 objects and freed 12,209 (go_memstats_frees_total). So now we have 18,707-12,209 = 6,498 objects. We can divide heap_alloc_bytes by 6498 to get an average memory size of 2243440/6498 = 345.3 bytes.

The sys_bytes size should be the sum of all *sys indicators, that is

sys_bytes == mspan_sys_bytes + mcache_sys_bytes + buck_hash_sys_bytes + gc_sys_bytes + other_sys_bytes + stack_sys_bytes + heap_sys_bytes

Verify with the above numbers: 72284408 == 32768 + 16384 + 1443899 + 2371584 + 1310909 + 425984 + 66682880, which is 72284408 == 72284408, we found a perfect match.

One interesting detail about SYS_bytes is that its size is 68.9 MiB, compared to 616.7MiB for VSS and 10.4 MiB for RSS. That means the numbers don’t match. As I understand it, a portion of our memory may be in the OS’s memory pages, which are in the swap or file system (not in RAM), which explains why RSS is smaller than sys_bytes. And VSS contains many things, such as mapped LIBC, PThreads library, etc. You can see what is being mapped from the /proc/pid/maps and /proc/pid/smaps files.

Gc_cpu_fraction runs very low, with only 0.000001 CPU time devoted to GC. It’s really cool.

Next_gc_bytes shows that the GC goal is to keep HEAP_ALloc_bytes below 4 MiB, since heap_ALloc_bytes is currently 2.1 MiB, indicating that the GC goal has been achieved.

Further reading

  • Tldp.org/LDP/Linux-F…
  • www.kernel.org/doc/Documen…
  • Man7.org/linux/man-p…
  • Linux.die.net/man/2/setrl…
  • golang.org/ref/mem