Not long after I recently worked at Splunk, a colleague approached me on Slack and asked me about an earlier post about the Kubernetes Metrics.

His question was about which “memory usage “indicator OOMKiller uses in a container to determine whether a container should be killed. The argument I made in that article is.

You might think it would be easy to use container_memory_usage_bytes to track memory utilization, but this metric also includes cached (think file system cache) data, which can be expelled under memory pressure. A better indicator is container_memory_working_set_bytes, because that’s what OOMKiller is concerned about.

This is the core argument in this article, so I decided THAT I needed to simulate this behavior. Let’s see what OOMKiller is looking at.

I made a little thing that would keep allocating memory until OOMKiller came in and killed the container in the POD.

package main

import (
 "fmt"
 "net/http"
 "time"

 "github.com/prometheus/client_golang/prometheus/promhttp"
)

func main() {
 memoryTicker := time.NewTicker(time.Millisecond * 5)
 leak := make(map[int][]byte)
 i := 0

 go func() {
  for range memoryTicker.C {
   leak[i] = make([]byte, 1024)
   i++
  }
 }()
 
 http.Handle("/metrics", promhttp.Handler())
 http.ListenAndServe(":8081", nil)
}
Copy the code

Deploying it in Minikube and setting both the memory request and limit to 128MB, we can see that container_memory_usage_bytes and container_memory_working_set_bytes track each other almost 1:1. When they all reach the limit set on the container, OOMKiller kills the container and the process restarts.

Since container_memory_usage_bytes also tracks the file system cache used by the process, I optimized the gadget to write data directly to a file on the file system.

package main

import (
 "fmt"
 "net/http"
 "time"

 "github.com/prometheus/client_golang/prometheus/promhttp"
)

func main() {

 memoryTicker := time.NewTicker(time.Millisecond * 5)
 leak := make(map[int][]byte)
 i := 0

 go func() {
  for range memoryTicker.C {
   leak[i] = make([]byte, 1024)
   i++
  }
 }()

 fileTicker := time.NewTicker(time.Millisecond * 5)
 go func() {
  os.Create("/tmp/file")
  buffer := make([]byte, 1024)
  defer f.Close()

  for range fileTicker.C {
   f.Write(buffer)
   f.Sync()
  }
 }()

 http.Handle("/metrics", promhttp.Handler())
 http.ListenAndServe(":8081", nil)
}
Copy the code

With the introduction of file system caching, we started to see container_memory_usage_bytes and container_memory_working_set_bytes diverging

Now the interesting thing is that containers are still not allowed to use more memory than the container limit, but OOMKiller container_memory_working_set_bytes will only kill the container if it reaches its memory limit.

Another interesting aspect of this behavior is that container_memory_usage_bytes peaked at the container’s memory limit, but data continues to be written to disk.

If we look at container_memory_cache again, we see that the amount of cache used continues to increase until container_memory_usage_bytes reaches its limit, and then begins to decrease.

From this experiment, we can see that,container_memory_usage_bytesIt does occupy some of the file system pages that are being cached. We can also see that,OOMKillerAre trackingcontainer_memory_working_set_bytes. This makes sense, because cached pages from a shared file system can be ejected in memory at any time.It doesn't make sense for us to kill processes just for using disk I/O.


Thank you for your attention to “Cloud native Xiao Bai”