The author | yuan b alibaba storage service technical experts

Kubernetes log output is a unified collection and analysis of log output. In Kubernetes, log collection is very different from ordinary virtual machines. It is relatively difficult to implement and more expensive to deploy. However, if properly used, it is more automated and less expensive to operate and maintain than traditional methods. This is the fourth article in the log series.

The first article: 6 typical problems in the construction of K8s log system, how many have you encountered?

Article 2: understanding the design and practice of K8s log System

Article 3:9 Tips to Solve logging Problems in K8s

Kubernetes Log collection difficulties

In Kubernetes, log collection is much more complex than traditional virtual machines and physical machines. The most fundamental reason is that Kubernetes shields the underlying abnormalities, provides more fine-grained resource scheduling, and provides a stable and dynamic environment. Therefore, log collection is faced with a richer and more dynamic environment, and there are more points to consider.

Such as:

  • For Job applications that take only a few seconds to run, how can I ensure that log collection is timely and data is not lost?
  • K8s is generally recommended to use large-size nodes. Each node can run 10-100+ containers. How to collect 100+ containers with the lowest resource consumption?
  • In K8s, applications are deployed in YAML mode, but log collection is mainly in the form of manual configuration files. How can I deploy log collection in K8s mode?
Kubernetes The traditional way
Log types File, STDout, host file, journal Documents, journal
Log the source Business container, system component, host Business, host machine
Acquisition methods Agent (Sidecar, DaemonSet), Direct writing (DockerEngine, business) The Agent, the direct writing
Number of single machine applications 10-100. 1-10
Applied dynamics high low
Node dynamics high low
Collection deployment Mode Manual, Yaml Manual and custom

Collection mode: active or passive

Log collection methods are divided into passive collection and active push. In K8s, passive collection is generally divided into Sidecar and DaemonSet, while active push includes DockerEngine push and business direct write.

  • The DockerEngine has the LogDriver function. You can configure different Logdrivers to write stDout of containers to remote storage through the DockerEngine to collect logs. This approach has low customizability, flexibility, and resource isolation, and is generally not recommended for production environments.
  • Service direct write integrates the SDK for collecting logs in an application and sends logs to the server through the SDK. This method eliminates the logic of disk collection and does not require additional Agent deployment. It has the lowest resource consumption for the system. However, due to the strong binding between the service and the log SDK, the overall flexibility is low.
  • DaemonSet runs only one logging agent on each node to collect all logs on this node. DaemonSet occupies much less resources, but its scalability and tenant isolation are limited. It is more suitable for clusters with single functions or not many businesses.
  • In Sidecar mode, a log agent is deployed for each POD. This agent collects logs of only one service application. Sidecar occupies more resources, but is flexible and isolated from multiple tenants. Therefore, you are advised to use this method in large K8s clusters or clusters that serve multiple service parties as PaaS platforms.

To sum up:

  • DockerEngine direct writing is not recommended;
  • Service direct write is recommended for scenarios with a large amount of logs.
  • DaemonSet is commonly used in small and medium clusters;
  • Sidecar is recommended for use in very large clusters.

Detailed comparison of various collection methods is as follows:

DockerEngine Business direct writing DaemonSet way Sidecars way
Collection Log Type The standard output Business log Standard output + partial file file
Deployment operations Low, native support Low, only need to maintain the configuration file Generally, DaemonSet needs to be maintained The sidecar container needs to be deployed for each POD that needs to collect logs
Classified Log Storage Unable to realize Independent Service Configuration Generally, it can be mapped by container/path etc Each POD can be configured individually for high flexibility
Multi-tenant Isolation weak Weak, log direct writing will compete with business logic for resources In general, isolation can only be done between configurations Strong, isolated by the container, resources can be allocated separately
Supported Cluster scale There is no limit on local storage. If syslog and Fluentd are used, there is a single point limit unlimited Depending on the number of configurations unlimited
Resource usage Low, docker
Engine to provide The overall lowest, save the collection overhead Lower, one container runs per node High, each POD runs one container
Query convenience Low. Only original logs can be grep High, can be customized according to service characteristics A high value allows you to customize query and statistics High, can be customized according to service characteristics
customizability low High, freely extensible low High, each POD is configured separately
The coupling High, strongly bound to DockerEngine, modification requires DockerEngine restart If the collection module is modified or upgraded, services need to be released again Low. The Agent can be upgraded independently Generally, the Sidecar service restarts when the Agent is upgraded by default. (Some extension packages support the Sidecar hot upgrade.)
Applicable scenario Test, POC, and other non-production scenarios Scenarios with high performance requirements A single-function cluster with clear log classification Large, hybrid, PAAS cluster

Log output: Stdout or file

Unlike virtual machines/physical machines, K8s containers provide both standard output and files. In the container, standard output directly prints logs to stdout or stderr, while DockerEngine takes over the stdout and stderr file descriptors and processes the logs according to the LogDriver rules configured by DockerEngine. The method for printing logs to a file is similar to that used by a VM or physical server, except that different log storage modes can be used, such as default storage, EmptyDir, HostVolume, and NFS.

Although using Stdout to print logs is officially recommended by Docker, it should be noted that this recommendation is based on the scenario where containers are only used as simple applications. In actual business scenarios, we still suggest using files as much as possible, mainly for the following reasons:

  • Stdout performance issues, from the application output Stdout to the server, there are several processes (such as the popular JSON LogDriver) : Stdout -> DockerEngine -> LogDriver -> serialize to JSON -> save to file -> Agent collection file -> parse JSON -> upload server. The entire process is much more expensive than the file. 100,000 lines of log output per second takes up an additional CPU core of The DockerEngine during pressure testing.

  • Stdout does not support classification, that is, all outputs are mixed in a stream and cannot be classified as files. Generally, an application contains AccessLog, ErrorLog, InterfaceLog (logs that call external interfaces), TraceLog, and so on. These logs have different formats and uses. If mixed in the same stream will be difficult to collect and analyze;

  • Stdout only supports the main output of the container. Daemon /fork applications cannot use Stdout.

  • File Dump supports various policies, such as synchronous and asynchronous write, cache size, file rotation, compression, and clearing policies, and is more flexible.

Therefore, we recommend that online applications use files to export logs. Stdout should only be used in single-function applications or in some K8s system/o&M components.

CICD integration: Logging Operator

Kubernetes provides a standardized business deployment mode. Yaml (K8s API) can be used to declare routing rules, expose services, mount storage, run services, define scaling rules, etc. Therefore, Kubernetes is easy to integrate with CICD system. Log collection is an important part of o&M monitoring. After services are online, all logs must be collected in real time.

The original approach was to manually deploy the log collection logic after publication, which required manual intervention and ran counter to CICD automation. To automate, some people started wrapping an auto-deployed service with an API/SDK based on log collection that would be invoked via CICD’s Webhook after publication, but this approach was expensive to develop.

In Kubernetes, the most standard way to integrate logs is to register a new resource into the Kubernetes system and manage and maintain it as an Operator (CRD). In this way, no additional development is required for the CICD system, just additional log-related configuration when deployed to the Kubernetes system.

Kubernetes log collection solution

Long before Kubernetes came along, we started to develop log collection solutions for container environments. As K8s became more stable, we began to migrate a lot of business to THE K8s platform, so we also developed a log collection solution for K8s based on the previous foundation. The main functions are:

  • Support real-time collection of all kinds of data, including container files, container Stdout, host files, Journal, Event, etc.
  • Support a variety of acquisition deployment modes, including DaemonSet, Sidecar, DockerEngine LogDriver, etc.
  • You can enrich log data, including adding Namespace, Pod, Container, Image, and Node information.
  • Stable, highly reliable, based on the Logtail acquisition Agent developed by Ali, millions of deployment instances have been deployed in the whole network.
  • CRD based extension to deploy log collection rules using Kubernetes deployment and publishing, perfect integration with CICD.

Install the log collection component

At present, this collection scheme is open to the public. We provide a Helm installation package, which includes Logtail’s DaemonSet, AliyunlogConfig’s CRD declaration and CRD Controller. After installation, DaemonSet acquisition and CRD configuration are available. The installation method is as follows:

  1. Ali Kubernetes cluster can be selected when the installation, so that the cluster will be automatically installed when the cluster is created. If it is not installed, you can manually install it.
  2. If it is self-built Kubernetes, whether it is built on Ali or in other clouds or offline, you can also use such a collection scheme, the specific installation method refer to self-built Kubernetes installation.

After the preceding components are installed, Logtail and Controller run in the cluster. However, these components do not collect logs by default. You need to configure log collection rules to collect logs of a specific Pod.

Collection rule configuration: Environment variable or CRD

In addition to manual configuration on the logging console, Kubernetes supports two additional configurations: environment variables and CRD.

  • Environment variables have been configured since the swarm era. You only need to specify the address of the data to be collected on the container environment variables to be collected. Logtail automatically collects the data to the server.

This method is simple to deploy, low cost to learn, and easy to learn; But rarely able to support the configuration of the rules, many advanced configuration mode (for example, the filter way, black and white list, etc.) don’t support, but the way this statement does not support modify/delete, every change is actually create a new collection configuration, the history of the collection configuration requires manual cleaning, otherwise there will be a great waste of resources.

  • CRD configuration is very in line with Kubernetes official recommended standard extension, so that the collection configuration is managed in the way of K8s resources, by deploying AliyunLogConfig to Kubernetes this special CRD resource to declare the data to be collected.

For example, the following example deplores a collection of container standard output, defining containers that require both Stdout and Stderr to be collected, and excluding containers that contain COLLEXT_STDOUT_FLAG: false in their environment variables.

The crD-based configuration is managed by Kubernetes standard extended resources, supports the complete semantics of adding, deleting, changing and checking configurations, and supports various advanced configurations, which is our most recommended collection configuration.

This parameter specifies the recommended configuration mode of a collection rule

In practical application scenarios, DaemonSet or the combination of DaemonSet and Sidecar are generally used. The advantage of DaemonSet is high resource utilization, but there is a problem that all logtails of DaemonSet share global configuration. A single Logtail cannot support a cluster with a large number of applications because it has the upper limit of configuration support.

The above is our recommended configuration mode, the core idea is:

  • Collect as much similar data as possible for one configuration to reduce the number of configurations and reduce the pressure of DaemonSet.
  • Sufficient resources should be provided for core application collection, and Sidecar can be used.
  • The configuration mode is CRD if possible.
  • Sidecar because each Logtail is configured separately, there is no limit on the number of configurations. This configuration is suitable for very large clusters.

Practice 1 – Small to medium sized clusters

The vast majority of Kubernetes clusters belong to small and medium-sized, there is no clear definition of small and medium-sized, the general number of applications within 500, node scale within 1000, there is no clear function of Kubernetes platform operation and maintenance. The number of applications in this scenario is not very large. DaemonSet can support all acquisition configurations:

  • The data of most business applications was collected by DaemonSet.
  • Core applications (such as order/transaction systems) collect data separately in Sidecar mode.

Practice 2 – Large Clusters

For some large/super large clusters used as PaaS platforms, the general business is more than 1000, the node scale is also more than 1000, there are special Kubernetes platform operation and maintenance personnel. In this scenario, there is no limit on the number of applications and DaemonSet cannot support it. Therefore, Sidecar method must be used. The overall planning is as follows:

  • The system component logs and kernel logs of Kubernetes platform are of relatively fixed types. These logs are collected using DaemonSet and mainly provide services for the operation and maintenance personnel of the platform.
  • Sidecar logs are collected for each service. Each service can set the destination IP address for Sidecar collection independently, providing sufficient flexibility for service DevOps personnel.

There’s an Ali team that needs you!

Cloud native application platform invites Kubernetes/ Container/Serverless/ application delivery technology domain experts (P7-P8) to join us.

  • Technical requirements: Go/Rust/Java/C++, Linux, distributed system;

  • Years of work: P7 from 3 years, P8 from 5 years, depending on actual ability;

  • Location: China (Beijing/Hangzhou/Shenzhen); Overseas (San Francisco Bay Area/Seattle).

Resume: Xining. Zj AT alibaba-inc.com.

“Alibaba Cloud originators pay close attention to technical fields such as microservice, Serverless, container and Service Mesh, focus on cloud native popular technology trends and large-scale implementation of cloud native, and become the technical circle that knows most about cloud native developers.”