Server-ports-first has some bugs, protocol detection and opaque ports in Linkerd 2.10

Protocol Detection, as the name suggests, allows Linkerd to automatically detect protocols used in TCP connections. One of Linkerd’s design principles is “Just Work”, and protocol detection is an important part of how Linkerd achieves this goal.

What is protocol detection?

In short, protocol detection is the ability to determine the protocol used on a TCP connection by examining the traffic on the connection.

Linkerd uses Protocol Detection to avoid requiring the user to specify a Protocol. Instead of requiring the user to configure a protocol for each port, the Linkered agent simply performs protocol detection to answer questions.

Linkerd’s Protocol Detection works by looking at the first few bytes of a client connection to get information about traffic. This implementation has some consequences, which we describe below.

But first, let’s answer the question of why Linkerd cares about any agreement in the first place.

Observability, reliability, and safety

We typically classify Linkerd’s broad capabilities into three categories: Observability, reliability, and Security. Understanding the protocols used on connections is fundamental to each category.

observability

At the heart of Linkerd’s observable capabilities is flow detection. Such instruments require knowledge of the protocols being used, as knowledge of the protocols can provide rich indicators. For example, knowing that the connection is using HTTP, Linkerd can parse requests, responses, and response codes and report metrics such as response latency, request volume, and error rates. These metrics are so valuable that they become part of what Google’s SRE book calls the “golden signal.” On the other hand, if Linkerd only knows that the connection is TCP, it is limited to recording very basic information, such as the number of bytes read and written — no further interpretation of the bytes.

The core of Linkerd’s observable characteristics is flow measurement. This detection requires an understanding of the protocol being used, as knowledge of the protocol can provide rich metrics. For example, knowing that a connection is using HTTP allows Linkerd to parse requests, responses, and response codes, and to report metrics such as response latency, request volume, and error rates. These metrics are so valuable that they are part of what Google’s SRE book calls the “golden signal.” On the other hand, if Linkerd only knows that a connection is TCP, it can only record very basic information, such as the number of bytes read and written — it has no further ability to interpret bytes.

security

Two-way TLS (mTLS) is a core feature of Linkerd. Starting with Linkerd 2.9, all TCP traffic between meshed endpoints is mTLS by default by Linkerd agents. (There are some caveats – see skip-Ports below.)

Again, it is important to understand the protocol of the connection. For example, if the connection is already TLS (for example, through an application), there is no reason to re-TLS. (Strictly speaking, TLS is a transport-layer protocol, not an application-layer protocol like HTTP, but for the purposes of this article, the distinction between the two is not important.)

reliability

Finally, understanding the protocols of the underlying connections allows Linkerd to provide sophisticated reliability capabilities. One example here is load balancing. Without knowing the connection protocol, Linkerd is limited to balancing connections: once it has established a TCP connection to the server, it cannot further manipulate that connection.

However, if Linkerd knows that the connection is HTTP, it can move from Connection balancing to Request balancing. Linkerd sets up a pool of connections across endpoints and balances requests in this pool. Because it now has access to requests and Responses, Linkerd can be very sophisticated in balancing requests; In fact, it balances requests based on the recent performance of each possible endpoint (using an indicator called the “weighted Moving Average” or EWMA) to avoid tail latency from slow endpoints.

(Linkerd is also a simple solution for load-balancing gRPC connections in Kubernetes.)

When protocol detection fails

While protocol detection is designed to allow Linkerd to “just work,” there are certain situations where it can’t: the infamous Server-preference-first protocol. These protocols (including MySQL and SMTP) work by having the client establish a connection and then wait for the server to respond. This is perfectly legal behavior from TCP’s point of view, but it means Linkerd cannot detect the protocol because the information comes from the server, not the client.

Why not simply use the server’s bytes to detect the protocol? Linkerd had not even established a connection to the server when the protocol was detected. Choosing which server to talk to is a function of the load balancer, and which load balancer to use is a function of the protocol. It was a delicious “chicken-and-egg” question with tcp-chinglish.)

To avoid this, Linkerd introduced skip-inbound-ports and skip-outbound ports configuration options. These options instruct Linkerd to completely bypass agents on certain ports by modifying the iptables rules that Linkerd uses to connect pods through its Sidecar agent. For example, annotate config.linkerd. IO /skip-outbound-ports: The PodSpec 3306 added to the workload instructs Linkerd to create an iptables rule to ensure that the Linkerd agent will never process any traffic to port 3306 (the MySQL port). Likewise, the annotation config.linkerd. IO /skip-inbound-ports: 3306 will write an iptables rule so that the agent never processes MySQL traffic sent to it.

The Skip Ports configuration

These options provide a way around the inability of Protocol Detection to handle the Server-moves-first protocol. However, they have an obvious drawback: because they completely bypass the Linkerd agent, Linkerd cannot apply mTLS or capture any metrics for these ports.

Opaque ports and improved protocol detection in Linkerd 2.10

To address skip-ports, in version 2.10, Linkerd will add the concept of opaque ports (and the corresponding opaque-ports annotation). Opaque ports are ports that Linkerd will proxy without protocol detection. While this approach still needs to be configured, marking the port as opaque allows Linkerd to apply mTLS and report TCP-level metrics — a big improvement over skipping it entirely.

Opaque Ports configuration

Linkerd 2.10 will also improve the way protocol detection works by making it “fail Open” : if the protocol detection code does not see client bytes after 10 seconds, it will treat the connection as a TCP connection and continue, rather than fail as in 2.9. This means that not using opaque-ports (or skip-ports) annotating server-speaker-first port has a worst-case behavior of a 10-second connection delay, rather than a connection failure.

conclusion

Protocol detection is one of Linkerd’s most powerful features and is the foundation of Linkerd’s “Just Works” principle. While protocol detection is not a panacea, opaque-ports introduced in Linkerd 2.10 should address most of the shortcomings of the earlier skip-ports feature and allow Linkerd users to extend mTLS across the entire Kubernetes environment, No matter what the agreement is.

Refs

Protocol Detection and Opaque Ports in Linkerd