preface

I haven’t written or shared technical documents for a long time. In the past two years, I have been involved in various aspects of work, from IaC exploration to SRE system construction to API Gateway design. I have seen the whole process of a cloud platform from preparation to deployment to operation. Recently, I was responsible for the construction of API gateway in the product, and when it came to cluster Management, I couldn’t bypass well-known implementations like Kubesphere and Rancher.

Online parsing of Rancher’s source code is sparse, and if there are a few, they are mostly superficial or archaic. Often make people confused, do a cluster management so simple, various gods disdain to analyze?

When I read the Rancher code in disbelief, the overall structure was not surprising. But as the work progressed and the API Gateway was actually built, I felt the gap between open source implementation and enterprise application. The open source implementation may only realize the basic CRUD, but the enterprise use, account system, authority system construction; Set up operation flow and business flow; Access to third-party teams; The construction of product SLA/SLO is the major part of the workload, and it is difficult to achieve satisfactory results by stacking open source software. In the process of building the whole technology system, methodology and architecture design become the stumbling block. If the designer does not have a certain thinking ability, it often leads to the code is constantly overthrown and rebuilt.

With these thoughts in mind, I went back to The Rancher project and wrote this series: Let’s Unbox Rancher.

This series may be written many times, covering infrastructure docking, cluster management, tunnel management, control surface and management surface, authentication and authentication, API construction and so on. It can also be regarded as my work essay, and I will write down some experiences and experiences when I have time.

The rancher architecture

First, look at the architecture of Rancher itself.

The important parts of it are

  • Authentication and authentication: what are the specific methods and principles?
  • User management: what are the specific methods and principles?
  • Cluster Controller/Agent: What are their functions and business processes?
  • Communication tunnel: How to establish and communicate?

As for what the Rancher API has and what it can do, it is too trivial for readers to refer to the Rancher V3 API.

When viewing the source code, the default reader wants to manage an existing K8S cluster, not create a cluster through the K8S service in the common cloud, or create a cluster through RKE. This is because cluster preparation involves IaaS and is beyond Rancher’s scope, even though Rancher supports k8S cluster preparation and is nothing more than an interface to call the real K8S Service.

Here are the answers to these questions, and with them, we’ll dive into Rancher’s code.

  • Authentication: The AGENT reports a Service Account token, and the server entrusts the hosting account to execute requests using the IMPERSONation technology
  • Authentication: The Server exposes the Rancher RBAC management system and converts it into K8S RBAC rules for distribution to the back-end cluster
  • User management: Directly interconnect with IDP and create internal accounts
  • Cluster Controller: monitors the changes of back-end cluster resources and performs tuning operations
  • Cluster Agent: Registers clusters, reports information, and forwards requests
  • Tunnel: Establishes a duplex tunnel based on Websocket

The code structure

This series is based on Rancher V2.5.9 and involves only the Rancher server, not the front end. Once the project is opened, the code structure can be roughly divided into

  • The main function of several subprojects: in the root directorymain.goIs rancher’s main function,cmdThe main functions of Rancher Agent and Rancherd are included below
  • API definitions & code-gen:pkg/apiThe framework that contains the Web Server,pkg/apisContains CRD definitions,pkg/codegenContains the code generator
  • K8s operators:pkg/controllers
  • Tunnel module: scattered in multiple packages, such aspkg/dialer.pkg/httpproxy.pkg/k8slookup.pkg/peermanager.pkg/tunnelserver.pkg/clusterrouter. Rancher’s incomplete decoupling of tunnel management from cluster management is also due to the fact that the RemoteDialer package does not manage clusters itself.
  • Cluster management module: Seepkg/multiclustermanager

You can see that Rancher’s code structure makes a rough classification of the sub-components, but the code is still confusing and modularized.

Writing ideas

Next, let’s take the Rancher apart. Rancher’s abilities can be summed up in three main parts

  • Cluster management
  • User management
  • Forward requests

Each of the important submodules is

Cluster management:

  • controllers
  • cluster agent
  • multi-cluster manager

Request forwarding:

  • The authentication module
  • Authentication module
  • Tunnel module
  • Request forwarding module

User management:

  • Account management
  • Idps docking
  • Token management
  • Rights management

Among these modules, the most basic and independent are the tunnel module and the cluster management module. By understanding how Rancher knows about the existence of a cluster and how rancher forwards requests from Rancher APIServer to the back-end K8S cluster, you can get an idea of rancher’s general working mode. The rest is just business logic based on that working pattern.

Therefore, we will first sort out the tunnel module and analyze the behavior of Agent and Multi-Cluster Manager in cluster registration and cluster management based on the tunnel. After that, we will analyze the communication between Rancher and back-end cluster by explaining the authentication and authentication of K8S and combining with the process of request forwarding. How authentication and authentication is carried out.

After that, we will analyze rancher’s user management and permission management modules. Based on the tunnel module above, readers can understand how rancher’s management surface and K8S control surface work together, and what kind of opaque tricks Rancher does.

Finally, we will analyze rancher’s tools, such as CI/CD, Service Mesh, Monitoring, app Store, etc., to understand the working paradigm of cloud native systems.

Let’s get started.