The author | quantico great tiger, KanJunBao

Operator Lifecycle Manager (OLM), as part of the Operator Framework, helps users automatically install, upgrade, and manage the Lifecycle of operators. At the same time, OLM itself is also installed and deployed in the form of Operator. It can be said that it works by managing Operators. The automation management capability of declarative for Operator is in line with the design concept of Kubernetes interaction. In this article we will look at the basic architecture and installation of OLM.

OLM component model definition

OLM was created to help users without knowledge of domains such as big data and cloud monitoring deploy and manage complex distributed applications such as ETCD, big data analytics or monitoring services themselves. Therefore, in terms of its design goals, OLM officials hope to achieve common management capabilities for cloud native applications in the following directions, including:

  • Life cycle management: Manages operator itself and monitors upgrades and life cycles of resource models;

  • Service discovery: Discover which operators exist in the cluster, which resource models these operators manage, and which operators can be installed in the cluster.

  • Packaging capability: provides a standard mode for distributing, installing, and upgrading operators and dependent components;

  • Interaction capability: After standardizing the above capabilities, you need to provide a normalized way (for example, CLI) to interact with other user-defined cloud services in the cluster.

The above design goals can be boiled down to requirements in the following directions:

  • Namespace deployment: Operator and its management resource model must be deployed with namespace constraint, which is also necessary for logical isolation and enhanced access control using RBAC in multi-tenancy environments;

  • Using Custom Resource (CR) Definitions: Using the CR model is the preferred way to define user and operator read/write interactions; An operator also declares its own resource model or that is managed by another operator through CRDs. The operator’s own behavior pattern configuration should also be defined by fields in CRD;

  • Dependency resolution: The operator implementation only needs to care about the packaging of itself and its managed resources, and does not need to care about the connection to the running cluster. At the same time, dynamic library definitions are used on dependencies. In this example, vault-operator is deployed and an ETCD cluster is created as the back-end storage. Instead of directly including the etcd operator container in vault-operator, OLM should resolve the corresponding dependency through dependency declarations. To do this, you need a dependency definition specification in Operators;

  • Idempotency of deployment: dependency resolution and resource installation can be performed repeatedly, while problems during application installation are recoverable;

  • Garbage collection: In principle, we should rely on Kubernetes’ native garbage collection capability as much as possible. When deleting OLM’s own extended model ClusterService, we need to clean its associated resources in operation at the same time. Ensure that resources managed by other ClusterServices are not deleted.

  • Supports tag and resource discovery.

Based on the above design goals, THE OLM implementation defines the following model and components for Operator.

First, OLM itself contains two operators: the OLM Operator and the Catalog Operator. They manage the basic CRD models that are extended in the following OLM architectures:

The Deployment, Serviceaccount, RBAC related roles and role bindings are created by OLM Operator during the Operator installation management lifecycle; The Catalog Operator is responsible for creating resources such as CRDs and CSVs.

Before introducing the two OLM operators, let’s first look at the definition of ClusterServiceVersion. As a basic element in THE OLM workflow, ClusterServiceVersion defines the collection of metadata and runtime information of user business applications managed by OLM, including:

  • Apply metadata (name, description, version definition, link, icon, tag, etc.), as we’ll see in the next chapter’s practical examples;

  • Installation policies, including deployment sets and service accounts, RBAC roles and bindings required during Operator installation;

  • CRDs: includes the type of CRD, the service that it belongs to, other K8s native resources that the Operator interacts with, and the fields and field descriptors that contain the semantic information of the model, such as spec and status.

With a basic understanding of clusterService Sion concepts, let’s take a look at the OLM Operator.

First, the OLM Operator works based on ClusterServiceVersion. Once the dependent resources declared in CSV have been registered in the target cluster, the OLM Operator is responsible for installing the application instances corresponding to these resources. Note that the OLM Operator does not care about the creation and registration of the CRD model corresponding to the dependent resources declared in CSV. These actions can be done by manual kubectl operations of the user or by Catalog Opetator. This design also gives users a familiar process of gradually adapting to the OLM architecture and eventually applying it. In addition, OLM Operator listening on the custom model of dependent resources can be global all Namespaces or restricted to a specified namespace.

The Catalog Operator is responsible for parsing the dependent resource definitions declared in CSV and updating the version corresponding to CSV by listening for the version definition of channels corresponding to the installation package in Catalog.

Users can create Subscription models to set the source of the required installpackages and updates in a channel. When an update is available, a user’s InstallPlan model is created in the namespace. The InstallPlan instance will contain the definition of the target CSV and the associated Approval policy. The Catalog Operator will create the corresponding execution plan to create the dependency resource model required by CSV. Once approved by the user, the Catalog Operator creates the resources in InstallPlan, and the OLM Operator’s dependency criteria are met. Instances of operators defined in CSV are created by OLM operators.

OLM architecture introduction

In the previous section, we learned about OLM’s basic component model and related definitions. In this section, we introduce its basic architecture, as shown in the following figure:

The Operator Framework provides two important meta-operators and corresponding extended resources (ClusterServiceVersion, InstallPlan, etc., as described in the previous section). This parameter is used to manage the life cycle of Operator applications. In the customized CSV model, various resource combinations of operators are defined, including how operators are deployed, the types of user-defined resources managed by operators, and the K8s native resources used.

In the definition of the previous section, we also learned that OLM Operator requires that the custom resource model it manages be registered in the target installation cluster before installing the corresponding Operator instance. This action can be created manually by the cluster administrator through Kubectl. This can also be done with the Catalog Operator, which in addition to registering the target CRD model, is also responsible for the automatic upgrade of the resource model version. Its workflow includes:

  • Ensure the cache and index mechanism of CRDs and CSVs models for version control and registration of corresponding models;

  • Listen for unresolved InstallPlans created by users:

    • Find the CSV model that meets the dependency condition and add it to the parsed resource;
    • Add all CRD models managed or relied on by the target Operator to the parse resource;
    • Find and manage CSV models for each dependency CRD;
  • Listen for all resolved Installplans and create all dependent resources after user approval or automatic approval is complete.

  • Listen to CataologSources and Subscriptions models and create corresponding InstallPlans based on their changes.

Once the OLM Operator detects that the resources required for installation in the CSV template have been registered or changed, it starts the installation and upgrade of the application Operator, and finally starts the Operator workflow. Create and manage the corresponding custom resource instance model in the Kubernetes cluster.

The installation of the OLM

Now that you know the OLM infrastructure, let’s first look at OLM installation. In the community code, we find the templates corresponding to the OLM deployment resources. Users can easily modify the corresponding deployment parameters to complete the customized OLM installation.

You can find the latest release and installation instructions for each version in the official release notice.

The following uses 0.13.0 as an example to run the automatic installation script:

curl -L https:/ / github.com/operator-framework/operator-lifecycle-manager/releases/download/0.13.0/install.sh -o install. Sh
chmod +x install.sh
./install.sh 0.13. 0
Copy the code

Command for manually installing the DEPLOYMENT template required for OLM:

kubectl apply -f https:/ / github.com/operator-framework/operator-lifecycle-manager/releases/download/0.13.0/crds.yaml
kubectl apply -f https:/ / github.com/operator-framework/operator-lifecycle-manager/releases/download/0.13.0/olm.yaml
Copy the code

After using clone OLM repository, you can run the make run-local command to start minikube and build the OLM image locally by using minikube’s own Docker daemon. Yaml in the deploy directory of the repository is used as the configuration file to build and run the local OLM. You can run the kubectl -n local get deployments command to check whether the OLM components are successfully installed and running.

In addition, OLM can generate and install a customized deployment template by setting parameters specified in the template as follows. Here are the template parameters it supports:

# sets the apiversion to use for rbac-resources. Change to `authorization.openshift.io` for openshift
rbacApiVersion: rbac.authorization.k8s.io
# namespace is the namespace the operators will _run_
namespace: olm
# watchedNamespaces is a comma-separated list of namespaces the operators will _watch_ for OLM resources.
# Omit to enable OLM in all namespaces
watchedNamespaces: olm
# catalog_namespace is the namespace where the catalog operator will look for global catalogs.
# entries in global catalogs can be resolved in any watched namespace
catalog_namespace: olm
# operator_namespace is the namespace where the operator runs
operator_namespace: operators

# OLM operator run configuration
olm:
  # OLM operator doesn't do any leader election (yet), set to 1
  replicaCount: 1
  # The image to run. If not building a local image, use sha256 image references
  image:
    ref: quay.io/operator-framework/olm:local
    pullPolicy: IfNotPresent
  service:
    # port for readiness/liveness probes
    internalPort: 8080

# catalog operator run configuration
catalog:
  # Catalog operator doesn't do any leader election (yet), set to 1
  replicaCount: 1
  # The image to run. If not building a local image, use sha256 image references
  image:
    ref: quay.io/operator-framework/olm:local
    pullPolicy: IfNotPresent
  service:
    # port for readiness/liveness probes
    internalPort: 8080
Copy the code

You can use the following methods to customize templates and install them in a specified cluster:

  • Create a name likemy-values.yamlYou can configure parameters by referring to the configuration template.
  • Based on the above configurationmy-values.yamlTemplate, usingpackage_release.shGenerate the specified deployment template;
The first parameter is the system-compatible target version of Helm Chart
The second argument is the output directory specified by the template
The third parameter is the specified configuration file path. / scripts/package_release. Sh 1.0.0 - myolm. / my - olm deployment - my - values. The yamlCopy the code
  • To deploy the template file in the specified directory, runkubectl apply -f ./my-olm-deployment/templates/;

Finally, you can use the GLOBAL_CATALOG_NAMESPACE environment variable to define the catalog operator to listen for the namespace specified for the global catalogs. By default, the installation creates the OLM namespace and deploys the Catalog operator.

Dependency resolution and upgrade management

Just as APT/DKPG and yum/ RPM manage system component packages, OLM also has problems with dependency resolution and upgrade management of running Operator instances when managing Operator versions. To ensure the availability of all operators at run-time, OLM needs to ensure in the dependency resolution and upgrade management process that:

  • Do not install unregistered Operator instances that depend on APIs.
  • If an Operator upgrade will break the dependency conditions of its associated components, do not perform the upgrade.

Here are some examples of how OLM currently handles dependency resolution under version iteration:

First, the UPGRADE of CRD is introduced. When a CRD to be upgraded belongs to a single CSV, OLM immediately upgrades the CRD. If the CRD belongs to multiple CSV files, the CRD upgrade must meet the following conditions:

  • All service versions currently used by the CRD need to be included in the new CRD;
  • All CR (Custom Resource) instances associated with existing service versions of CRD can be verified against the new CRD Schema.

When you need to add a new version of CRD, the official recommended steps are:

  1. Suppose we have a CRD currently in use, its version isv1alpha1At this point you want to add a new versionv1beta1And set it to the new storage version as follows:
versions:
  - name: v1alpha1
    served: true
    storage: false
  - name: v1beta1
    served: true
    storage: true
Copy the code
  1. If you need to use a new version of CRD in your CSV, we need to make sure that in CSVownedThe CRD version referenced by the field is new, as follows:
customresourcedefinitions:
  owned:
  - name: cluster.example.com
    version: v1beta1
    kind: cluster
    displayName: Cluster
Copy the code
  1. Push the updated CRD and CSV to the specified repository directory.

When we need to deprecate or remove a CRD version, OLM does not allow us to immediately remove an in-use CRD version. Instead, we need to first deprecate the version by setting the SERVERd field in the CRD to false. The unused version is then removed during subsequent CRD upgrades. The official recommended steps for removing or deprecating a CRD-specific version are as follows:

  1. Will expire the deprecated CRD version correspondingserverdThe field marked false indicates that the version is no longer used and will be deleted in the next upgrade, for example:
versions:
  - name: v1alpha1
    served: false
    storage: true
Copy the code
  1. If the current CRD version is about to expirestorageThe field is true and needs to be set to false while the new version ofstorageSet the corresponding field to true, for example:
versions:
  - name: v1alpha1
    served: false
    storage: false
  - name: v1beta1
    served: true
    storage: true
Copy the code
  1. Update the CRD model based on the above modifications;

  2. During subsequent upgrades, expired versions that are not in service will be removed from the CRD and the final version state of the CRD will be:

versions:
  - name: v1beta1
    served: true
    storage: true
Copy the code

Note When deleting the CRD of a specified version, ensure that the version is also deleted from the storedVersion field queue in CRD Status. OLM will help us delete storedVersion when it discovers that it will no longer be used in the new CRD. In addition, we need to ensure that the CRD version referenced in the CSV is updated when the old version is deleted.

Let’s take a look at two examples of upgrade failures and OLM’s dependency resolution logic:

Example 1: Suppose we have two different types of CRD, A and B.

  • Operators that use A depend on B
  • Operators using B have a Subscription
  • The Operator using B is upgraded to the new version C and is deprecated from the old version B

The result of such an upgrade is that the CRD version of B does not have the corresponding Operator or APIService to use it, and A, which depends on it, will not work.

Example 2: Suppose we have two custom apis, A and B.

  • Operators that use A depend on B
  • Operators that use B depend on A
  • Operators using A want to upgrade to A2 and discard the old version A. The new A2 version depends on B2
  • Operators using B want to upgrade to B2 and discard the old version B. The new B2 version depends on A2

At this time, if we only try to upgrade A without synchronously upgrading B, even if the system can find the appropriate upgrade version, the corresponding Operator version cannot be upgraded.

To avoid the problems that might be encountered in the above version iterations, the dependency resolution logic used by OLM is as follows.

Suppose we have a set of operators running under a namespace:

  • For each subscription under this namespace, if the subscription has not been checked before, OLM looks for the latest version of the CSV under the corresponding source/package/channel, Temporarily create an operator that matches the new version; If the subscription is known, OLM will query the corresponding source/package/channel for updates;

  • For each version of the API relied on in CSV, OLM selects a corresponding operator based on the priority of Sources. If it finds a new operator, OLM will temporarily add the same dependent version. The dependency API is added if the corresponding operator is not found.

  • If there is an API that does not meet the source dependency conditions, the system will degrade the dependent operator (rollback to the previous version). In order to satisfy the final dependency conditions, the degradation process continues, and in the worst case, all operators in the namespace remain the same version.

  • If a new operator completes parsing and satisfies the dependency criteria, it will eventually be created in the cluster. There is also a subscription associated with discovering its channel/package or source to continue to see if new versions are updated.

With an understanding of the basics of OLM dependency resolution and upgrade management, let’s take a look at the workflow related to OLM upgrades. ClusterServiceVersion (CSV), CatalogSource and Subscription are three extensions of the OLM framework that are closely related to upgrades. In the OLM ecosystem, we store operator metadata such as CVS via CatalogSource; OLM uses the API of the Operator repository to query available or upgradable Operators based on CatalogSources. In CatalogSource, operators use Channels to identify packaged versions of the installation packages.

Subscription is used when users want to upgrade an operator to a specific version of a package in which channel they want to install. If the package specified in the subscription is not already installed in the target cluster, OLM installs the latest version of Operator in the download sources such as catalog/ Package/Channel.

In a CSV definition, we can declare a replacement operator from a replaces field. OLM, upon receiving a request, looks for CSV definitions that can be installed from different channels and constructs them into a DAG (directed acyclog). In this process channels can be considered as the entry point for updating DAG. During the upgrade process, if OLM finds that there is an uninstalled intermediate version between the latest upgradable version and the current version, the system automatically builds an upgrade path and ensures that the intermediate version is installed on the path. For example, if we have a running operator running version 0.1.1, OLM finds the latest upgradable version of 0.1.3 and the intermediate version of 0.1.2 through the subscribed channel after receiving the update request. In this case, OLM first installs the operator of 0.1.2 in CSV to replace the current version, and finally installs 0.1.3 to replace 0.1.2.

Of course, in some cases, such as when we have an intermediate release with serious security vulnerabilities, iterating through each release is not a reasonable or safe way to do it. At this point we can customize the installation path using the skips field to skip the specified intermediate version, as shown below:

apiVersion: operators.coreos.com/v1alpha1
kind: ClusterServiceVersion
metadata:
  name: Etcdoperator. V0.9.2
  namespace: placeholder
  annotations:
spec:
    displayName: etcd
    description: Etcd Operator
    replaces: Etcdoperator. V0.9.0
    skips:
    - Etcdoperator. V0.9.1
Copy the code

If we need to ignore multiple versions of the installation, we can use the following definitions in CSV:

olm.skipRange: <semver range>
Copy the code

The version range can be defined by referring to Semver, and a skipRange CSV example is as follows:

apiVersion: operators.coreos.com/v1alpha1
kind: ClusterServiceVersion
metadata:
    name: Elasticsearch - operator. V4.1.2
    namespace: placeholder
    annotations:
        olm.skipRange: '> = 4.1.0 < 4.1.2'
Copy the code

operator-registry

In OLM, we can define where InstallPlan completes the automatic download and dependency resolution of the CatalogSource model. Subscription also pulls the latest version of the installation package from the CatalogSource through a Subscription to a channel. This section uses operator-Registry as an example to describe the installation and basic usage of CatalogSource.

Operator-registry mainly consists of the following three parts:

  • Initializer: is responsible for receiving operator MANIFESTS uploaded by users and importing data into the database;
  • Registry -server: a service that contains the SQLite database service for storing and storing operator SSE and exposes the gRPC protocol interface;
  • Configmap – server: Responsible for providing registry-server with the parse operator manifest related ConfigMap (including tags related to operator bundles or configuration metadata such as CRD and CSV) and storing it in sqLite database.

For the format definition of operator manifes, in operator-Registry each CSV definition unit contained in the upload directory is called a bundle. Each typical bundle consists of a single CSV(ClusterServiceVersion) and a single or multiple CRDS containing their associated interface definitions, as shown below:

 Example # bundle
 0.61.
 ├ ─ ─ etcdcluster.crd.yaml
 └ ─ ─ etcdoperator.clusterserviceversion.yaml
Copy the code

When importing a sse document into a database, the following validation will be included:

  • Each package installation package needs to define at least one channel;
  • Each CSV must be associated with a channel that exists in the installation package.
  • There is only one corresponding CSV definition in each bundle directory;
  • If the CSV contains the CRD definition, the CRD must also exist in the bundle directory.
  • If a CSV is inreplacesIf the definition is replaced by another CSV, the corresponding old and new CSV needs to exist in the package.

In principle, it is best to keep a clear directory structure for each bundle in its manifests.

manifests
├ ─ ─ etcd
   ├ ─ ─ 0.61.
      ├ ─ ─ etcdcluster.crd.yaml
      └ ─ ─ etcdoperator.clusterserviceversion.yaml
   ├ ─ ─ 0.9. 0
      ├ ─ ─ etcdbackup.crd.yaml
      ├ ─ ─ etcdcluster.crd.yaml
      ├ ─ ─ Etcdoperator. V0.9.0. Clusterserviceversion yaml
      └ ─ ─ etcdrestore.crd.yaml
   ├ ─ ─ 0.92.
      ├ ─ ─ etcdbackup.crd.yaml
      ├ ─ ─ etcdcluster.crd.yaml
      ├ ─ ─ Etcdoperator. V0.9.2. Clusterserviceversion yaml
      └ ─ ─ etcdrestore.crd.yaml
   └ ─ ─ etcd.package.yaml
└ ─ ─ prometheus
    ├ ─ ─ 0.14. 0
       ├ ─ ─ alertmanager.crd.yaml
       ├ ─ ─ prometheus.crd.yaml
       ├ ─ ─ Prometheusoperator. 0.14.0. Clusterserviceversion yaml
       ├ ─ ─ prometheusrule.crd.yaml
       └ ─ ─ servicemonitor.crd.yaml
    ├ ─ ─ 0.15. 0
       ├ ─ ─ alertmanager.crd.yaml
       ├ ─ ─ prometheus.crd.yaml
       ├ ─ ─ Prometheusoperator. 0.15.0. Clusterserviceversion yaml
       ├ ─ ─ prometheusrule.crd.yaml
       └ ─ ─ servicemonitor.crd.yaml
    ├ ─ ─ 0.222.
       ├ ─ ─ alertmanager.crd.yaml
       ├ ─ ─ prometheus.crd.yaml
       ├ ─ ─ Prometheusoperator. 0.22.2. Clusterserviceversion yaml
       ├ ─ ─ prometheusrule.crd.yaml
       └ ─ ─ servicemonitor.crd.yaml
    └ ─ ─ prometheus.package.yaml
Copy the code

Through the official Dockerfile we can build a minimum set of operator-Registry image that contains initializer and Registry-server.

Create a CatalogSource object and specify the corresponding image to use our operator-Registry, as shown below:

apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: example-manifests
  namespace: default
spec:
  sourceType: grpc
  image: example-registry:latest
Copy the code

After the above example-manifest has been started, we can check whether the corresponding gRPC backend service has been established through the POD log:

$ kubectl logs example-manifests-wfh5h -n default

time="2019-03-18T10:20:14Z" level=info msg="serving registry" database=bundles.db port=50051
Copy the code

At the same time, once the catalog is loaded, the package-Server component in OLM will start reading the Operators package defined in the catalog. We can Watch the currently available Operator Package by using the following command:

$ watch kubectl get packagemanifests [...]  NAME AGE prometheus 13m etcd 27mCopy the code

In the meantime, we can use the following command to view a default channel for specifying Operator package:

$ kubectl get packagemanifests etcd -o jsonpath='{.status.defaultChannel}'

alpha
Copy the code

Using the Operator package name, channel, and namespace to run catalog obtained above, We can initiate the installation or upgrade of an Operator from a specified catalog source by creating the OLM Subscription object described above, as shown in the following example:

apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: etcd-subscription
  namespace: default 
spec:
  channel: alpha
  name: etcd
  source: example-manifests
  sourceNamespace: default
Copy the code

In addition, through gRPCurl, a command line communication tool supporting gRPC protocol, we can send a request to the specified catalog server locally, so as to conveniently view the software package catalog information.

summary

This chapter introduces the basic architecture and usage of Operator Lifecycle Manager. After learning this chapter, we have a clear understanding of the working principle and architecture design of OLM. At the same time, some sample codes are used to deepen readers’ understanding of OLM application practice, which provides a guiding basis for realizing product capability expansion through Operator Framework in actual work.

Author’s brief introduction

Kuang Dahu Ali Cloud senior technical expert, engaged in the development of Kubernetes and container related products. It pays special attention to cloud native security and is a core member of Ali Cloud Container service cloud native security.

Kan Junbao, ali Cloud container service technical expert, focuses on Kubernetes, Docker, cloud storage, and is the core maintainer of ALI Cloud CSI project.

– Free book benefits –

Before 17:00 on September 11, in the message area of Alibaba public account ** welcome everyone to discuss and exchange, ** select the top 3 messages to send “cloud native application management: Principles and Practice” book!

“Alibaba Cloud originator focuses on micro-service, Serverless, container, Service Mesh and other technical fields, focuses on the trend of cloud native popular technology, large-scale implementation of cloud native practice, and becomes the public account that most understands cloud native developers.”