The author | car overflow Fluid community Commiter Xie Yuandong Fluid source community Commiter | alibaba cloud native public number

Flexible scaling is one of Kubernetes’ core capabilities, but it has always been around this stateless application load. Fluid provides flexible scalability of distributed cache, allowing flexible expansion and contraction of data cache. Based on Runtime, it provides performance indicators such as cache space and current cache ratio. Combined with the capacity of Runtime resources, it provides on-demand data cache scaling capability.

background

As more and more data-intensive applications such as big data and AI begin to deploy and run in Kubernetes environment, the divergence between the design concept of data-intensive application computing framework and the flexible application choreography of cloud native has led to data access and computing bottlenecks. Fluid, a cloud native data choreography engine, provides data access acceleration for applications by abstracting data sets, utilizing distributed cache technology and combining with scheduler.

Elastic scaling is one of Kubernetes’ core capabilities, but it has always been around this stateless application load. Fluid provides flexible scalability of distributed cache, allowing flexible expansion and contraction of data cache. Based on Runtime, it provides performance indicators such as cache space and current cache ratio. Combined with the capacity of Runtime resources, it provides on-demand data cache scaling capability.

This capability is very important for big data applications in Internet scenarios, as most big data applications are implemented through end-to-end pipeline. The pipeline consists of the following steps:

  1. Data extraction: Use Spark, MapReduce and other big data technologies to preprocess the original data.
  2. Model training: The first stage is used to generate feature data for machine learning model training, and generate corresponding models.
  3. Model evaluation: Evaluate and test phase 2 generated models through test sets or validation sets.
  4. Model reasoning: The model verified in the third stage is finally pushed online to provide reasoning services for business.

As you can see, the end-to-end pipeline contains many different types of computing tasks. For each task, there are appropriate professional systems in practice (TensorFlow, PyTorch, Spark, Presto). However, these systems are independent of each other and usually rely on external file systems to implement the transfer of data from one phase to the next. However, frequent use of file systems to achieve data exchange will bring a large amount of I/O overhead, and often become the bottleneck of the entire workflow.

Fluid is very suitable for this scenario. Users can create a Dataset object, which has the ability to decentralize and cache data to Kubernetes compute nodes as the data exchange medium. In this way, remote writing and reading of data are avoided and the efficiency of data use is improved. But the problem here is resource estimation and reservation for the temporary data cache. Before data production and consumption, accurate data volume estimation is difficult to meet. A high estimate leads to a waste of resource reservation, and a low estimate leads to a higher possibility of data writing failure. It is more user-friendly to expand and shrink as needed. We want to achieve a similar effect to page cache, where the layer is transparent to the end user but the cache acceleration is real.

We introduced cache elastic scaling through Fluid through a custom HPA mechanism. When the amount of cached data reaches a certain percentage, elastic expansion is triggered to expand cache space. For example, if the cache space usage exceeds 75%, the total cache space is 10 GB. When the cache space usage reaches 8 GB, the expansion mechanism is triggered.

Here’s an example to help you experience Fluid’s ability to expand and shrink automatically.

The premise condition

Kubernetes 1.18 and above is recommended, because prior to 1.18, HPA could not customize the scaling policy, it was implemented by hard coding. After 1.18, users can customize capacity expansion policies, for example, they can define the cooling time after a capacity expansion.

Specific steps

1. Install the JQ tool to parse JSON.

In this example we are using centos and can install JQ through yum.

yum install -y jq
Copy the code

2. Download and install Fluid latest version.

git clone https://github.com/fluid-cloudnative/fluid.git
cd fluid/charts
kubectl create ns fluid-system
helm install fluid fluid
Copy the code

3. Deploy or configure Prometheus.

Metrics exposed by AlluxioRuntime’s cache engine are collected by Prometheus if there is no Prometheus in the cluster:

$ cd fluid
$ kubectl apply -f integration/prometheus/prometheus.yaml
Copy the code

If you have Prometheus in your cluster, you can write the following configuration to the Prometheus configuration file:

scrape_configs:
  - job_name: 'alluxio runtime'
    metrics_path: /metrics/prometheus
    kubernetes_sd_configs:
      - role: endpoints
    relabel_configs:
    - source_labels: [__meta_kubernetes_service_label_monitor]
      regex: alluxio_runtime_metrics
      action: keep
    - source_labels: [__meta_kubernetes_endpoint_port_name]
      regex: web
      action: keep
    - source_labels: [__meta_kubernetes_namespace]
      target_label: namespace
      replacement: $1
      action: replace
    - source_labels: [__meta_kubernetes_service_label_release]
      target_label: fluid_runtime
      replacement: $1
      action: replace
    - source_labels: [__meta_kubernetes_endpoint_address_target_name]
      target_label: pod
      replacement: $1
      action: replace
Copy the code

4. Verify that Prometheus is installed successfully.

$kubectl get ep -n kube-system Prometheus - SVC NAME ENDPOINTS AGE Prometheus - SVC 10.76.0.2:9090 6m49s $kubectl get SVC -n kube-system Prometheus - SVC NAME TYPE cluster-ip external-ip PORT(S) AGE Prometheus - SVC NodePort 172.16.135.24 < None > 9090:32114/TCP 2m7sCopy the code

If you want to visualize monitoring metrics, you can install Grafana to validate monitoring data, as described in the documentation.

5. Deploy metrics Server.

Check whether the metrics server is included in the cluster. If the metrics server is correctly configured, run kubectl top node to display the memory and CPU.

Kubectl Top Node NAME CPU(cores) CPU% MEMORY(bytes) MEMORY% 192.168.1.204 93m 2% 1455Mi 10% 192.168.1.205 125m 3% 1925Mi 13% 192.168.1.206 96m 2% 1689Mi 11%Copy the code

Otherwise, run the following command:

kubectl create -f integration/metrics-server
Copy the code

6. Deploy the custom-metrics-API component.

To scale based on custom metrics, you need to have two components:

  • The first component collects metrics from the application and stores them in the Prometheus time series database.
  • The second component uses the metrics collected to extend the Kubernetes custom Metrics API, known as K8S-Prometheus-Adapter.

The first component is deployed in step 3, and the second component is deployed.

If custom-metrics-API has been configured, add the configuration related to the dataset to the Adapter configMap configuration:

apiVersion: v1 kind: ConfigMap metadata: name: adapter-config namespace: monitoring data: config.yaml: | rules: - seriesQuery: '{__name__=~"Cluster_(CapacityTotal|CapacityUsed)",fluid_runtime! ="",instance! ="",job="alluxio runtime",namespace! ="",pod! =""}' seriesFilters: - is: ^Cluster_(CapacityTotal|CapacityUsed)$ resources: overrides: namespace: resource: namespace pod: resource: pods fluid_runtime: resource: datasets name: matches: "^(.*)" as: "capacity_used_rate" metricsQuery: ceil(Cluster_CapacityUsed{<<.LabelMatchers>>}*100/(Cluster_CapacityTotal{<<.LabelMatchers>>}))Copy the code

Otherwise, run the following command:

kubectl create -f integration/custom-metrics-api/namespace.yaml
kubectl create -f integration/custom-metrics-api
Copy the code

Note: Because custom-metrics-API connects to the cluster’s Prometheous access address, please replace the Prometheous URL with the actual Prometheous address you use.

Checking custom indicators:

$ kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1" | jq
{
  "kind": "APIResourceList",
  "apiVersion": "v1",
  "groupVersion": "custom.metrics.k8s.io/v1beta1",
  "resources": [
    {
      "name": "pods/capacity_used_rate",
      "singularName": "",
      "namespaced": true,
      "kind": "MetricValueList",
      "verbs": [
        "get"
      ]
    },
    {
      "name": "datasets.data.fluid.io/capacity_used_rate",
      "singularName": "",
      "namespaced": true,
      "kind": "MetricValueList",
      "verbs": [
        "get"
      ]
    },
    {
      "name": "namespaces/capacity_used_rate",
      "singularName": "",
      "namespaced": false,
      "kind": "MetricValueList",
      "verbs": [
        "get"
      ]
    }
  ]
}
Copy the code

7. Submit the Dataset used by the test.

$ cat<<EOF >dataset.yaml apiVersion: data.fluid.io/v1alpha1 kind: Dataset metadata: name: spark spec: mounts: - mountPoint: https://mirrors.bit.edu.cn/apache/spark/ name: spark --- apiVersion: data.fluid.io/v1alpha1 kind: AlluxioRuntime metadata: name: spark spec: replicas: 1 tieredstore: levels: - mediumtype: MEM path: /dev/shm quota: Gi high: 1 "0.99" low: "0.7" the properties: alluxio. User.. Streaming data. The timeout: 300sec EOF $ kubectl create -f dataset.yaml dataset.data.fluid.io/spark created alluxioruntime.data.fluid.io/spark createdCopy the code

8. Check whether the Dataset is in the available state.

It can be seen that the total amount of data in this data set is 2.71GiB. Currently, the number of cache nodes provided by Fluid is 1, and the maximum cache capacity that can be provided is 1GiB. In this case, the amount of data cannot meet the requirements of full data cache.

$kubectl get dataset NAME UFS TOTAL SIZE CACHED CACHE CAPACITY CACHED PERCENTAGE PHASE AGE Spark 2.71GiB 0.00B 1.00GiB 7 m38s 0.0% BoundCopy the code

9. After the Dataset is available, check whether the monitoring metrics can be obtained from custom-metrics-API.

kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/default/datasets.data.fluid.io/*/capacity_used_rate" |  jq { "kind": "MetricValueList", "apiVersion": "custom.metrics.k8s.io/v1beta1", "metadata": { "selfLink": "/apis/custom.metrics.k8s.io/v1beta1/namespaces/default/datasets.data.fluid.io/%2A/capacity_used_rate" }, "items": [ { "describedObject": { "kind": "Dataset", "namespace": "default", "name": "spark", "apiVersion": "data.fluid.io/v1alpha1" }, "metricName": "capacity_used_rate", "timestamp": "2021-04-04T07:24:52Z", "value": "0" } ] }Copy the code

10. Create an HPA task.

$ cat<<EOF > hpa.yaml
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: spark
spec:
  scaleTargetRef:
    apiVersion: data.fluid.io/v1alpha1
    kind: AlluxioRuntime
    name: spark
  minReplicas: 1
  maxReplicas: 4
  metrics:
  - type: Object
    object:
      metric:
        name: capacity_used_rate
      describedObject:
        apiVersion: data.fluid.io/v1alpha1
        kind: Dataset
        name: spark
      target:
        type: Value
        value: "90"
  behavior:
    scaleUp:
      policies:
      - type: Pods
        value: 2
        periodSeconds: 600
    scaleDown:
      selectPolicy: Disabled
EOF
Copy the code

First of all, let’s interpret the configuration from the sample. There are mainly two parts: one is the rule of expansion capacity, and the other is the sensitivity of expansion capacity:

  • Rule: The condition for triggering capacity expansion is that the amount of cached data of a Dataset object accounts for 90% of the total cache capacity. The capacity expansion object is AlluxioRuntime. The minimum number of copies is 1, and the maximum number of copies is 4. The Dataset and AlluxioRuntime objects need to be in the same namespace.
  • Policy: The version can be K8s 1.18 or later. You can set the stability time and step size ratio for capacity expansion and capacity reduction respectively. In periodSeconds, two new replicas are added, which cannot exceed the maxReplicas limit. And complete an enlarged, cooldown (stabilizationWindowSeconds) for 20 minutes; The capacity reduction policy can be directly disabled.

11. Check the HPA configuration. The current data ratio of the cache space is 0. Far below the threshold for triggering expansion.

$ kubectl get hpa
NAME    REFERENCE              TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
spark   AlluxioRuntime/spark   0/90      1         4         1          33s
$ kubectl describe hpa
Name:                                                    spark
Namespace:                                               default
Labels:                                                  <none>
Annotations:                                             <none>
CreationTimestamp:                                       Wed, 07 Apr 2021 17:36:39 +0800
Reference:                                               AlluxioRuntime/spark
Metrics:                                                 ( current / target )
  "capacity_used_rate" on Dataset/spark (target value):  0 / 90
Min replicas:                                            1
Max replicas:                                            4
Behavior:
  Scale Up:
    Stabilization Window: 0 seconds
    Select Policy: Max
    Policies:
      - Type: Pods  Value: 2  Period: 600 seconds
  Scale Down:
    Select Policy: Disabled
    Policies:
      - Type: Percent  Value: 100  Period: 15 seconds
AlluxioRuntime pods:   1 current / 1 desired
Conditions:
  Type            Status  Reason               Message
  ----            ------  ------               -------
  AbleToScale     True    ScaleDownStabilized  recent recommendations were higher than current one, applying the highest recent recommendation
  ScalingActive   True    ValidMetricFound     the HPA was able to successfully calculate a replica count from Dataset metric capacity_used_rate
  ScalingLimited  False   DesiredWithinRange   the desired count is within the acceptable range
Events:           <none>
Copy the code

12. Create a data preheating task.

$ cat<<EOF > dataload.yaml
apiVersion: data.fluid.io/v1alpha1
kind: DataLoad
metadata:
  name: spark
spec:
  dataset:
    name: spark
    namespace: default
EOF
$ kubectl create -f dataload.yaml
$ kubectl get dataload
NAME    DATASET   PHASE       AGE   DURATION
spark   spark     Executing   15s   Unfinished
Copy the code

13. At this point, it can be found that the amount of cached data is close to the cache capacity provided by Fluid (1GiB) and elastic scaling conditions are triggered.

$kubectl get dataset NAME UFS TOTAL SIZE CACHED CACHE CAPACITY CACHED PERCENTAGE PHASE AGE Spark 2.71GiB 1020.92MiB 1.00 GiB 36.8% 5 m15s BoundCopy the code

Based on HPA monitoring, you can see that the expansion of Alluxio Runtime has started and the expansion step is 2.

$ kubectl get hpa
NAME    REFERENCE              TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
spark   AlluxioRuntime/spark   100/90    1         4         2          4m20s
$ kubectl describe hpa
Name:                                                    spark
Namespace:                                               default
Labels:                                                  <none>
Annotations:                                             <none>
CreationTimestamp:                                       Wed, 07 Apr 2021 17:56:31 +0800
Reference:                                               AlluxioRuntime/spark
Metrics:                                                 ( current / target )
  "capacity_used_rate" on Dataset/spark (target value):  100 / 90
Min replicas:                                            1
Max replicas:                                            4
Behavior:
  Scale Up:
    Stabilization Window: 0 seconds
    Select Policy: Max
    Policies:
      - Type: Pods  Value: 2  Period: 600 seconds
  Scale Down:
    Select Policy: Disabled
    Policies:
      - Type: Percent  Value: 100  Period: 15 seconds
AlluxioRuntime pods:   2 current / 3 desired
Conditions:
  Type            Status  Reason              Message
  ----            ------  ------              -------
  AbleToScale     True    SucceededRescale    the HPA controller was able to update the target scale to 3
  ScalingActive   True    ValidMetricFound    the HPA was able to successfully calculate a replica count from Dataset metric capacity_used_rate
  ScalingLimited  False   DesiredWithinRange  the desired count is within the acceptable range
Events:
  Type     Reason                        Age                    From                       Message
  ----     ------                        ----                   ----                       -------
  Normal   SuccessfulRescale             21s                    horizontal-pod-autoscaler  New size: 2; reason: Dataset metric capacity_used_rate above target
  Normal   SuccessfulRescale             6s                     horizontal-pod-autoscaler  New size: 3; reason: Dataset metric capacity_used_rate above target
Copy the code

14. After waiting for some time, the cache space of the dataset was increased from 1GiB to 3GiB, and the data cache was almost complete.

$kubectl get dataset NAME UFS TOTAL SIZE CACHED CACHE CAPACITY CACHED PERCENTAGE PHASE AGE Spark 2.71GiB 2.59GiB 3.00 GiB 95.6% Bound 12 mCopy the code

According to the HPA status, the number of replicas in the Runtime corresponding to the Dataset is 3, and the capacity_used_rate of the used cache space is 85%, which does not trigger cache expansion.

$ kubectl get hpa
NAME    REFERENCE              TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
spark   AlluxioRuntime/spark   85/90     1         4         3          11m
Copy the code

15. Clean up your environment.

kubectl delete hpa spark
kubectl delete dataset spark
Copy the code

conclusion

Fluid provides the ability to use cache capacity on demand by combining Prometheous, Kubernetes HPA and Custom Metrics capabilities to trigger automatic elastic scaling based on the proportion of cache space occupied. In this way, users can be more flexible in using the distributed cache to improve the data access acceleration capability. In the future, we will provide the ability of timed expansion and contraction to provide stronger certainty for expansion and contraction capacity.

Fluid’s repository: github.com/fluid-cloud… Welcome to follow, contribute code and star.