Some guidelines for pit detour in CronJob controllers

Background: As the only person in the enterprise familiar with cloud products, you usually have to deal with cloud products. At present, most of our cloud infrastructure and cloud services are running on ali cloud, and every cloud products are independent of the management system, which makes us often cannot be related products in the process of operations and associated information effective organization together, for rapid diagnosis of problems and information query, for the operations and the development of students, Jumping back and forth between multiple systems to find associated information is an inefficient and error-prone transaction, so generally speaking, whether in operations or development, we want to integrate and correlate cloud resources and services associated with the enterprise to maximize efficiency. In this process, we used CronJob of Kubernetes cluster to obtain some resources of Ali Cloud regularly. During this process, we encountered some problems and re-read the official CronJob documents according to the problems, which are recorded here.

CronJob brief introduction

A CronJob object is like a Crontab file in Linux. It periodically creates jobs within a given scheduling period (crontab format).

Note: The scheduling period of all scheduled tasks depends on the time zone of the k8S master node

In general, cronJobs are useful for creating periodic and repetitive tasks, such as periodic backups and mail delivery scenarios.

Of course, Cronjob also has some limitations and features in Kubernetes cluster, so it is better to use it in detail.

Note: The Cronjob controller is still officially in beta, which means there are still some issues.

Limitations of cronJobs

A Cronjob creates roughly one Jobs object each time it executes a schedule. This is probably because sometimes two jobs may be created, or no task may be created. Official implementations try to solve this problem, but it is still unavoidable. Therefore, all jobs should be idempotent during design.

If the startingDeadlineSeconds parameter is set to a large value, or if it is not set (the default), and concurrencyPolicy is set to Allow, the Job will always run at least once.

For each Cronjob, the Cronjob controller checks how many scheduling times it has missed in the period from the last scheduling time to the present. If the scheduling times are 100 times, the Cronjob will not execute scheduling any more, and the following exceptions will occur.

Cannot determine if job needs to be started. Too many missed start time (> 100). Set or decrease .spec.startingDeadlineSeconds or check clock skew.
Copy the code

Note that if the startingDeadlineSeconds parameter is set (not empty), the controller will no longer count the number of missed dispatches from the last dispatch time, but from the startingDeadlineSeconds value to the present. For example, if the set startingDeadlineSeconds: 200, the controller will statistical missed scheduling times in the last 200 seconds.

If the CronJob cannot be created at the scheduled time, the task is considered to have missed scheduling. For example, when concurrencyPolicy: Forbid is set, the CronJob attempt will be scheduled again while the current task is still running. In this case, the CronJob will be rejected by Forbid, and thus will be recorded as a missed schedule.

For another example, let’s assume that a timed task is set to execute every minute after 08:30:00, and that the startingDeadlineSeconds parameter is not set. If the CronJob controller fails between 08:29:00 and 10:21:00, the Job will not run and the number of missed jobs will exceed 100.

In order to further illustrate the problem, suppose a timing task is set on 08:30:00 started once every minute, and startingDeadlineSeconds: 200. If the CronJob controller fails during the same period, the Job will continue to be executed at 10:22:00. Because the controller only calculates the number of missed scheduling times in the past 200 seconds, the number of missed scheduling times is less than 100. All scheduled tasks continue to be scheduled after the controller recovers, and normal tasks are not affected.

It should also be noted that cronJobs are only responsible for scheduling and creating matching Jobs, while Jobs manage the Pods that actually perform tasks.

Cronjob parameter details

spec.startingDeadlineSeconds: indicates the start time for counting the number of missed scheduling (100 times). By default, the number of missed scheduling starts from the last time

No more scheduling after 100)
spec.concurrencyPolicy: concurrent scheduling strategy, optional values: {” Allow “:” Allow concurrent “, “Forbid” : “not allowed”, “Replace” : “scheduling coverage”}.
– Allow: Note:When set toAllow, you need to consider the task execution time and scheduling period, because the last task may not be successfully executed, the next task will be executed, so many tasks may be executed backlog, resulting in resource misuse;
– Replace: when usingReplaceIn this case, the latter task replaces the previous one, so that all tasks may not be completed.
– Forbid: indicates that concurrent scheduling is not allowed, that is, scheduling is performed once and then again in the next scheduling period. However, most tasks may be executed for too long

The scheduling times have all been perfectly missed, and nowstartingDeadlineSecondsIf the parameters are not set, it may appear that the task will not be scheduled again, corresponding to k8S

May beCannot determine if job needs to be started: too many missed start time (> 100). Set or decrease .spec.startingDeadlineSeconds or check clock skew
spec.schedule: Scheduling interval in standard crontab format [minute/hour/month/week]
spec.failedJobsHistoryLimit: Indicates the limit on the number of failed tasks. (Usually, one or two tasks can be reserved for viewing failure details and adjusting scheduling policies.)
spec.successfulJobsHistoryLimit: Limit on the number of successful missions in history (you can decide how many successful missions to keep)
spec.jobTemplate: standard POD runtime template (container runtime parameters)
spec.suspend: This parameter is optionaltrueAll subsequent tasks are suspended. This parameter does not apply to already running tasks. The default value is False

CronJob sample

$cat dnsall-cronjob. Yaml apiVersion: batch/v1beta1 kind: cronjob metadata: labels: run: Dnsall name: dnsall namespace: myApp spec: # It is strongly recommended to set concurrent policies based on the scheduling period and task characteristics. Forbid # strongly suggest setting failure number of jobs for the returning for failure, to optimize the task failedJobsHistoryLimit: 1 successfulJobsHistoryLimit: Schedule: '05,15,25,35,45,55 */1 * * *' suspend: false jobTemplate: metadata: spec: template: metadata: labels: run: dnsall spec: imagePullSecrets: - name: mydocker containers: - args: - -cmdbtype - dns image: harbor.bgbiao.top/cron-job:2019-12-04 imagePullPolicy: Always name: dnsall resources: {} terminationMessagePath: /dev/termination-log terminationMessagePolicy: File dnsPolicy: ClusterFirst # it is strongly recommended to set the restartPolicy for the job. OnFailure schedulerName: default-scheduler securityContext: {} terminationGracePeriodSeconds: 30 $kubectl get cronjob -n myapp NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE dnsall 05,15,25,35,45,55 */1 * * * False 0 8m41s 23h # cronJob creates jobs periodically, so job pods are maintained by job controllers. We preserve the cronjob above three successful mission $kubectl get jobs - n myapp | grep DNS dnsall 23 s 22 m - 1577597100 1/1 dnsall 24 s - 1577597700 1/1 12m dnsall-1577598300 1/1 24s 2m22s The current value of 0 $kubectl get the pods -n myapp | grep dnsall dnsall - 1577598300-1577598300 - Completed 0 3 m29s hdl4z 0/1Copy the code

Welcome to follow my official account: BGBiao, progress together ~

Some guidelines for pit detour in CronJob controllers

Related Posts

How to use JMeter performance test correctly? Stick to the actual requirements of the interview

Build Kubernetes cluster based on VirtualBox + Ubuntu 16

R language uses Markov Chain, MC to simulate mortgage default