CI/CD Half bucket of water (2)

The author:Clfeng

Learners’

In the chapter “CI/CD Half bucket of Water (I)”, we created our own learning project, set up the CI/CD environment, and wrote our first assembly line. In this chapter, we will take a closer look at two core CI/CD concepts: Pipeline and Jobs. Through the related content of these two related concepts, we will further improve our ability to write CI/CD pipeline.

Pipelines

Pipelines are the top component of CI/CD; when a pipeline is created, it automatically executes a series of tasks without human intervention. For the study of Piplines, THE author believes that it is very important to understand the related architecture and the way of performance optimization. Even if we do not know the content of this part, it may not affect us to make the pipeline of the project run. However, understanding this part will help us to write a more reasonable assembly line and improve the execution efficiency of the whole assembly line.

Pipeline trigger mode

Code push
Timing task
Manual trigger
api

Pipeline architecture

Basic: Suitable for simple projects where all configuration is centralized in one easy-to-find place.
Directed Acyclic Graph: Suitable for large, complex projects that require efficient execution.
Child/Parent Pipelines: Apply to Monorepos and projects with a large number of independently defined components.

The Basic model

In the Basic model, you must wait for all tasks in the Build stage to complete before moving to the Test stage

stages:
  - build
  - test
  - deploy

build_a:
  stage: build
  script:
    - echo "build a"
  tags:
    - clf-cicd-runner

build_b:
  stage: build
  script:
    - echo "build b"
  tags:
    - clf-cicd-runner

test_a:
  stage: test
  script:
    - echo "test a"
  tags:
    - clf-cicd-runner

test_b:
  stage: test
  script:
    - echo "test b"
  tags:
    - clf-cicd-runner

deploy_a:
  stage: deploy
  script:
    - echo "deploy a"
  tags:
    - clf-cicd-runner

deploy_b:
  stage: deploy
  script:
    - echo "deploy b"
  tags:
    - clf-cicd-runner
Copy the code

DAG model

When the dependency is declared through the needs keyword, the job of the next stage can be executed in advance according to the dependency.

In the example above:

Deploy_a relies on test_A, test_A relies on build_A;

Deploy_b relies on test_B, test_a relies on build_B;

Where build_A and build_B belong to the Build stage; Test_a and test_B both belong to the test stage; Both Deploy_A and Deploy_B are part of the Build stage;

The entire pipeline execution process is as follows:

Build_a and build_B start executing at the same time, and we’ll assume that build_A takes less time to complete; When build_A completes but build_B hasn’t, test_A doesn’t have to wait for Build_B to complete before it starts executing

stages:
  - build
  - test
  - deploy

build_a:
  stage: build
  script:
    - echo "build a"
  tags:
    - clf-cicd-runner

build_b:
  stage: build
  script:
    - echo "build b"
  tags:
    - clf-cicd-runner

test_a:
  stage: test
  needs: [build_a]
  script:
    - echo "test a"
  tags:
    - clf-cicd-runner

test_b:
  stage: test
  needs: [build_b]
  script:
    - echo "test b"
  tags:
    - clf-cicd-runner

deploy_a:
  stage: deploy
  needs: [test_a]
  script:
    - echo "deploy a"
  tags:
    - clf-cicd-runner

deploy_b:
  stage: deploy
  needs: [test_b]
  script:
    - echo "deploy b"
  tags:
    - clf-cicd-runner
Copy the code

The Child/Parent model

stages:
  - triggers

trigger_a:
  stage: triggers
  trigger:
    include: a/.gitlab-ci.yml
  rules:
    - changes:
        - a/*

trigger_b:
  stage: triggers
  trigger:
    include: b/.gitlab-ci.yml
  rules:
    - changes:
        - b/*
Copy the code

# a/.gitlab-ci.yml
stages:
  - build
  - deploy

build:
  stage: build
  script:
    - echo "This job is a build."
  tags:
    - clf-cicd-runner
deploy:
  stage: deploy
  script:
    - echo "This job is a deploy."
  tags:
    - clf-cicd-runner
Copy the code

# b/.gitlab-ci.yml
stages:
  - build
  - deploy

build:
  stage: build
  script:
    - echo "This job is b build."
  tags:
    - clf-cicd-runner
deploy:
  stage: deploy
  script:
    - echo "This job is b deploy."
  tags:
    - clf-cicd-runner
Copy the code

Multiple projects

Multiple projects are organized much like parent/ Child, except that the child pipelines specified in this way are other projects

# upstream yml
Once the test phase is successfully executed, the staging phase is entered, which triggers the pipeline for the downstream project my/ Deployment
rspec:
  stage: test
  script: bundle exec rspec

staging:
  variables:
    ENVIRONMENT: staging
  stage: deploy
  trigger: my/deployment
Copy the code

# upstream yml
# Upstream project can map the execution state of downstream project pipeline through trigger:strategy keyword
trigger_job:
  trigger:
    project: my/project
    strategy: depend
Copy the code

# downstream yml
# Downstream projects map pipelines from upstream projects via the NEEDS: Pipeline keyword
upstream_bridge:
  stage: test
  needs:
    pipeline: other/project
Copy the code

Pipeline performance optimization

Put failure-prone tasks first
Avoid unnecessary tasks
Optimize the Docker image to make the image smaller

Docker image optimization:

Use small base images, for exampledebian-slim.
Don’t install convenience tools like Vim or Curl unless you strictly need them.
Creating an Exclusive Mirror
Disable man pages and documentation installed by software packages to save space.
To reduceRUNLayer and combine the software installation steps.
Multiple Dockerfiles using builder mode can be combined into one Dockerfile with multi-stage builds to reduce the image size.
If you are usingaptTo add--no-install-recommendsTo avoid unnecessary packages.
Clean up caches and files that are no longer needed. For example,rm -rf /var/lib/apt/lists/*Works on Debian and Ubuntu, oryum clean allIt applies to RHEL and CentOS.
Use tools such as dive or DockerSlim to analyze and shrink mirrors.

Pipeline for merge requests

Define the pipeline for the merge request

build:
  stage: build
  script: ./build
  only:
    - main

test:
  stage: test
  script: ./test
  only:
    - merge_requests

deploy:
  stage: deploy
  script: ./deploy
  only:
    - main
Copy the code

Jobs

Jobs helped us specify the specific work content of each stage of pipeline. In practical applications, in addition to specifying the specific job content, we also need to control the job execution timing according to the specific situation. The job keywords rules, only, and except give us this ability

rules

# Basic example
job:
  script: echo "Hello, Rules!"
  rules:
    - if: '$CI_PIPELINE_SOURCE == "merge_request_event"'
      when: manual
      allow_failure: true
    - if: '$CI_PIPELINE_SOURCE == "schedule"'
Copy the code

Rules Related subattributes

if
when
allow_failure
changes

If no other keywords are defined after the if keyword is defined, the following default values are used

when: on_success
allow_failure: false
Copy the code

# if Excludes jobs in a few specific cases, but adds jobs in others
job:
  script: echo "Hello, Rules!"
  rules:
    - if: '$CI_PIPELINE_SOURCE == "merge_request_event"'
      when: never
    - if: '$CI_PIPELINE_SOURCE == "schedule"'
      when: never
    - when: on_success
Copy the code

Rules How to check whether a job is added: The job is added when the values of WHEN, if, changes, and exits are all true.

If keyword – related expression syntax

Is it equal: ==

Whether unequal:! =

In addition, the general string needs to be wrapped by “”;

Determine whether a variable is defined:

if: $VARIABLE == null if: $VARIABLE ! = nullCopy the code

Determines whether a variable is defined but empty

if: $VARIABLE == "" if: $VARIABLE ! = ""Copy the code

Determine whether a variable exists

# the VARIABLE cannot be empty if: $VARIABLECopy the code

Comparisons are made using regular expressions

$VARIABLE =~ /^content.*/ # $VARIABLE and regular expression cannot match the result of true $VARIABLE_1! ~ /^content.*/Copy the code

Through && and | | keyword compare multiple logic unite

$VARIABLE1 =~ /^content.*/ && $VARIABLE2 == "something"
$VARIABLE1 =~ /^content.*/ && $VARIABLE2 =~ /thing$/ && $VARIABLE3
$VARIABLE1 =~ /^content.*/ || $VARIABLE2 =~ /thing$/ && $VARIABLE3
Copy the code

Expressions can be grouped by parentheses

($VARIABLE1 =~ /^content.*/ || $VARIABLE2) && ($VARIABLE3 =~ /thing$/ || $VARIABLE4)
($VARIABLE1 =~ /^content.*/ || $VARIABLE2 =~ /thing$/) && $VARIABLE3
$CI_COMMIT_BRANCH == "my-branch" || (($VARIABLE1 == "thing" || $VARIABLE2 == "thing") && $VARIABLE3)
Copy the code

only

test:
  script: npm run test
  only:
    refs:
      - main
      - schedules
    variables:
      - $CI_COMMIT_MESSAGE = ~ /run-end-to-end-tests/
    kubernetes: active
Copy the code

How to determine whether to add job to pipeline:

(Refs any condition is true) && (variables any condition is true) && (change s any condition is true) && (Any selected Kubernetes state match)

Original text:(any listed refs are true) AND (any listed variables are true) AND (any listed changes are true) AND (any chosen Kubernetes status matches)

except

test:
  script: npm run test
  except:
    refs:
      - main
    changes:
      - "README.md"
Copy the code

How to determine whether to exclude job:

Original text:(any listed refs are true) OR (any listed variables are true) OR (any listed changes are true) OR (a chosen Kubernetes status matches)

Scripts grammar

before_scripts
script
after_scripts

Before_script and script use a unified shell; After_script uses a new shell

job:
  before_script:
    - echo "run before script"
  script:
    - echo "run script 1"
    - echo "run script 2"
  after_script:
    - echo "run after script"
Copy the code

Abnormal scenario

Problems with multi-line commands

stages:
  - test

.job_template: &job_configuration
  stage: test
  image: centos:7
  tags:
    - clf-cicd-runner

job9:
  < < : *job_configuration
  script:
    - false && true
    - echo $? The output is 0
Copy the code

How to solve

stages:
  - test

.job_template: &job_configuration
  stage: test
  image: centos:7
  tags:
    - clf-cicd-runner

job:
  < < : *job_configuration
  script:
    - | false && true echo $? The output is 1Copy the code

Split long commands

stages:
  - test

.job_template: &job_configuration
  stage: test
  image: centos:7
  tags:
    - clf-cicd-runner

job_1:
  < < : *job_configuration
  script:
    - echo "run script 1" # output log
    - echo "run script 2" # output log

job_one_line_1:
  < < : *job_configuration
  script:
    - | echo "First command line." echo "Second command line." echo "Third command line."
job_one_line_2:
  < < : *job_configuration
  script:
    - > echo "First command line."
      echo "Second command line."

      echo "Third command line."

job_one_line_3:
  < < : *job_configuration
  script: echo "First command line."

    echo "Second command line."

    echo "Third command line."

The output of the first line is wrapped
job_multi_line_1:
  < < : *job_configuration
  script:
    - | echo "First command line is split over two lines." echo "Second command line."
job_multi_line_2:
  < < : *job_configuration
  script:
    - > echo "First command line is split over two lines."
      echo "Second command line."

job_multi_line_3:
  < < : *job_configuration
  script:
    - echo "First command line is split over two lines."

      echo "Second command line."

job_multi_line_4:
  < < : *job_configuration
  script: echo "First command line is split over two lines."

    echo "Second command line."
Copy the code

Use color codes

Misc.flogisoft.com/bash/tip_co…

job:
  script:
    - echo -e "\e[31mThis text is red,\e[0m but this text isn't\e[31m however this text is red again."
Copy the code

Use it by defining a variable
job:
  before_script:
    - TXT_RED="\e[31m" && TXT_CLEAR="\e[0m"
  script:
    - echo -e "${TXT_RED}This text is red,${TXT_CLEAR} but this part isn't${TXT_RED} however this part is again."
    - echo "This text is not colored"
Copy the code

Variables

Source of variable

Predefined variables
Custom variables (Custome Variable)
- project
  - Through gitlab – ci. Yml
  - Project Settings
  - API
- group
- Gitlab instance

Note: Variables added to the project can be accessed only by the project, variables added to the group can be accessed by the group instance, and variables added to the GitLab instance can be accessed by the whole project in GitLab

Create custom variables

variables:
  TEST_VAR: "All jobs can use this variable's value"

job1:
  variables:
    TEST_VAR_JOB: "Only job1 can use this variable's value"
  script:
    - echo "$TEST_VAR" and "$TEST_VAR_JOB"
Copy the code

Type of variable

Variable: traditional key value type
File: Key is the name of the variable and value is the path to the File where the value is stored

Note: variables defined in.gitlab-ci.yml can only be of variable type, while variables defined in projects, project groups and Gitlab instances can be of variable type or file type

Set variable properties

Mask variable

The value of the mask variable is not displayed in job logs

Protect the variable

After the protect variable, it will only be passed to protected branches and tags

Pass variables between jobs

build:
  stage: build
  script:
    - echo "BUILD_VERSION=hello" >> build.env
  artifacts:
    reports:
      dotenv: build.env

deploy:
  stage: deploy
  script:
    - echo "$BUILD_VERSION" # Output is: 'hello'
  dependencies:
    - build
Copy the code

build:
  stage: build
  script:
    - echo "BUILD_VERSION=hello" >> build.env
  artifacts:
    reports:
      dotenv: build.env

deploy:
  stage: deploy
  script:
    - echo "$BUILD_VERSION" # Output is: 'hello'
  needs:
    - job: build
      artifacts: true
Copy the code

The priority of a variable

Trigger variables, scheduled pipeline variables, and manual pipeline run variables
Project variables
Group variables
Instance variables
Inherited variables
Variables defined in jobs in the.gitlab-ci.yml file (Variables defined in jobs in the.gitlab-ci.yml file)
Variables defined outside of jobs (globally) in the.gitlab-ci.yml file
Deployment variables
Predefined variables: Predefined variables

variables:
  API_TOKEN: "default"

job1:
  variables:
    API_TOKEN: "secure"
  script:
    - echo "The variable value is $API_TOKEN"
Copy the code

Cache and artifacts

During job execution, some dependency packages of a project need to be downloaded (it is time-consuming for an assembly line or job to download dependency packages from the network each time) and the result files are sent to the next job for use. How do I solve this problem? Cache and artifacts keywords will help us solve these problems nicely.

The difference between

The cache is used to cache dependent packages, and the cached contents are stored in gitlab-runner

Artifacts are used to deliver intermediate results from different stage builds, and the cached content will be saved on GitLab and available for download

Cache

The original:

Define cache per job by using the cache: keyword. Otherwise it is disabled.
Subsequent pipelines can use the cache.
Subsequent jobs in the same pipeline can use the cache, if the dependencies are identical.
Different projects cannot share the cache.

Translation:

Use cache: the keyword defines the cache for each job. Otherwise it is disabled.
Subsequent pipelines can use caching
Subsequent jobs in the same pipeline can use caching if the dependencies are the same.
Different projects cannot share the cache.

Matters needing attention

Caching is an optimization, but it is not always guaranteed to work. You may need to regenerate the cache files for each job that requires them.

Whether the cache is available depends on
- Runner Type of the actuator
- Whether to use different actuators to pass the cache between jobs.
The cache storage
- All caches defined in a job are archived in a cache.zip file. The file is stored on the machine installed by Gitlab-Runner;
- Cache is stored in key-value pairs. Cache objects are stored in cache:key and cache.zip values defined in job
- Based on the storage mode described in the previous point, the cache may be unavailable in certain cases
  - Because different jobs have the same key and are uploaded to the cache, overwriting occurs
  - When the cache is decompressed, it is decompressed in the working directory, and runner does not pay attention to the overwriting of files decompressed by different jobs

To sum up, the runner does not validate the cache. The cache is not trusted

# Basic example
stages:
  - build
  - test

before_script:
  - echo "Hello"

job A:
  stage: build
  script:
    - mkdir vendor/
    - echo "build" > vendor/hello.txt
  cache:
    key: build-cache
    paths:
      - vendor/
  after_script:
    - echo "World"

job B:
  stage: test
  script:
    - cat vendor/hello.txt
  cache:
    key: build-cache
    paths:
      - vendor/
Copy the code

Cache related keyword

Paths: Specifies cached files or directories
Key: unique identifier of the cache
- Files: When a file in the specified file changes, a new key is generated
- Prefix: Adds the prefix tocache:key:filesBefore calculating the hash
Untracked: Uses untracked: true to cache all untracked files in the Git repository
When: Define the cache under what conditions. Optional values: on_success, on_failure, always
Policy: defines the cache upload and download policies. The options are pull, push, and pull-push

prepare-dependencies-job:
  stage: build
  cache:
    key: gems
    paths:
      - vendor/bundle
    policy: push
  script:
    - echo "This job only downloads dependencies and builds the cache."
    - echo "Downloading dependencies..."

faster-test-job:
  stage: test
  cache:
    key: gems
    paths:
      - vendor/bundle
    policy: pull
  script:
    - echo "This job script uses the cache, but does not update it."
    - echo "Running tests..."
Copy the code

cache:key

Use the cache:key keyword to provide a unique identifying key for each cache. All jobs using the same cache key use the same cache, including in different pipelines.

Note: One key is corresponding to the cache file, which can be reused in different jobs of the same pipeline, or in different pipelines

Artifacts

The original:

Define artifacts per job.
Subsequent jobs in later stages of the same pipeline can use artifacts.
Different projects cannot share artifacts.

Artifacts expire after 30 days unless you define an expiration time. Use dependencies to control which jobs fetch the artifacts.

Translation:

Define the artifacts for each job
Jobs for subsequent stages of the same pipeline can use artifacts
Artifacts cannot be shared between different projects

If the expiration time is not set, it will expire after 30 days.

Artifacts Related keywords

Name: Define the name of the artifact
Paths: Artifacts Path to the cache file
Dependencies: Defines current job artifacts download the previous jobs of stages
Exclude: excludes the cached file
Expire_in: Use expire_in to specify how long job artifacts should be stored before expiration and deletion
Expose_as: Expose Job Artifacts in the merge request UI using the expose_AS keyword
Public: Sets whether artifacts are publicly available, which means that both anonymous users and visitors can download them in the public pipeline
Untracked: Adds any Git untracked files as artifacts (as well as paths defined in artifacts: Paths)
When: defines the conditions for uploading artifacts
reports
- api_fuzzing
- cobertura
- . [Other Viewer]

Use artifacts to specify a list of files and directories that are attached to the job when it succeeds, fails, or always.

Use artifacts to specify a list of files and directories to attach to the job if it succeeds, fails, or always.

The artifacts are sent to GitLab after the job finishes. They are available for download in the GitLab UI if the size is not larger than the maximum artifact size.

After the job is completed, the artifacts are sent to GitLab. If the size is not greater than the maximum artifacts size, they can be downloaded in the GitLab UI.

By default, jobs in later stages automatically download all the artifacts created by jobs in earlier stages. You can control artifact download behavior in jobs with dependencies.

By default, the later Jobs automatically download all artifacts created by the earlier jobs. You can control the download behavior of artifacts in Jobs by using the dependencies keyword.

When using the needs keyword, jobs can only download artifacts from the jobs defined in the needs configuration.

When using the NEEDS keyword, Jobs can only download artifacts from Jobs defined in the requirements configuration.

Job artifacts are only collected for successful jobs by default, and artifacts are restored after caches.

By default, job Artifacts are only collected for successful jobs and restored after caches.

Key summary:

Store in GitLab
Artifacts created for the previous job are automatically downloaded for the later job and can be used directly
The download behavior of artifacts can be controlled using the dependencies keyword
If the job has the NEEDS keyword, only job artifacts specified by the NEEDS keyword are downloaded

artifacts:dependencies

By default, the job downloads all job artifacts from the previous stage. You can specify job artifacts to download by using dependencies.

build:osx:
  stage: build
  script: make build:osx
  artifacts:
    paths:
      - binaries/

build:linux:
  stage: build
  script: make build:linux
  artifacts:
    paths:
      - binaries/

test:osx:
  stage: test
  script: make test:osx
  dependencies: # Specify that only build: OSX-generated artifacts be downloaded
    - build:osx

test:linux:
  stage: test
  script: make test:linux
  dependencies: # specify that only build: Linux-generated artifacts be downloaded
    - build:linux

# deploy Job is not specified, so by default, go back and download all the previous artifacts generated by the job
deploy:
  stage: deploy
  script: make deploy
Copy the code

artifacts:exclude

Exclude prevents files from being added to artifacts

artifacts:
  paths:
    - binaries/
  exclude:
    - binaries/**/*.o
Copy the code

artifacts:expire_in

Use expire_in to specify how long job artifacts should be stored before expiration and deletion

job:
  artifacts:
    expire_in: 1 week

    Other types of values that can be specified
    # expire_in: '42'
    # expire_in: 42 seconds
    # expire_in: 3 mins 4 sec
    # expire_in: 2 hrs 20 min
    # expire_in: 2h20min
    # expire_in: 6 mos 1 day
    # expire_in: 47 yrs 6 mos and 4d
    # expire_in: 3 weeks and 2 days
    # expire_in: never
Copy the code

artifacts:expose_as

test:
  script: ["echo 'test' > file.txt"]
  artifacts:
    expose_as: "artifact 2"
    paths: ["file.txt"]
Copy the code

The effect is shown below:

artifacts:paths

Defines the path to a cache file or directory. The path is relative to the project directory and cannot be linked directly outside the project directory.

artifacts:untracked

Add any Git untracked files as artifacts (as well as the paths defined in artifacts: Paths) using artifacts: Untracked. Artifacts: Untracked Ignores the configuration in the.gitignore file for the repository.

artifacts:
  untracked: true
  paths:
    - binaries/
Copy the code

artifacts:when

Define the conditions for uploading artifacts

Optional value:

On_success (default): Upload artifacts if the job is successfully executed
On_failure: Upload artifacts when job execution fails
Always: Always upload artifacts

conclusion

In this chapter, we learned about the structure of Pipelines, some skills of pipeline performance optimization, and the ability to control the timing of job execution. In addition, you learned to use cache and artifacts keywords to improve job execution efficiency and to cache some build result files generated during pipeline execution. To be honest, the author thinks that we can deal with most of the demand scenarios after learning this, let alone understand the existing internal assembly line related content. But I’m looking forward to taking you a closer look at what else is in CI/CD. In the next section, we’ll take a closer look at the various keywords in CI/CD.

Refer to the link

Docs.gitlab.com/ee/ci/index…