Readers who read the previous three issues of GitLab related analysis (pretend to be readers)

  • GitLab series 1 basic functions and architecture introduction
  • GitLab Series 2 GitLab Workhorse
  • GitLab series 3 Unicorn

I always post GitLab architecture diagrams. So far we have explained the “half” of how GitLab handles HTTP/HTTPS requests: User requests come in through the HTTP/HTTPS portal and arrive at Gitlab-workhorse via Nginx. If the requests that can be processed are handled by the user first, then the requests that cannot be processed are handed over to unicorn. Unicorn can be considered as a shell to ensure efficient scheduling and processing of requests. The kernel that handles the request is mostly GitLab Rails, or Gitlab-CE, which handles the “other half” of the request processing



I’m not going to cover GitLab Rails in this issue, because the project is so large and complex that it could easily be described as a Web application powered by a persistent database, PostgreSQL, and distributed cache, Redis. And the purpose of this series is to cover the functionality of each component of GitLab as much as possible (as opposed to the code structure of open source projects, etc.). As mentioned earlier, GitLab Rails is equivalent to GitLab’s real kernel, where most of the processing logic for user requests is implemented. In the future, source tracking and analysis of Gitlab-CE will be conducted in combination with some business scenarios.)

What is this episode about? Remember that the previous installments were all about HTTP/HTTPS routing, but what about SSH? From the architecture diagram, the first port of entry for SSH is the GitLab Shell. This section focuses on the preparation for understanding how Gitlab-shell works



When we think of shells, we might think of command-line terminals like Bash or ZSH. To some extent, the GitLab Shell can also be thought of as a collection of predefined commands, but it’s more than that

GitLab Shell is a Shell?

Remember this post from the first issue explaining how SSH uses symmetric encryption to log in to a server? To review: in fact, the whole process mainly uses the symmetric encryption algorithm of public key encryption and private key decryption: $HOME/.ssh/authorized_keys file of the remote server saves the public key uploaded by the user and generally appends it to the end of the file. When the user logs in, the remote server sends a random sequence to the user. The user encrypts it with his local SSH private key and sends it back to the remote server. The remote server decrypts it with the stored public key. If successful, the user is trusted and allows you to log in without asking for a password





When performing git-over-ssh on the local machine, can we log in to the bash terminal on the server to do some work? Obviously not: the server can’t let us log in to the terminal to execute everything, unless the server manager wants the user to experience rm -RF /* delete library run away



So GitLab uses a feature of SSH: authorized_keys specifies the command to execute after login. The remote server appends the user’s public key to the end of the $HOME/.ssh/authorized_keys file



Now, as a server manager, you certainly do not want users to log in to your server terminal to mess around, so the idea is to only allow users to execute the shell command specified by the administrator, to achieve the role of security control, as shown in the following figure



$HOME/.ssh/authorized_keys $HOME/.ssh/authorized_keys $HOME/.ssh/authorized_keys

# command="./cmd ssh-rsa <my-rsa-key>"
command="/home/git/gitlab-shell/bin/gitlab-shell key-10",no-port-forwarding,no-X11-forwarding,no-agent-forwarding,no-pty ssh-rsa <my-rsa-key>
Copy the code

The user cannot use SSH to log in to the server terminal. After logging in, the user can only run this command and exit after the command is executed

/home/git/gitlab-shell/bin/gitlab-shell key-10
Copy the code

This is why the GitLab Shell is named so figuratively: Only allow you to execute the GitLab Shell on the server, other Shell commands, such as rm -rf /* a shuttle user to spare me. ARGV is the execution parameter of /home/git/gitlab-shell/bin/gitlab-shell, which represents the key identifier of the user

/home/ gitlab-shell/bin/gitlab-shell



Note that we can also use the $SSH_ORIGINAL_COMMAND variable to fetch the original_command from the client. Here’s an experiment: print out the who and original values of /home/git/gitlab-shell/bin/gitlab-shell

Print the variable value
File.write("/tmp/git_original_cmd", original_cmd)
File.write("/tmp/git_who", who)
Copy the code

When I was in the local respectively carry out git push | git fetch | git pull | git clone, print the value of are:

// action => original_cmd who
git push => git-receive-pack 'BradeHyj/ToyProject.git' key-11
git fetch => git-upload-pack 'BradeHyj/ToyProject.git' key-11
git pull => git-upload-pack 'BradeHyj/ToyProject.git' key-11
git clone => git-upload-pack 'BradeHyj/ToyProject.git' key-11
Copy the code

git-receive-pack – Receive what is pushed into the repository

git-upload-pack – Send objects packed back to git-fetch-pack

What does that say? When git push is performed locally, it involves logging in to the server to execute commands (and this login is followed by commands recognized by gitlab-shell).

# git push
ssh user@host:port git-receive-pack 'BradeHyj/ToyProject.git'
Copy the code

Original_cmd: original_cmd: original_cmd: original_cmd: original_cmd: original_cmd: original_cmd: original_cmd

  • Who: Parameter passed in when SSHD calls GitLab Shell
  • Original_cmd: previously mentioned$SSH_ORIGINAL_COMMANDVariables are removed immediately after being fetched

Pro Git 2: SSH intelligent transfer protocol: git-scm.com/book/zh/v1/…

All git commands executed by gitlab-shell will be verified before they are actually executed, as shown in the following code

# lib/gitlab-shell.rb. def verify_access status = api.check_access(@git_access, nil, @repo_name, @who || @gl_id,'_any', GL_PROTOCOL)

    raise AccessDeniedError, status.message unless status.allowed?

    status
  end
Copy the code

Git hooks

The processing logic of the GitLab Shell relies on git hook scripts. All code repositories stored on the GitLab server have hooks folders linked to /home/gitlab-shell /hooks, so it is important to understand the execution logic of the Gitlab-shell hook script



Pro Git 2: server hooks: git-scm.com/book/zh/v2/…

Hooks are stored in the hooks subdirectory under Git. That is.git/hooks in most projects. When you initialize a new repository with Git init, git places sample scripts in this directory by default. In addition to being called themselves, these scripts also reveal the parameters passed when triggered. All of the examples are shell scripts, and some of them are mixed in with Perl code, but any properly named executable script will work fine — you can write them in Ruby or Python, or some other language. These examples have names that end in.sample, which you need to remove if you want to enable them.

To activate the hooks script, place a properly named executable file in the hooks subdirectory in your Git directory. This way, it can be called by Git

As a system administrator, you can use several server-side hooks to enforce various types of policies on projects. These hook scripts run before and after they are pushed to the server. A hook that runs before a push to the server can exit at any time with a non-zero value, reject the push and return an error message to the client, and you can set up a push strategy as complex as you want.

  • pre-receiveWhen handling push operations from clients, the first script to be invoked ispre-receive. It takes a set of pushed references from standard input. If it exits with a non-zero value, none of the tweets will be accepted. You can use this hook to block non-fast-forward updates to references or to control access to all references and files that the push changes.
  • update : updateScripts andpre-receiveThe script is similar except that it runs once for each branch ready for update. If a tweeter pushes content to multiple branches at the same time,pre-receiveIt only runs once, compared toupdateEach branch is run once for each pushed branch. Instead of reading content from standard input, it takes three parameters: the name of the reference (branch), the SHA-1 value of the content that the reference points to before the push, and the SHA-1 value of the content that the user is about to push. ifupdateThe script exits with a non-zero value, and only the corresponding reference is rejected; The rest will still be updated.
  • post-receive : post-receiveHooks run at the end of the process and can be used to update other system services or notify users. It takes in andpre-receiveSame standard input data. Its uses include sending a message to a mailing list to notify the continous Integration server, Or update ticket-tracking systems — or even analyze submitted information to determine whether a ticket should be turned on, changed, or closed. The script cannot terminate the push process, but the client will remain connected until it finishes running, so use it with caution if you want to do anything else, as it will take you a long time.

Let’s take a look at the gitlab-shell pre-receive logic (the following examples are in the $gitlab-shell/hooks folder)



As you can see, preparations such as user permission authentication and authorization must be performed before pushing the server repository (Git-receive-pack)



Post-receive differs from pre-receive in execution logic



After executing any Git command for server-side repository operations, the Post_receive interface of GitLab Rails is invoked to handle the subsequent logic



Introduction to GitLab Shell components

The GitLab Shell component is used to handle all git SSH sessions for GitLab. When a user accesses GitLab over SSH (such as Git pull/push over SSH), the GitLab Shell component does the following:

  1. Restrict users from using predefined git commands (git push, git pull, etc.)
  2. Call the GitLab Rails API to check whether the user is authorized and to determine which Gitaly server the user accesses the code repository (the main function of the Gitaly component is to perform actions related to the code repository)
  3. Copy data back and forth between the SSH client and the Gitaly server

When we do git pull/push over SSH, the following things happen:

  1. Git pull over SSH -> gitlab-shell -> call gitlab-Rails API to authenticate user information and authorize -> authorize success or failure -> establish Gitaly session
  2. Git push over SSH -> gitlab-shell -> create Gitaly session -> execute gitlab-shell pre-receive hook script on Gitaly server -> Call the Gitlab-Rails API to authenticate user information and authorize -> Authorization succeeds or fails

For historical reasons, the Gitlab-shell component also contains hook scripts that allow Gitlab to validate user git push commands (for example, to determine whether the current user has permission to push local code changes to a protected branch). These hook scripts can also trigger GitLab events (such as CI pipeline startup when the user successfully pushes code). Gitlab-shell’s Git hook script belongs to the Gitaly component and only runs on the Gitaly server. There is no need to install the Gitlab-shell component on the Gitaly server, see gitlab.com/gitlab-org/…

The appendix

Refer to the link

Gitlab-shell official warehouse

Pro git 2

GitLab series 2 GitLab Workhorse GitLab series 3 Unicorn