“This is the 12th day of my participation in the Gwen Challenge in November. Check out the details: The Last Gwen Challenge in 2021.”


While many programmers are familiar with using Git, I’d like to know more about how git is implemented. Git uses a number of clever ideas to optimize common version control operations. Git learning should be about practice, not about reading lots of documents. So that’s why I’m writing this article.

This article is spiced with Git in rust, but you can find many implementations of Git in other languages (for example, Go).

The object model

This series is not an introduction to Git. There are plenty of good explanations on the web about how to use Git. But I do want to review a few core concepts of Git, because they are important to understanding how Git works.

Version control

Git is one of many “version control software” programs, but it is by far the most popular (Linus NB). Although the implementations of these VCS differ in detail, they share many of the same core ideas.

The purpose of version control is not just to store the current state of a set of files, but also the history of how those files have changed over time.

This history can be browsed, updated, and shared, making version control a very useful tool in applications where editing history is important, especially in software development.

submit

Git stores its history as a collection of snapshots called “commit”. You can view the status of any submitted document; This operation is called “checking” the commit.

You can think of commits as a series of backups of your code, although Git has some tricks to reduce the amount of storage required for all these backups as we’ll see later.

Each commit builds on previous commits, which are called their “parent” commits.

In the simplest case, the commit history is “linear” (that is, a straight line), meaning that every commit has a parent and a child (except for the first and last commit).

For example, there might be three commits, A, B, and C in that order. Here we can illustrate this with a commit diagram:

A --- B --- C
Copy the code

However, even in small projects, the commit history is rarely completely linear. Git allows “branching” of commit history, where multiple commits are built on top of the same parent commit. This is useful, for example, when developers are developing features in parallel and don’t want to affect each other’s code until their features are finally complete. This might result in a submission diagram like this:

A --- B --- C --- D
        \
         E --- F --- G
Copy the code

** Commit history can also be “merged”, i.e. one commit merges multiple parent commits. ** In the above example, commit G may be merged into D, resulting in a new commit H.

A --- B --- C --- D --- H
        \             /
         E --- F --- G
Copy the code

Branches and labels

Each commit has an ID called the “commit hash”(you’ll see where this hash comes from in a moment). Although we can refer to any submission with a hash value, it is often convenient to name the submission.

Branches and tags are two ways to do this. Both are references to the commit.

The difference is that a lift on a branch updates the branch to point to a new commit, while the label always points to the same commit.

The traditional approach is to use the main branch to point to the latest producable commit, with feature/xyz, fix/xyz, etc., to track the progress of adding features or fixing bugs. Tags are primarily used to mark releases that correspond to a particular version.