Git has a well-designed model that enables it to support all the features required for version control, such as maintaining history, supporting branching, and facilitating collaboration. However, learning Git from the top down (starting with the command line interface) can be very confusing. If there is a problem, you can only save the current work and then copy it again to continue processing. If we can understand the underlying data structure first, we will be much more comfortable with the command line interface.

The snapshot

Git treats files and folders in top-level directories as collections and manages their history through a series of snapshots. In Git terminology, a file is called a Blob object, or data object, which is a set of data. Directories are called trees, which map names to Blob objects or tree objects (allowing directories to contain other directories). A snapshot is the topmost tree to be traced, also known as a COMMIT.

History modeling: Associated snapshots

In Git, history is a directed acyclic graph composed of snapshots. Each snapshot has a series of “parents,” that is, a series of snapshots before it. Note that snapshots can have multiple “parents” at the same time, for example, two merged branches.

o <-- o <-- o <-- o <---- o
            ^            /
             \          v
              --- o <-- o
Copy the code

Data model and its pseudocode representation

Git data model can be more clearly understood through pseudo-code representation.

// A file is a set of datatypeBlob = array<byte> // A directory containing files and directoriestypeTree | tree = map < string, blob > / / every submitted contains a fathers, metadata, and the top of the treetype commit = struct {
    parent: array<commit>
    author: string
    message: string
    snapshot: tree
}
Copy the code

Object and memory addressing

Can be a blob of objects in the Git, tree, or commit: type object = blob tree | | commit, all objects through the SHA – 1 hash addressing.

objects = map<string, object>

def store(object):
    id = sha1(object)
    objects[id] = object

def load(id):
    return objects[id]
Copy the code

reference

These hashes are given human-readable names, known as references. A reference is a pointer to a submission. Unlike an object, it is mutable (references can be updated to point to new commits). For example, a master reference usually points to the last commit of the main branch.

references = map<string, string>

def update_reference(name, id):
    references[name] = id

def read_reference(name):
    return references[name]

def load_reference(name_or_id):
    if name_or_id in references:
        return load(references[name_or_id])
    else:
        return load(name_or_id)
Copy the code

In Git, our current location has a special index called “HEAD”.