1 introduction

  • Git is flexible, and there are multiple ways to achieve the same result.
  • By memorizing the combination of commands in different scenarios, you will stay at the level of “knowing how it is, not knowing why it is”.
  • Only by understanding the internal principles of Git and the underlying operations of Git commands can you use Git in a simple way.

2 Git internal principles

Git is a content-addressing file system developed in C and provides a VCS user interface on top of it.

2.1 Git Directory Structure

Use the git init command to initialize the current directory and generate the **.git** folder.

1. Workspace, staging, and Git repository

  • The workspace is the current directory (except.git/) where all editing takes place.
  • The staging area corresponds to the. Git /index file, which contains the information of the staging area and can be used to generate git tree objects. (git initGit /index is not generated after execution, but the first timegit addGit /objects/**.
  • Git repository corresponds to.git/, which stores all historical snapshots of the project for use when needed.

2.. git/directory

.git/ contains the following directories and files:

  • Branches / : The new version is no longer in use
  • Description: For GitWeb applications only
  • Config: indicates the configuration options of the current project
  • Info / : Unlike.gitignore files, you can configure a local file ignore mode that does not push to the remote library and affect others.
  • Hooks / : directory holds hook scripts
  • Objects / : Directories store all data contents
  • Refs / : Directory stores Pointers to commit objects for data
  • HEAD: The file content is the current branch
  • Index: The file content is the information about the staging area

2.2 the Git command

Git contains both low-level and high-level commands (Procelain).

  1. Git commands commonly used by users are high-level commands, such as add, commit, and checkout. High-level commands are user-friendly and easy to understand and operate.
  2. Git was originally designed as a set of tools for VCS, also known as low-level commands; Low-level commands are not used directly by users, but are invoked by shells or scripts.

    Here are some basic commands:

    • Checkout-index: Copy files from the index to the working tree.
    • Cat-file: Provide content or type and size information for repository objects.
    • Hash -object: Compute object ID and optionally creates a blob from a file.
    • Update-index: Register file contents in the working tree to the index.
    • Write-tree: Create a tree object from the current index.
    • Commit -tree: Create a new commit object.

2.3 the Git object

Git defines four types of objects: blob, tree, commit, and tag. They are all located in the **.git/objects/** directory. Git objects add a header to the original file, that is, object content = object header + file content. Git cat-file cannot be used to read git cat-file. Git cat-file cannot be used to read git cat-file. The format of the object header is as follows: Object header = object type + space + data length + NULL byte. For example, in a file whose content is hello World, the bloB object header is blob 11\000.

  • Blob: Workspace files enter git repositories as BLOB objects, the UNIX equivalent of inodes or file content.
  • Tree: A tree object contains references to blob objects and other tree objects, equivalent to directories in UNIX.
  • Commit: Contains the Hash reference of the last COMMIT object, the Hash reference of the top-level tree object of the project snapshot at that point in time, author/submitter information, timestamp, blank line, and committed comment information.
  • Tag: Contains a commit Hash reference, tag name, and other information (depending on the tag type).

2.4 Content addressing

  1. Relying on low-level commandsgit hash-objectCommand to add header information to the file content, calculate the hash value and return, add-wParameter to create a BLOb object in git repository (Blob object = object header + file content).
  2. When blob objects are stored in the Git repository directory (.git/objects/), the storage directory (the first two bits of the hash string) and the named file (the last 38 bits of the hash string) are specified based on a 40-bit (hexadecimal) hash string. For example, the hash value of a BLOb object is62/0d4582bfbf773ef15f9b52ac434906a3cdf9c3, then its path in git repository is.git/objects/62/0d4582bfbf773ef15f9b52ac434906a3cdf9c3.
  3. Git content addressing is essentially: Git identifies and indexes files based on the Hash value generated by the file content (adding headers). It is not necessary to write a complete Hash string for command operations, as long as the entered Hash string length is uniquely identifiable and indexed.
  4. Git can be used on large projects without having to worry about Hash collisions. The probability of a hash collision from SHA-1 is negligible.

2.5 Git Version Mechanism

  1. HEAD points to the current branch. If master is the current branch, the content of the HEAD file isref: refs/heads/master.
  2. Branches (local, remote, remote trace, trace) and tags (tag objects) all contain references to commit objects.
  3. The COMMIT object contains a reference to the last COMMIT object (similar to a single linked list) and a reference to the top-level tree object committed this time.
    • Each top-level tree object can be considered a complete version.
    • The commit history and version history are formed through the chain structure of the COMMIT object.
  4. Git branches and tags tag the current branch by referring to a COMMIT object.

    Note: Any reference to a Git object refers to a 40-bit Hash string of Git objects.

2.6 RefSpec

The reference specification refers to the mapping between the remote warehouse branch and the local branch, and can be expressed as < SRC >:< DST >, which also implies that the data flow is SRC → DST.

2.6.1. Fetch and push commands

Both commands contain refspecs to specify the data flow.
git fetch [remote repository] [remote branch]:[local branch]
git push [remote repository] [local branch]:[remote branch]
Copy the code

2.6.2 Config File To configure refSpec

When using the default fetch/push command, Git will operate according to the refSpec configuration in.git/config.

2.6.2.1 when throughgit remote addThe git/config command adds a configuration node to the.git/config file when you add a remote branch.

The “+” in fetch is optional and tells Git to force it to update even if it cannot fast-forward. Then performgit fetch orginThis default command pulls all branches of the Origin remote repository.

2.6.2.2 throughgit log origin/masterTo view the Master branch from the remote repository fetch.
# the following three commands are equivalent, Git will be extended to them all refs/remote/origin/master
git log origin/master
git log remote/origin/master
git log refs/remote/origin/master
Copy the code
2.6.2.3 refSpec Specifies branch mapping

Fetch = + 1) can be rewritten the fetch behavior refs/heads/master: refs/remotes/origin/mymaster, specify the master branch of remote mapping for local origin/mymaster branch. 2) Multiple mappings can also be specified and multiple specified branches can be pulled at one time.

[remote "origin"]
      url = [email protected]:kivihub/test.git
      fetch = +refs/heads/master:refs/remotes/origin/master
      fetch = +refs/heads/experiment:refs/remotes/origin/experiment
      fetch = +refs/heads/qa/*:refs/remote/orgin/qa/*
Copy the code

To push the refSpec from the local qa/master branch to the remote QA/Master branch, configure the following:

[remote "origin"]
      url = [email protected]:kivihub/test.git
      fetch = +refs/heads/master:refs/remotes/origin/master
      fetch = +refs/heads/experiment:refs/remotes/origin/experiment
      fetch = +refs/heads/qa/*:refs/remote/orgin/qa/*
      push = refs/heads/master:refs/heads/qa/master 
Copy the code

You can delete the master branch of the remote Origin library by running git push origin :master. Because the refSpec is in the form of: make the remote branch empty by setting it blank, which means delete it.

2.7 other

  1. git gcGarbage collection commands are used to compress or delete data to save disk space.
    • Loose objects are packaged and stored in packFile.
    • Delete objects that are not referenced by any COMMIT and have been around for a while (months).
Refer to the article
  1. GIT Science Series 5: Index in GIT
  2. Pro Git related chapters
    • Git internals – Low-level commands and high-level commands
    • Git objects
    • Git internals – Git references
    • Git internals – Package files
    • Git internals – Reference specifications
    • Git internals – Transport protocols
    • Git maintenance and data Recovery
    • Git internals – Environment variables
  3. Git command List
  4. Gitignore and exclude