What is version control

Version control is a system that records changes to the contents of several files so that future revisions to a particular version can be reviewedCopy the code

Centralized version control system

CVS, Subversion, and Perforce all have a single, centrally managed server that holds revisions of all files, and people who work together connect to this server via a client to retrieve the latest files or submit updates. The disadvantage is a single point of failure of the central server. If it’s down for an hour, no one can commit an update during that hour and no one can work together.

Distributed version control system

Clients such as Git, Mercurial, Bazaar, and Darcs don’t just take snapshots of the latest version of the file, but mirror the entire original repository. This way, a failure of any of the co-operating servers can be later recovered using any of the mirrored local repositories.

Git Basics

The main difference between Git and other version control systems is that Git is more like a small file system and only cares if the overall file data changes. Git does not keep data on these differences. In practice, Git is more like taking snapshots of files that have changed and recording them in a tiny file system. Each time an update is committed, it looks through the fingerprint information of all the files and takes a snapshot of the file, then saves an index pointing to that snapshot. To improve performance, Git doesn’t save the file again if it hasn’t changed. Instead, it only makes a connection to the last saved snapshot.

Data integrity

Before being saved to Git, all data is checksum calculated for the content and used as a unique identifier and index for the data. Git uses the SHA-1 algorithm to compute the checksum of the data, calculating a SHA-1 hash on the contents of a file or the structure of a directory as a fingerprint string. The string consists of 40 hexadecimal characters (0-9 and a-f) :

24b9da6552252987aa493b52f8696cd6d3b00373
Copy the code

Git’s work depends entirely on these fingerprint strings, so you’ll often see hashes like this. In fact, everything stored in a Git database is indexed by this hash value, not by filename.

Three states

There are only three states in Git for any given file:

- Committed: A file has been safely saved in a local database - Modified: A file has been modified, but has not been committed for preservation - Staged: A file has been modified and placed in a list to be saved next time it is committedCopy the code

Get the Git repository for your project

The first is to create a new Git repository by importing all the files in an existing directory.

$git add *. Java $git add README $git commit -m 'Initial project version'Copy the code

The second option is to clone a new mirror repository from an existing Git repository.

$ git clone git://github.com.git
Copy the code

Git internals

Git is basically a content-addressable file system. Internally, Git is a simple key-value data store. It allows content of any type to be inserted and returns a key value by which the content can be retrieved at any time.

The directory structure

Each project has a Git directory, which is where Git keeps its metadata and object databases.

  • The config: file contains project-specific configuration options
  • Hooks: client – or server-side hook scripts
  • Objects: Directories store all data contents
  • Refs: The directory stores Pointers to the submitted objects of the data (branches)
  • HEAD: the file points to the current branch
  • Index: The file stores temporary area information
  • Info: Contains information about git repositories
  • Description: For GitWeb applications

Git object

Git initializes a repository. Git initializes the Objects directory and creates the pack and info subdirectories in that directory, but there are no other regular files in that directory.

$ git init  git-test
Copy the code

2. Submit a text 111.txt containing Hello Word

$ git add  111.txt
$ git commit -m '1'
Copy the code

This is how Git stores its data content — it generates a file for each content, takes a SHA-1 checksum of that content with the header information, and creates subdirectories named after the first two characters of the checksum. And name the file with the remaining 38 characters (checksum) (save to a subdirectory).

3. Run the cat-file command to retrieve the data content and view the three data in sequence

Tree (Tree) object

$ git cat-file -p 6dfa3fb63d7fa3f6240dc4ec4f8324f04f31a96e
100644 blob 40d8cf85db392991a45f8fdfda2b5b04f9a49b89    111.txt
Copy the code

The tree object can store file names and also allows you to store a group of files. Tree Indicates the directory of the object type in the file system. A single tree object contains one or more tree records, and each record contains a SHA-1 pointer to a BLOb or subtree object

Commit the object

$git cat - file - p 08 c0d79e1721c7a7297fefef65f9c863bd696e39 tree 6 dfa3fb63d7fa3f6240dc4ec4f8324f04f31a96e author maybe tomorrow 1633934712 +0800 committer [email protected] 1633934712 +0800Copy the code

The format of the commit object is simple: it indicates the top-level tree object of the project snapshot at that point in time, author/committer information (obtained from the user.name and user.email Git configurations), the current timestamp, a blank line, and commit comment information.

A blob object

$ git cat-file -p 40d8cf85db392991a45f8fdfda2b5b04f9a49b89
hello word
Copy the code

Git stores these snapshots using blob objects, which correspond roughly to the contents of the file

Git reference

Store Pointers to commit objects for data (branches) in git’s refs directory. There are heads directories and tags directories

The HEAD tags

The HEAD file is a reference identifier to your current branch. Such a reference identifier — which does not look like a normal reference — does not actually contain a sha-1 value, but rather a pointer to another reference.

$ git branch test
Copy the code

Every time you run git branch, you create a current branch file and add the last committed SHA-1 value in your current branch to the reference for the branch you want to create.

08 c0d79e1721c7a7297fefef65f9c863bd696e39 # # # # test document object is the last time you submit information $git cat - file - p 08 c0d79e1721c7a7297fefef65f9c863bd696e39 tree 6 dfa3fb63d7fa3f6240dc4ec4f8324f04f31a96e author perhaps 1633934712 + 0800 tomorrow committer [email protected] 1633934712 +0800Copy the code

Tags

A Tag object is very much like a COMMIT object — it contains a Tag, a set of data, a message, and a pointer. The main difference is that the Tag object points to a COMMIT instead of a tree. It’s like a branch reference, but it doesn’t change — it always points to the same COMMIT, just providing a friendlier name.

Remotes

Remote reference. If you add a remote repository and PUSH code there, Git will record the value of each branch of the remote you last pushed to in the refs/remotes directory.

summary

Before creating a commit object with git Commit, Git computes the checksum of each subdirectory and stores the directories as tree objects in git repositories. Git then creates a commit object that, in addition to the commit information, contains Pointers to the tree object (the project root directory) so that it can reproduce the contents of the snapshot in the future if needed.

There are five objects in the Git repository: three BLOB objects that represent the contents of a file snapshot; A tree object that records the contents of the directory tree and the bloB object index of the files in it; And a COMMIT object that contains indexes to the tree object (the root directory) and other commit information metadata.

After modification, the submitted object will contain a pointer to the last submitted object. After two commits, the warehouse history becomes:

What is a branch

Basically, a branch in Git is just a pointer or reference to a HEAD record in a working version. Is a mutable pointer to the COMMIT object. Git will use master as the default name for the branch. After a few commits, you actually have a master branch that points to the last committed object, which automatically moves forward each time it commits.

How does Git create a new branch, which creates a new branch pointer on the current commit object

Since a branch in Git is really just a file that contains the checksum of the object in question (a 40-character SHA-1 string), creating a branch is as simple as writing 41 bytes (plus a newline character) to a file.

Merge and rebase

In general, the simplest scenario is to maintain stable code in the Master branch, then develop new features on the feature branch, or review and test other people’s contributions, then incorporate it into the trunk, and finally remove the feature branch.

The easiest way to merge branches is the merge command, which combines the latest snapshots of the two branches (C3 and C4) with their latest common ancestor (C2) in a three-way merge.

Instead of simply moving the branch pointer right, Git takes a new snapshot of the result of the three-way merge and automatically creates a commit (C5) pointing to it. This particular commit is called a Merge commit because it has more than one ancestor.

There is another option: you can re-patch the changes made in C3 on top of C4. In Git, this operation is called rebase. With the rebase command, changes committed in one branch can be replayed in another branch.

The principle of it is to go back to the two branches (the branch of your branch) and what you want yan entered the common ancestor, extraction of branch each time to submit your differences (diff), keep these differences to a temporary file, and then from the current branch to yan you need merge branches, applied in each sequence difference patch file.

The snapshot that was pointed to in the last commit in the result of the merge, whether through a derivatives or a three-way merge, is the same snapshot, but the commit history is different. Derivatives reproduce the changes in the order in which they occur in each row, while a merger brings together the final result.

In derivatives, some existing commits are actually discarded to create similar but different new ones. If you push a commit somewhere and someone else downloads it and works on it, and then you rewrite the commit with Git rebase and push it again, your collaborators will have to remerge their work and things will get messy when you get it from them again. Never deduct updates that have been pushed to the public repository.

reference

Pro Git (Chinese Version)