Git subtree usage and FAQ analysis

Those of you who have several years of experience in front-end development have encountered the problem of how the front-end can share the code in the scenario of splitting the project. There are generally four options:

npmPackages sharing
dllShared
submodule
subtree

1. Comparison of schemes

(I) NPM package sharing

The easiest way to do this is through NPM package sharing, which has the following disadvantages in practice:

Management difficulties
- Single responsible person release mode

The person in charge has a lot of work to do, because the person in charge needs to deal with the relationship between the code going online, which makes the high version contain the low version code. In fact, due to schedule reasons, the person in charge is often put on line after the low version, which has high requirements for the person in charge. In addition, the development and release need to communicate with the owner is not a small cost.

Develop a self-publishing approach

Without an automated process, the code inclusion relationship is difficult to manage, and it is easy for packages released by A to lack B’s code, resulting in online problems.

Go live and test later

To deploy the test environment, you must first publish it to the online environment. Due to this unreasonable operation, you need to temporarily change the reference version number and other operations, which are tedious and error-prone.

(2) DLL sharing

DLL sharing is the packaging of a common module as a single project, which is eventually referenced by the user through script tags, and the methods in the module are exposed to the user through global variables. Compared with other methods, the biggest advantage of this method is that the public module is upgraded once, and all references take effect. But this model has the following problems:

DLL modules must be packaged, but after packaging, common packages such as React and LoDash are repeatedly packaged. In addition, the whole JS file is loaded at one time, so it cannot be loaded on demand. Of course, these two problems can be solved by configuring the Webpack External and unpacking, so it’s worth a try.
The common module is loaded at runtime, and the editor cannot obtain any definition of this module in the referenced project. The completion and verification functions of the editor cannot be used, which reduces the development efficiency. If you use TS, you can consider writing a TS definition in the public module. During development, you can solve this problem to a certain extent by using NPM link to reference the TS definition link.

(3) SubModule

Submodule is used to replace the NPM package. The principle is to save the commitId of one Git project (parent project) under another Git project (child project). The submodule command can synchronize the commitId code to the parent project.

Because submodule actually only takes the public module code as a directory of the parent project and runs with the parent project, there are no problems caused by NPM package, DLL package and other independent references outside the project. Submodule itself is a Git function and the whole development process is pure Git operation. Therefore, there is no trouble in NPM package management. Compared with NPM package, NPM package has obvious advantages. Currently, there are only a few small problems found in practice:

The operation is slightly tedious

The parent project will record the commitId of a SubModule. If the id of the subModule changes, the parent project will execute the subModule command to resynchronize the commitId code to the local location. Additional subModule commands are required. Git config –global subModule. recurse True Updates subModule automatically every time you switch branches or pull code. In addition, to modify the code of submodule, it is necessary to operate both parent project and child project git at the same time, which is more complicated than only operating a single project.
Learning costs

It takes practice to understand the logic of submodules, and time to roll it out across the team.

(4) Subtree

Subtree actually has a lot in common with subModule and is often compared online.

submodule

In the parent project to maintain the child project git address, the current code binding commitId, etc., through the command to synchronize the commitId corresponding code to the parent project, and in git operation, the parent project and the child project are also separate operation.

subtree

Don’t record any child project information, and each input command, must take the git information such as address, branch name, according to the command, the command specified branch code copied to the specified directory of the project, also can make the project with a command of the specified directory code push to engineering, in the process of development, no need to worry about completely the existence of the subtree, Just assume there is only one project development commit and wait until you need to synchronize the code to the Subtree repository to execute the Subtree push command.

On the surface, subtree has the benefits of submodule, and the operation is simpler than subModule and easier to understand. Therefore, the current team chooses Subtree as the code sharing scheme across projects, but it is not so simple in practice. See iii.

Use subtree

The following commands are executed in the my-app project using /subtree/subtreeA as the subtree directory. The subtree project is a subtree-project example

Git subtree add

Complete command:

git subtree add --prefix=subtrees/subtreeA http://xxxxxxxxxxxx/subtree-project.git master

Usage Scenarios:

The subtree is added for the first time.

Actual operation:

Subtrees /subtreeA subtrees/subtreeA subtrees/subtreeA subtrees/subtreeA subtree

Git subtree pull

Complete command:

git subtree pull --prefix=subtrees/subtreeA http://xxxxxxxx/subtree-project.git master

Usage Scenarios:

The subtree code has been updated and needs to be synchronized to this project.

Actual operation:

This command merges the new commit records from the master branch of subtree-project. Submission record of My-app:My-app subtrees/subtreeA will drop more firstFile files

Git subtree push

Complete command:

git subtree push --prefix=subtrees/subtreeA http://xxxxxxxx/subtree-project.git master

Usage Scenarios:

Modify subtree contents in parent project, need to synchronize the changes to subtree git repository.

Actual operation:

Start by adding three files to my-app

/subtrees/subtreeA/secondFile.ts

/notSubtreeFileA.ts

/notSubtreeFileB.ts

Put/subtrees/subtreeA/secondFile and/notSubtreeFileA as a submission from the subtree file commit

Commit /notSubtreeFileB as a no Subtree file commitSubmission record of My-app:Subtree-project submission record:After executing the subtree push command, you will find that the previous two commits have been synchronized to the subtree-project, and the commit message is consistent with the commit information on the my-app. No subtree file commit does not have files in the subtree directory, so the whole commit is ignored.

Subtree-project submission record:

Git subtree split

Complete command:

git subtree split --prefix=subtrees/subtreeA --rejoin

Usage Scenarios:

Improve subtree push efficiency.

Actual operation:

fromsubtree pushAs you can see from the command,subtree pushActually go through the project each submission, the submission documents involvedsubtreePick out the directory and sync tosubtreeGit will be able to execute before version 2.19, but it will be slow. After version 2.19, an exception will be thrown directly. spiltAfter executing the split command, you will see the commit record of the my-app project, and a new branch is created, and the branch andsubtree pushThe logic of the operation is the same as that involved onlysubtreeThe directory commit is picked out, and eventually the branch is merged into the original branch, producing a Split XXXXX commit record.Subsequent executionsubtree pushGit will only retrieve the submissions after split, so as to reduce the number of searches and improve the performance of push.

Subtree source code read

There is very little information about subtree on the Internet, and almost all of them only have simple usage. There are basically no articles about the solution to the problem and the principle of Subtree. The following is a summary of reading the source code of Subtree in the process of analyzing the problem.

How exactly does subtree traverse the submission

The subtree project does not save any information related to the subtree project. Therefore, every time you enter the command, you need to add the git address and branch name. How does a subtree push stop a split or add commit by traversing it? How does it know which commit is split? On the Mac subtree script file for/Library/Developer/CommandLineTools/usr/libexec/git – core/git – subtree, the script code can be directly debugging.

cmd_push () { if test $# -ne 2 then die "You must provide <repository> <ref>" fi ensure_valid_ref_format "$2" if test -e  "$dir" then repository=$1 refspec=$2 echo "git push using: " "$repository" "$refspec" localrev=$(git subtree split --prefix="$prefix") || die git push "$repository" "$localrev":"refs/heads/$refspec" else die "'$dir' must already exist. Try 'git subtree add'." fi }Copy the code

subtree pushCommand entry incmd_pushMethod, you can see that this method actually executessubtree splitAnd push the output of this command to the Subtree project’s Git repository, i.esubtree push = subtree split + git pushAs can be seen from experiments,subtree splitIn without--rejoinIn the case of the parameter, the output is a commitId, that is, the command generates a new commit. From beforesubtree splitAs can be seen from the execution results ofsubtree splitThis is the step that extracts the submission containing the subtree directory,--branchIs to put the extracted commit into a specific branch,--rejoinInstead of creating a new branch, you simply rejoin the extracted commit into the current branch.The above analysis does not answer our question,subtree pushHow did you find the split commitsubtree splitThe code. Here, because the split method is long, only part of it is posted

cmd_split () {
	debug "Splitting $dir..."
	cache_setup || exit $?

	if test -n "$onto"
	then
		debug "Reading history for --onto=$onto..."
		git rev-list $onto |
		while read rev
		do
			# the 'onto' history is already just the subdir, so
			# any parent we find there can be used verbatim
			debug "  cache: $rev"
			cache_set "$rev" "$rev"
		done
	fi

	unrevs="$(find_existing_splits "$dir" "$revs")"

	# We can't restrict rev-list to only $dir here, because some of our
	# parents have the $dir contents the root, and those won't match.
	# (and rev-list --follow doesn't seem to solve this)
	grl='git rev-list --topo-order --reverse --parents $revs $unrevs'
	revmax=$(eval "$grl" | wc -l)
	revcount=0
	createcount=0
	extracount=0
	eval "$grl" |
	while read rev parents
	do
		process_split_commit "$rev" "$parents" 0
	done || exit $?

Copy the code

The other two splits are a call to the Find_existing_Splits method and a Git rev-list call to that field.

find_existing_splits () {
	debug "Looking for prior splits..."
	dir="$1"
	revs="$2"
	main=
	sub=
	local grep_format="^git-subtree-dir: $dir/*\$"
	if test -n "$ignore_joins"
	then
		grep_format="^Add '$dir/' from commit '"
	fi
	git log --grep="$grep_format" \
		--no-show-signature --pretty=format:'START %H%n%s%n%n%b%nEND%n' $revs |
	while read a b junk
	do
		case "$a" in
		START)
			sq="$b"
			;;
		git-subtree-mainline:)
			main="$b"
			;;
		git-subtree-split:)
			sub="$(git rev-parse "$b^0")" ||
			die "could not rev-parse split hash $b from commit $sq"
			;;
		END)
Copy the code

Commit with git-subtree-dir, git-subtree-mainline, git-subtree-spilt commit split commit Subtree Add commits similarly to Subtree split, so this rule can find split commits as well.

Git git git git git git git Git Git Git Git Git Gitsubtree-mainline, one issubtree-splitAfter finding the split commit, the split commit message will be foundsubtree-mainlineandsubtree-splitCommitId, put these two submitted outputs asgit rev-listThe parameters.Check it againgit rev-listThe function of the entire commandgit rev-list --topo-order --reverse --parents \$revs \$unrevs, meaning output all that can be reached $Revs commit and cannot reach \$ The commit of unrevs is the commit between $revs and $UNREvs, where $revs is assigned when the script starts executing, commitId where the command is executed, and $UNREvs isfind_existing_splitsMethod found commitId, that’s it,subtree pushThe ability to traverse only the commits between the current commit and the last split or add.

Through a wave of reading, I have basically understood what Subtree push does, the rules of subtree push traversing commit, and how subtree finds subtree add and subtree split, providing ideas for the following questions.

4. Common Problems with Subtree

(I) Subtree push or Subtree split Segmentation fault

Segmentation faultCommon causes of errors are accessing more memory than the system allows for the program, combinedsubtree pushandsubtree splitWhat you’re doing, presumably, is traversing too many commits, for several reasons.

1. The number of split commits from the last split to this commit has exceeded the memory range.

Solutions: (1) First back up a branch, then delete the subtree in the current branch and subtree Add, then generate a submission record of subtree Add, then restore the subtree to the master state, and then pick the operation on subtree from the backup branch. And finally subtree push. (2) to perform on the MAC $ulimit -s unlimited, the stack size limit is set to not limit, perform subtree split executable success, just be slow, because or traverse the submitted many times, but no more than the stack size limit. Window setup methods are also available on the web, not tested.

2. A branch that does not have subtree Add or Subtree split joins the current branch

This could easily have happened when the project first introduced subtree. The reason is that there is a commit tree like this, and Test1 is executed at some commitsubtree addTest2 insubtree addbeforecheckoutI’m out, and I’m gonna close test at the end.When traversing a commit, the merge commit traverses in two directions, one in the trunk directionsubtree addSo the traversal ends here, but the test2 branch is insubtree addIt was pulled out earlier, so his submission path is notsubtree add, and finally bypasssubtree add, iterates through all submissions, generatingSegmentation fault.

When the project just introduced subtree, the requirement branch of other colleagues was checked out before the subtree was introduced, and this situation would occur when the project finally merged into master, resulting in the subsequent implementation of subtree push error.

Solution: After introducing subtree, tell everyone to perform git rebase master on the branch you are developing.

A split b split C split D split

If there is a commit like this, the last commit of split commit is a merge commit, then this split is invalid, after analyzing the script code, I think it should be a Git bug.Go back to the code reading abovefind_existing_splitsMethod is called when a split commit is foundtry_remove_previousOutput result, output is^commitId^, the processing priority of this syntax is^(commitId^).(commitId^)The first parent of this commitId,^commitIdCommitId is not the commitIdgit rev-listThe word reachable is written in the official document. Any submissions that can reach the commitId are all the parent submissions (including the parent’s parent). Any submissions that can’t reach the commitId are subsequent submissions. Taken together, this is the subsequent submission of the output (the first parent of this commitId).

try_remove_previous () {
	if rev_exists "$1^"
	then
		echo "^$1^"
	fi
}
Copy the code

whensubtree pushInstead of traversing a split commit and ending the traversal, records will be found in the Split Commit messagesubtree-mainlineandsubtree-splitCommitId, and then take the first parent of each submission as the cutoff point.The problem iscomitId^If the commit was a merge commit, there would be two parents. Taking only the first parent would result in the branch of the other parent bypassing the split. There is no cut-off point and the whole branch is traversed.The final traversal order is as follows, finding split, found from the message of splitsubtree-mainlineMerge commit takes its first parent (no subtree file commit) as the cut-off point, but on the other side, merge’s second parent does not set the cut-off point. It traverses all the way to init (in fact, it traverses to the cutoff point of the last split or add commit), and the end result is that this split is almost null.

Five, the summary

Subtree = submodule > DLL > NPM subtree = submodule > DLL > NPM subtree = submodule > DLL > NPM Subtree finds split and add commit messages via commit message, so commit messages generated by split and add commands cannot be modified. Split commit (merge commit) split commit (merge commit) Split commit (merge commit) Split commit (merge commit) Split commit (merge commit) split commit (merge commit) All problems can be solved by deleting the subtree and adding it again.