Git Bisect command

Author: Zhou Kai – Principal of Gitee

Abstract: When Git 2.33.0 was released, there was a problem of incompatibility between client and server. In the process of troubleshooting, I used Git bisect command again. While solving the problem, I updated my understanding of bisect command based on some recent knowledge. Bisect can be used in combination with a few scenarios to share some thoughts on bisect commands and other ways bisect commands can be used.

background

Git released 2.33.0 on August 17th.

Gitee.com/mirrors/git…

Unfortunately, shortly after the release of version 2.33.0, some users reported that Git version 2.33.0 could not carry out normal Clone/Fetch/Push operations when using SSH communication protocol. The specific phenomena are as follows:

Cloning into 'xxxxxx'...
fetch-pack: unexpected disconnect while reading sideband packet
fatal: early EOF
fatal: fetch-pack: invalid index-pack output
Copy the code

After seeing this prompt, I immediately thought that there seems to be Fetch and Sideband related updates in Git 2.33.0 update log, so I went to the update log and found the following two information:

* "git fetch" over protocol v2 left its side of the socket open after it finished speaking, which unnecessarily wasted the resource on the other side. (merge ae1a7eefff jk/fetch-pack-v2-half-close-early later to maint). * The side-band demultiplexer that is used to display progress output from the remote end did not clear the line  properly when the end of line hits at a packet boundary, which has been corrected.Copy the code

However, in order to further locate the source of the problem and facilitate accurate location of the problem, Git Bisect command is used to test and find out the specific cause, so that the problem can be better investigated, analyzed and solved.

Git Bisect introduction

Git Bisect: git-scm.com/docs/git-bi…

The git bisect command is used to find the Commit that caused the problem using binary lookup.

The above quotations from nguyen other web logs: www.ruanyifeng.com/blog/2018/1…

In simple terms, we give bisect a range. It automatically identifies the midpoint of the current range, tests at this midpoint, and tells it whether it is a good commit or a bad commit to narrow the search. We can quickly locate the problem Commit so that we can target the problem.

Troubleshoot problems

Since Git 2.33.0 is responsible for this problem, the bisect command is in the v2.32.0 ~ v2.33.0 range. If we find a commit that introduces this problem, Can be more accurate positioning of the problem, thus avoiding blind guess and common sense.

Enter bisect mode

Git bisect start Git bisect start Git bisect

Git bisect start specifies a range, and bisect tells us that there are 304 commits between the two versions. It takes about 8 steps to locate a specific commit, which is the benefit of binary lookup.

Start the first test

We use the make command to compile and build Git source code. After compiling and building Git, we can use a wrapper provided by Git to call Git command. Here we can add -j4 parameter to increase the speed of compiling and building Git. Once the command is complete, we can use./bin-wrappers/git to test:

Again, no, so we can confirm that the Commit that caused the problem occurred in the Commit after this Commit. Git bisect bad: git bisect bad

Git tells us that it has about seven more tests to run to locate the Commit that caused the problem. If the Commit is acceptable, Git bisect Good will be used to mark it.

Automated Bisect

We just need to repeat the above steps, but obviously there is no need for human follow-up. As a new generation of migrant workers, we need to be able to automate everything that can be automated, so on the one hand, to save time, on the other hand, to avoid human error. Git bisect run can be used to automate this.

git bisect run my_script arguments
Copy the code

The bisect command uses the return value of my_script to determine whether the current commit is good or bad.

Note that the script (my_script in the above example) should exit with code 0 if the current source code is good/old, and exit with a code between 1 and 127 (inclusive), except 125, if the current source code is bad/new.

To put it simply:

If the value is 0, the current version is good
If the value is 1 to 124 126 127, the current version is bad
If you exit at 125, there is a problem with the build

So let’s write a script based on this rule, but first we need to see what the return value of Clone failure is

OK, let’s do it via Shell Scripts ~~~

#! /bin/ ZSH # put the Clone code in the current CommitID directory each time. Avoid conflict make - j4 &&. / bin - wrappers/git clone [email protected]: / kesin/taskover git ` git log - n 1 -- pretty = format: ` s = $% H? If [ $s -eq 0 ]; Then # normal exit 0; elif [ $s -eq 128 ]; Then # fail exit 1; XD exit 128; XD exit 128; fiCopy the code

Then we will execute this script to perform binary lookup automatically:

The bisect command automatically executes the build and Clone processes and automatically determines the Commit scope based on the return value:

Here is the bisect run command which automatically determines that the current commit is a bad commit based on the script we wrote, and then automatically proceeds to the next step. Of course, there will be good submissions during binary lookup, and bisect will automatically narrow the range based on the return value we provide:

Eventually, Bisect will locate the first commit with this problem:

ae1a7eefffe60425e6bf6a2065e042ae051cfb6c is the first bad commit

The bisect command also prints out the details of the commit that caused the problem, and we can then analyze our problem based on the commit changes.

Analyze and solve problems

Above we used the bisect command to find the offending commit: gitee.com/mirrors/git…

/*
 * this is the final request we'll make of the server;
 * do a half-duplex shutdown to indicate that they can
 * hang up as soon as the pack is sent.
 */
close(fd[1]);
fd[1] = -1;
Copy the code

Git V2 via SSH in the transmission process, when the client receives the data, it also needs to perform a series of local operations. This operation process no longer needs to maintain the connection with the server. Therefore, after the client sends data to the server, the client sends the FIN to the server to enter the half-duplex state. After the server sends data to the server, the client closes the connection without a long local operation. In this way, unnecessary network resources are not occupied.

After knowing the cause of the problem, we analyzed Gitee’s SSH distribution agent and found that our SSH agent would close the SSH link immediately after receiving the client’s FIN, resulting in the above problem: the client could not finish receiving the data before the connection was disconnected

The solution is also very simple. After receiving the FIN from the client, the network connection is not closed immediately, but is closed only after the data is sent.

Git Bisect use thinking

In a world where organizations are promoting r&d efficiency, DevOps culture and tools, can Bisect’s logic be applied to the entire process?

For example, in the automated test, some cases that fail the test are directly failed. Although specific use cases and relevant input and output are informed, it is undoubtedly valuable for the organization to automatically find the first submission with this problem through the Bisect command:

If the Case fails to pass, the Owner of the Bad Case is directly given to give precise notification and quick repair
In centralized testing, excessive notification can be avoided, distracting otherwise irrelevant people
Avoid unclear questions, mutual buck-passing, bad mood, affect the team atmosphere
.

Thinking about algorithm ideas

Git Bisect uses the binary search idea we are familiar with. In Git source code, the binary search algorithm is also used in Git Pack IDx file search. Through the exquisite sector division, coupled with the binary search algorithm to quickly locate the offset of object. Git also uses sha-1 Hash algorithms, different Diff algorithms, lots of recursion, etc.

/* hash-lookup.c */ int bsearch_hash(const unsigned char *hash, const uint32_t *fanout_nbo, const unsigned char *table, size_t stride, uint32_t *result) { uint32_t hi, lo; hi = ntohl(fanout_nbo[*hash]); lo = ((*hash == 0x0) ? 0 : ntohl(fanout_nbo[*hash - 1])); while (lo < hi) { unsigned mi = lo + (hi - lo) / 2; int cmp = hashcmp(table + mi * stride, hash); if (! cmp) { if (result) *result = mi; return 1; } if (cmp > 0) hi = mi; else lo = mi + 1; } if (result) *result = lo; return 0; }Copy the code

But in the actual coding process, how many developers can pat their chest and say: in the coding process, I have the model in mind, I have the algorithm in mind, and I can write code in an efficient way, using reasonable logic.

I think the answer is self-explanatory.

Researching open source projects is a great way to learn good practices and ideas from lots of good code, from studying to contributing, and the results are less important than the process.

The last

Good at tools and thinking, come to Gitee (gitee.com) to study open source projects and contribute code.