Gopher refers to north

I don’t know if it is the case with other people, but the word Lao Xu is most afraid of hearing is “occasionally”. As for the reason, I won’t say much, but I understand all of them.

Let’s look directly at panic information.

runtime error: invalid memory address or nil pointer dereference panic(0xbd1c80, 0x1271710) /root/.go/src/runtime/panic.go:969 +0x175 github.com/json-iterator/go.(*Stream).WriteStringWithHTMLEscaped(0xc00b0c6000, 0x0, 0 x24)/go/pkg/mod/github.com/json-iterator/[email protected]/stream_str.go:227 + 0 x7b github.com/json-iterator/go.(*htmlEscapedStringEncoder).Encode(0x12b9250, 0xc0096c4c00, 0 xc00b0c6000)/go/pkg/mod/github.com/json-iterator/[email protected]/config.go:263 + 0 x45 github.com/json-iterator/go.(*structFieldEncoder).Encode(0xc002e9c8d0, 0xc0096c4c00, 0 xc00b0c6000)/go/pkg/mod/github.com/json-iterator/[email protected]/reflect_struct_encoder.go:110 + 0 x78 github.com/json-iterator/go.(*structEncoder).Encode(0xc002e9c9c0, 0xc0096c4c00, 0 xc00b0c6000)/go/pkg/mod/github.com/json-iterator/[email protected]/reflect_struct_encoder.go:158 + 0 x3f4 github.com/json-iterator/go.(*structFieldEncoder).Encode(0xc002eac990, 0xc0096c4c00, 0 xc00b0c6000)/go/pkg/mod/github.com/json-iterator/[email protected]/reflect_struct_encoder.go:110 + 0 x78 github.com/json-iterator/go.(*structEncoder).Encode(0xc002eacba0, 0xc0096c4c00, 0 xc00b0c6000)/go/pkg/mod/github.com/json-iterator/[email protected]/reflect_struct_encoder.go:158 + 0 x3f4 github.com/json-iterator/go.(*OptionalEncoder).Encode(0xc002e9f570, 0xc006b18b38, 0 xc00b0c6000)/go/pkg/mod/github.com/json-iterator/[email protected]/reflect_optional.go:70 + 0 xf4 github.com/json-iterator/go.(*onePtrEncoder).Encode(0xc002e9f580, 0xc0096c4c00, 0 xc00b0c6000)/go/pkg/mod/github.com/json-iterator/[email protected]/reflect.go:219 + 0 x68 github.com/json-iterator/go.(*Stream).WriteVal(0xc00b0c6000, 0xb78d60, 0 xc0096c4c00)/go/pkg/mod/github.com/json-iterator/[email protected]/reflect.go:98 + 0 x150 github.com/json-iterator/go.(*frozenConfig).Marshal(0xc00012c640, 0xb78d60, 0xc0096c4c00, 0x0, 0x0, 0x0, 0x0, 0x0)Copy the code

First of all, I believe in the power of open source. So Xu’s first move was to analyze business code for logical holes. It was clear that the colleague was also trustworthy, so a quick guess was that some unanticipated data triggered the boundary conditions. Next comes the routine operation of saving the site.

As the title says, this is an accidental panic problem, so follow the classifications above and save the scene in a way that fits the current technology stack. Then it was time to sit back and wait for the harvest, and the wait was many days. Several alarms were received, but the site did not meet expectations.

This time I not only do not panic, even a little excited. So and so once said: “to dare to question, dare to challenge the authority”, a read so far then out of control, I old Xu and to make contributions to the cause of open source! Start reading the source code for jSON-iterator with a little thought.

As SOON as I started reading it, I realized that the saying “When God closes one door, he will open another” was a lie. Old Xu felt that God had closed all the doors and even all the Windows. Let’s see how he actually closes the door.

func (cfg *frozenConfig) Marshal(v interface{}) ([]byte, error) {
	stream := cfg.BorrowStream(nil)
	defer cfg.ReturnStream(stream)
	stream.WriteVal(v)
	ifstream.Error ! =nil {
		return nil, stream.Error
	}
	result := stream.Buffer()
	copied := make([]byte.len(result))
	copy(copied, result)
	return copied, nil
}


// WriteVal copy the go interface into underlying JSON, same as json.Marshal
func (stream *Stream) WriteVal(val interface{}) {
	if nil == val {
		stream.WriteNil()
		return
	}
	// omit other code
}

Copy the code

The panic stack knows that the null pointer caused the panic, and the (*frozenConfig).Marshal has already done a non-null check internally. To this point, old Xu really have no other way to give up the strategic solution to the panic. After all, it doesn’t matter that much, and programmers never fix enough bugs. After such a comfort, the heart is much easier to accept.

In fact, I consciously ignored this problem for a long period of time, after all, I didn’t find the root cause of the problem. The problem continued on the line until I couldn’t tell you what day it was, but when the mood came, I took a second look, and these two eyes were crucial!

func doReq(a) {
    req := paramsPool.Get().(*model.Params)
    // defer 1
    defer func(a) {
    	reqBytes, _ := json.Marshal(req)
    	// Omit other log-printing code} ()// defer 2
    defer paramsPool.Put(req)
    // Req initializes and initiates requests and other operations
}
Copy the code

Note:

  1. The above code variable names have been generalized.
  2. The actual code in the project is much more complex than that, but it is still the smallest prototype that caused this problem.

The paramsPool in the above code is a variable of type sync.Pool, which you’re probably familiar with. Sync.pool is designed to reuse already used objects (coroutine safety), reduce memory allocation, and reduce GC stress.

type test struct {
	a string
}

var sp = sync.Pool{
	New: func(a) interface{} {
		return new(test)
	},
}

func main(a) {
	t := sp.Get().(*test)
	fmt.Println(unsafe.Pointer(t))
	sp.Put(t)
	t1 := sp.Get().(*test)
	t2 := sp.Get().(*test)
	fmt.Println(unsafe.Pointer(t1), unsafe.Pointer(t2))
}
Copy the code

Based on the above code and the output, t1 and T variables have the same address, so they are reusable objects. It is easy to see the root cause of the problem by reviewing the doReq function above.

Defer 2 and defer 1 are in reverse order!!

Defer 2 and defer 1 are in reverse order!!

Defer 2 and defer 1 are in reverse order!!

The Get and Put methods provided by sync.pool are coroutine safe, but json.marshal (req) and request initialization have concurrency issues with high concurrent calls to doReq, and are most likely to cause panic with the concurrent call timeline shown below.

Now that you’ve identified the cause, it’s much easier to fix it by adjusting the order in which we call defer 2 and defer 1. There was no panic after Xu posted the modified code online. The root cause of this accident is a very small detail, so we should be cautious and cautious in the development of peacetime, to avoid irreparable losses caused by such a small white mistake. Another rule of thumb is to try not to get too bothersome when developing and investigating problems. Pauses can work wonders.

Finally, I sincerely hope that this article can be of some help to all readers.