preface

Last Saturday immediately off work, is jubilant thinking about what to eat off work! All of a sudden QA came to me and told me that our DB and ES could not synchronize data. It was really a bald scalp. Finally, WE had a day off. Below I will share the reason for my bug to everyone, to avoid stepping on pits ~.

The cause of the bugbulkHiding error messages

At the first time, I went to check the error log, and there was no error log, which was amazing. In this case, let’s DEBUG it. Before debugging, I first paste a code:

func (es *UserES) batchAdd(ctx context.Context, user []*model.UserEs) error {
	req := es.client.Bulk().Index(es.index)
	for _, u := range user {
		u.UpdateTime = uint64(time.Now().UnixNano()) / uint64(time.Millisecond)
		u.CreateTime = uint64(time.Now().UnixNano()) / uint64(time.Millisecond)
		doc := elastic.NewBulkIndexRequest().Id(strconv.FormatUint(u.ID, 10)).Doc(u)
		req.Add(doc)
	}
	if req.NumberOfActions() < 0 {
		return nil
	}
	if_, err := req.Do(ctx); err ! =nil {
		return err
	}
	return nil
}
Copy the code

This is the code above, using bulk operation of ES, no problem was found after debugging, oh my God!! Have no clue, then take a look at the ES source code, is there any hidden points not noticed. Req.do (CTX)

// Do sends the batched requests to Elasticsearch. Note that, when successful,
// you can reuse the BulkService for the next batch as the list of bulk
// requests is cleared on success.
func (s *BulkService) Do(ctx context.Context) (*BulkResponse, error) {
	/ * *...... Omit some code **/
	// Get response
	res, err := s.client.PerformRequest(ctx, PerformRequestOptions{
		Method:      "POST",
		Path:        path,
		Params:      params,
		Body:        body,
		ContentType: "application/x-ndjson",
		Retrier:     s.retrier,
		Headers:     s.headers,
	})
	iferr ! =nil {
		return nil, err
	}

	// Return results
	ret := new(BulkResponse)
	iferr := s.client.decoder.Decode(res.Body, ret); err ! =nil {
		return nil, err
	}

	// Reset so the request can be reused
	s.Reset()

	return ret, nil
}
Copy the code

I’m just going to post the most important parts of the code. Just look at this paragraph and let me explain:

  • The first buildingHttprequest
  • sendHttpRequest and analyze, and parseresponse
  • resetrequestIt can be reused

Ret := new(BulkResponse), new a BulkResponse structure, its structure is as follows:

type BulkResponse struct {
	Took   int                            `json:"took,omitempty"`
	Errors bool                           `json:"errors,omitempty"`
	Items  []map[string]*BulkResponseItem `json:"items,omitempty"`
}
// BulkResponseItem is the result of a single bulk request.
type BulkResponseItem struct {
	Index         string        `json:"_index,omitempty"`
	Type          string        `json:"_type,omitempty"`
	Id            string        `json:"_id,omitempty"`
	Version       int64         `json:"_version,omitempty"`
	Result        string        `json:"result,omitempty"`
	Shards        *ShardsInfo   `json:"_shards,omitempty"`
	SeqNo         int64         `json:"_seq_no,omitempty"`
	PrimaryTerm   int64         `json:"_primary_term,omitempty"`
	Status        int           `json:"status,omitempty"`
	ForcedRefresh bool          `json:"forced_refresh,omitempty"`
	Error         *ErrorDetails `json:"error,omitempty"`
	GetResult     *GetResult    `json:"get,omitempty"`
}
Copy the code

To explain what each field means:

  • took: Total elapsed time in milliseconds
  • Errors: if any of the sub-requests fails, theerrorsFlag is set totrueAnd report error details on the corresponding request (see Items explanation below)
  • ItemsThis is where each sub-request is storedresponseHere,ErrorIt stores detailed error information

Now I think you should know why our code does not report err information. Each request of bulk is executed independently, so the failure of one sub-request does not affect the success of other sub-requests. Therefore, we need to solve the error of one sub-request from BulkResponse. Now let’s get the code right:

func (es *UserES) batchAdd(ctx context.Context, user []*model.UserEs) error {
	req := es.client.Bulk().Index(es.index)
	for _, u := range user {
		u.UpdateTime = uint64(time.Now().UnixNano()) / uint64(time.Millisecond)
		u.CreateTime = uint64(time.Now().UnixNano()) / uint64(time.Millisecond)
		doc := elastic.NewBulkIndexRequest().Id(strconv.FormatUint(u.ID, 10)).Doc(u)
		req.Add(doc)
	}
	if req.NumberOfActions() < 0 {
		return nil
	}
	res, err := req.Do(ctx)
	iferr ! =nil {
		return err
	}
	// If any subrequests fail, the 'errors' flag is set to' true 'and details of the errors are reported on the corresponding requests
	// If there is no error, all the results are successful
	if! res.Errors {return nil
	}
	for _, it := range res.Failed() {
		if it.Error == nil {
			continue
		}
		return &elastic.Error{
			Status:  it.Status,
			Details: it.Error,
		}
	}
	return nil
}
Copy the code

Let’s explain the res.Failed method again. In this method, bulk Response in items will be returned with an error, so just look for the error message in this method.

So far, I have finally found the reason for this bug. Next, we can look at the next bug. Let’s briefly summarize:

bulkThe API allows multiple iterations in a single stepcreateindexupdatedeleteRequest, each sub-request is executed independently, so the failure of one sub-request does not affect the success of other sub-requests.bulkIn the response structure ofErrosField if any of the sub-requests failerrorsFlag is set totrueAnd report error details on corresponding requests,itemsA field is an array of the results of each request listed in the order requested. So in usingbulkWhen must fromresponseTo determine whether there iserr.

Bug cause value range is out of bounds

Here is a completely improper use of their own, but still want to say es mapping number type range problem:

The number types are classified as follows:

type instructions
byte The signed 8-bit integer ranges from -128 to 127.
short The signed 16-bit integer ranges from -32768 to 32767.
integer The signed 32-bit integer ranges from −231−231 to 231231-1.
long The signed 64-bit integer ranges from −263−263 to 263263-1.
float 32-BIT single-precision floating point number
double 64 – bit double – precision floating point number
half_float A 16-bit half-precision IEEE 754 floating point type
scaled_float Floating-point numbers of scale types, such as the price field, need to be accurate to the minute, have a 57.34 scale factor of 100, and store a result of 5734

These are all signed types. Unsigned was only supported in ES7.10.1. If you are interested, click here.

Here the number types and ranges are listed to facilitate the reason for my bug, and here is a direct explanation:

Tinyint is a byte store with an unsigned value ranging from 0 to 255. In ES, THE mapping type is byte with a value ranging from -128 to 127. When the value in DB exceeds the value range, this problem will occur during synchronization. Here we need to pay attention to the value of the range of the problem, not like me, because this has also investigated the bug for a long time, there is no need to save some space, anyway, it does not take up much space.

conclusion

This article is a simple summary of the problems I encountered in the work, published is to give you a reminder, someone stepped on the pit, do not step on, a waste of time !!!!

Well, that’s all for this article, the three qualities (share, like, read) are the author’s motivation to continue to create more quality content!

At the end, I will send you a small welfare. Recently, I was reading the book [micro-service architecture design mode], which is very good. I also collected a PDF, which can be downloaded by myself if you need it. Access: Follow the public account: [Golang Dreamworks], background reply: [micro service], can be obtained.

I have translated a GIN Chinese document, which will be maintained regularly. If you need it, you can download it by replying to [GIN] in the background.

Translated a Machinery Chinese document, will be regularly maintained, there is a need for friends to respond to the background [Machinery] can be obtained.

I am Asong, an ordinary program ape, let GI gradually become stronger together. We welcome your attention, and we’ll see you next time

Recommended previous articles:

  • Mechanics-go Asynchronous task queues
  • Leaf-segment Distributed ID Generation System (Golang implementation version)
  • 10 GIFs to help you understand sorting algorithms (with go implementation code)
  • Go Book Recommendations (From Getting Started to Giving up)
  • Go parameter transfer type
  • Teach my sister how to write message queues
  • Cache avalanche, cache penetration, cache breakdown
  • Context package, read this article enough!!
  • Go -ElasticSearch: How to get started
  • Interviewer: Have you used for-range in go? Can you explain the reasons for these problems