0 the introduction

Context is a widely used package in Go, officially developed by Google and introduced in version 1.7. It is used to simplify operations such as passing context data across multiple Go routines and terminating the routine tree (manual/timeout). For example, an official HTTP package uses context to pass context data for a request, and gRpc uses context to terminate the routine tree generated by a request. Because it is so simple to use, it is now almost a general specification for writing the go base library. I have some experience with context.

This article mainly talks about the following aspects:

  1. Use of context.
  2. Context implementation principles, what are the things that need to be noticed.
  3. Analyze the causes of the problems encountered in practice.

1 the use of

1.1 Core Interface Context

type Context interface {
    // Deadline returns the time when work done on behalf of this context
    // should be canceled. Deadline returns ok==false when no deadline is
    // set.
    Deadline() (deadline time.Time, ok bool)
    // Done returns a channel that's closed when work done on behalf of this
    // context should be canceled.
    Done() <-chan struct{}
    // Err returns a non-nil error value after Done is closed.
    Err() error
    // Value returns the value associated with this context for key.
    Value(key interface{}) interface{}
}
Copy the code

-done returns a channel. When the context is cancelled, the channel is closed, and the routine for using the context should end and return. – Methods in Context are coroutine safe. This means that a Context created in the parent routine can be passed to any number of routines and accessed simultaneously. – Deadline returns a timeout period. When routine obtains the timeout period, you can set the timeout period for certain I/O operations. – Value enables routine to share some data. Of course, obtaining data is coroutine safe.

During request processing, functions of each layer are called. The functions of each layer create their own routine, which is a routine tree. Therefore, the context should also be reflected and implemented as a tree.

To create a context tree, the first step is to have a root. Context.Background returns an empty context, often as the root of the tree. It is typically created by the first routine that receives the request, and cannot be cancelled, has no value, and has no expiration date.

func Background() Context
Copy the code

Then how do I create other descendant nodes? The context package provides us with the following functions:

func WithCancel(parent Context) (ctx Context, cancel CancelFunc)
func WithDeadline(parent Context, deadline time.Time) (Context, CancelFunc)
func WithTimeout(parent Context, timeout time.Duration) (Context, CancelFunc)
func WithValue(parent Context, key interface{}, val interface{}) Context
Copy the code

Each of the four functions takes a parent context as its first argument and returns a value of type Context, thus creating layers of different nodes. The child node is obtained by copying the parent node and holding some state values of the child node based on the function arguments received, which can then be passed to the underlying routine.

The WithCancel function returns an additional CancelFunc function type variable. This function type is defined as: CancelFunc

type CancelFunc func()
Copy the code

Call CancelFunc object will undo the corresponding Context object, so that the parent node’s environment, the revocation of the rights of the child nodes Context when trigger certain conditions, can be called CancelFunc object to end all child nodes tree and routine. In routine for child nodes, you need to use something like the following to determine when to exit routine:

select {
    case <-cxt.Done():
        // do some cleaning and return
}
Copy the code

Check whether cxt.done () is finished. When the top-level Request is finished, or the Request is cancelled externally, you can cancel the top-level context, thereby exiting the entire routine tree for the Request.

WithDeadline and WithTimeout have one more time parameter than WithCancel, which indicates how long the context can survive. If the expiration date is exceeded, its child context is automatically destroyed. So the lifetime of the context is determined by the routine and deadline of the parent context.

WithValue returns a copy of the parent that holds the key/value passed in, and val can be obtained by calling the value (key) method of the Context interface. Note that if you set key/value in the same context, the values will be overwritten if the keys are the same.

For more examples, see the official blog.

2 the principle of

2.1 Storage and query of context data

type valueCtx struct {
    Context
    key, val interface{}
}

func WithValue(parent Context, key, val interface{}) Context {
    if key == nil {
        panic("nil key")
    }
    ......
    return &valueCtx{parent, key, val}
}

func (c *valueCtx) Value(key interface{}) interface{} {
    if c.key == key {
        return c.val
    }
    return c.Context.Value(key)
}
Copy the code

Context data is stored like a tree, with each node storing only one key/value pair. WithValue() holds a key/value pair, which inserts the parent context into the new child context and holds the key/value data in the node. Value() queries the Value data corresponding to the key. If the Value data cannot be found, the Value data in the parent context is recursively queried.

Note that the context data in the context is not global. It only queries the data of the local node and its parent node, not its sibling node.

2.2 Manual Cancel and Timeout Cancel

CancelCtx implements the canceler interface by embedding the parent Context in cancelCtx:

Type cancelCtx struct{Context} parent Context done chan struct{} mu sync.Mutex children map[canceler]struct{} err error } // A canceler is a context type that can be canceled directly. The // implementations are *cancelCtx and *timerCtx. type canceler interface { cancel(removeFromParent bool, err error) Done() <-chan struct{} }Copy the code

The children in the cancelCtx structure holds all of its cancelers, and all cancelCtx cancels () in children are called to terminate all cancelCtx cancelCtx when an external cancel is triggered. Done indicates whether it has been canceled. This done closes when the external triggers Cancel, or when the parent Context’s channel closes.

CancelCtx struct {cancelCtx // cancelctx.done () Timer *time. timer deadline time. time} func WithDeadline(parent Context, deadline time.Time) (Context, CancelFunc) { ...... c := &timerCtx{ cancelCtx: newCancelCtx(parent), deadline: deadline, } propagateCancel(parent, c) d := time.Until(deadline) if d <= 0 { c.cancel(true, DeadlineExceeded) // deadline has already passed return c, func() { c.cancel(true, Canceled) } } c.mu.Lock() defer c.mu.Unlock() if c.err == nil { c.timer = time.AfterFunc(d, func() { c.cancel(true, DeadlineExceeded) }) } return c, func() { c.cancel(true, Canceled) } }Copy the code

The deadline in the timerCtx structure holds the timeout time, and when this time is exceeded, cancel is triggered.


CancelCtx cancelCtx cancelCtx cancelCtx cancelCtx cancelCtx cancelCtx cancelCtx cancelCtx

3 Problems encountered

3.1 background

One day, in order to connect our system to ETRACE (internal link tracking system), we need to pass requestId and rpcId during gRpc/Mysql/Redis/MQ operations. Our solution is Context.

All Mysql, MQ, and Redis operation interfaces take context as the first argument, and if the context(or its parent) is cancelled, the operation will fail.

func (tx *Tx) QueryContext(ctx context.Context, query string, args ... interface{}) (*Rows, error) func(process func(context.Context, redis.Cmder) error) func(context.Context, redis.Cmder) error func (ch *Channel) Consume(ctx context.Context, handler Handler, queue string, dc <-chan amqp.Delivery) error func (ch *Channel) Publish(ctx context.Context, exchange, key string, mandatory, immediate bool, msg Publishing) (err error)Copy the code

After going online, encounter a series of pits……

3.2 the Case 1

Symptom: Five minutes after the login, all users fail to log in, and alarms are continuously received.

Cause: Using localCache in the program refreshes cached variables every 5 minutes (calling registered callback functions). LocalCache holds a context that is passed in when the callback function is called. If the callback function depends on the context, unexpected results may occur.

The function of the getAppIDAndAlias callback is to read data from mysql. If CTX is cancelled, failure is returned.

func getAppIDAndAlias(ctx context.Context, appKey, appSecret string) (string, string, error)
Copy the code

The first time localcache. Get(CTX, appKey, appSeret) sends the CTX to the context of the gRpc call, and the gRpc cancels the context when the request ends or fails, resulting in the subsequent cache Refresh(). The execution failed. Procedure

Solution: Do not use the localCache context during Refresh and use a context that does not cancel.

3.3 Case 2

Symptom: The system continuously receives an alarm (sys Err is excessive) after the system goes online. Log/ETRace generates two types of SYS Err:

  • context canceled
  • sql: Transaction has already been committed or rolled back

3.3.1 Background and reasons


Ticket is a service that processes Http requests using Restful protocols. As gRpc protocol is used in the program, a component is needed for protocol conversion, so we introduce GRPC-Gateway to realize Restful conversion into gRpc.

repetitioncontext canceledThe process is as follows:

  1. The client sends HTTP restful requests.
  2. Grpc-gateway establishes a connection with the client, receives requests, converts parameters, and invokes grPC-server.
  3. Grpc-server processes requests. Grpc-server starts a stream for each request, which creates the context.
  4. The client is disconnected. Procedure
  5. Grpc-gateway receives a signal indicating that the connection is disconnected, causing context cancel. The GRPC client sends an RST_STREAM after sending an RPC request because its request was terminated due to an external exception (i.e. its context was cancelled).
  6. The GRPC server immediately terminates the request (i.e. the stream context of the GRPC server is cancelled).

The gRpc handler was disconnected during processing.

sql: Transaction has already been committed or rolled backCauses:

The official Database package is used to perform db transactions. Where db.BeginTx initiates a coroutine awaitDone:

func (tx *Tx) awaitDone() {
    // Wait for either the transaction to be committed or rolled
    // back, or for the associated context to be closed.
    <-tx.ctx.Done()

    // Discard and close the connection used to ensure the
    // transaction is closed and the resources are released.  This
    // rollback does nothing if the transaction has already been
    // committed or rolled back.
    tx.rollback(true)
}
Copy the code

When the context is cancelled, rollback() is performed, and when rollback is performed, atomic variables are manipulated. Later, tx.mit () in another coroutine evaluates the atomic variable and throws an error if it changes.

3.3.2 Solution

Both of these errors are due to disconnection and are normal. These two errors can be ignored.

3.4 Case 3

After the launch, mysql transactions are blocked once or twice every two days, resulting in the request time reaching 120 seconds. In pangu (internal mysql operation platform), all blocked transactions are processing the same record.


3.4.1 Process

1. It is suspected that multiple transactions across the machine room operate the same record. Cross-room operations take longer and block DB transactions in other equipment rooms.

2. When this phenomenon occurs, an interface is temporarily degraded. Reduces the probability of multiple transactions operating on the same record.

3. Reduce the number of transactions.

  • Remove a single SQL transaction
  • Reduce unnecessary transactions by moving business logic

4. Adjust the DB parameter Innodb_LOCK_WAIT_TIMEOUT (120s->50s). This parameter indicates the maximum amount of time mysql can block while executing a transaction, which is reduced to reduce the overall operation time. It was considered to specify transaction timeout in the program, but innodb_LOCK_WAIT_TIMEOUT is either global or session-specific. Not set for fear of affecting other SQL in session.

5. Consider using distributed locks to reduce the concurrency of transactions operating on the same record. However, due to time constraints, this improvement was not made.

6. DAL colleagues found a transaction not committed, check the code, find the root cause.

Golang’s database/ SQL package has a race condition that causes the transaction to neither commit nor rollback.

3.4.2 Source Code Description

BeginTxx() starts a transaction with a coroutine:

// awaitDone blocks until the context in Tx is canceled and rolls back // the transaction if it's not already done. func  (tx *Tx) awaitDone() { // Wait for either the transaction to be committed or rolled // back, or for the associated context to be closed. <-tx.ctx.Done() // Discard and close the connection used to ensure the // transaction is closed and the resources are released. This // rollback does nothing if the transaction has already been // committed or rolled back. tx.rollback(true) }Copy the code

In tx.rollback(true), check whether the atomic variable tx.done is 1. If 1, return. If the value is 0, add 1 and rollback.

When the transaction Commit() is committed, the atomic variable tx.done is manipulated and the context is checked to see if it has been cancelled. If no, perform the COMMIT operation.

// Commit commits the transaction. func (tx *Tx) Commit() error { if ! atomic.CompareAndSwapInt32(&tx.done, 0, 1) { return ErrTxDone } select { default: case <-tx.ctx.Done(): return tx.ctx.Err() } var err error withLock(tx.dc, func() { err = tx.txi.Commit() }) if err ! = driver.ErrBadConn { tx.closePrepared() } tx.close(err) return err }Copy the code

If the atomic variable is operated on during commit(), the context is cancelled, and the other coroutine is rollback(), the atomic variable is set to 1. Commit () is not executed and rollback() is not executed.

3.4.3 Solution

The solution can be any of the following:

  • One is passed in when a transaction is executedIt doesn't cancel the context
  • correctiondatabase/sqlSource code, and then specify a new GO build image at compile time

We later submitted patch to Golang to correct this problem (it has been incorporated into Go 1.9.3).

4 Lessons learned

Because go has a large number of official and third-party libraries that use the context, be careful when calling functions that receive the context and be aware of when the context cancels and what action triggers the cancel. I often use context from gRpc in my programs, which produces some unexpected results, and then spend time summarizing the lifetime and behavior of context in gRpc and internal base libraries to avoid the same problems.