First, find the problem

  • 1. Recently, I found that there is an API service developed by Golang, which will be blocked every few days after running. Neither log is printed nor API can be accessed.
  • 2. The service contains API services and several scheduled tasks, which are also unexecutable.
  • 3, the service has access to the Route view, but can access.
  • 4. Check the CPU, memory, and number of file handles. (The hTOP tool is recommended. CentOS can passyum install htopInstallation)

Second, solve the problem

1) repetition BUG

  • First of all, how to reproduce the BUG, only let the service run for a few days, the BUG can be checked.
  • Sure enough, the BUG resurfaced today.

(2) to locate the BUG

  • 1, can access the view Route, access Route, generated HTTP Log can also print.
  • 2. Crontab Jobs cannot be executed, and logs cannot be printed.
  • 3. A scheduled task to print the third-party API status did not hang
  • All access to the database is blocked. You should start with the database operation.

(3) to solve

Since we are using Golang’s GORM, I need to know the state of the database. Then I need to add a scheduled task to print the database status. The code is as follows:

func main (){ err := c.AddFunc("@every 5s", func() { by, _ := json.Marshal(db.DB.DB().Stats()) log.Println(string(by)) }) if err ! = nil { log.Fatalln(err.Error()) return } }Copy the code

We can see that the DBStats structure looks like this, with the following code:

// DBStats contains database statistics. type DBStats struct { MaxOpenConnections int // Maximum number of open connections to the database. // Pool Status OpenConnections int // The number of established connections both in use and  idle. InUse int // The number of connections currently in use. Idle int // The number of idle connections. // Counters WaitCount int64 // The total number of connections waited for. WaitDuration time.Duration // The total time blocked waiting for a new connection. MaxIdleClosed int64 // The total number of connections closed due to SetMaxIdleConns. MaxLifetimeClosed int64 // The total number of connections closed due to SetConnMaxLifetime. }Copy the code
  • When I added the scheduled task to print the database state, I found that the InUse filled up and the WaitCount increased. Look at the following JSON data:
{"MaxOpenConnections":30,"OpenConnections":30,"InUse":30,"Idle":0,"WaitCount":18032,"WaitDuration":66343149623,"MaxIdleC losed":0,"MaxLifetimeClosed":4203} {"MaxOpenConnections":30,"OpenConnections":30,"InUse":30,"Idle":0,"WaitCount":179968,"WaitDuration":5069116627810,"MaxId leClosed":0,"MaxLifetimeClosed":5147}Copy the code
  • The problem is that the connection is taken from the pool and not put back into the pool, or the connection is still occupied.

  • Database operations are involved in both API and JOBS. What would cause the connection to remain occupied? The first thing I would think is that the Transaction was not committed or rolled back.

  • Sure enough, found the following code, the moment really want to kill myself ~ 🤦 came ️ 🤦 came ️ 🤦 came ️ 🤦 came ️ 🤦 came ️ 🤦 came ️ 🤦 came ️ 🤦 came ️ 🤦 came ️ 🤦 came ️ 🤦 came ️ 🤦 came ️

tx := o.Begin() transactionIsOk, err := chain.EthChainInstance.GetTransactionReceipt(action.Hash) if err ! = nil { continue }Copy the code
  • The changed code
tx := o.Begin() transactionIsOk, err := chain.EthChainInstance.GetTransactionReceipt(action.Hash) if err ! = nil { tx.Rollback() continue }Copy the code
  • It is more recommended to defer tx.Com MIT () or defer tx.rollback ().

  • After repair, re-build deployment, observation of about ten hours, has been stable operation.

2019/05/08 00:18:11 {"MaxOpenConnections":300,"OpenConnections":13,"InUse":0,"Idle":13,"WaitCount":0,"WaitDuration":0,"MaxIdleClosed":0,"Max LifetimeClosed":2804}Copy the code
  • OpenConnections are clogged with InUse because the Transaction is not completed, and other database operations cannot execute the corresponding SQL while making connections in the pool.
  • The end ~