Production malfunction!!

Alarm Information Display

Alarm level: critical fault Alarm content: Redis: Springboot-digest-consumer Type: master_slave Node:10.111.194.:6391Impact range of abnormal primary/secondary connection: ****Copy the code

As a result, many services in the production of redis set and GET values do not operate properly, resulting in a lot of throwing errors.

But what causes it?

At first I thought it was a network problem, but after checking it, I found this command:

keys *
Copy the code

But we didn’t use this command.

So what causes it?

Through the search, it was found that it was operated by a certain IP machine, and the operator was actually the big guy around me, and I looked at him when he was operating.

At that time, there was a mistake thrown in the program, so THE redis client RDM was used in the production springboard machine to find whether a certain key value of Redis was stored.

For example, redis has 16 db databases from DB0 to DB15. He accidentally opened DB0, and the result was practically implemented

keys *
Copy the code

And this DB0 there are hundreds of thousands or even millions of data, directly obtain the contents of the library all the key values, that is quite scary, will directly make redis CPU soared, redis stuck locked, affecting the subsequent operation.

And if all the requests in Redis can not get the data, will go to the database, so as to produce great pressure to the database, resulting in redis avalanche effect, so that the database down events, if the payment company, the loss will be heavy.

Fortunately, it didn’t affect us very much this time.

analyse

  • Don’t use keys. Denial can be disastrous. Companies are advised to disable this command;
  • Production jumpers do not use RDM connection, if you need to search, the best connection redis instance search;