When something goes wrong online, in addition to viewing logs and troubleshooting exceptions, there is another important thing — notification. Because the company’s business form and attributes are different, the business side is also different. If the business party is a colleague in another group of the company (internally), then consider informing him directly. If the business side is a user/customer (external), then it is best to notify the company’s operations/business first and let them handle the external side.

If there is no basic process, then the failure will inevitably encounter the following situation:

  1. Business side: Alas, I can’t access/have no data. Can you see what the problem is?
  2. Business side: How long will it take you to recover?
  3. Leader: See what’s going on.
  4. Agent: Is this a big problem?
  5. Leader: Let’s inform each business party and then deal with the problem.
  6. Agent: The price information is missing. Is it caused by this fault?
  7. Leader: Hey, have you notified the business group?

See the problem?

All parties are asking about the specifics of this matter, the scope of the impact, the size of the impact and the level of failure, and they are not prepared to ask. You also did not have what preparation, asked you to answer, not clear of prevaricate on the past.

Establish notification process

The above conversation occurred because we did not fully understand the failure ourselves and did not care about the impact on the business side. Relying solely on verbal commands, meeting criticism, and second-guessing doesn’t really help, because execution deteriorates over time, especially if everyone is busy.

In this case, establishing a basic process is more effective than criticizing and admonishing. Based on the above conversation, we can generally deduce these categories:

  • [x] The business item affected;
  • [x] Size of influence range;
  • [x] The duration of an effect;
  • [x] Level of influence;

After determining the content, inform the superior responsible person, and inform some useful information. Then notify the relevant business parties and tell them at once what they want to ask. With that done, let’s set up a basic notification process:

  1. Collect fault and impact information, such as fault status, affected services, impact scope, impact duration, and impact level.
  2. After the information collection is completed, a little collation;
  3. Inform the superior responsible person, describe the problem, scene, impact, if possible, inform the general direction of investigation or solution, the most important is the time estimate;
  4. Notify the relevant business party formally, and inform the fault status, impact, and expected recovery time by email or nail;
  5. Fault registration;
  6. Inform relevant business parties after recovery;

When the basic notification process is established, inform all members of the team/department that it should be strictly followed.

Copyright watermark wechat public number Python programming reference | technical column
https://www.weishidong.com

Practice is king

As mentioned above, execution deteriorates over time. Although the basic notification process is established, it is actually a hollow process. What is the scope of influence? How about the duration? What is the impact level?

If these cannot be determined, the success rate and effectiveness of the notification will be less than satisfactory. For this to work, the notification list must be determined on the basis of the notification process. An effective notification list looks like this:

It looks pretty good, doesn’t it. I believe that the business side or the superior leadership to see such a fault notification information, will not spend more time to ask what, so as not to delay the program recovery.

A simple table is not enough. For example, how do I know who the business parties associated with this project are? How are these levels determined? How are these numbers generated? Don’t panic. Let’s take it one at a time. We can treat the information in the notification list as the query results. The information that is not directly reflected can be obtained by the associated query, so we need several more tables:

  • [x] A form used to determine the business level;
  • [x] Table for determining impact levels;
  • [x] The business table associated with the business party;

These tables need to be sorted out by the team/department together. After finishing, we can determine several related items in the notification list according to the information on the table. Below is a few points of reference table, you can use in the work according to the actual situation can be slightly adjusted.

Reference table for influence grade grading

Business level grading reference table

Association number naming reference table

This is how the number spider-a-job-price-01 in the example is derived.

Business parties associate reference tables

With these forms and notification lists in place, the implementation of the matter can proceed smoothly.

To learn more, check out the free, 253-page, full-color Python Programming Reference ebook, which is available as a downloadable PDF. Follow the wechat official account [Python programming Reference] to get the download address.