preface

As a beginner of Node language, I was not only excited but also a little nervous to practice back-end development. Fortunately, everyone was very open and gave a lot of suggestions and shares. So far, I have successfully built three projects based on Node.js + TypeScript + IMServer 1. It is also time to summarize my recent learning process. The following is to share my growth process with a small development task as the carrier.

demand

After completing the construction of Node project, I received the first Node background development task: regularly pulling the organizational structure information of enterprise wechat into the business database system, and providing the user query interface for mobile phone number query. At the beginning of this task or relatively optimistic, full of confidence to develop.

A preliminary plan

After the scheme design, the above scheme is formed:

1. Start the scheduled task of Node-schedule during server deployment initialization (init.ts initial startup file), read the enterprise configuration of enterprise wechat in the database, and then start the organizational architecture update process of several enterprises in parallel.

2. Enterprise wechat provides the details of acquiring department members, so the information of each department needs to be updated in parallel and written into the mysql database.

3. When the query interface arrives at the server, first query the member corresponding to the mobile phone number from the database. If there is no member, call the mobile phone number from the wechat side of the enterprise to obtain the userID API, and then obtain the latest user information through the user information API to avoid the update time gap caused by periodic update. If yes, the information in the database is returned.

Development of trample thunder

The overall business logic is not complicated. There are many problems encountered during debugging and deployment. Here are some examples:

1. Limited access frequency Wechat officially stipulates that the number of requests for the same resource at the same time should not exceed a certain value (60). As the request interface of department details adopts the parallel mode, the threshold is exceeded, and IP was officially banned during the test.

2. Too many processes lead to slow SQL query. There are 120 update processes at the same time without considering the deployment of multiple sites (3 sites * 5 servers * 8 workers), which leads to the confusion of read and write of mysql database and consumes a lot of performance, resulting in some slow queries when the database read and write pressure is relatively high

3. Invalid Mobile phone Number Cannot call enterprise wechat API The interface for enterprise wechat to obtain userID of mobile phone number has the following restrictions: When a certain number of invalid mobile phone numbers appear in the query, the official IP address of enterprise wechat will be blocked. However, there are a large number of invalid mobile phone numbers after quitting in the business system. Therefore, when it is found that the phone numbers do not exist in the database, the blocking will be triggered by frequent invocation of the above interface.

4, database read and write conflict Because there are multiple servers reading and writing database at the same time, resulting in partial duplication and lack of database.

5. The network environment leads to the failure of the balance of read/write lock, resulting in derivative problems. In order to optimize the above part, the task read/write lock is introduced to ensure the single process update. As a result, Intranet servers hold read/write locks all the time and lose the effectiveness of load balancing. During the configuration of the pre-online environment, the pre-online environment always holds the read/write lock due to the good network environment, which affects real-time online data.

6, failed to consider the failure of the alarm and recovery

Depth optimization design

How to solve these problems and ideas and solutions are introduced below.

1. Limited access frequency

Here, the parallel request for “department member information API” is transformed into a serial transmission mechanism based on the effective frequency value, designed to be called at 10 calls per second.

2. Too many processes cause SLOW SQL queries

The obvious solution is to reduce the number of processes that start scheduled tasks.Back-end services are classified into test environments, pre-online environments, and formal environments. You can determine whether to start timer scripts in different environments by setting SCHEDULE_ENV during deployment (USING SKTE as an example).Each server will start 8 worker processes, which each worker uses"The process. The env. IMSERVER_WORKER_ID"Variable, so you can design only the “worker1” process to start scheduled tasks;

3. Invalid mobile phone number cannot call enterprise wechat API

This is the situation that failed to be found in the technical research, and found that the work of the early technical research was negligent. Firstly, the business caller cannot know whether the mobile phone number is valid or not, and should not care about this limitation. Therefore, the real-time query mechanism introduced in order to solve the problem that some new records are not updated in time is unreasonable. Real-time query mechanism: “for the database there is no phone number, through the enterprise WeChat official API for real-time query to return the results” so removed the mechanism, and provides a real-time query interface based on enterprise WeChat official API, each business calls, will result in an update to the organizational structure.

4, database read and write conflicts

The redis task lock mechanism is introduced to ensure that only one process can update the database at the same time.Secondly, the update between enterprises adopts the parallel mechanism. Since they are not in conflict with each other, the read and write conflicts of the same record are not caused and the update speed can be improved.

5. Network environment leads to the failure of balance of read/write lock, resulting in derivative problems

In the initial design, I hoped that the servers could compete fairly for task lock according to their own load. However, the actual situation is that due to the multi-site deployment, the stable Intranet environment can always obtain the task lock first, so there is no so-called fairness. Especially when the pressure test needs to be deployed in advance online environment, if there is no set read-only db account and did not set the start timing task environment variables, these two mistakes can lead to an update logic structure adjustment of code update to online, online has always been the old logic in the execution, after a series of screening we found online environment has gained the read-write lock, Update the database with the old logic. Therefore, environment variables are added to control the start of scheduled tasks, database permissions are distinguished in the pressure test environment, and read-only mode is added.

6. Alarm and error recovery

There’s a little bit of front-end mind-set here, which is just as important.

Alarm on

Is access toIMLogtheNode SDKThrough theKibanaGrafanaSystem configuration, can effectively monitor the update of the organizational architecture.

Error recovery aspect

The error here mainly occurs when the access_token of the enterprise wechat API expires, which usually occurs in the following two situations:

1. The wechat official of the enterprise takes the initiative to expire the access_token

2. In the process of updating the organizational structure, the access_token is just invalid, that is, the HTTP transmission to the enterprise wechat is just invalid. The above situation is unavoidable. Here, middleware is used to encapsulate Node. fetch and increase the verification of the return value of response. If the return value of enterprise wechat API is “wx_code. INVALIDE_TOKEN”, it will give warning and reset accessToken.

export default (app) => {
  const { utils: { imlogHelper } } = app;
  const wrapperLogFetch = (originFetch, { traceId, header, client_ip, }) = > async(... args) => {const res = awaitoriginFetch(... args);if (res.errcode === WX_CODE.INVALIDE_TOKEN) {
      // Perform the update logic
      wxService.clearAllRedisKey();
      imlogHelper({
        cmd: url,
        message: 'accessToken_update_warning'.body: JSON.stringify(res),
        trace_id: traceId,
        retcode,
        headers: header,
      });
    }
    return res;
  };
    // Override the context.fetch method
  return async (ctx, next) => {
    if(! ctx.logFetch) {const originFetch = ctx.fetch;
      const { traceId, ip: client_ip } = ctx.request;
      const header = JSON.stringify(ctx.request.header);
      const logFetch = wrapperLogFetch(originFetch, {
        traceId,
        header,
        client_ip,
      });
      ctx.logFetch = logFetch;
    }
    if(ctx.fetch ! == ctx.logFetch) { ctx.fetch = ctx.logFetch; }await next();
  };
}
Copy the code

conclusion

After redesign and verification, the above design scheme is formed, with the following optimization points: 1. First, through the task lock based on Redis SETNX, to achieve the same time single process update database; 2. Ensure the separation of different environments by setting environment variables for starting scheduled tasks and setting database read and write accounts during deployment; 3. Maximize performance and avoid API call blocking through the serial mode of enterprise parallel and department data pull interface; 4, improve the error recovery mechanism and alarm, real-time check the running status. I hope this article has given you some insight on your Journey to Node. Please feel free to like, bookmark and comment.


  1. IMWeb team’s Node version of the Web framework, similar to Koa. ↩