Node.js Operation and Maintenance Problems (1)

In view of the rapid updating of front-end technology and the need to improve the front-end technology level of my team, I proposed to Boss at the end of 2016 a complete set of schemes that adopted the structure of separating technologies from the front and back by referring to Ali’s Midway Island. The overall technical architecture of our project team is Node.js + Java. In terms of back-end Java, Spring Cloud began to emerge in 2016 with the adoption of microservices technology. As far as front-end Node.js is concerned, we boldly use Node.js as a front-end server for adapting various back-end interfaces in order to improve the level of front-end technology.

Due to the introduction of node.js operation and maintenance, this paper will not discuss such architecture, business characteristics or application scenarios.

Node.js in addition to front-end engineering, in the implementation of business logic, javascript syntax compared with object-oriented programming Java, development efficiency is fast.

I remember that when the project team separated the front end from the back end, it often configured one front end student with four back end students to fulfill one business requirement.

However, due to the nature of the javascript language, running code on the server is prone to memory leaks and high CPU load.

There is no more effective open source monitoring system on the market except for some better solutions.

In the actual operation and maintenance monitoring process, when performance problems occur, it is not only the CPU and Memory indicators that can solve all problems, but it is necessary to analyze the problems online and at that time.

At present, in terms of this demand, the relevant operation and maintenance tools are more or less not very good, and the solutions to the operation and maintenance pain points are not perfect. Of course, these operations tools point to the following:

The operation and maintenance tools represented by OneAPM and NewRelic, which are hard coded in the business code, cannot go deep into the bottom layer and analyze the running status. They can only look at information such as CPU, memory, whether the process has been restarted, and so-called transaction. In most cases, o&M also needs to be able to dump the current server under heavy traffic to facilitate analysis and location. Unfortunately not.
Pm2 as the representative of the complex operation and maintenance tools, in addition to charging, too complex, complexity far more than the general Web application projects. A Google search for PM2 sometimes reveals a memory leak, which is pM2’s own memory leak. However, a process restart due to irregularities in the business code is one way to solve the memory leak problem. No! Should be circumvented it! Because restart process, free memory! The paid version of PM2 uses v8-Profiler, an open source third-party package that takes full advantage of node.js’s V8 API and implements complex JS logic internally.
Although monitoring tools, such as Tencent’s TSW and Alibaba’s Pandor. js, have their own dashboards, or not all of them have their own dashboards, they have no way to go deep into the bottom layer and analyze the runtime.
Easy-monitor, written by @Hyj1991, combines some characteristics of PM2 and TSW to locate and solve node.js memory leak offline. But it doesn’t work on the line.
Others declined to comment.

After looking at all the node.js operations tools on the market, we also have an Alinode. Alinode was created by master @Piuling and others. It is said that it has ensured the stable operation of many business lines of Ali. It even opened a column on Github after the tmall Double 11 fire in 2015.

Alinode features the ability to locate profile Node.js processes online. This magical stuff is essentially node.js revamped to add some new features.

What impressed me most were the following points:

Online profile,dump, trace-GC
Visualizing dump files, sometimes producing dump files in gigabytes, which is more than chrome-dev-tools can handle by itself.
No need to change any business code, direct deployment, fully compatible with official Node.js.

4. Logs are generated and automatically reported by AgentX.

However, for my Intranet project, it is unlikely to send out performance data, so we developed a Node.js by ourselves in combination with the current node.js monitoring project on the market.

Our Node.js has the following features:

Generate logs directly without changing any code.
Unlike Alinode, we do not occupy the CPU number of Node.js process, but use go language to write agent to achieve efficient log parsing.
We can also perform online dump and profile a Node.js process

Our model is as follows: