directory

  • preface
  • What data to collect
    • performance
    • error
    • Auxiliary information
    • summary
  • Client SDK (probe) related principles and apis
    • Web
    • Wechat applets
  • Write test cases
    • Unit testing
    • The process test
      • A way to provide a Web environment
      • Mock Web API
  • conclusion

One, foreword

With the development and attention to the front end, the industry gradually attaches more importance to the front end monitoring system. I won’t explain why monitoring is necessary here. So let’s get straight to demand.

For small and medium-sized companies, they can directly use three-party monitoring. For example, they can set up a set of free Sentry to catch exceptions and report events, or use ARMS of Ali Cloud, which has comprehensive functions and is not too expensive. There are many similar open source systems or payment systems, which can meet our needs.

If this company gradually grows, has become a large company, the number of users, business services, the overall structure of the company are all in the upgrade, so that the monitoring system of the three parties may slowly appear some can not meet the needs of the problem. For example, the relationship between the various systems in the enterprise is too independent and scattered, which cannot use internal unified login, cannot jump to each other, and it is not quickly supported to add some field collection. These problems will lead to efficiency can not meet the requirements of enterprise development. An internal controllable and high-speed response to the needs of the enterprise’s front-end monitoring system is very necessary.

We have invested a certain amount of energy and time in the internal front-end monitoring system. Today, we would like to share the content of the front-end monitoring SDK, mainly in three aspects:

  • What data to collect
  • Client SDK(probe) and principle
  • Write test cases

Second, what data to collect

The core of the front-end monitoring system is to collect the relevant data of the client. The client probes we support now include web, wechat applet, Andriod and ios. They mainly collect the following information:

2.1 performance

Collect performance information such as page loading, static resources, and Ajax interfaces, including load time, HTTP protocol version, and response body size, to provide data support for improving overall service quality and solve slow query problems.

2.2 the error

Collecting JS errors, static resource loading errors, and Ajax interface loading errors are all fairly well understood. The following describes business interface errors:

When a client sends an Ajax request to the back-end business interface, the interface returns a JSON data structure, which generally contains two fields: ErrorCode and Message. Errorcode is the status code defined internally by the service interface. Normal business responses have internal conventions such as ErrorCode ==0. If errorCode is not 0, it may be an exception or foreseeable exception, and such error data needs to be collected.

Because different teams or interfaces may have different conventions, we will only provide a default method, which will be called after the Ajax request response. The business side will write the judgment logic control in the default method according to the JSON data of the convention and response. Something like this:

errcodeReport(res) { if (Object.prototype.toString.call(res) === '[object Object]' && res.hasOwnProperty('errcode') && res.errcode ! == 0) { return { isReport: true, errMsg: res.errmsg,code: res.errcode }; } return { isReport: false }; }Copy the code

2.3 Auxiliary Information

In addition to the above two types of hard indicator data, we also need a lot of other information, such as: user access track, user click behavior, user ID, device version, device model, UV/UA identifier, traceId and so on. Most of the time, the problems we need to solve are not so simple and directly can be checked out, and even we need front-end monitoring and other systems to be associated in some cases, so these soft indicator information is also very important.

Here’s a special explanation of traceId:

Today’s back-end services use APM (Application Performance Management) systems. The APM tool generates a unique ID, usually called traceId, at the beginning of a complete request invocation, which records the link details of the server throughout the request process. If the front-end can get it, it can be used to query the log information of a request in the back-end APM system. As long as the back-end interface is configured properly, the traceId can be returned to the client in response to HTTP requests from the client, and the SDK can collect the traceId of Ajax requests. In this way, the front-end and back-end monitoring can be associated.

2.4 summary

Collecting the above information and developing a management console can achieve the purpose of monitoring front-end performance and exceptions. Imagine a scenario where, when we receive the alarm from the monitoring system or the feedback from the relevant colleagues, we can open the management console and see the real-time error first. If we find that the problem is caused by THE JS code, we can quickly find the error of the front-end code. If it is not a front-end error, we find it is a back-end interface problem by collecting the business interface error. We can also timely inform the back-end colleagues when the interface reported an error with errorCode xx. In addition, we can directly check the back-end link monitoring data of this Ajax request through traceId. If the problem is not obvious enough to be detected, we can also restore the scene of the user by analyzing the collected data such as user track, device information and network request in many aspects, so as to help us troubleshoot the bugs or compatibility problems that are difficult to reproduce in the code.

In the above scenario, we were able to improve the front end troubleshooting ability, and even assist the back end students. In most of the time, there are bugs, it is likely that the first time is to find the front end to do feedback, the front end is the vanguard of troubleshooting problems. When we have such a front-end monitoring system, not every time we encounter a problem at a loss, the time to solve the problem will be much faster.

[List of specific fields]

Having determined what information to collect, the next step is to implement the client SDK, which automatically collects data from business items and reports it to the server.

Iii. Client SDK (probe) related principles and APIS

The so-called probe is because our SDK relies on the running environment of the monitored front-end project and adds probe function to the bottom API of its running environment to collect information. The main principles and APIS used for the realization of WEB and wechat small program SDK are shared below.

3.1 WEB

The following figure shows the main Web API used by the SDK. Through these apis, we can obtain page performance information, resource performance information, Ajax information and error information respectively.

3.1.1 Performance

Performance. Timing can be used to obtain performance data such as DNS, TCP, and white screen time when the page is first loaded. However, performance. So we also modified to use performance. GetEntriesByType (‘ navigation ‘). The white screen time here may be different from the actual white screen time perceived by real users, for reference only.

With the New PerformanceObserver listener, we can listen for performance data loaded by all resources (CSS,script, IMG, Ajax, etc.) : load time, response size, HTTP protocol version (http1.1/http2), etc. Then we need to manage the resource performance data through an array and empty the array after the data is reported.

3.1.2 the fetch/xmlHttpRequest

Since browsers do not provide a unified API to collect Ajax request and response data, and whether we use axois or other HTTP request libraries, they are implemented based on FETCH and xmlHttpRequest. Therefore, the collection can only be achieved by overriding fetch and xmlHttpRequest and inserting custom code into the corresponding functions and logic. There are many related articles, so I won’t go into details here.

let _fetch = fetch; window.fetch = function () { // custom code return _fetch .apply(this, arguments) .then((res) => { // custom code return res; })};Copy the code

3.1.3 window. The onerror | unhandledrejection | console. The error | the surveillance function and framework

These last apis collect JS-related error information. Two issues need to be noted:

One is that onError does not get cross-domain script errors, and the solution is simple: set the Crossorigin attribute for cross-domain script tags and require the static server to set the CORS response header for the current resource.

Second, the error message after code compression needs to be parsed out the corresponding column and column and error message of the source code through sourceMap file. SourceMap itself is a kind of data structure, which stores the relational data of source code and compressed code. It can be easily converted through the parsing library. However, how to automate the management and operation of sourceMap files is the core of the front-end monitoring system to be solved. It is necessary to combine static resource publishing system and front-end monitoring system in the enterprise to solve the inefficient manual packaging and uploading problem.

3.2 wechat applets

Js is used at the bottom of wechat small program, which has its own life cycle and provides a global API. By overwriting some of its global functions and related apis we can get: network requests, error messages, device and version information, etc. Since the loading process of wechat applets is controlled by wechat APP, and js and other resources are also hosted by wechat internally, different from web, we cannot obtain the page and resource loading information that Performance can obtain on Web (it was later found that the applets are in v2.11.0 (2020-04-24) version. The new API provides performance object metrics, which can be used in the future. The following figure shows the MAIN APIS used by the SDK

3.2.1 App and Component

By overwriting the global App function, binding the onError method to listen for errors, overwriting its onShow method to perform the logic required by the SDK when the applet is started. By overriding Component’s onShow method, we can perform our path collection and reporting logic when page components switch.

Init (){this.appmethod = App; this.componentMethod = Component; const ctx = this; Component = (opts) => {override decomponent (opts, CTX); ctx.componentMethod(opts); }; App = (App) => {overrideApp(App, CTX); ctx.appMethod(app); }; } // This overrideComponent(opts, CTX) => {const compOnShow = opts.methods.onshow; Opts.methods.onshow = function(){// do something // arguments) } }) overrideApp(app, ctx) => { const _onError = app.onError || function () {}; const _onShow = app.onShow || function () {}; app.onError = function (err) { reportError(err, ctx); return _onError.apply(this, arguments); }; app.onShow = function () { //do something return _onShow.apply(this, arguments); }; })Copy the code

3.2.2 rewrite wx. Request

As with FETCH /xmlHttpRequest, there is no global API to capture request information, so we can only listen for collection by overriding wx.Request.

const originRequest = wx.request; const ctx = this; DefineProperty (wx, 'request', {value: function () { // sdk code const _complete = config.complete || function (data) {}; config.complete = function (data) { // sdk code return _complete.apply(this, arguments); }; return originRequest.apply(this, arguments); }})Copy the code

Once we have implemented the SDK or are in the process of implementing it, we need to write test code, so let’s talk about writing test cases.

Write test cases

The SDK is a separate library that needs to be maintained and updated for a long time. It is used in many business projects and requires more stability. When problems occur, it is expensive to update. Need to go through: update code -> release new version -> business update dependent version, etc., and if in this process, if the SDK changes another problem, it will start the above cycle again, business colleagues will definitely be in trouble. As the number of systems connected to monitoring increased, it became scary to change any code in an iterative process, because there was a lot of procedural correlation logic, and there was a fear of error. During a code refactoring and optimization process, resolve to improve unit and process testing.

4.1 Unit Test

Unit testing is mainly for some common methods that have obvious input and output, such as common methods in SDK utils, SDK parameter configuration methods, etc. For the monitor SDK, more of the testing code is focused on process testing, which is not specified here for unit testing.

4.2 Process Test

After the monitoring SDK is initialized in a business project, it mainly collects and uploads information by adding probes to monitor the running status of the business project. In most cases, it does not execute whatever the business side calls. For example, when the page is loaded for the first time, the SDK will collect and upload the information related to the first load when appropriate, so we need to simulate this process through the test code to ensure that the reported data is expected.

Our SDK runs in a browser environment and does not support Web apis in a Node environment. So we need to make our test code run in a browser or provide support for the API. Here we’ll look at two different ways to enable our test code to run properly.

4.2.1 Method of providing the Web Environment

If we use Mocha or Jest as the test framework, we can write and execute our test code in HTML through the mocha. Run method of Mocha, and open and run it in the browser. Jest-lite also supports running JEST in a browser.

However, sometimes we don’t want it to open a browser, and we want the test code to run on a terminal. We can use a headless browser and load the browser environment in Node, such as Phontomjs or Puppeteer. They provide tools such as Mocha-PhantomJS to run HTML directly from the terminal to execute the test flow.

Based on the written HTML test file and using Mocha-Phantomjs and Phantomjs, here is the package.json command configuration.

scripts:{
    test: mocha-phantomjs -p ./node_modules/.bin/phantomjs /test/unit/index.html
}
Copy the code

Phontomjs is deprecated and not recommended. Puppeteer is recommended; related functions and similar tools are supported.

For example:

This has been used before in the WebSocket code base. Because of the reliance on the Web Api: WebSocket. New WebSocket() is required to complete the testing process, and node does not have this API. You can use Mocha to write test cases in HTML, or if you want to run your tests on a terminal throughout, you can use Mocha-Phantomjs to make your test HTML files run on a terminal instead of opening a local web page.

Of course, you could have just opened the HTML in a browser to see the results of the test run, and the dependencies associated with PhantomJS are very large and slow to install. But at that time we were using the continuous inheritance service Travis, and when our code was updated to the remote repository, Travis would start multiple separate containers and execute our test files on the terminal, which would not pass Travis without running tests on the terminal using Mocha-PhantomJS.

4.2.2 Mock Web API approach

In the process of perfecting the monitoring SDK test, I tried another approach, using Mock throughout.

The Web environment above operates in a way that requires either a browser or a headless browser. But the actual code we need to test isn’t Web apis, we just use them. We assume that they are stable, we only care about their input and output, and if they are internally buggy, we can’t control them, that’s the browser developer’s business. So all I have to do is simulate the relevant Web API in the Node environment.

Take WebSocket as an example. Since Websockets are not supported in Node, there is no way to create a new WebSocket. Const WebSocket = require(‘WebSocket’); const WebSocket = require(‘WebSocket’); So we don’t have to run it in a browser or headless browser environment.

The following is a specific example of how the FETCH in our monitoring SDK simulates process testing. In general, the following three contents should be supported.

  1. Start an HttpServer service to provide the interface service
  2. Introduce tripartite libraries and have Node support fetch
  3. Manually simulate part of the Performance API in Node

First of all, the normal process of FETCH in SDK is explained. When our SDK is initialized in a business project, THE SDK will rewrite FETCH, so that when fetch is actually used for business interface request in a business project, THE SDK can obtain HTTP request and response information through the previously rewritten logic. In addition, the system obtains and reports the performance information of the FETCH request through performance. The test code that we’re going to write is to verify that this process works.

(1) HTTP Server

Since we are validating the complete fetch process, we need to start an HttpServer service that provides an interface to receive and respond to the FETCH request.

(2) Mock fetch

If fetch is supported in the Node environment, we can directly use the three-party library Node-fetch, at the top of the execution environment, we can define the fetch ahead of time.

/** MockFetch.js */
import fetch from 'node-fetch';
window = {};
window.fetch = fetch;
global.fetch = fetch;
Copy the code
3) Mock performance

Performance is a special feature that no third-party library supports. For the FETCH process, if we want to simulate Performance, we only need to simulate the PerformanceObserver we use, and even some of the inputs and returns we can only simulate what we need. The following code is an example of PerformanceObserver use. In the SDK, we mostly use this code as well.

/** PerformanceObserver */ var observer = new PerformanceObserver(function(list, obj) { var entries = list.getEntriesByType('resource'); for (var i=0; i < entries.length; i++) { // Process "resource" events } }); observer.observe({entryTypes: ['resource']});Copy the code

The Performance layer is automatically listening for resource requests inside the browser, and we’re just providing PerformanceObserver to collect its data. In essence, active collection behavior probes are implemented within Performance.

Let’s emulate some of the PerformanceObserver features to support the testing process we need. Definition window. PerformanceObserver as a constructor, fn the incoming parameters are added to the array. MockPerformanceEntriesAdd is that we need to manually call the method, when we launch a fetch, we manually invoke this method once the mock data into to the registered monitoring function, This enables an instance of PerformanceObserver to receive our mock data to simulate the behavior inside Performance in the browser.

/** MockPerformance.js */ let observerCallbacks = []; / / simulation PerformanceObserver object, add resources to monitor the queue window. PerformanceObserver = function (fn) {this. Observe = function () {}; observerCallbacks.push(fn); }; / / manual trigger queue window. The simulation performance resources mockPerformanceEntriesAdd = (resource) = > {observerCallbacks. ForEach ((cb) = > {cb ({ getEntriesByType() { return [resource]; }}); }); };Copy the code

Popular point for example, no. 10 company to pay wages to workers’ bank cards, workers’ salary bank cards the next day will be deducted mortgage. Workers are most concerned about the protection of normal mortgage deduction otherwise affect credit investigation. Originally, the worker only needs to pay attention to whether the bank has successfully completed the deduction, but the worker lost his job recently and the company can not pay the salary card, so he can only transfer money to his credit card with his savings card, so that the bank can deduct the money to pay the mortgage. Company is the bottom of the browser performance, working people to turn their money is mockPerformanceEntriesAdd, pay to the bank card to replace the company for their turn into money, from passive to active. Fine, you fine ~

MockPerformanceEntriesAdd is the simulation of the active browser, into the performance information, we can directly write die mockData (below). Look at the test code

/** test/fetch.js */ import 'MockFetch.js'; import 'MockPerformance.js'; import webReportSdk from '.. /dist/monitorSDK'; Fetch const monitor = webReportSdk({appId: 'appid_test',}); const mockData = { name: 'http://localhost:xx/api/getData', entryType: 'resource', startTime: 90427.23999964073, Duration: 272.06500014290214, initiatorType: 'FETCH ', nextHopProtocol:' H2 ',... } test('web api: fetch', () => { //GET const requestAddress = mockData.name; fetch(requestAddress, { method: 'GET', }); / / send the request, want to emulate the browser performace data to monitor window. MockPerformanceEntriesAdd (mockData); })Copy the code

When mockPerformanceEntriesAdd execution, the SDK PerformanceObserver can be collected within the mock performance information. (note here, we also need to start a httpserver service, service interface http://localhost:xx/api/getData

When the above test code to run the SDK to obtain the address for http://localhost:xx/api/getData, fetch request, response, and performance information, and the SDK will also send a fetch request elevate the collected data to the back-end service. We can rewrite window.fetch again to intercept the SDK’s reported request, and then we can get the request content and use the request content to make the expected test judgment

// Rewrite fetch again to intercept the request and skip reporting const monitorFetch = window.fetch; let reportData; Window.fetch = function () {// We will make a type flag for the data reported by the SDK, If (arguments[1] && arguments[1].type === 'reportable -data') {// Get reportData = JSON.parse(arguments[1].body); return Promise.resolve(); } return monitorFetch.apply(this, arguments); }; Expect (reportData.resourcelist [0].name).toequal (mockData.name); // Expect (reportData.resourcelist [0].name).Copy the code

The merged test code

/** test/fetch.js */ import 'MockFetch.js'; import 'MockPerformance.js'; import webReportSdk from '.. /dist/monitorSDK'; Fetch const monitor = webReportSdk({appId: 'appid_test',}); // Rewrite fetch again to intercept the request and skip reporting const monitorFetch = window.fetch; let reportData; Window.fetch = function () {// We will make a type flag for the data reported by the SDK, If (arguments[1] && arguments[1].type === 'reportable -data') {// Get reportData = JSON.parse(arguments[1].body); return Promise.resolve(); } return monitorFetch.apply(this, arguments); }; Const mockData = {name: 'xxx.com/api/getData', entryType: 'resource', startTime: 90427.23999964073, duration: 272.06500014290214, initiatorType: 'FETCH ', nextHopProtocol:' H2 ',... } test('web api: fetch', (done) => { //GET const requestAddress = mockData.name; fetch(requestAddress, { method: 'GET', }); / / send the request, want to emulate the browser performace data to monitor window. MockPerformanceEntriesAdd (mockData); SetTimeout (()=>{expect(reportData.resourcelist [0].name).toequal (mockData.name); //more expect... done() },3000) })Copy the code

As shown in the figure above, we mainly used this pattern for SDK flow testing and code writing. With the test code, the stability and controllability in the iterative process of code maintenance can be guaranteed to a large extent, and a lot of later testing costs can be saved.

Five, the conclusion

These are the three core aspects of our monitoring SDK, as well as many other details and implementations, such as: how to throttling, reporting timing, data merging, initial configuration, etc. During development iterations, avoid client SDK or back-end service compatibility issues due to iteration. It is also more important to consider the later database query and storage requirements, collection, storage and query to complete the front-end monitoring system.

– End –

Follow the official account of the great poet, the first time to get the latest articles.