Web pages, and spAs in particular, are run in the user’s browser, so there is no way for developers to know about errors, performance data, and other information that is critical to improving page stability and optimizing page performance.

Front-end monitoring is all about collecting this information and sending it to the server. Through ELK (Elasticsearch, Logstash, Kibana) aggregation and visual logs, developers can observe errors and behaviors on the page, collect performance data, and fix and improve the page code.

For example, when a user complains that a button on the page does not respond when clicked and the page does not show any information, this may be due to the page triggering a Bug that was not discovered during development or testing. Then we can find the corresponding error log according to the page address and user information reported by the user, analyze the cause of the error and modify it.

It is not very difficult to develop a front-end monitoring system, mainly on the browser native methods of rewriting and log sending. From the perspective of application, a qualified front-end monitoring system should be able to:

  • Locate the logging trigger location and user.
  • Collect unhandled JS errors and trace user behavior.
  • Listen for AJAX and responses.
  • Calculate performance metrics such as page load time.
  • Logs are reported without affecting main functions on the page, and logs are not omitted as much as possible.

This paper summarizes the key functions of developing a front-end monitoring system and points for attention.

positioning

Basic information

In order to determine which pages the log triggers, the monitoring system needs to record some information, usually:

  • Application name (to distinguish between different projects)
  • Environment name (to distinguish, test, and production environments)
  • Web URL
  • Source URL (document.referrer)
  • time

The user id

To search the log when a user report or complaint is received, the monitoring system can record some information about the user (such as the user ID).

Device fingerprinting

Sometimes you encounter strange problems, which are not necessarily page errors, but may be caused by a particular user’s system or browser environment. If you can determine that these errors are all from the same device, then you can use this to decide whether to ignore the error.

The so-called device fingerprint is a code that can stably and uniquely identify a device from a certain data of the device. Device fingerprints can be implemented in various ways, such as Canvas fingerprint and AudioContext fingerprint.

See FingerprintJS for the implementation and components.

let fingerPrints = await FingerprintJS.load()
let result = await fingerPrints.get()
let fpid = result.visitorId // Device fingerprint
Copy the code

error

All JS errors (both logical and syntactic) are thrown globally if they are not caught (try catch or promise catch). This global error can be caught using the window’s two events, error and unhandledrejection:

window.addEventListener('error'.ev= > {
    // ev: ErrorEvent | Event
}, true)
window.addEventListener('unhandledrejection'.ev= > {
    // ev: PromiseRejectionEvent
}, true)
Copy the code

Errors can provide information roughly as follows:

  • “Message” : error message
  • Stack: call stack
  • Filename: indicates the filename.
  • LineNo: line number
  • ColNo: column number

Depending on the event type, different information can be obtained:

let error
if (event instanceof PromiseRejectionEvent) {
  error = {
    type: 'promise'.message: event.reason.message,
    stack: event.reason.stack,
    lineNo: event.reason.lineNo,
    colNo: event.reason.colNo,
    name: event.reason.name
  }
} else if (event instanceof ErrorEvent) {
  error = {
    type: 'error'.message: event.message,
    stack: event.error.stack,
    filename: event.filename,
    lineNo: event.error.lineNo,
    colNo: event.error.colNo,
    name: event.error.name
  }
} else if (event instanceof Event) {
  let target = event.target as Element

  error = {
    type: 'event'.message: event.type,
    stack: target.outerHTML
  }
} else {
  error = {
    type: 'customer'.message: event.message,
    stack: event.stack,
    name: event.name
  }
}
Copy the code

When the event type is Event, resources (such as IMG, script, and link) may be loaded incorrectly, and there is no message or stack in the object. You can record the element’s outerHTML to mark the element with the error.

Tracing user behavior

In addition to knowing the code location of the error, we also want to know what user action caused the error. In general, the most common action on a page is a mouse click, so we can capture the click event that listens to the document, record the id and className of the triggering element, save it in a queue of maximum length 10, and send it along with the error log.

AJAX

Although there is a log of AJAX calls made by the front end on the backend server, there are sometimes timeout errors caused by network problems or non-responsive server problems that are not logged on the backend server. More often than not, the front end needs a log of its own so that the front end can integrate and process it.

To listen to AJAX, we need to modify the XMLHttpRequest class, primarily by rewriting the open and send methods.

Log dead loop

But until then, since the logs themselves need to be sent using AJAX, we need to keep the original open and send methods (methods that don’t send logs) :

const xhrOpen = XMLHttpRequest.prototype.open
const xhrSend = XMLHttpRequest.prototype.send
Copy the code

Lest it happen:

Send request -->AJAX Log --> Send AJAX log request -->AJAX Log --> Send AJAX log request......Copy the code

The endless cycle of.

open

The main change to Open is to inject a member in the XMLHttpRequest instance that holds information for later use after the request ends.

const open = function ( method, url, async, username, password) {
  this._requestParams = {
    url
  }
    
  return xhrOpen.call(this, method, url, async, username, password)
}
XMLHttpRequest.prototype.open
Copy the code

In the _requestParams member object, the request address is stored.

send

In the SEND method, you need to log the parameters and start time of the request and the corresponding time and response when the request times out (timeout event) or completes (readyStatechange event).

const send = function (data) {
  try {
    if (this._requestParams) {
      let startTime = Date.now()
      this._requestParams.body = data || ' '

      this.addEventListener('timeout'.() = > {
        let endTime = Date.now()
        this._requestParams.time = endTime - startTime

        handler(this._requestParams)
      })

      this.addEventListener('readystatechange'.() = > {
        if (this.readyState === (XMLHttpRequest.DONE || 4)) {
          let endTime = Date.now()
          this._requestParams.time = endTime - startTime
          this._requestParams.code = this.status

          handler(this._requestParams)
        }
      })
    }
  } catch (er) {
    console.error(er) // eslint-disable-line
  }

  return xhrSend.call(this, data)
}
XMLHttpRequest.prototype.send = send
Copy the code

The handler function is used to pass logs to the sender for unified sending. To keep the logging system from interfering with normal functionality, the logging operation is wrapped in a try catch.

performance

Performance, especially loading performance, is an important indicator of web pages. The faster the page loads, the better, but in development and test environments it can be slowed down by the user’s own network and equipment conditions in production. Therefore, we need to collect such information, analyze the reasons why users open the page slowly, and consider improvement methods, such as whether to add CDN.

Critical time point

In order to measure load times, we need to know some key points during page loading, such as when the request started, when the resource finished loading, and so on. Performance. Timing saves this information in the performance. Timing window of the browser.

  • FetchStart: Initiates the request or fetches the HTML from the cache
  • DomainLookupStart: starts domain name resolution
  • DomainLookupEnd: completes domain name resolution
  • ConnectStart: Starts the TCP connection
  • ConnectEnd: Completes the TCP connection
  • RequestStart: Starts sending HTTP requests
  • ResponseEnd: the first byte of HTML received from the server or cache
  • ResponseEnd: Gets the last byte of HTML from the server or cache
  • DomInteractive: The DOM parsing ends and the embedded resources start to be loaded
  • DomContentLoadedEventEnd: All immediately executed scripts are completed
  • LoadEventStart: The resource is loaded and the load event is triggered

FetchStart is usually thought of as the beginning of a page browse. From these points in time, we can calculate some important indicators:

  • PFT (First Paint Time) : responseEnd – fetchStart
  • TTI (Time to Interactive) : domInteractive – fetchStart
  • Ready (HTML load completion time) : : domContentLoadedEventEnd – fetchStart
  • Load (page complete Load completion time) : loadEventStart – fetchStart
  • DNS (DNS query time) : domainLookupEnd – domainLookupStart
  • TCP: connectEnd – connectStart
  • TTFB (Time to First Byte) : responseStart – requestStart
  • Trans (content transfer time) : responseEnd – responseStart
  • DOM (DOM parsing time) : domInteractive – responseEnd
  • Res (resource load time) : loadEventStart – domContentLoadedEventEnd

Based on these indicators, we can analyze the causes of slow page loading and make specific optimization for each link.

Resource load time

JS and CSS are key resources of a page, without which the functionality and style of the page may be incomplete or unusable. Also, their load time affects the first screen time. So we need to collect the load time of the resource.

Fortunately, such an interface is provided by the Performance. getEntries function, which returns an array containing the names of all resources and their loading and parsing times. We extract JS and CSS, record the number and total time, and calculate the average time.

let cssTime = 0
let cssCount = 0
let jsTime = 0
let jsCount = 0

let entries = window.performance.getEntries()
for (let a of entries) {
  if (/\.css$/i.test(a.name)) {
    cssTime += a.duration
    cssCount++
  } else if (/\.js$/i.test(a.name)) {
    jsTime += a.duration
    jsCount++
  }
}

let css = Math.round(cssTime / cssCount)
let js = Math.round(jsTime / jsCount)
Copy the code

web vitals

In addition to the standard time points defined by W3C, Google advocates a set of performance metrics for describing the page experience — Web Vitals. The three most commonly used metrics (also used by Chrome Lighthouse) are:

  • Largest Contentful Paint (LCP) : Measures load performance and indicates how long it takes for the “Largest chunk” of content to appear on a page. It should be in 2.5 seconds.
  • CLS (Cumulative Layout Shift) : Measures visual stability, indicating the magnitude and frequency of page content changes during loading. It should be less than 0.1.
  • FID (First Input Delay) : Measures interactivity and represents the Delay from the user’s First action to the page response. It should be within 100 milliseconds.

There is no direct way to obtain these three indicators because they require complex calculations. An NPM tool is available to help us calculate: Web-Vitals.

Details can be found in Web-Vitals.

It is important to note that these three metrics are not available synchronously. LCP and CLS are not available in server-side rendered pages, and FID needs to get them the first time a user does something (such as clicking on an input box).

Record time

Performance. Timing information is 0 before the corresponding time point, so to obtain accurate information, we need to wait for the page load completion (Window load event) to calculate these indicators and upload the log.

But in some bad cases, the user’s device takes too long to request resources, the load event doesn’t fire, and the page goes blank. The user will probably just close the page and this time the performance log will not be sent. Unfortunately, a log with a long load time is more valuable because it reveals more problems with the web page or server.

Therefore, we need to find ways to send even incomplete logs as much as possible. There are roughly two kinds:

  1. inbeforeunloadorloadUsed in eventssendBeaconSending logs. (Described in the following section)
  2. Save the logs to localStorage and send them the next time the page is started. (For unsupportedsendBeaconBrowser)

send

Logs can be sent either by requesting a virtual resource (such as an image) from the server, or by AJAX. Note that since we are listening to and modifying AJAX’s open and send methods, sending logs using the modified method will cause an infinite loop, so we need to use the original method.

To minimize the impact of sending logs on web functionality, we can use the requestIdleCallbackAPI to send logs when idle. It works like this: if a frame (typically around 16ms) has time left after the refresh is complete, the incoming callback is executed. This means that the log is not sent immediately, but needs to be stored in a queue, waiting to be sent, and the queue emptied once sent.

Therefore, sending logs is divided into three steps:

  1. Add the log
  2. Bide your time
  3. Send the log

Written as a class, it looks something like this:

const timeout = 5000
const xhrOpen = XMLHttpRequest.prototype.open
const xhrSend = XMLHttpRequest.prototype.send

/ * * *@name The sender * /
class Sender {
  / * * *@name Construction method@param Url address * /
  constructor(url) {
    this.url = url
    this.queue = []
    this.callbackId = 0

    window.addEventListener('unload'.this.clear.bind(this)) // It is processed in UNLOAD because it needs to be preceded in beforeUNLOAD
  }

  / * * *@name Waiting for idle */
  wait() {
    if (this.callbackId) {
      window.cancelIdleCallback? . (this.callbackId)
      this.callbackId = 0
    }

    if (this.queue.length) {
      if ('requestIdleCallback' in window) {
        this.callbackId = window.requestIdleCallback(
          () = > {
            this.send()
          },
          { timeout } // Force to send timeout)}else {
        this.send() // If requestIdleCallback is not supported, send it immediately}}}/ * * *@name Send the * /
  send() {
    try {
      let xhr = new XMLHttpRequest()
      xhr.timeout = 5000
      this.xhrOpen.call(xhr, 'POST'.this.context.url, true)
      this.xhrSend.call(xhr, JSON.stringify(this.queue))
    } catch (er) {
      console.error(er) // eslint-disable-line
    } finally {
      this.queue = []
    }
  }
  / * * *@name Send the rest */
  clear() {
    if (this.queue.length) {
      let data = JSON.stringify(this.queue)

      if ('sendBeacon' in window.navigator) {
        window.navigator.sendBeacon(this.context.url, data)
      } else {
        this.send()
      }
    }
  }

  / * * *@name Add message@param Message message * /
  add(message) {
    this.queue.push(message)

    this.wait()
  }
}
Copy the code

When used, we simply call sender.add to add the log, and the sender will send it when appropriate.

The remaining log

Notice that the Sender class has a clear method, which is used to send as many remaining logs as possible if the user closes the page but still has none left. It uses sendBeacon to send logs, unlike XMLHttpRequest, which unsends after the page unloads in asynchronous sending; SendBeacon also sends asynchronously, but it sets the task to the browser’s task, even if the page unloads, the send will still be completed, thus improving the log sending success rate.

The resources

  1. PerformanceTiming
  2. Ali Cloud application real-time monitoring service monitoring indicators
  3. Web Vitals
  4. requestIdleCallback
  5. sendBeacon