As for the website performance that the project team is currently optimizing, I was confused at first. How to measure that a mobile web page is excellent and can reach the speed of complete loading within 200ms or even 1s? Or depending on all the classic TTI, FCP indicators (TTI.FCP). Hyperlink).
Guided by the leader, it has become one of my recent goals to look at some of the most popular solutions on Google. Here are some emerging metrics about web performance in the last 20 years. You can also answer performance questions during the interview
Core Web Vitals
Core-web-vitals is a set of domain metrics that measure important aspects of the actual user experience on the web. Core Web Vitals includes metrics and target thresholds for each metric to help developers know qualitatively whether their site experience is “good,” “needs improvement,” or “poor.” This article will explain the methods commonly used to select thresholds for Core Web Vitals metrics and how to select thresholds for each particular Core Web Vitals metric.
Core Web Vitals metrics and thresholds
Call it site key performance indicators and their thresholds
In 2020, the site’s key performance indicators are three: Maximum Content drawing (LCP), First input Delay (FID), and Cumulative layout offset (CLS).
Each metric measures the user experience in a different way:
LCP(Largest Contentful Paint)
LCP measures perceived load speed and marks the point in the page load timeline when the page’s main content has likely loaded
I feel a little like FCP
The LCP measures perceived loading speed and marks points in the page loading timeline when the main content of the page is likely to be loaded;
FID(First Input Delay)
FID measures responsiveness and quantifies the experience users feel when trying to first interact with the page
I feel a bit like TTI
FID measures responsiveness and quantifies the user’s experience when they first try to interact with the page;
CLS(Cumulative Layout Shift)
CLS measures visual stability and quantifies the amount of unexpected layout shift of visible page content
Visual stability
CLS measures visual stability and quantifies unexpected layout offsets of visible page content.
Each Core Web Vitals metric has associated thresholds that classify performance as “good”, “needs improvement”, or” poor” :
Percentile, commonly used indicators in performance monitoring, such as top quarterback, 85 point, 95 point, etc.
In addition, to categorize the overall performance of a page or site, we use 75-quartile values for all page views of that page or site.
In other words, a site would be classified as having “good” performance if at least 75% of page views reached the “good” threshold. Conversely, if at least 25% of page views meet the “poor” threshold, the site is classified as having “poor” performance.
Thus, for example, an LCP 75 quartile of 2 seconds is classified as “good”, while an LCP 75 quartile of 5 seconds is classified as “poor”.
Criteria for the Core Web Vitals Metric Thresholds
When establishing thresholds for Core Web Vitals metrics, we first identify the conditions that each threshold must meet. Below, I explain the criteria we use at Google to evaluate the 2020 Core Web Vitals metric threshold. Subsequent sections will describe in more detail how to use these criteria to select thresholds for each indicator in 2020.
Over the next few years, we expect to make improvements and additions to the above criteria and thresholds online to further improve our ability to measure user experience.
High-quality User Experience (Definition of quality of User Experience)
Our main goal is to optimize for the user and the quality of their experience. With this in mind, we aim to ensure that pages that meet the “good” threshold of Core Web Vitals provide a high-quality user experience.
To determine thresholds associated with a high quality user experience, we sought human perception and HCI studies. Although a single fixed threshold is sometimes used to summarize this research, we find that basic research is often expressed as a series of values.
For example, research on how long a user typically waits before losing focus is sometimes described as one second, while basic research is actually represented as ranging from a few hundred milliseconds to a few seconds.
The fact that perception thresholds vary depending on user and context is further supported by aggregated and anonymized Chrome metrics data, which shows that there is not a single amount of time users wait for a web page to display content before aborting the page load. Rather, this data shows a smooth and continuous distribution. For a more in depth look at human perception thresholds and relevant HCI research, see The Science Behind Web Vitals.
The fact that perception thresholds are usually determined by the user, and this context is well confirmed by various anonymous and aggregated Chrome metrics, is that users are willing to spend a considerable amount of time waiting for page content to be displayed before aborting page loading.
Rather, it shows a smooth and continuous distribution. For more on human perception thresholds and related HCI research, see the science behind Web Vitals.
If there are appropriate user experience survey data results available, then there must be a reasonable data set that we use to guide our threshold selection. If the results of the survey data that fit the user experience are not available, like some new metric: CLS (Cumulative Layout Shift), we will evaluate the real pages that meet the different candidate thresholds of this metric to identify the thresholds that can bring good user experience.
Feasibility of indicators
In addition, to ensure that site owners can successfully optimize their sites to meet the “good” threshold, we require that the threshold for web pages be reachable. For example, while 0 ms is the ideal “good” threshold for LCP, representing an instantaneously complete loading experience, the zero ms threshold is virtually impossible in most cases due to network and device processing delays. Therefore, 0 ms is not a reasonable LCP “good” threshold for Core Web Vitals.
When evaluating candidate core Web Vitals’ Good ‘thresholds, we verified that these thresholds were achievable based on data in the Chrome User Experience Report (CRUX). To confirm that the threshold can be reached, we require that at least 10% of the sources currently meet the “good” threshold. In addition, to ensure that optimized sites are not misclassified due to variability in field data, we verified that optimized content always meets the “good” threshold.
Instead, we determine the “poor” threshold by identifying performance levels that are currently unmet by only a few sources. Unless there is a study that can be used to define a poor threshold, the worst-performing 10-30% of sources are classified as poor by default.
The choice of percentiles
As mentioned earlier, to categorize the overall effect of a web page or site, we use the 75th percentile of all visits to that page or site. The 75th percentile was chosen based on two criteria. First, percentiles should ensure that the majority of visits to a page or site meet the target performance level. Second, the value of the selected percentage should not be unduly influenced by outliers.
These goals are somewhat at odds with each other. To meet the first goal, a higher percentage is usually a better choice. However, if the percentile is high, the likelihood that the result value will be affected by outliers also increases. If a few visits to the site happen to result in a large SAMPLE of LCP over an unstable network connection, we do not want these abnormal samples to determine the site classification. For example, if we use a high percentile (such as 95th place) to evaluate the effect of a site with 100 visits, it only takes 5 outlier samples to make the 95th percentile susceptible to outliers.
Given that these goals are somewhat contradictory, our analysis concludes that the 75th percentile strikes a reasonable balance. By using the 75th percentile, we know that the majority of visits to the site (3 out of 4) achieve the target effect or higher target level. In addition, the 75th percentile value is less likely to be affected by outliers. Returning to our example, for a site with 100 visits, a large sample of outliers would need to be reported for 25 of those visits to understand how the value of the 75th percentile would be affected by the outliers. While 25 out of 100 samples could be outliers, they are far less likely than in the 95th percentile
Largest Contentful Paint (LCP)
Quality of experience
One second is often cited as the amount of time the user will wait before starting to lose focus on the task. After looking closely at the relevant studies, we found that one second is an approximation to describe a range of values, ranging from about a few hundred milliseconds to a few seconds.
Card et al., and Miller are two common sources cited for the 1-second threshold. Card introduced Newell’s unified cognitive theory and defined an “immediate response” threshold of 1 second. Newell interprets immediate response as “having to respond to some stimulus within about one second (i.e., from about 0.3 to about 3 seconds).” This follows Newell’s discussion of “real-time constraints on cognition,” in which “interactions with the environment that give rise to cognitive consideration occur on the order of a few seconds,” ranging from about 0.5 to 2-3 seconds.
Miller is another oft-cited source for the one-second threshold, noting that “if the response delay is greater than two seconds, the human interaction with the computer can have a huge impact, sometimes three seconds.”
Miller and Card’s study put the time range for users to wait before losing focus at about 0.3 to 3 seconds, suggesting that our LCP “good” threshold should be within this range. Also, assuming that the existing ‘First Contentful Paint’ (FCP) has a ‘good’ threshold of 1 second and that the maximum ‘Largest Contentful Paint’ (LCP) usually occurs after ‘FCP’, We further limited the candidate LCP threshold range from 1 second to 3 seconds. To select the threshold that best meets our criteria within this range, let’s Jesse Lay look at the realizability of these candidate thresholds
Can achieve sexual
Using the data from CrUX, we can determine the percentage of sources in the network that meet the candidate LCP’s “good” threshold.
While less than 10% of the sources met the less than 1 second threshold, all the other 1.5 to 3 second thresholds met our requirement that at least 10% of the sources met the “good” threshold, making THE LCP a valid candidate.
In addition, to ensure that the selected thresholds can always be achieved for well-optimized sites, we analyzed the LCP performance of the highest-performing sites on the network to determine the thresholds that can always be achieved for these sites. Specifically, our goal is to determine the threshold at which the highest performing sites can consistently be achieved in the 75th percentile. We found that the 1.5 and 2 second thresholds could not always be reached, while the 2.5 second threshold could always be reached.
To determine the poor threshold for LCP, we use the CrUX data to determine the threshold that most sources meet:
For the 4-second threshold, approximately 26% of mobile sources and 21% of PC sources would be classified as poor. This falls within our target range of 10-30%, so we conclude that 4 seconds is an acceptable “poor” threshold.
Therefore, we conclude that 2.5 seconds is a reasonable “good” threshold for LCP and 4 seconds is a reasonable “poor” threshold.
First Input Delay
High-quality user experience
It is reasonably consistent to conclude that a visual feedback delay of about 100ms is thought to be caused by related sources, such as user input. This indicates that the “FID” “GOOD” threshold of 100ms is probably the minimum standard: if the delay in processing input exceeds 100ms, the other processing and rendering steps cannot be completed in time.
Jakob Nielsen’s oft-cited “Response Time: 3 Important Limits” defines 0.1 second as the limit that allows the user to feel that the system is responding instantly. Nelson cites Miller and Card, who cites Michotte’s 1962 causality. In Michotte’s study, participants were shown “two objects on a screen. Object A starts and moves towards object B. It stops on contact with B, which then starts and moves away from A.” Michotte changes the time between when object A stops and object B starts moving. Michotte found that for A delay of about 100ms, participants gave the impression that object A was causing object B to move. For delays of approximately 100ms to 200ms, causality is confounded; For delays exceeding 200ms, it is the motion B of the object that is no longer considered to be caused by object A.
Similarly, Miller defined the response threshold for “response to control activation” as “an action indication that is physically activated, usually by the movement of a button, switch, or other control component. Should be… This response is detected. Time delay: no more than 0.1 seconds “, and later “the delay between button pressing and visual feedback should be no more than 0.1 to 0.2 seconds”. More recently, In Towards the Temporally Perfect Virtual Button, Kaaresoja et al investigated the various delays of simultaneality between touching a Virtual Button on a touch screen and subsequent visual feedback indicating that the Button is being touched. When the delay between button pressing and visual feedback was 85ms or less, participants reported that visual feedback was present at the same time as button pressing 75% of the time. In addition, for delays of 100 milliseconds or less, participants reported consistent perceived quality of keystrokes, decreased perceived quality for delays of 100 to 150 milliseconds, and decreased to very low levels for delays of 300 milliseconds.
So let’s define 100ms as good
Cumulative Layout Shift
Quality of Experience
Cumulative layout shift (CLS) is a new metric used to measure the amount of movement of visible content on a page. Since CLS is new, we are not aware of studies that directly inform the threshold for this indicator.
Therefore, in order to determine a threshold consistent with user expectations, we evaluated real-world studies with different typographical offsets to determine the maximum acceptable offsets before causing significant damage when consuming page content. In our internal testing, we found that offsets from 0.15 or higher were consistently considered disruptive, while offsets from 0.1 or lower were significant but not overly disruptive. Thus, while zero layout offset is ideal, we conclude that values as high as 0.1 are candidate “good” CLS thresholds
Can achieve sexual
While the CrUX data suggest that 0.05 May be a reasonable CLS ‘good’ threshold, we recognize that in some use cases it is currently difficult to avoid damaging layout offsets. For example, with third-party embedded content (such as social media embedded), the height of the embedded content is sometimes not known until the load is complete, which can result in a layout offset greater than 0.05. Thus, we conclude that while many sources meet the 0.05 threshold, the CLS threshold of 0.1 is slightly less relaxed, resulting in a better balance between quality of experience and availability. It is our hope that the Web ecosystem will continue to look for solutions to address layout changes caused by third party embeddings, which will allow for a stricter CLS “good” threshold of 0.05 or 0 to be used in future Iterations of Core Web Vitals. .
In addition, to determine the poor threshold for CLS, we use the CrUX data to determine the threshold that most sources meet:
For a 0.25 threshold, about 20% of mobile sources and 18% of PC sources would be classified as “poor.” This falls within our target range of 10-30%, so we conclude that 0.25 is an acceptable “poor” threshold.
The original address
A piece of my mind
For what I am currently working on, LCP and CLS can be added as performance indicators of the current project at this stage,
LCP(Largest Contentful Paint)
CLS(Cumulative Layout Shift)
At present, the focus of the project lies in the opening speed of the web page, and the response time after FID interaction is the indicator that needs to be paid attention to in the future.
LCP
The LCP indicator 2.5 seconds is a reasonable “good” threshold, while 4 seconds is a reasonable “poor” threshold. 2.5-4S is the scope of needs improvement.
CLS
CLS index values as high as 0.1 are candidate “good”, 0.25 is the acceptable “poor” threshold, and 0.1-0.25 is the range of needs improvement
Remaining issues:
How are LCP and CLS calculated and counted? Where are we right now? How to speed up? What is our target of LCP and CLS acceleration that can be completed at the lowest cost at present?