• Building a Shop with sub-second Page Loads: Lessons Learned
  • Erik Witt
  • The Nuggets translation Project
  • Translator: luoyaqifei
  • Proofreader: Romeo0906, L9m

Here’s what we learned from leveraging our research on web caching and NoSQL systems to create an online mall that could accommodate hundreds of thousands of visitors attracted by a TV campaign.

TV shows such as “Shark Tank” (US), “Dragons’ Den” (UK) or “Die Hohle der Lowen” (Germany) give young startups a chance to pitch their products to business tycoons in front of a large audience. However, the main benefits are often offered strategic investment is not a jury, complete – but only a handful of fair during the television broadcast of attention: live even a few minutes also can bring hundreds of thousands of new users to the website, at the same time can improve the weeks, months or even permanent basic level of active sites. That is, if the site can catch the initial load spike and not reject user requests…

Availability is not enough — latency is key!

Online marketplaces are especially under pressure to make money because they are not just entertainment ventures (such as blogs), but often have to be turned into profits because the founders themselves have substantial investment backing. Obviously, the worst-case scenario for a commercial business is site overload, during which the server has to discard some user requests and may even crash completely. It’s not as rare as you might think: during DHDL’s season, about half of the online stores were disconnected during the live broadcast. And staying online costs only half the rent, because user satisfaction is forced to connect to conversion rates, which translates directly into generated revenue.

Source

There is a lot of research to back this up on the impact of page load times on customer satisfaction and conversion rates. Aberdeen Group, for example, found that an extra second of delay resulted in an 11% reduction in page views and a 7% loss in conversions. But you can also ask Google or Amazon and they’ll tell you the same thing.

How to speed up your website

An online mall for the startup Thinks participated in DHDL and aired on Sept. 6. We had the challenge of building an online store that could handle hundreds of thousands of visitors and had a stable load time of less than a second. Here’s what we learned along the way and from recent performance research on databases and networks.

There are three main reasons that affect page load time in existing Web application technologies, as shown below:

  1. Back-end processing: The Web server needs time to load data from the database and consolidate the web site.
  2. Network latency: Each request takes time to travel from the client to the server and back (request latency). This becomes even more important when you consider that the average site needs to make over 100 requests to load fully.
  3. Front-end processing: Front-end devices need time to render pages.

In order to make our online shop speed up, let us one by one to solve these three bottlenecks.

The front end performance

The most important factor affecting front-end performance is the Critical rendering Path (CRP), which describes the five necessary steps required to display a page to the user in the browser, as shown below.

Key rendering path steps:

  • DOM: When a browser parses HTML, it incrementally generates a tree model of HTML tags, called the Document Object Model (DOM), that describes the content of the page.
  • CSSOM: Once the browser has received all the CSS, it generates a tree model of the tags and classes contained in the CSS, called the CSS object model, with style information attached to the tree nodes. This tree describes how the page content is styled.
  • Render tree: By combining DOM and CSSOM, the browser constructs a render tree that contains the page content and the style information to be applied.
  • Layout: The layout step calculates the actual location and size of the page content on the screen.
  • Draw: The final step is to draw the actual pixels onto the screen using layout information.

The individual steps are fairly simple, and what makes things difficult and limits performance are the dependencies between these steps. DOM and CSSOM constructs typically have the greatest performance impact.

This diagram shows the steps of the key rendering path, including wait dependencies, as shown by the arrow.

Important dependencies in the relationship rendering path

Nothing is displayed to the client until the CSS is loaded and the complete CSSOM is constructed. So CSS is said to block rendering.

JavaScript (JS) is even worse because it can access and change DOM and CSSOM. This means that as soon as a script tag is found in the HTML, the DOM construct is paused and the script is requested from the server. Once the script is loaded, it cannot be executed until all CSS has been extracted and the CSSOM constructed. JS is executed after the CSSOM build, and in the following example, it can access and change the DOM as well as the CSSOM. Only then can the DOM be constructed and the page be displayed to the client. So JavaScript is said to be blocking parsed.

Examples of JavaScript accessing CSSOM and changing the DOM:


Copy the code

JS can be even worse. For example, the jQuery plug-in accesses the layout information of the computed HTML elements and then starts changing CSSOM again and again until the desired layout is achieved. As a result, the browser must repeatedly execute JS, construct the render tree, and layout before the user will see anything other than a white screen.

There are three basic concepts for optimizing CRP:

  1. Reduce key resources: Key resources are the resources (HTML, CSS, JS files) required for the initial rendering of the page. You can greatly reduce critical resources by inlining the CSS and JS required to render parts of the site that are visible when not scrolling (called the first screen). The next JS and CSS should be loaded asynchronously. Files that cannot be loaded asynchronously can be spliced together into a file.
  2. Minimize bytes: By minimizing and compressing CSS, JS, and images, you can greatly reduce the number of bytes loaded in CRP.
  3. Shorten CRP length: THE CRP length is the maximum number of consecutive round trips to the server required to obtain all critical resources. It can be shortened by reducing critical resources and minimizing their size (large files require multiple round trips to fetch). Placing CSS at the top of the HTML and JS at the bottom of the HTML further reduces its length, because JS execution always blocks fetching CSS and constructing CSSOM and DOM.

In addition, browser caching is very effective and should be used in all projects. It works well for all three optimizations because cached resources don’t have to be loaded from the server first.

The whole subject of CRP optimization is quite complex, especially inlining, cascading, and asynchronous loading, which can break the reusability of code. Fortunately, there are powerful tools out there that can do these optimizations for you, and these tools can be integrated into your build and deployment chain. You really should check out the following tools…

  • Analysis: GTmetrix measures PageSpeed, webpagetest analyzes your resources, and Google PageSpeed Insights generates tips on how to optimize CRP for your site.
  • Inlining and optimization: [Critical] ((github.com/addyosmani/…) Perfect for automatically inlining your obvious CSS and asynchronously loading the rest of your CSS, Processhtml links your resources and PostCSS to further optimize your CSS.
  • Minimization and compression: We used Tiny PNG for image compression, UglifyJs and CSsmin for minimization, and Google Closure for JS optimization.

With these tools, you can build a great front-end site with very little effort. Here’s a page speed test for the first time we visit the Store:

Thinks.com’s Google Page speed score

Interestingly, the only complaint within PageSpeed Insights was that the script cache life of Google Analytics was too short. So Google is basically complaining about itself.

First page load from Canada (GTmetrix), server hosted in Frankfurt

Network performance

Network latency is the most important factor in page load time, and it’s also the most difficult to optimize. But before we optimize, let’s take a look at the division of the initial browser request:

When we type www.thinks.com/ into the browser and press Enter, the browser starts using DNS lookups to identify the IP addresses associated with the domain, which must be done for each individual domain.

Using the received IP address, the browser initializes the TCP connection to the server. TCP handshake requires two round trips (one for TCP quick open). Using a secure SSL connection, the TLS handshake requires 2 additional round trips (1 being TLS False Start or Session Termination).

After the initial connection, the browser sends the actual request and waits for the data to come in. The time the first byte arrives depends largely on the distance between the client and the server, including the time the server takes to render the page (including session lookups, database queries, template rendering, and so on).

The final step is to download the resource (in this case, HTML) over a possible multiple round trip. New connections in particular often require many round-trips because the initial congestion window is small. This means that TCP does not start with full bandwidth, but rather increases it over time (see TCP congestion control). Download speeds are dominated by a slow-start algorithm that doubles the number of segments in the congestion window for each round trip until packet loss occurs. Packet loss on mobile and Wifi networks therefore has a significant performance impact.

Another thing to keep in mind: with HTTP/1.1, you only get 6 parallel connections (2 if the browser still follows the original standard). Therefore, you can only request a maximum of six resources in parallel.

To get a sense of how important network performance is to page speed, you can check out Httparchive, which has a lot of statistics. For example, the average site loads about 2.5MB of data in over 100 requests.

source

So websites make a lot of small requests to load a lot of resources, but network bandwidth is increasing all the time. The evolution of physical networks will save us, right? Well, not really…

High Performance Browser Networking by Ilya Grigorik

It turns out that increasing bandwidth above 5 Mbps doesn’t really affect page load times. But reducing the latency of individual requests can reduce page load times. This means doubling the bandwidth gives you the same load time, while reducing the latency by half gives you half the load time.

So if latency is the determinant of network performance, what can we do about it?

  • Persistent connections are a must. There’s nothing worse than when your server closes the connection after every request and the browser has to perform a handshake and TCP slow start over and over again.
  • Avoid redirects as much as possible, as they can greatly slow down your initial page load. Always link to full web sites (e.g., use www.thinks.com instead of Thinks.com).
  • Use HTTP/2 if you can. It comes with server push, which can transfer multiple resources for a single request. Header compression to reduce the size of requests and responses; And request pipelining and multiplexing send arbitrary parallel requests over a single connection. With server push, your server can send your HTML, followed by the CSS and JS required for your site, without waiting for the actual request.
  • Set explicit cache headers for your static resources (CSS, JS, static images such as logos). This way, you can tell the browser how long to cache these resources and when to revalidate them. Caching saves a lot of round-trips and bytes that need to be downloaded. If no explicit cache header is set, the browser will do heuristic caching, which is better than no caching, but far from optimal.

  • Use a content delivery Network (CDN) to cache images, CSS, JS, and HTML. These distributed cache networks can significantly reduce the distance from users and thus provide resources more quickly. They also speed up your initial connection because you have TCP and TLS handshakes with nearby CDN nodes, which in turn establish hot and persistent back-end connections.

  • It is recommended that you create a single-page application using a small initial page that loads other components asynchronously. This way, you can use cacheable HTML templates, load dynamic data in small requests, and update only parts of the page during navigation.

All in all, there are some do’s and don ‘ts when it comes to network performance, but the limiting factor is always the combination of roundtrips and physical network latency. The only effective way to overcome this limitation is to bring the data closer to the client. This is true of the most advanced state of the network cache, but only for static resources.

For a lookup, we followed the guidelines above, using Fastly CDN and active browser caching, and even using a new Bloom Filter algorithm for dynamic data to keep cached data consistent.

www.thinks.com reloads to show browser cache coverage

What the browser cache does not provide for repeated page loading requests (see figure above) include two asynchronous calls to the Google Analytics API and the initial HTML request from the CDN. Therefore, for repeated page loads, the page can be loaded immediately.

The back-end performance

For back-end performance, we need to consider both latency and throughput. To achieve low latency, we need to minimize the server’s processing time. To maintain high throughput and cope with load spikes, we needed to adopt a horizontally scalable architecture. We won’t go into too much detail because the space for design decisions to affect performance is huge, and these are the most important components and attributes to look for:

Scalable back-end technology stack components: load balancers, stateless application servers, distributed databases

First, you need load balancing (such as Amazon ELB or DNS load balancing) to allocate incoming requests to one of your application servers. It should also implement auto-tuning capabilities, generate additional application servers as needed, and failover capabilities to replace broken servers and reroute requests to healthy servers.

Application servers should minimize shared state to keep coordination to a minimum and use stateless session processing to enable free load balancing. In addition, the server should have efficient code and IO to minimize server processing time.

The database needs to withstand spikes in load and minimize processing time. At the same time, they need to be expressive enough to model and query data as needed. There are a number of scalable databases (NoSQL in particular), each with its own trade-off. For more information please refer to our survey and Decision guide on this topic:

NoSQL database: a survey and decision guide

Along with our colleagues at the University of Hamburg, we are: Felix Gessert, Wolfram Wingerath, Steffen… medium.baqend.com

Thinks online mall is built on Baqend and uses the following back-end stack:

Backend technology stack of Baqend: MongoDB as master database, stateless application server, HTTP cache hierarchy, REST and JS SDK of Web front end

The primary database used for lookup is MongoDB. To maintain our expiring Bloom filter (for browser caching), we use Redis because of its high write throughput. Stateless application Servers (Orestes Servers) provide interfaces for back-end functions (file hosting, data storage, real-time queries, push notifications, access control, etc.) and handle cache consistency for dynamic data. They get requests from the CDN, which also acts as a load balancer. The front end of the site uses the REST API-based JS SDK to access the back end, which automatically leverages the full HTTP cache hierarchy to speed up requests and keep the cached data up to date.

The load test

To test the Lookup shop under high load, we used two application servers on our T2.medium AWS instance in Frankfurt for load testing. MongoDB runs on two T2.large instances. Load tests were built using JMeter and run on 20 machines at IBM Soft Layer to simulate 200,000 users accessing and browsing a web site simultaneously in 15 minutes. Twenty percent of users (40,000) are configured to perform additional payment processes.

Load test setup for online shopping mall

We found some bottlenecks in the payment implementation, for example we had to switch from active update of inventory (implemented using findAndModify) to partial update operation (inc) of MongoDB. After that, however, the load handled by the server was finely tuned to an average request latency of 5 ms.

JMeter output during the load test: 6.8 million requests in 12 minutes with an average delay of 5 ms

All of the load test combinations generated about 10 million requests and transferred 460 GB of data, with a 99.8% CDN cache hit rate.

Overview of the dashboard after the load test

conclusion

In summary, a good user experience depends on three pillars: front-end, network, and back-end performance.

Front-end performance is what we think is the easiest to implement because there are already a lot of tools and some best practices that are easy to follow. But there are still many sites that don’t follow these best practices and haven’t optimized their front end at all.

Network performance is the most important factor for page load time and the most difficult to optimize. Caching and CDN are the most effective optimization methods, but even for static content considerable effort is required.

Back-end performance depends on single server performance and the ability to distribute work across machines. Horizontal extensibility is particularly difficult to achieve and must be considered from the outset. Many projects treat scalability and performance as an afterthought, only to run into big trouble as their business grows.

Literature and tool recommendations

There are many books on web performance and extensible system design: High-performance Browser Web by Ilya Grigorik contains almost everything you need to know about web and browser performance, and the current updated version is available online for free! Martin Kleppmann’s design of data-intensive applications is still in pre-release form, but is already one of the best books in its field, covering most of the basics behind scalable back-end systems and with a fair amount of detail. Design Performance, written by Lara Callender Hogan, covers many best practices around building fast, user-friendly websites.

There are also some great online guides, tutorials, and tools to consider: From beginner friendly Udacity course site performance optimizer, To Google’s developer performance Guide to optimizer tools like Google PageSpeed Insights, GTmetrix, and WebPageTest.

Latest Web performance development

Mobile page acceleration

Google is raising awareness of site performance through programs such as PageSpeed Insights, developer’s guide, and using PageSpeed as a major factor in ranking its pages.

The latest concept used to speed up web pages and enhance the user experience in Google search is Mobile Web Acceleration (AMP). The goal is for news articles, product pages and other search content to load immediately from Google search. To do this, these pages must be built as AMP.

An example AMP page

AMP does two things:

  1. Sites built for AMP use simplified versions of HTML and use JS loaders to render quickly and asynchronously load as many resources as possible.

  2. Google caches the site in the Google CDN and distributes it over HTTP/2.

The first essentially means that AMP limits your HTML, JS, and CSS in a way that builds web pages with an optimized critical rendering path that can be easily crawled by Google. AMP enforces several restrictions, such as all CSS must be inlined, all JS must be asynchronous, and all content on a page must have a static size (to prevent redrawing). While you can achieve the same results without these limitations by adhering to previous Web performance best practices, AMP can be a good trade-off for very simple sites.

The second thing means that Google grabs your site and then caches it in the Google CDN for quick distribution. The site content will be updated as the crawler re-indexes your site. The CDN also follows the static TTL set by the server, but at least performs microcaching: resources are considered up to date for at least a minute and are updated in the background when the user requests entry. AMP is therefore best suited for user cases where the content is mostly static. This applies to news websites or other publications that have been edited by human editors.

Progressive Web Apps

Another Google approach is progressive Web Applications (PWA). The idea is to use a service worker in the browser to cache static parts of the site. Therefore, these parts are loaded immediately for repeating views and can be used offline. The dynamic part is still loaded from the server side.

The App shell (single-page application logic) can be revalidated in the background. If an update to the application shell is identified, the user is prompted to update the page. For example, Gmail inbox implements this.

However, writing service worker code that caches static resources and revalidates them requires considerable effort on every site. In addition, only Chrome and Firefox fully support service workers.

Cache-sketch method for Baqend

At Baqend, we have researched and developed a way to check for stale urls in the client before actually fetching them. At the beginning of each user session, we get a very small data structure called the Bloom Filter, which is a highly compressed representation of all the outdated resource collections. By looking at the Bloom filter, the client can check whether the resource is out of date (contained in the Bloom filter) or whether it is brand new. For potentially obsolete resources, we bypass the browser cache and get the content from the CDN. In all other cases, we serve the content directly with the browser cache. Using browser caching saves network traffic and bandwidth, and it’s fast.

In addition, we ensure that the CDN (and other invalid-based caches such as Varnish) always contains the latest data, and that resources are wiped as soon as they become obsolete.

How does Baqend ensure the freshness of cached dynamic data examples

Bloom filters are probabilistic data structures with adjustable false positives, meaning that collections can be used to represent containment of objects that are never added, but never remove the actual entry. In other words, we may occasionally revalidate new resources, but we never provide out-of-date data. Note that the false positives rate is very low, which allows us to make the set very small. For example, we only need 11 kbytes to store 20,000 different updates.

Baqend has a lot of work on the server side for stream processing (query match detection), machine learning (best TTL estimation), and distributed coordination (scalable Bloom filter maintenance). If you’re interested in these details, check out this article or these slides to delve deeper.

Performance gains

It all boils down to this.

What page speed gains can be made using Baqend’s caching infrastructure?

To demonstrate the benefits of using Baqend, we built a very simple news application on each of our leading competitors in the back-end as a Service (BaaS) space and looked at page load times from different locations around the world. As shown below, Baqend continues to load for less than 1 second, which is 6.8 times faster than average. Even when all clients come from the same location as the server, Baqend is 150% times faster due to browser caching.

Comparison of average load times for simple news apps

Let’s use this comparison as a hands-on Web application to compare the BaaS competition.

A screenshot of the hands-on comparison

But of course this is a test scenario, not a Web application with real users. So let’s go back to the Online Marketplace and look at a real-world example.

Thinks online mall — All the facts

When DHDL (the German version of “Shark Tank”) aired on September 6th with 2.7 million viewers, we sat in front of our TV and our Google Analytics screen, excited for the Thinks founders to come up with their product.

From the time they started the demo, the online merchant’s number of concurrent users quickly increased to around 10,000, but the real peak occurred during the commercial break when suddenly over 45,000 concurrent users visited the store to purchase Towell+ :

Google Analytics observations begin before commercial breaks.

We get 3.4 million requests, 300,000 visitors, up to 50,000 concurrent visitors and up to 20,000 requests per second in the 30 minutes we play on TV, all achieving a 98.5% cache hit rate at the CDN level. And an average of 3% of server CPU load

As a result, the page load time is less than 1 second, achieving a remarkable 7.8% conversion rate overall.

If we look at the other malls shown in the same DHDL episode, we see that four of them crashed completely and the rest took advantage of minimal performance optimizations.

Usability overview and Mall’s Google page speed score, on DHDL, presented September 6.

conclusion

We have already seen the bottlenecks to overcome in designing fast and scalable websites: we must master critical rendering paths, understand network limitations, the importance of caching, and a back-end design with level scalability.

We’ve seen a number of tools for solving individual problems, as well as mobile accelerated Pages (AMP) and progressive Web Applications (PWA), which take a more holistic approach. However, the problem of caching dynamic data remains.

Baqend’s approach is to reduce web development and put the build primarily on the front end, using the JS SDK for back-end functionality on a fully hosted cloud service from Baqend, including data and file storage, (real-time) queries, push notifications, user management and OAuth, and access control. The platform automatically speeds up all requests and ensures availability and scalability by using a full HTTP cache hierarchy.

Our vision for Baqend is a site that requires no load time, and we want to give you the tools to get there.

Continue to visit www.baqend.com for a free trial.


Don’t want to miss our next article on network performance? Join our newsletter to easily send it to your inbox.