Rendered a million Web Pages to Learn how the Web breaks Rendered Us unsuitable, rendered us unsuitable, rendered us unsuitable, rendered us unsuitable, rendered us unsuitable, rendered us unsuitable, rendered us unsuitable, rendered us unsuitable and rendered us unsuitable. In order to broaden the field of vision, thinking. This melon will not go word for word translation, Dan for a smooth expression. In the meantime will also add their own views, or quote other. Anyway, that’s the thing. I hope you like it

Why render a million pages?

In short, there is an argument being made that the web is somehow slower than it was 15 years ago. The reason is that the growing prosperity of JS frameworks, web fonts, and the growth of polyfills have done us no more or no more harm than faster calculators, faster network communications, and better network protocols.

So the argument goes.

So the team of authors wanted to find out if this was true, and try to identify the common causes of slow and crashing websites in 2020.

How does this work?

Using Puppeteer, the team wrote a Web browser (Chrome) script to launch 200 EC2 instances and have it run over the weekend to render root pages for the top 1 million domain names.

By a few statistical measures, this is crazy enough.

In this plan, they track all the errors caught by window.onError.

Normally, we track customer feedback for errors, but this time, we’re tracking the entire network! This time it will be convincing enough: how exactly do web pages crash in practice?

Most common mistakes

Analysis of the data shows that most problems can be classified. This, in turn, can guide developers to the future of Web technology: fixing this small problem can reduce the number of errors reported on the Web by a factor of ten.

The TOP10 bad guys are:

  1. Reference error
  2. Type error
  3. Grammar mistakes
  4. Error
  5. Invoke exception
  6. OneSignal
  7. Beyond the scope of
  8. Integration Error
  9. Chunk download error
  10. Eval error

As Tolstoy put it: The web addresses at work are all different, but the way they are destroyed is the same. As you can see, the distribution of these network errors conforms to Ziff’s law. In this case, there are three types of errors that account for the largest proportion of all errors. That is:

Referenceerrors, typeErrors, and syntaxerrors account for 85% of all errors!

Obviously, there are many ways to cause these errors, and the specific string in the error message will tell us exactly what happened. As developers, we must encounter some of these all the time, and we are always familiar with them. (I’ve seen this mistake before.)

Of course, there are many ways to generate these types of errors. The specific string in the error message tells us more about what is actually happening. Looking at the most common error messages will give you a sense of familiarity. As a web developer, you’ve probably come across some of these before.

Let’s look at the TOP10 mistakes.

  1. Cannot find $sign (JQ classic common)
  2. Cannot find QQ_Qun (?)
  3. JQuery undefined (common)
  4. Unexpected symbol ‘<‘ (common)
  5. Invalid or unexpected symbol
  6. Unable to read undefined envelope property (common)
  7. $sign is not a function (common)
  8. The addRventListener property of NULL cannot be read
  9. Unexpected identifier
  10. Unable to read null appendChild property

These errors all point to specific error messages, and the author team continues to debug samples of these errors to gain insight into their specific error situations. As a result, something unexpected happened: It turns out that there is a common root cause for both reference errors and syntaxerrors — resource load failures — and there is an essential finding for TypeErrors that they are the same kind of problem.

In-depth research by the team of authors resulted in the following article, which describes the findings of each error:

How to resolve ReferenceError: we can obtain the usage trend of the common library’s high-frequency global variables and, based on this, build the associated variable name and the specific library to resolve the ReferenceError. That is to use custom high frequency variables to override the bulletin library to solve most of these problems.

What Causes TypeError on Live Web Sites: 97% of type errors are caused by null or undefined. Most of these are because there are no dependencies that match the third-party library or browser environment, or because the document object has an error that causes the selector to not get the value.

What Causes SyntaxError on live Web sites: During development, most syntax errors are caused by spelling errors. In practice, most syntax errors come from network failures or JS coding errors.

How to predict the number of errors?

The authors initially used logistic regression and classification to try to predict the presence of errors in the website based on the library called by JS. Based on this assumption, it means that the mere presence of some code is indicative of an error.

Further analysis shows that most errors are due to lack of code, so this approach is less predictive. However, we can list a regression coefficient that the classifier learns. This is pretty advanced. It shows how dependent these category choices are. In fact, small pieces of code with WebPack will be strongly linked to some bugs, but they are also critical dependencies for the browser.

One more conclusion: The products used to track JS errors have a lower error count.

Let’s look at the regression coefficients of these databases and see if there are any errors in the prediction.

For example, the regression coefficient of Baidu statistics is large, which means the probability of missing code is small, and the prediction has a lower number of errors. However, this melon suspicion and wall also have a relationship, because has been using Google statistics, consciously better ah ~

Network error recovery capability

Of the 1 million web pages surveyed, 12 percent had one or more unhandled errors. That is a staggering number indeed. These errors indicate that the execution of a program has been aborted due to some unexpected circumstances, resulting in some functionality being broken.

As you can see from 12%, the Web’s bug recovery is powerful: but whatever your bug is, it has to be small enough to bother fixing it.

Data display: Most errors come from missing code, data, or document objects at runtime. This is most likely due to the binding nature of the Web: the type is determined at runtime (late) rather than compile time (early). It’s true that typing at run time makes loading libraries easier and more natural, but it also opens up the possibility of errors: missing libraries or changes to the API. Of course, typing at runtime is not the only option; many languages are typed at compile time.

With the Web built by Java Applets, for example, things will be different. (What can we learn from this dinosaur?)

How do you build error-resistant networks

In a strongly typed system language that requires typing to be strictly defined, running any loaded libraries dynamically can be difficult, especially if the libraries are highly customized and the apis are open. It’s not just about code coming from the web, it’s also about the browser runtime.

If you don’t have Java runtime correctly installed, then the applet will refuse to run until you download and install the proper runtime. On the Web, you can use older browsers to view pages, but browsers and websites can crash over long iterations (as the environment changes). However, you can also write a web page that works with both current and older browsers. Along this line of thinking, runtime binding types are also critical to the evolution of the network!

In 2006, Alan Kay and the Viewpoint Research Institute launched an ambitious project to refactor a bare-bones computer to a GUI operating system in 20,000 lines of code. Although the project stalled due to funding issues, the final report describes a dynamic building-of-language (late-bound references and dynamicity) that is bound at runtime — KScript. That’s six years before TypeScript!

Alan Kay mimics an ecologically tight distributed system, without tight interlocking coupling.

As of this point, we have not reached a final conclusion. Static typing guarantees the compiler against certain type errors, which developers like to see. TypeScript is interesting because it spans dynamic and static typing, and it comes with the cost that the compiler might consider a compile-time type not a run-time type.

The Web’s runtime binding mechanism keeps us behind, and if the browser doesn’t support new features of the Web, the code breaks. For the Web, this seems better than the “all or nothing” features of the Java Applet model, where programs run only if the right runtime environment is installed. In the early 2000s, XHTML had a similar story. With XHTML, the document is required to be valid XML, and invalid tags cause the page to not be displayed at all. At the time, this was advocated by many, perhaps because invalid HTML was seen as the “mastermind” that caused browsers to render differently. Over the past decade, there has been a better idea of standardizing these invalid tags and incorporating them into HTML5. As of now, HTML5 is beating XHTML and JavaScript is beating Java Applet.

There is still a dark horse, and that is WebAssembly. Many cases revolve around compiling code to WASM in statically typed languages (such as Blazor). We’re at the beginning of an era where front-end development will no longer have a lot of JS code. However, in this way, technology is doomed to fail if it becomes an island. Learning from history, it seems necessary to find a better solution around dynamics and consider runtime bindings!

While statically typed languages provide security, dynamically typed languages are the key to making the Web less error-prone. Balance is the key in the end! The data shows that when the network is down, the reason is that the code is not running as expected, resulting in document errors, type errors, third-party libraries or data not loading, etc. Let’s assume that the type system is a way to solve programming problems by ensuring that dependencies are checked at compile time. Is there an ergonomic way to achieve this? It allows this checking to be performed in a dynamic environment, while eliminating most of the errors that plague today’s networks.

conclusion

  1. First of all for the team’s practice of praise, dare to think dare to do! In fact, the research crash is only a sub-project of the whole research, the parent project is JavaScript Performance in the Wild 2020, which also includes network connection, third-party library usage, page rendering time, number of requests, redraw times and so on.

  2. We can expect TypeScript to be the best solution to the JS type problem yet! Expect more solutions for things like references and syntax, which ESLint can do, but it’s just plug-in tools. For more imagination, perhaps the programming language itself has to be powerful enough.

  3. Webpack is so important! In today’s front-end engineering environment, where development is all about calling dependent libraries, the key is how to package it into a “nice” project to run online.

  4. For now, the further future is WebAssembly.

  5. The browser is the foundation of everything on the Web front end. One statistic: Google Chrome was launched in 2008 and had 69.89% of the market by 2020. But who remembers when Netscape had 90 per cent of the market in the mid-1990s? Maybe the die is not yet cast! As the author says, we are on the cusp of history. Where will the browser go? Let’s see as we walk

  6. Coding with less coupling and more cohesion is more than just the front end. It’s calculator science!

  7. You see here? How about a like? I am carmelo Anthony of nuggets, the person is not hard to talk also more……