The author took over the home page of Taobao from the end of The Double 12 in 2014. It has been almost one and a half years now, and he completed the handover of the home page related work not long ago. After two revisions and a migration from PHP to Node, I still have a feeling.

I. Introduction of relevant background

Taobao home page is the front of Taobao, carrying almost all the Amoy department of the entrance of the business, the flow is very large, the magnitude of the unit is 100 million. In recent years, the rise of wireless terminal, business focus began to wireless terminal offset (currently can not be called offset, basic to wireless), so Taobao PC terminal home page traffic has been cut, but even so, its daily PV is still quite high.

Taobao’s home page has always been a testing ground for internal platforms and technologies, and it’s always changing. The latest framework and system will find taobao home page pilot, you can imagine, if a need to promote the upgrade or optimization measures in Taobao home page has been online, and got good data and stability, other businesses have what reason not to try and change? At the same time, last year in front of Taobao’s technical architecture group, will naturally take the initiative to push some experimental content into the business.

Tao department site page including home page, other channel pages and activity pages, these pages are not by taobao front line line of code code out, so much business, this play even if the number of double is busy. In fact, most pages are built on the internal platform operation or the front end through the way of module construction, and the focus of the front end lies in the construction of the platform itself and the guarantee of module versatility and reuse rate, of course, there are some engineering things.

Using the page built on the platform, the front end only needs to consider the development of the atomic modules that make up the page, and the overall rendering is solely responsible for the unified script provided by the platform. On the home page of Taobao, the rendering model is slightly different, considering the huge number of page modules and a small amount of cross-department and cross-team communication.

Second, the overall change of taobao home page

It is mentioned in the background that taobao home page relies on the internal building platform, and its changes naturally follow the changes of the building system.

1. Taobao home page under PHP

Not long after I took over taobao home page, I encountered the annual revision while it was still running in PHP. What needs to be explained here is that all the code of Taobao home page is completely controlled by the front end, the front end will not deal with the database directly, and its data sources are divided into two parts.

The data source

One is the data filled in by the operation. In the form of front end excavation, pit space is reserved for operators to obtain filling data, such as (pseudocode) :

The above code produces a copy of the PHP template and the form pits for the INFO fields. This process is referred to as “digging”.

Operation fill in these pits will generate the corresponding data of this PHP template, and finally render it into a complete HTML fragment (real-time rendering).

This is how a submodule is constructed in older scaffolding systems. I described it very simply, but as a platform there are many things to consider, such as data order control, timing publishing, rollback mechanism, filtering mechanism, filtering mechanism, data synchronization, data update, version control, permission control, references to other systems, and so on.

The second is data provided by back-end or personalization platforms. Different businesses have different demands. Some businesses have their own back ends that require data generated by their businesses; Some businesses want users to see different content, thousands of people, expect access algorithms; Some businesses deal directly with sellers and expect to use investment data; Some businesses want to run data filtered from the data pool… In short, Taobao home page needs to docking various systems, various interfaces. Integration of dynamic data sources will be mentioned later.

And the corresponding domain names of these systems are not the same, so the JSONP format is naturally preferred. But some special systems, such as advertising, its rendering is not a simple JSONP request, it may also interfere with the entire advertising rendering process, such as loading their JS, handing over the control of rendering.

Page architecture

The source of the data and the structure of the submodules are described above, so how does the entire page come together? Module construction is divided into two types. One is visual construction. The operation or front end can drag the developed module (or the module selected from the module library) to the container to form a page:

There are many other issues that a system needs to consider, such as page layout, multi-terminal adaptation, temporary hiding of modules, position adjustment, skin selection, module replication and so on.

It can also be constructed using the following source code (pseudocode) :

Modules are introduced by module ID, and tags like LazyLoad are added to facilitate control of the rendering tempo and data entry. The difference between source code construction and module construction is that the former makes it easier to control the structure of modules and the order in which they are rendered.

Dynamic data source

Homepage to face a lot of interface and platform, dozens of business docking, interface is a big problem, because of the difference of the background system, basic have no way to unify the format of the data source, once the operation on a whim one day you want to change a more comfortable he felt with better system or data, and docking of estimation and communicating before and after a few times. So here’s the picture:

The platform has the ability of data source access, that is to say, the pit we dug can not only allow operators to fill in data, but also directly import data from various data sources. Of course, there needs to be a mapping and transformation of data fields. The interface provided by the back end looks like this:

The interface form of the front-end convention is:

The system must provide a binding strategy for this mapping:

After binding, data can be exported synchronously or asynchronously, which are capabilities provided by the platform. This solution basically solves the problem of back-end system/interface changes and reduces the cost of communication between the front and back ends.

However, it is important to note that even though the interface on the page is combed through the platform once, it also means that all requests to the page will flow through the platform first and then be distributed to the various back ends, which requires a high level of resilience.

PHP to Node

Taobao home page daily request of this magnitude, can not be more than ten or twenty Taiwan servers can resist, support it must have a service cluster.

Each CDN node has PHP rendering capability, and when the page is published, we synchronize all modules and data of the page to all CDN nodes, and the basic pattern is something like this. It looks pretty good, but after a period of operation and maintenance, many security and performance problems slowly emerge:

Performance issues. Each PHP page contains multiple sub-modules, and sub-modules may refer to other sub-modules, PHP include operation is consumption, each reference is a disk IO, a rendering node ran thousands of PHP pages similar to taobao home page, concurrent high efficiency can be expected.

PHP include operation does have consumption, but after the loading and execution process is warmed up, the bytecode is directly into the cache, and there is no frequent disk I/O situation. The poor performance of CDN PHP is mainly caused by two problems: 1. PHP version is too old, and the performance difference between 5.4 and 7 is more than several times; 2. 2. Fast-cgi mode has no advantage over Node in high concurrency scenarios.

The push mechanism is faulty. File synchronization (the sync action in the picture) is a nasty mechanism. First of all, there is no time control. A file can be synchronized to all nodes in a few seconds or more than a minute or two. The synchronization process can also fail, and the cost of health checks is quite high. When the release is relatively compact, there are many files that need to be synchronized, which can easily cause queues to pile up, aggravating the experience of poor synchronization.

Strong demand for real time. Before the file is pushed, it may go through some pre-system. The longer the release link is, the slower the online effective time will be, and it will take about five minutes to take effect. Such a delay is totally unacceptable for the demand of high real-time requirement (such as SEC kill).

Of course, there are many other problems, such as higher operational costs, higher security risks, and insufficient PHP talent pool. So the fate of the PHP render container, that is, gets killed.

The service cluster is Cache CDN. It has only static file processing capability, but no PHP/Node rendering capability, so it has high processing efficiency, good performance, and strong resistance to pressure. In addition, it can also spend money to buy services to expand the Cache cluster.

When a user accesses the Cache, Nginx goes to the Cache CDN. If the user hits the Cache, Nginx returns to the CDN. If the user does not hit the Cache, Nginx returns to the source server. The source server is a Node service with module rendering capabilities that can do a number of things:

  • The Cache response header is controlled by max-age and s-maxage to control the Cache duration on the client and in the Cache. The Cache duration can be adjusted at any time according to requirements, for example, to increase the size of the Cache.

  • Control extranet environment, and AB test status;

  • Integrate front-end related toolchains, such as detection, compression, filtering, and so on.

  • Its advantages are many, not listed here. In this mode, a layer of DISASTER recovery (Dr) is added. The source server pushes data to the backup server in the same machine room as the Cache at intervals. If the source server fails, Dr Can be automatically applied to the backup data.

    The change of mode not only makes a breakthrough in operation and maintenance, but also reduces the security risk when CDN is attacked. At the same time, it also saves various detection mechanisms required by Sync, saving more than millions of costs per year, and the advantages are quite obvious.

    3, Node, different mode

    In the PHP module above, we only say HTML and data parts, the reader should have noticed that CSS and JS static resources are not mentioned, so how does the page render?

    In the old VERSION of PHP page, we directly introduced a CSS and a JS, while Taobao adopted the iterative release of the Git version, and these static resources were directly placed in a Git repository. And that’s it:

    After each git file release, change the PHP version number and release the PHP code. Of course, some optimizations have been made, such as automatically updating the version number when releasing Git.

    The page rendering mode of the new build platform is different from that of PHP.

    A module of CSS/JS and template together, CSS/JS and the page of other modules of static resources are independent of each other, the purpose is to hope that a single module can also be complete to run, more conducive to module reuse.

    The hole of the module is also separated from the template, and the data format is defined in the form of JSON Schema:

    Build the platform and parse it into the pit in Figure 1 through this JSON Schema. The rendering of a module becomes a patchwork between index.xtpl and the excavation data.

    Modules are isolated from each other, so there is a degree of redundancy, but the benefits of module decoupling are much greater than this redundancy. In fact, we manage individual modules through a repository. Page rendering is simpler. The source Node container will combine all the index. XTPL files into a single page. XTPL. To reduce page requests, CSS and JS will also combine into a single file, as shown in http://cdn/?? Ray. CSS, mod2. CSS, mod3. CSS.

    Any module update, the page will be aware of, the next time you enter the system, it will prompt whether to upgrade the module and page.

    Third, taobao home page performance optimization

    There are so many modules on the home page that if you spit them out in one breath, the DOM number is bound to exceed 4K, resulting in a very long first screen time. According to the TMS development specification, each TMS module contains an index.js and an index. CSS, and finally, two COMBO JS and CSS are displayed. The home page does not execute all index.js at once when it is loaded, or the page will block badly at first.

    Page rendering logic

    The loading logic of the home frame is roughly as shown in the figure above:

  • Iterate over all TMS modules (including a J_Module hook);

  • Some TMS modules have no JS content, but load an index. JS, and add tB-pass class for the module to skip the JS execution of the module;

  • The page is divided into two parts, the first screen is one, and the non-first screen is the second. The first screen module is added to the lazy loading monitoring;

  • When the first-screen module is loaded or the user processes page interaction (scrolling, mouse movement, etc.), the non-first-screen module will be added to the lazy loading monitoring;

  • Handle special modules that start loading a few hundred pixels before they enter the window;

  • Monitor scrolling and render modules according to the above logic;

    Some modules are not necessarily rendered, even if they are executed, because they are not of high priority. There are event listeners inside the module, such as waiting for mouseover/onload events to be rendered.

    If you have written about performance optimization before, there is no need to copy it.

  • Take a look at taobao personalized home page (http://www.barretlee.com/blog/2016/03/31/personality-in-taobao-home-page/)
  • Taobao performance optimization practice home page (http://www.barretlee.com/blog/2016/04/01/optimization-in-taobao-homepage/)

    Optimizing the performance of your code is a delicate task, and if you want to optimize the performance of a large, unoptimized page, you may face a refactoring of your code. The above article mentioned the detailed optimization inside the page, but the standardization and standardization in the development process, as well as the optimization of each link in the online access channel, have not been mentioned.

    Fourth, taobao home page stability guarantee

    In high traffic, any small problem can be magnified into a big one, so any unexpected problems encountered in the development process need to be taken seriously. However, many accidental problems could not be found in our test environment, such as problems related to region (such as a CDN node in Shanghai was down), user attribute problems (such as the user page window at the end of nickname with the character S), browser plug-in problems, operator advertising injection problems, etc.

    It is difficult to consider all the issues before going online, but there are two things that must be done: bottom-saving disaster recovery + monitoring and early warning.

    1. Bottom-saving disaster recovery mechanism

    There are two considerations for bottom-of-the-line disaster recovery:

  • Asynchronous interface request error, including interface data format error, interface request timeout, etc.

  • Synchronous rendering, source page rendering error.

    The asynchronous interface request mainly involves the background system. There are many docking systems, and the stability and pressure resistance of each system are different. There are many schemes to guarantee this aspect, and the following are the most common:

    Each data request is cached locally and provides a hard bottom for each interface. Another option is to “retry” a second request if the first request is unsuccessful. Discussion on this specific can see before writing this article: “the great flow of disaster out under plan (http://www.barretlee.com/blog/2015/09/16/backup-solution-at-big-traffic/). .

    For synchronous rendering, it only needs the page template and synchronous data. If there is an error in either of the two, the source site will report an error. In this case, the content returned by the source is an error page with a status code of 5XX. This error may not be caused by the developer, but may be caused by abnormal synchronization or disconnection of the system link. To solve this problem, I made a mirror page for taobao home page:

    Once the source station is abnormal, Nginx will go to the home page mirror with the Cache CDN room, the mirror content is taobao home PAGE HTML backup source code.

    2. Monitoring and warning mechanism

    Monitoring also has two layers:

  • Module level monitoring, interface request placement, module skylight detection, etc.

  • Monitor the page, add special marks on the page, and periodically return all CDN nodes to check whether special marks exist.

    At the module level, the monitoring content is quite large. The more detailed the monitoring points are, the higher the efficiency of problem locating will be. For example, on a slightly complex module, I will bury these monitoring points:

  • Interface request format error, request failure, request timeout, at least three buried points;

  • Hard bottom data request failure buried point;

  • No statistical burying point was completed in module 5s;

  • Module links and image blacklist match buried points.

    Some monitors will also automatically handle explicit errors, such as HTTP images appearing on HTTPS pages, which will be handled immediately.

    3. Automatic detection before going online

    This is part of taobao’s entire engineering environment, front-end automated testing. These issues are typically addressed before going live:

  • Check whether the HTML conforms to the specification

  • Check the HTTPS upgrade

  • Checking link validity

  • Check the validity of static resources

  • Detect JavaScript error

  • Check whether a pop-up box is displayed when the page is loaded

  • Check if the page calls console.*

  • Page JS memory record

    Of course, you can also add your own test cases, such as detecting interface data format and module skylight problems. Automatic detection can also set a regular regression, or more secure.

    Five, Taobao home page agile measures

    1. Health check

    There are many page modules. In order to track the changes of every little point on the page, I have made detailed statistics in every link of request and rendering, as shown in the figure below:

    Whenever the interface request fails, or the interface runs the disaster recovery logic, or the module renders for more than 5s, the console will have a yellow alert, and of course the alarm statistics have been sent to the server.

    2. Interface Hub

    The interface Hub is a management tool for data requests, as shown in the following figure:

    Rendering of many modules on the page requires more than one data source. Once the operation feedback page rendering data is abnormal, the data can be directly found through the Hub to accelerate the efficiency of Bug location. At the same time, Hub can also be used to switch the environment, switching an interface request to the daily or pre-send environment interface, it is a powerful tool for debugging.

    3. Fast track

    I put a shortcut channel before and after the page script execution. In case of emergency online problems, such as style disorder overflow, interface error resulting in skylight, etc., you can directly modify the CSS and JS of the page through the shortcut channel and go online within two minutes.

    However, this kind of channel is only suitable for emergency problem fixes, after all, it is very risky to insert JS code arbitrarily.

    Six, the summary

    It’s a bit of an anticlimactic writing (too much coding and drawing), and it hasn’t expanded in many ways. Hope above can let you have a basic understanding to taobao home page.

    Baichuan.taobao.com is the wireless open platform of Alibaba Group. Through the opening of “technology, business and big data”, baichuan.taobao.com provides high cohesion, open, industry-leading technology product matrix, mature business components and perfect service system in mobile scenes. Help mobile developers quickly build apps, accelerate the process of APP commercialization, and empower mobile developers and mobile entrepreneurs in an all-round way.