The 2019 Double 11 is coming. 1 minute 36 seconds 10 billion, 5 minutes 25 seconds over 30 billion, 12 minutes 49 seconds over 50 billion… Without Singles Day, it would have taken another 20 years for China’s Internet technology to develop to where it is today.

In the 11 years since the birth of Singles Day, there has been a recurring scene among Alipay’s technical team — every year when the target was set, everyone would be incredulous, or exclaim or groan: “Impossible! It’s too much!” But every year’s exaggerated goals miraculously come true.

One year need desperately jump to reach the fruit, the next year will become more ordinary but daily. Unconsciously, Double 11 has become a ship carrying the happiness and dreams of billions of people from the original boat when it set sail. Behind this remarkable “miracle project”, technology and technicians have been running tirelessly at the incredible “Chinese speed” for more than a decade.

The original intention of technical people is often extremely simple — since they decide to do it, they go all out and go ahead, but when they step by step, they suddenly look back and find that the miracle has been there.

It was only a few seconds before the outage

November 11, 2009, for Alipay engineer Chen Liang, was no different from any other day.

In that year, there was no Alipay building, let alone z-space. When he was sitting in huaxing Times Square during the morning rush hour, an email from CHENG Li, CTO, was sent to his computer: Today taobao Mall will hold a promotion activity, and it is estimated that the transaction volume will be large, so everyone should pay close attention to the dot system.

The main responsibility of Chen liang’s team was to ensure the stability and reliability of the whole system. In the case of a promotional event, the idea is, in layman’s terms, to keep the server “firm” and not overwhelmed by a rush of users.

Taobao Mall in August 2009 just reorganized online, the average daily trading volume for alipay at that time, to steady catch is not a problem. Even if the promotion of temporary flood peak, not afraid, expansion is good.

And that’s actually how it works. Teams of students gather in their offices, staring at computer screens and expanding as soon as they see that trading volume is approaching the system’s capacity limit.

The uptick in transactions was a bit unusual. In a quiet office filled with the sound of keyboards, someone shouted: “I made it!” Then someone followed: “I also seconds to!” There was a buzz in the office.

It turns out that some people are curious about what taobao Mall is doing to promote the volume of transactions up to this point in the past, found that there is a discount of more than 50% of the “second kill”, could not help but try.

“I can’t remember what happened at that time, but I remember everyone was very happy.” The cat said.

Happiness is Chen Liang’s first and most vivid impression of this promotion.

However, except for the students who were busy expanding their capacity all day long, most people in Alipay were unaware of the promotion at that time. “I found out afterwards that there was a promotion the day before, and my colleague said that the traffic was a bit heavy.” Mr. Li, now a researcher at Ant Financial, says the head of operations and maintenance nervously protested at a review meeting the next day: ‘What’s going on at Taobao Mall? The amount of payments has gone up so much that it’s dangerous if we don’t prepare enough in advance.”

What did Taobao Mall do? Looking back today, they just did something not that big: a single day promotion with 27 brands, GMV 50 million.

At that time, no one could predict how the promotion would grow in the future, but Alipay smelled the coming rain from the growth of the data: the transaction peak value of this activity was more than 5 times of normal days. Although it passed smoothly this time, it was close to the carrying limit of Alipay at that time.

The middle of 2010 just passed, pay treasure to follow clean out treasure mall ventilation: last year that promotion, still do this year? Taobao mall said, do.

Heroes never fight unprepared battles, how to prepare for “Double 11” has been put on the agenda of alipay’s weekly stability meeting. The first is to prepare adequate capacity. But by how much? No one has experience.

“It’s a no-brainer to pop a number in your head and then multiply that number by three to buy a machine.” Mr. Lee was blunt.

To see if that would work, he and his team ran a test to see how much traffic a machine could handle by manually changing the configuration and directing traffic from multiple machines to one. “Now think of, that is the earliest prototype of pressure measurement.”

They even have a back-up contact group. At that time, there was no nail nail, and all the work groups were on Wangwang. “What if there is something wrong with wangwang server and we can’t contact in time?”

It didn’t take long to prepare, but everything had been taken care of, “but no matter how much preparation you make, something unexpected happens every year.” Finance core technology department engineer Zhao Zunkui said. He was part of the accounting team, where every move involved money and there was no room for error.

The unexpected really came.

In the early hours of The 11th, shortly after the promotion began, alipay’s account database ran out of capacity.

Diseases come by the mountain. When the problem is discovered, the situation is critical. “Only a few more minutes!” If the operation and maintenance can not find a solution immediately, Alipay will face the risk of downtime, the transaction link is broken, no one can buy.

How to do? Operation and maintenance took it to heart and said, let’s cut the accounting system and make room for the core accounting system.

With a group of executives standing in the background, Jiang Tao, an engineer on alipay’s middleware team, felt more nervous than ever. “My hands were shaking when I was operating,” he said.

The prompt decision saved Alipay from the brink of collapse only a few seconds away. According to the data after the event, the number of users participating in 2010’s Double 11 reached 21 million and the total GMV reached 1 billion, 20 times that of the previous year. It is difficult for anyone to predict the increase in advance.

“You expected it to go up, but you never thought it would go this far.” “That was when we began to feel that something worse was to come,” Mr. Cho said.

The power of code

Xiao Han, born in 1985, and Zheng Yangfei, born in 1990, both learned about double 11 when they were in college.

Xiao Han likes online shopping. In 2009, she became one of the first people to try double 11. She also met alipay engineers who participated in double 11 in a technical exchange group. Zheng yangfei often buys computer News, which says that the sales of Singles’ Day in 2010 are equal to the total retail sales of Hong Kong in one day.

“Feel good cow B, want to go in to see.”

Two young people who did not know each other at that time came up with such an idea.

Xiao Han joined Alipay in 2011. In that year, Alipay had started the mode of “building in the first half of the year and promoting in the second half of the year”. The preparatory work started in May and June.

The system acts as the first portal to alipay’s transaction link, “like a restaurant cart. In normal restaurants, one waiter can only serve one dish at a time, but the challenge of Double 11 is that one waiter can serve ten dishes at the same time, so we need a cart. But there is no cart ready to meet Alipay’s needs, so we had to build our own.”

For almost a year, Xiao han and her team worked hard on the project, and Spanner finally had her first big exam on November 11, 2012.

Who would have thought, the accident happened again.

In that year, Alipay’s great promotion monitoring system had also been online, and the flow curve could be displayed in real time in seconds. When midnight approached, everyone was staring at the screen eagerly.

— At zero o ‘clock, the flow comes in, the curve starts to grow, it’s a nice curve, and everybody starts cheering, but all of a sudden, it drops, and it starts shaking like an electrocardiogram.

There is no problem with the monitoring system, there is no error, why the flow will not come in?

A stone raises a thousand waves. He can see the shake, also real-time display in taobao’s operations room. Alipay engineer He Yan, as the only “representative” of Alipay, is there with taobao’s technical students to prepare for the war. This is an extremely test of psychological ability to bear the work, in the payment curve jitter moment, “Taobao technology students all of a sudden surrounded me in the middle. Lian asked, ‘What happened to Alipay?'” He yan recalled.

Xiao Han’s mind is blank, the only thought is, “can’t let the transaction stop.”

From 0:00 to 0:20, 10 minutes are used to locate problems and 10 minutes are used to solve problems. In the same short 20 minutes, the outside world has changed: “‘ Alipay can not pay ‘on the weibo hot search, family members, relatives, friends have called me to ask what is the situation, my mobile phone has been exploded.”

After shutting down a health monitoring module, stability was restored. Instead of being nervous, Xiao Han felt more shocked than ever: what she had done was closely related to tens of thousands of people, and every tiny omission affected an inestimable large group.

“It’s hard to realize the weight of every line of code you type until you’ve been in it.” Zheng yangfei said. He joined Alipay as an intern in 2013, and took his elder brother Gong Jie to say a sentence that impressed him deeply: “Look at those customer service mm, if you knock the code carefully, if there is less than one mistake, they will not know how many wrong calls they will receive.

Architectural revolution

After crossing the barrier of 2012, DBA has repeatedly warned that expansion has come to an end, and it can only last for a few more months. If we do not take other measures at this rate, we will certainly not last until Next Year’s Double 11.

Misfortunes never come alone. Another “magic spell” also fell one after another: the maximum number of Oracle database connections became the bottleneck of capacity expansion. What’s more, due to the repeated expansion of the machine room, hangzhou’s electricity was not enough to support. Sometimes, in order to ensure the power supply of the computer room, “in summer, the office will be cut off, and we have to bring ice to cool down.” Gong Jie said with a wry smile, Hangzhou’s summer, who knows.

We must find new solutions from the perspective of the root cause. For example, we must “make revolution” and unify the structure.

Revolution is not a dinner party, but a fundamental restructuring of the structure. It is difficult: first, there is no successful experience to learn, but to grope. Second, it involves many departments, and people’s different needs inevitably lead to conflicting opinions. Third, if we want to make a revolution, we must take a longer view, not just to solve the problems of this year or next year, but at least to plan for the next three years.

Meanwhile, after talking to Taobao Mall, now known as Tmall, Alipay unsurprisingly set another jaw-dropping “impossible” target: a peak of 20,000 payments per second.

The stakes are high and everyone is cautious. “The restructuring plan has been under discussion for a long time.” As the architect of the project, Mr. Chen said, he spent a lot of time trying to convince everyone to agree on the plan.

One of the burdens falls on him, while the other falls on Jiang Tao, who started to resolve the crisis in 2010. Jiang tao worries more about stability: “It’s very complicated and technically risky to make technical structural changes while maintaining the business.”

Time is running out for them. The project of LDC architecture was launched at the end of 2012, less than a year before the Double 11 of 2013. For such a huge project, it is a matter of one word.

Chen liang originally conceived of a grand system to unify all systems at one time, but this plan was rejected by Cheng Li: “The main problem is taobao transactions, taobao first.” According to him, even if the first issue is done first, it must be online by 2013.

A bunch of impossible targets gathered together. But now that the goal is set, the only direction left is forward.

“After the project was approved, we launched almost every month.” Jiang tao said that the frequency was several times higher than that of other projects, but even so, the whole system was not deployed until half a month before The Double 11, and there were still many small mistakes. However, as more and more small problems were discovered and corrected, he finally felt that he was “confident in my heart”.

In 2013, The LDC structure of Alipay made its debut on Singles’ Day for the first time, and Alipay also sent “representatives” to the “Bright Top” of Alibaba’s Xixi Park, the general command room of Singles’ Day for the first time.

This lucky representative is Lee Jun-kyu. “I was there to be cannon fodder.” He laughed at the intense pressure he felt as soon as he walked into the bright Top. In front of hundreds of engineers in the group, the commander li Jin pointed to the big screen and called out to him: “Xiang Xiu (Lee Jun-kyu’s name)! Look at Alipay!”

This stressful task, Lee Jun-kyu even did several years, even to make experience. “The first thing is don’t panic. Whatever feedback you get, say, ‘I see. Because you can’t really do anything on the ground by yourself. Your job is really to communicate, as quickly as possible, to your buddies back there, and then trust them.”

He said this is one of the key winning secrets of Alipay’s technical team: ‘You’re never alone, and you can’t fight alone, but you always have the most reliable partner behind you.’

As for the results of the war this year, According to Jiang Tao, “we have carried them through.” The new structure has taken a first step forward.

Guan Gong, Lingyin Temple and pressure survey

Another special feature of 2013 Double 11 is that there is a hanging picture of Guan Master in alipay’s preparation room.

The painting was “invited” by Zheng, but “worship guan Gong”, as a tradition of Alipay’s technical team, had a long history before he joined the company. Source is said to want to trace back to pay treasure to establish at the beginning, when every important system updates, engineers will forward guan Gong expression pack in want flourishing group, in order to update smoothly, “do not give bug”.

A year later, guan Gong’s statue “upgraded”, a student went to Xi ‘an school recruitment saw Guan Gong’s shadow puppet art, “invited” one back to put it in the war room. Later, Cheng Li bought a wooden statue of Guan Gong, and last year, Hu Xi, deputy CTO, bought a bronze statue of Guan Gong.

In addition to worshiping Guan Gong, it is also a routine item to go to temples to burn incense. Depending on the destination, there are two major schools: lingyin Temple school and Faxi Temple School. As for which side is effective, there are different opinions, but it is observed that every year after Double 11, Cheng Li and Hu Xi will personally lead the team to the mountain to fulfill the promise, from the Alipay building all the way to tianzhu Faxi Temple, on the way back, will also pick up garbage along the way to do charity.

Technology is pure science. Do techies really believe that praying to God will prevent system failures and bugs?

“Psychologically, I think it’s useful.” Chen liang said, “It mainly expresses awe for the unpredictable. We’ve been working on technology for years, but the road of technology is full of unpredictable things.”

The unpredictability constitutes the biggest source of anxiety for engineers facing Double 11 every year.

They used their own ways to ease the pressure of the looming Double 11. Some are sports people who empty their minds by running or playing basketball; some are obsessive-compulsive people who check codes over and over again to feel at ease; and some are foodies who must go to haidilao in a group before a fight.

Cho zungyu, who has participated in the event for the past 11 years, was asked which year was the worst. Chen Liang, who is also on full attendance, said: “14 years ago, our confidence in double 11 zero was 60 percent if we had to put a number on it.”

But he quickly added: “After 2014, it’s 95 percent.”

Chen Liang’s confidence comes from the establishment of pay treasure pressure measurement system in those days. This time is no longer manual adjustment configuration test single machine, but to create a simulation environment for the system to run, to find out the system problems in advance and timely repair, so as not to be taken by surprise in the formal battlefield.

“The pressure test has changed the way we guarantee the stability of Double 11 from an uncertain thing to a certain thing.” Zheng Yangfei, known as “the little prince of pressure measurement”, said.

Although the 2014 manometry only covers the core system, it has helped a lot. In the month leading up to Singles’ Day, it exposed at least a hundred deadly problems ahead of time. “If one of them hadn’t been fixed, our 2014 Double 11 would definitely have failed.” The cat said.

1%? Or 10%?

This pressure test, not only pressure out a lot of hidden dangers, but also a big problem: The Oracle database used by Alipay “shake” in the pressure test, performance is seen to touch the ceiling.

2014 is the year of mobile Internet explosion. The exponential growth in mobile payments is bound to lead to more traffic spikes than in previous years, and Oracle is clearly unable to support it.

Buy another server? The cost is unbearable, and the machinery added to cope with the peak is not used in normal days, a total waste of resources.

Is there any other way? There is. OceanBase, a distributed database developed by Alibaba, has been dormant for two years after it was transferred from Taobao to Alipay, and is anxiously looking for a stage to show its talents.

But when I heard that it was a database developed by myself, the business was suspicious. For databases directly related to transactions and amounts, as long as there is a wrong data, the consequences will be unimaginable. Let alone such a large flow of Double 11, even on weekdays, it is quite necessary to consider whether to use this unverified product.

Try cutting 1% of traffic to OceanBase first. This was the solution reached after much debate. But Oracle’s performance in pressure tests showed a gap of not just 1%, but 10%.

OceanBase said, let’s take the 10%.

Ten percent, that doesn’t sound like much, but 10 percent of Singles Day is the peak of a weekday. If OceanBase can catch that 10% without incident, it could assume the burden of supporting Alipay’s day-to-day operations.

OceanBase has to prove it can. “We found taobao students, coordinated many resources to do a test, mainly to verify whether the amount of taobao orders and Alipay transactions are consistent.” At the time, the plan was cautious, and if OceanBase had a problem, it could always be cut back, says Wenhui, an engineer on the DBA team.

OceanBase did not miss a single data point. Cheng immediately slapped the board: 10% will be cut to you. This decision made OceanBase’s debut on Singles Day, “equivalent to Oracle and Lu Su (Cheng’s nickname) helping us.” Teacher wenhui said with a smile.

It was less than two weeks before the Double 11 of 2014. Reliability has stood the test, but OceanBase is a young database only four years old, and there are plenty of glitches, such as response times of 10 milliseconds, orders of magnitude worse than Oracle. In the last ten days, Shi Wenhui and the whole team of students managed to optimize it to less than 1 millisecond.

“After doing it for so many years, I have confidence in its capacity and performance.” “Said Shi Wenhui.

It’s an understatement, but it’s hard to say how much work he and his entire team put into a product that faced the cancellation of a team breakup project.

OceanBase was not originally created for Double 11, but on the stage of Double 11, it gained the spotlight for the first time and performed well. From then on, Alipay started the process of completely moving the core transaction system to OceanBase.

By this year, OceanBase will carry 100 percent of its internal ant traffic. In the tPC-C benchmark test, known as the “World Cup in the database field”, tPC-C broke the world record held by the American company Oracle(Oracle) for nine years and became the first Chinese database product to top the list.

I won an Apple Watch

In 2015, Li visited the Shanghai Stock Exchange, where the trading system was deployed on six mainframe computers and could peak at 100,000 transactions per second.

He was amazed: 100,000! What unattainable numbers! When pay treasure also can achieve good! Back in Hangzhou, he immediately shared his experience with his classmates, who told him that this year our goal would exceed 100,000.

Li Junkui thought, this sounds impossible target, is the style of Alipay, there is nothing wrong.

Zheng yangfei, meanwhile, is struggling with the goal. He just made a bet with his supervisor about whether he, as the person in charge of the double 11 all-link pressure test in 2015, could guarantee the payment of double 11 without any problems. The stakes are an Apple Watch.

This year marks the first time that Zheng, who was born in the 1990s, has taken the lead, transforming his role from a participant in Double 11 to the leader of a project. But this year and he and his team are “getting” a year, the first half, because of the frequent availability problems, they do the stability of the team are also frequently hit, demoralized, many students chose to leave, both inside and outside doubts also, that a few months, in the air as if written all over the word “difficult”.

“At that time, there were few people in the team, but everyone was holding his breath, thinking we must do the double 11 thing well.” “I just don’t want people to think Alipay isn’t good,” Mr. Zheng said.

If you lose, you have to wait another year to recover. Because double 11 is only once a year, it is not only an annual big exam, but also an annual stage. According to Yang Haiti, a senior technical expert at the system Department, “Everyone wants to take their efforts of a year to Double 11 to prove and show them, but they are still unhappy.”

Compared with the full-link pressure measurement in 2014, we will make drastic improvements in 2015 in several aspects: first, we will expand the core system to all systems; second, we will platform it, that is, build a platform tool for full-link pressure measurement; third, we will link it with the whole group pressure measurement.

“It was very unsettling, to be honest.” Zheng was unsure.

He had forgotten about the Apple Watch by the time double 11 reached its peak at zero. There was a scheduled database task that the pressure test didn’t verify, making the curve look less smooth. Under the principle of “stability above all else”, the whole team’s heart trembled as long as the system shook. A quick inspection showed that the system as a whole did not go wrong, but the pressure test missed some details, making the results less than perfect.

“The curves are not particularly nice.” “He said regretfully.

Zheng won the Apple Watch, but for him, it was more than just a prize. It was a reminder that even the best preparation is not foolproof.

The quest for “silky smoothness” never ends

In fact, every pay treasure engineer’s heart, there is a “perfect curve”.

Ideally, it should be like this: double 11 zero, flood peak came, curve beautiful climb, no sudden rise and fall, do not use frequent jitter to torture everyone’s fragile nerves.

If I had to boil it down to one word, it would be silky.

But with every technological evolution and architectural change to that end, “you find that at a certain scale, you start off with what might be a great idea, and then the challenges pile up.” “It is not a myth that quantitative change produces qualitative change,” Yang lamented.

The “volume” of Double 11 has already entered an unprecedented field. In 2016, it only took more than 6 hours for the transaction volume to surpass the whole day of 2014. Over the years, they are constantly refreshing their own records.

The difficulty of maintaining stability under such quantities is increased by more than an order of magnitude.

Remember the “three-year plan” for the architecture revolution that was laid out at the end of 2012? It really lasted three years. This architecture revolution was originally designed to solve the database connection number and machine room limit. In the course of three years, many other architectures, such as remote multi-live, disaster recovery, flexible capacity scheduling, etc., were not fully implemented until 2016. The evolution of each of these steps is to enable the system to have the dynamic capacity to expand and expand flexibly smoothly.

“Big” is the test, “small” is the test.

The most profound moment in my experience is not the moment when OceanBase became famous in 2014, but a small test in 2016. In this test, he found a slight anomaly in one metric — a very inconspicuous deviation of two milliseconds.

“Two milliseconds. If it were anywhere else, it would probably be judged insignificant and passed over.” But one of his team mates took the problem seriously enough to check it out — luckily. It turned out that if this problem was not solved, the Double 11 would be in big trouble that year.

“Even though resources were scarce, time was tight, and the software had all sorts of imperfections, our friends never missed a problem.” Teacher wenhui regrets.

In the early years, flow was the main challenge in achieving a perfect curve, but as the business expanded, it became clear to engineers that stability was Paramount, but technology was more about the future.

2017 was the ninth year he yan joined Alipay. This year, Alipay realized offline and online mixing, and a large number of idle resources of offline tasks could be used for online tasks, thus greatly improving the utilization rate of resources.

“In some small scenarios, the savings may not be as dramatic, but on a volume like ours, the whole future is immeasurable.” He Yan said.

With the foundation accumulated by predecessors for so many years, the road facing the future is going more and more smoothly.

In 2018 double 11, Alipay actually guaranteed two big promotion, tmall’s big promotion, and Alipay’s own “code on double 11” and other gameplay. This year, the number of failures decreased by 70-80% compared with the previous year, and achieved a stable all-day for the first time.

Battalion leader Li Zheng was very calm: “To put it bluntly, we can control all kinds of risks better through systematic or engineering processes. The spikes were all within our expectations.”

2019 marks the first year of “cloud native” for Singles’ Day.

If the technology is like stacking the building blocks layer by layer, that cloud is the bottom of the foundation, laid this layer of foundation, the upper application is like standing on the shoulders of giants, born with a series of powerful capabilities. The business needs to stop worrying so much about technology and just focus on the business code.

Light up the world

At this point, I do not know whether you still remember the students who exclaim “impossible” because of the peak target of 20,000 pens per second.

In those days, 20,000 pens per second was the peak that they could only reach after half a year of struggle with all their efforts. Last year, 20,000 pens per second has become alipay’s daily situation, casually, in a second.

It was a real change, but no one thought much about it at the time. Almost every engineer said: “Every year after double 11, the next year’s goal will come out, and then we will make corresponding preparations and efforts for the next year’s goal.”

One goal, one goal after another, in the process of conquering one “impossible” after another, once thought unreachable heights, are left behind. The once-untouchable 100,000 transactions per second is now a breeze.

Only when looking back can I feel, one day I suddenly found that I had walked out so far and climbed so high.

And the young people who cried impossible but desperately tried to make the impossible a reality have grown up, they now have more calm, no one knows where the ceiling is, more likely, no ceiling.

The growth of traffic data is not the only technical guarantee for Double 11. More complex businesses and gameplay grow out of the technology, which in turn fuels the technology. Out of double 11, they can also walk into a lot of scenes: New Year red envelopes, five blessings collection cards…

— or get out of Alipay and Alibaba.

Those miracles created by the engineers of Alipay are becoming products and serving more financial institutions. Up to now, dozens of banks and financial institutions have used OceanBase, and pressure-measuring platform, cloud original and other technologies have also been commercialized. With the technology and experience accumulated on Double 11, Alipay is driving the whole Internet financial technology in China to fly together.

— or the world.

Singles’ Day has long been not just a Chinese event, but a global carnival. With Double 11, technology is also going global.

“But if it comes to our ideal, that is one year in the future, the whole preparation room is empty, except for the statue of Guan Gong, there is no need to have any students left behind, the intelligent system can solve all the problems, and we only need to hold a cup or drink wine, watching the silky smooth curve.”

Gong Jie is looking forward to this future, the scene has students laugh. “No way! ?” Every Year, double 11 is like fighting a war, and helping soldiers is like fighting a fire.

But who says impossible? After all, they are the kind of people who have already made too much of the impossible possible.

Financial Class Distributed Architecture (Antfin_SOFA)