This post is from the author: Ren Fake’s great post on GitChat, “Read the post” to see what issues were discussed with the author

“The universe says no to us, but we respond with flesh and blood, saying ‘Yes!'”

Ray Bradbury, –

Founded in 1994, Amazon has always been a controversial company. For years, Jeff Bezos, under the philosophy of Customer Obsession, focused the entire e-commerce business on the Customer experience through price, selection and availability, And continue to invest revenue in keeping prices low, expanding scale, and optimizing infrastructure such as warehousing and logistics for long-term advantage.

Amazon’s continued deliberate lack of profitability has not only created a delicate and strained relationship with Wall Street investors (it has not paid shareholders a dividend in years, and never will!). And its model has been repeatedly questioned as a Ponzi scheme.

At the same time, Bezos’s questioning of social cohesion and unconventional management style have greatly shaped Amazon’s conflicted gladiator work culture, which is crystallized into 14 leadership principles that guide day-to-day decisions across the company, from top to bottom.

While there has been criticism of the culture, many employees who have left Amazon have expressed love for the way they work.

The technology research and development work under the guidance of this culture also shows unusual characteristics: amazon has always been in the forefront of the industry in logistics management, recommendation engine, Web2.0, cloud computing, artificial intelligence, data management, platform design, DevOps and other fields, and there is no lack of popular and popular products.

But look from outside, its technology in the open source community and public speech, but there is little voice, unlike external silence, in the amazon, research and development is another story, one after another ideas have been put forward and try, each team for the common goal efficiently competition and cooperation, the accumulation of knowledge and experience is constantly and sharing…

From joining Amazon in 2009 to leaving at the end of 2014, I came into contact with technical research and development and r&d management of multiple teams.

After leaving Amazon to join startups, some of the questions the r&d team encountered often brought me back to thinking: “What makes Amazon R&D so different?” “, “How can we learn from Amazon’s experience to build an effective R&D team?” … .

This paper is just my thoughts on AMAZON r&d and management in recent years, combined with some experience in concrete implementation.

Due to the complexity of amazon’s RESEARCH and development and management system, and the majority of conclusions are based on some speculations in practice, there are some improper and wrong points, I hope to correct them in the process of reading.

Finally, Amazon’s gladiator culture, which encourages competition directly, is highly controversial and does not prioritize employee feelings.

But I can also feel the satisfaction of constantly innovating and surpassing myself under pressure.

Therefore, the discussion about culture is not a discussion of right and wrong, readers need to judge and adopt.

Amazon Flywheel

To be the world’s most customer-focused company, where people can find and discover everything they want to buy online.

— Amazon.com’s mission statement

Bezos explained amazon’s operation logic through a figure (See Figure 1), which gradually evolved into Flywheel, the core of amazon’s e-commerce operation. The flywheel is explained simply as follows:

Amazon provides rich selection and shopping convenience, which brings better user experience. The improvement of user experience leads to more consumers coming to the website, and the increase of traffic will attract more suppliers to join in, thus further enriching the selection.

This, in turn, enables Amazon to further reduce prices through supplier competition and cost sharing on a broader basis, which in turn leads to further improvement in consumer satisfaction. This virtuous cycle continues to happen, promoting the overall platform of Amazon to become better and better.



Figure 1: Amazon Flywheel

However, In The Amazon Way, The Virtuous Cycle Goes Fractal, performed by John Rossman, The Flywheel Effect discusses The Amazon Flywheel and its associated Holy Trinity: price, selection and availability in more detail. 【 note 3 】

The logic behind the Amazon Flywheel is this: Once the core is in place and the flywheel is activated, the flywheel will generate sustained power and get stronger over time.

Throughout the cycle, each part generates its own energy, which is used to drive other positive energy in the system.

In Amazon’s RESEARCH and development system, the benign cycle mechanism like flywheel plays a wide and huge influence, ranging from the recruitment and training system to the design of post-mortem analysis mechanism, and even the idea of Amazon’s platformization. 【 note 4 】

In addition, the various parts of the leadership formula that will be introduced later often interact with each other to form a flywheel mechanism.

As John Rossman said in The Amazon Way, “Once you establish Ownership in an organization, it will drive innovation and simplification like a flywheel”.

Leadership principles

“It would certainly be much easier and socially cohesive to just compromise and not debate, But that may lead to the wrong decision.”

Tony Galbato, Amazon vice president for human resources 

Have Backbone; Disagree and Commit. Leaders are obligated to respectfully challenge decisions when they disagree, even when doing so is uncomfortable or exhausting. 

Leaders have conviction and are tenacious. They do not compromise for the sake of social cohesion.

 Once a decision is determined, they commit wholly. They are willing to champion an unpopular or difficult message.

When we analyze the corporate culture of a company, the personality of its leader should be one of the important factors to be considered.

This is especially true for startups, where the way founders behave almost completely influences and defines the company’s culture.

Therefore, it is understandable that a boss who likes to run capital has a hard time focusing on the long-term development of users and products, and that the rapid decline of his company is not surprising.

Similarly, a man who made his fortune speculatingly would naturally use the Internet as an even greater pyramid scheme platform to peddle his cunning shortcut philosophy without a decent product or well-thought-out methodology of his own. [6]

Richard L. Brandt gives a detailed account of Jeff Bezos’ upbringing and describes the amazon founder’s unique personality traits in His book, One Click.

From these materials and analysis articles published on the Internet, it is not difficult to find Bezos’s characteristics of high standards, frugality, stubbornness, indifference, in-depth details, trust in numbers, result-oriented and so on.

As a well-known example, One Click and other articles tell the Bezos story of persuading his grandmother to quit smoking as a child. Interestingly, the same story is used in three different places on three completely different topics! 【 note 7 】

First, Click to Order uses this story to illustrate Bezos’ lack of empathy:

Bezos has no natural empathy. When he was 10, on a trip with his grandparents, he tried to get his grandmother to quit smoking.

He relied more on his nerdy approach to an awkward subject than on his understanding. He calculated that the amount of nicotine she inhaled would shave nine years off her life.

The grandmother cried. My grandfather had to teach him to be more compassionate.

“My grandfather looked at me, was silent for a moment, and then quietly said, ‘Jeff, one day you’ll understand that it’s harder to be kind than smart,'” Bezos said.

Second, the New York Times article uses this example to illustrate Bezos’s data-driven management genius:

Jeff Bezos turned to data-driven management very early.

He wanted his grandmother to stop smoking, -Active in a 2010 graduation speech at Princeton. He didn’t beg or appeal to sentiment.

He just did the math, learning that every puff cost her a few minutes. “You’ve taken nine years off your life!” he told her. She burst into tears.

Finally, Bezos himself, in his Plyston commencement address, used this example to show that choice matters more than talent:

… My grandfather looked at me, and after a bit of silence, he gently and Gently and calmly said, One day you’ll understand that it’s harder to be kind than clever.”

What I want to talk to you about today is the difference between gifts and choices. 

Cleverness is a gift, kindness is a choice. Gifts are easy — they’re given after all.

Choices can be hard. You can seduce yourself with your gifts if you’re not careful, and if you do, it’ll probably be to the detriment of your choices.

Back to business, When Amazon was founded, Bezos was thinking about how to keep the company from becoming more bureaucratic, spendthrift and luxurious over time.

He wanted to turn his ideas about work into simple instructions that newcomers could understand, generic enough to apply to all businesses, and strict enough to prevent the mediocrity he feared. The result was the 14 leadership principles we discussed here (Figure 2).

A brief explanation of leadership principles can be found online at Amazon. It is not hard to see that some of these leadership principles reflect Bezos’ personal characteristics (Frugality, Dive Deep, Have Backbone, etc.).

Others are reflections on how Amazon works (Customer Obsession, Ownership, etc.).



Figure 2 Amazon leadership criteria

Unlike the bombastic, vague values of other companies, Amazon’s leadership mantra has never been pretty text buried in an employee handbook.

These credos are used company-wide — from top to bottom — to guide daily work, performance reviews, hiring, and even to resolve arguments in meetings, cross-team collaboration (Figure 3), and so on.

Amazon leadership principles are like the blood of the giant beast of Amazon. They unify the cognition of Amazon people on the way and standard of doing things, and provide a set of standard language for efficient communication. 【 note 10 】



Figure 3. Use leadership principles in daily emails to communicate, feedback problems and give suggestions for solutions.

In addition, through the use of leadership criteria in personnel recruitment and performance evaluation, Amazon solidifies these abilities as the standard Competence that Amazons should possess.

For example, Amazon’s performance evaluation uses the Organization and Leadship Review (OLR) mechanism, where L stands for Leadership Principle. The evaluation is divided into three parts: employee self-evaluation, manager evaluation and 360 degree mutual evaluation.

These evaluations need to be based on specific leadership principles and specific suggestions for improvement.

In order to make these guidelines clear and more practical, Amazon has published a guide for each employee, which specifically explains how to use them and the situations of overuse and underuse (See Figure 4).

Moreover, as the basis for guiding decisions in Amazon’s daily work, it always keeps close interaction with the company’s strategy and people.

That is, these leadership principles are regularly reviewed for applicability — Vocally Self Critical, for example, has recently been replaced by Learn and Be Curious.



Figure 4. Specific operational guidance of leadership principles

So how can we learn from Amazon’s leadership formula?

We must understand that what Amazon presents are the results (status quo) of its development, and these results are path-dependent by the problems it encounters in its development process.

Therefore, it is definitely not feasible to mechanically copy Amazon’s leadership criteria. We need to first know the problems we want to solve and the goals we want to achieve, and then make choices based on this. When necessary, we need to develop our own leadership criteria.

Much of the open discussion around Amazon’s culture is about customer-orientated and taking a long-term view, which can be seen as thinking Big.

The web is full of information on this topic — I won’t go into it here without discussing both aspects of amazon. But that doesn’t mean these two points aren’t important, or that they’re easy to achieve.

In fact, for start-ups, these two points are undoubtedly representative of easier to know than to do. In my experience, it is necessary to emphasize these two points in the team, because focusing on customers can help establish a customer-oriented service awareness (culture) in the enterprise, which will not only help team members to establish empathy and win-win consciousness, but also lubricates the cooperation between teams.

A vision view can help the team develop a sense of thinking about long-term solutions to root problems while using short-term solutions to immediate problems.

As the Amazon Leadership guidelines can achieve, establish a standard and method of conduct for the entire company and select and develop people according to this standard.

In order to create a culture of accountability, action, and service across the company or team, What I usually do in startups is incrementally require Customer Obsession, Ownership, Bias for action, Invent and Simplify, Dive Deep) and Deliver results.

So easy and quick to let the whole team to establish service consciousness, focusing on the result of the customer, internal business team or rely on the team should be treated as a customer), have the courage to take responsibility and not pass the way of doing things (from the sense of responsibility, we will clear request team not judgment, want to consider the goal of the whole company or product, when clearly beyond the scope, Also explain why and help reflect the problem), and allow the technical team to naturally focus on the business (requirements for in-depth details), etc.

At the same time, during the interview process, we will also focus on the responsibility of the interviewee, execution, in-depth details, and even Think Big, rather than just technical ability.

When people have these qualities, a hard-fought, self-driven team is no longer out of reach.

In addition, it is important to note that this standard needs to be implemented consistently at all levels. You can’t hold the team accountable while taking no responsibility yourself. In addition, when the team encounters problems, you need to educate them and establish discipline to punish them if necessary.

For example, one of our system is out of question, r&d personnel before coming off work on Friday sent a mail to different development team, and then feel at ease to go home, after when asked why system problems didn’t get timely treatment, the researchers said already sent a mail to our each other team, don’t think the responsibility should rest with their (already handed over to the other team).

We analyzed the problem in the later meeting and criticized the team members of both sides. We made it clear that the person responsible for any problem should be responsible for the whole process of the problem. We should not think that we have nothing to do after sending an email.

Later, when similar problems occur and the email does not receive quick feedback, the r&d staff will actively call the other team members or leaders, and can quickly report the progress of the matter.

Bezos wants every Amazon person to be a leader or owner through leadership principles. He wants you to drive Amazon’s business as if it were your own business, not just come to work and socialize.

This is a problem many business owners and managers face, and Amazon does a good job of it, but the gladiator culture isn’t for everyone.

For enterprises, can we learn from Amazon to establish a more moderate competitive environment?

For job seekers, whether you want an environment that pushes you out of your comfort zone and pushes you to grow or a comfortable job depends on your choice.

The right person

Leaders of leapfrog companies do not aspire to a leadership model that “spreads the net widely and develops it later”. Instead, take the line: “We’re going to go through a lot of rigorous selection up front. Once you find the right people, you try to keep them close to you. If it doesn’t, we’ll be honest about it so we can get on with our jobs and they can get on with their lives.”

Good to Great

The audience, composition and influence of culture are inseparable from people. It can be said that culture is decided by people and also determines people.

An excellent enterprise like Amazon or Google must have people who can adapt to its culture to run efficiently. Especially, Amazon’s gladiator culture of conflict must constantly absorb people who can adapt to its competitiveness (in foreign countries, Amazon has a very high turnover rate in two years).

As we said in the introduction of amazon’s leadership guidelines, amazon’s leadership guidelines are used to recruit and develop people. How is this done?

On the one hand, there are two parts in the leadership criteria related to personnel selection and training:

Hire and Develop the Best

Leaders constantly raise the bar for hiring and promoting employees. They recognize outstanding talent and are willing to hone it through rotations in the organization. Leaders nurture leaders, and they take their responsibility to nurture people seriously. Leaders create career development mechanisms from the perspective of employees.

They Insist on the Highest Standards.

Leaders have exacting high standards — standards that many may think are unreasonably high. Leaders constantly raise standards and motivate their teams to deliver quality products, services and processes. The leader will make sure that any problems do not spread, that they are solved thoroughly in a timely manner and that they do not recur.

On the other hand, amazon’s recruitment process will focus on the competitiveness required by the leadership code.

Let’s start with a brief description of amazon’s interview process. (Not sure if Amazon’s technical interviews are the hardest in the world, but they certainly last. [Note 12]

From the candidates’ perspective, once candidates have passed the initial screening of HR and technology Onsite interviews, they are then invited to Amazon’s Onsite offices.

Onsite interviews are five to eight Onsite Onsite interviews, which vary from half a day to one day, depending on the job Onsite.

Here, take 5 rounds of interviews as an example, the whole arrangement will be divided into two rounds of technical interviews, one of which will examine programming ability, the other will examine design ability, then the recruitment manager interview, HR interview, and finally the Bar Raiser interview.

From the perspective of the interviewer, HR will first input the resume information of the interviewer into the internal interview management system, and then determine the interviewer for the phone interview. After the phone interview is passed (it may be HR and technical round), the feedback of the interview will be input into the system for future reference.

HR then works with interviewing managers to identify Onsite interviewers, especially Bar Raiser candidates.

Onsite, there is a quick kick off meeting to introduce candidates and positions to HR.

The interview manager will then work with you to assign leadership principles and other aspects of the interview to specific interviews.

For example, the first round of technical interviews will focus on candidates’ Ownership and Dive Deep abilities in addition to programming abilities. During the interview, each interviewer will constantly probe and challenge the candidate by asking questions, so as to have an objective and comprehensive understanding of the candidate.

After the interview, each interviewer should give detailed feedback in the system, which includes:

  • A clear vote, in or out

  • Brief summary

  • Strengths and weaknesses of the candidates

  • Questions, answers and analysis

Finally, Bar Raiser leads a debrief meeting, at the beginning of which the interviewers take some time to read all the feedback before putting together and discussing the candidates.

In the end Bar Raiser and the hiring manager will decide whether to offer the candidate based on the analysis.

Here, we see two features of Amazon’s interview design.

The first is a love affair with narrative documents, the most commonly used tool in Amazon’s internal management, whereas powerpoint is banned except for technical sharing.

This is because powerpoint contains only a small amount of information, and the audience can only grasp those key points, a mechanism that is speaker friendly but difficult for the audience.

In contrast, when writing documents, complete sentences and paragraphs force the author to think deeply and express themselves more clearly, and these documents can share more information without additional explanation.

In the case of candidate feedback, since the entire interview process is usually programmed on paper, it can be quite a burden (usually 30 to 60 minutes) for the interviewer to input the feedback with procedures and q&A details into the system.

But it is this process that allows the interviewer to review the candidate again during the entry process and evaluate the candidate as objectively as possible.

On the other hand, such feedback exercises the interviewer’s analytical skills and enables him to quickly reflect on the interview, thus enabling him to better complete the interview in the future.

Second, the setting of Bar Raiser. Bar Raiser is a group of in-house trained special interviewers who participate as a third party in the interview process throughout the company.

They also represent the long-term interests of the company in hiring, which are largely a counterbalance to the short-sighted desire of hiring managers to fill positions quickly. In an interview, Bar Raiser had to agree to give a candidate an offer.

How does Bar Raiser represent the long-term interests of the company’s recruitment? In the process of Judging Bar Raiser, they need to consider the following two questions:

  • Does the candidate exceed the capabilities of 50% of amazon’s current positions?

  • Can the candidate make a long-term impact at Amazon?

As mentioned earlier in the flywheel, Bar Raiser is an important part of the flywheel that drives the continuous optimization of people’s capabilities.

With Bar Raiser, Amazon hopes to raise the level of staff throughout the company by constantly raising the Bar for recruitment. The whole interview cycle is shown below.



Figure 5. Increase staffing levels by Hiring Bar

There’s no doubt that the whole interview process at Amazon is messy. But Hiring The Best is The name of The game. This process is safe and efficient.

If we unthinkingly applied this process directly to startups, we would have a problem!

When I first joined a startup with a whole set of approaches to improve the quality of hiring through narrative feedback and feedback-based debrief, after a few attempts, HR and the people interviewing started complaining.

After a little analysis, it is not difficult to find that the appeal of most start-up companies is to quickly find people who can work, as for the optimization of the interview system, the improvement of personnel level or the long-term potential of personnel are secondary.

In this case, we can make some adjustments to amazon’s recruitment process and form a recruitment system suitable for ourselves:

  • Identify competencies required within the company (accountability, execution, ability to dig into details, etc.).

  • Clear the technical requirements of the position through JD.

  • Arrange 3~4 rounds of interviews, including 1~2 rounds of technical interviews, 1 round of hiring manager interviews, and 1 round of HR interviews. Communicate with the interviewer before the interview what you need to look for, especially the key competency areas.

  • A quick discussion between all interviewers after the interview. During the discussion, each person mentioned the strengths and weaknesses of the candidate, and added some details of their observations as appropriate. Finally, everyone voted to stay or leave.

  • Bar Raiser’s way of thinking is a good choice for some important roles where the whole team needs to be considered.

As part of the process, there will be regular training for potential interviewers within the company. The training will focus on how to evaluate and analyze the abilities and potential of candidates, as GitChat has previously shared in the process of becoming an effective interviewer.

Another common problem is when people are in doubt about the outcome of an interview, and are bold enough to dismiss the candidate.

From the previous discussion, we know that the candidates who make it through the interview are more likely to fit amazon’s leadership criteria than the current 50% of the candidates in the role.

And it has the potential to have an impact on Amazon in the future. So, in terms of specific technologies, what kind of requirements does Amazon have for technical personnel? How do these requirements affect the efficiency of Amazon’s r&d?

SDE: Someone Do Everything

SDE stands for Software Development Engineer. They are the builders of Amazon system and the backbone of technology research and Development.

[Note 14] When we discuss amazon’s requirements for technical personnel and research and development efficiency, we mainly focus on the personnel in this position.

Before we look specifically at amazon’s requirements for technical staff, let’s first analyze the problems faced by technical staff at Amazon.

  1. Amazon’s Retail site, this part is called Retail

  2. All the support systems involved in e-commerce business, this part is called OpsTech

  3. Internal support tool development (should be under Engineering Excellence)

  4. Hardware-related systems and their business support systems

  5. Cloud computing services and systems

  6. Strategic pre-research work for the institute, such as the mysterious Lab126

Whether it was traditional e-commerce r&d teams like #1~3, or later, more technical product development work, it was unprecedented exploration at that time (and even now).

As a result, its technical teams know little more than the problems they face — sometimes only vaguely, with few ready-made references.

It’s no surprise, then, that we see this description on JD, a technical post:

The candidate can independently and creatively solve challenging (vague) problems.

Therefore, Amazon first needs its r&d staff to be able to solve problems quickly and independently.

In the 2016 shareholder letter, Bezos talked about high-velocity Decision Making:

Day 2 companies make high-quality decisions, but they make high-quality decisions slowly. 

To keep the energy and dynamism of Day 1, you have to somehow make high-quality, high-velocity decisions.

 Easy for start-ups and very challenging for large organizations. 

The senior team at Amazon is determined to keep our decision-making velocity high. Speed matters in business — plus a high-velocity decision making environment is more fun too.

We don’t know all the answers, but here are some thoughts.

On the other hand, fast decision-making means fast business, and fast business requires fast r&d support! In a business-driven environment, developers must be able to work creatively and quickly with the business team to solve problems with limited information, which is also a sense of responsibility.

Of course, Amazon’s leadership mantra, when carefully examined, is to achieve a low-cost, high-quality, fast movement.

Taking it a step further, when we have some ideas, we need the r&d team to work together to develop some support systems so that the whole business can work.

After the operation of the business, we will have a new understanding of the operation of the business through the system, and then we will have new ideas, research and development to understand and carry out corresponding development, and improve the operation of the business. And then the cycle continues…

This cycle leads us to our first r&d fact: software development is essentially a learning process, especially a quick learning of the business in its area.

The more we understand the business, the simpler and more user-friendly the systems we develop.

Therefore, Amazon’s technical r&d personnel must have excellent learning ability. Considering the speed of business and technology changes, r&d personnel must have the ability to learn quickly.

Further, once we understand the business and understand the requirements, we need to turn those business requirements into a working system. This process involves a series of tasks:

  • Product design (prototyping, interaction design, etc.)

  • System design/software design

  • coded

  • test

  • The deployment of

  • operations

What is the core of these jobs? While product design and system design are core from a business perspective, coding implementation is more like some kind of translation work. Which brings us to our second r&d fact: software development is essentially design.

If we leave product design to a TPM (similar to a product manager) or a PM (business person), we can rewrite this fact for developers: software development ability is the most critical to design ability.

So, to sum up a little bit, who is Amazon’s SDE?

They meet the requirements of the amazon leadership principles are competitive, especially in the customer is supreme, the sense of responsibility, execution and in-depth details, simplify foresight, innovation, to achieve performance, in addition, they also focus on business, fast learning, can through the design, implementation and operational systems to solve business problems creatively.

That’s probably what you want for a developer, isn’t it? Anyway, that’s how we look for peers in startups!

Why is SDE known internally as Someone who does Everything? This leads to two interesting internal amazon r&d initiatives:

  • Everyone is a (potential) architect

  • Developers do everything

First, let’s discuss that everyone is an architect.

Everyone is an architect

Amazon believes that the most critical technical ability of r&d personnel is design ability, so system and software design ability are key points in job requirements and interview arrangements.

For example, SDE1 needs to understand design, SDE2 needs to be able to design independently, and SDE3 needs to understand the architecture of complex systems.

When a developer enters a company, he or she already has (or has the potential to acquire) design and architecture skills and problem solving skills, as well as the ability to dive into detail and learn quickly.

At this point, it makes little sense to have a separate architecture team, and having a separate architecture team inevitably introduces additional communication and decision-making processes.

Moreover, it is difficult for architecture teams to come up with suitable architectures when they are separated from the specific business, and these architecture teams become the shackles of efficient r&d.

The prevailing view in recent years has been to involve the architect in development, meaning that the architect is involved in coding or actually implementing the architecture.

Amazon’s approach is to start from the other side, since everyone has the architecture and design skills, let the real business developers do the architecture.

Admittedly, even for Amazon’s r&d engineers, the architecture is still very complex. In order for developers to truly take charge of the architecture, some measures must be put in place to reduce the complexity of the architecture. These initiatives involve:

  • Reduce problem size by dividing teams

  • Share architectural work through proper division of labor

  • Internal frameworks limit choice and improve efficiency

  • Encapsulate supporting architecture and operations maintenance efforts through tools or services

  • Ensure the accumulation of knowledge through institutions

Later, we’ll touch on these initiatives, which, it can be said, make specific research and development work simple and efficient. Before we do that, let’s talk about developers getting everything done.

You build it, You run it.

In an interview with ACM in 2006, Werner Voegls, CTO of Amazon, talked about his experience in the process of servitization. He gave the concept behind amazon’s r&d staff being responsible for both r&d and operation maintenance work — “You build it, You run it”. The following is an excerpt from the interview.

There is another lesson here: Giving developers operational responsibilities has greatly enhanced the quality of the services, both from a customer and a technology point of view. 

The traditional model is that you take your software to the wall that separates development and operations, and throw it over and then forget about it. Not at Amazon. 

You build it, you run it. This brings developers into contact with the day-to-day operation of their software. 

It also brings them into day-to-day contact with the customer. This customer feedback loop is essential for improving the quality of the service.

This talk gives the benefits of letting r&d do operations and maintenance:

  • By breaking down the walls before development and operations maintenance, the overall efficiency of r&d and operations maintenance is improved.

  • Face to customer and improve service quality through customer feedback. This also encourages r&d to build a customer-facing awareness.

In other parts of this interview, Werner Voegls’ answer to deciding whether to release a feature reveals that another benefit of r&d taking over operations and maintenance is the building of data awareness related to product operations, and that data-driven and data-based management is a very important practice at Amazon. The original text reads as follows:

JG Let’s postulate that somebody has come up with an idea and the team has gone off and built something go/no-go decision get made?

WV It may depend on the criteria for success that were defined up front. When the service is ready for beta testing, we will slowly introduce this to our customers, and then we measure relentlessly.

Now, let’s think about 2006. It was three years later, in 2009, that the Agile community was coy about the concept of DevOps, and at that time it was just Dev, QA, and operations working together;

Before 2006, Amazon had let developers do (most of) testing and operations.

The entire R&D operations support system (then KNOWN as ABB) was already in place around service construction and deployment, allowing developers to focus on business learning and implementation of core modules.

Support work such as system setup, deployment, and monitoring can be easily done through these tools, which is one of the measures mentioned earlier to reduce architectural complexity:

Encapsulated by tools or services supporting architecture and operation maintenance, supporting the architecture here is usually refers to the physical architecture, operational architecture, may also contain common system architecture (e.g., to support high availability and high concurrency architecture, or based on the information architecture), although the work is not directly related to business, but really can’t afford to do.

For Amazon, the business part is changing the fastest. New businesses keep emerging and the old businesses need to be optimized and adjusted constantly. Accordingly, the business team and r&d team should focus on the work related to the business.

In contrast, the problem solving of physical architecture, system architecture, and operation and maintenance architecture tends to have fixed patterns, which are easily encapsulated and provided by servitization tools, and their variability depends more on the magnitude of services than content. 18 】 【 note

When these knowledge and practices are encapsulated in tools or services, developers can simply learn to use these tools or services, regardless of the complex knowledge behind them.

Therefore, from the perspective of system development and implementation, the relevant design, coding, testing, deployment, operation and maintenance work can be undertaken by the R&D personnel. For example, most of the internal service system that OpsTech is responsible for, even the product design work is also completed by the R&D.

This is what the developers from You Build it, You Run it do, as shown below.

It is important to note that while SDE will take priority and do their best to handle all of the work involved in r&d in most cases, Amazon is not shy of creating roles within the team and handing them the work when greater expertise is required.

For example, in the Seller Centre system of a service provider, the product manager and front end team are placed across the entire team as user experience and interaction are critical to improving user efficiency.

Similarly, some businesses require developers like Data Engineers or Embedded System Architects, who need to learn the supporting systems and sometimes even the necessary development skills. In order to ensure business promotion and system operation and maintenance under the circumstances of limited resources.

Another area of misunderstanding is the positioning and use of tests and testers. There is no doubt that testing is very important work.

But for most systems or services, testing is done more automatically and programmatically, and even if there is some manual validation, it is often digested internally by the system’s developers.

In some systems or services that emphasize user experience and usability, such as mobile applications, dedicated testers are also deployed.

In addition to QA, Amazon’s SDET (Software Development Engineer in Teast) is a position for automated testing with programs.

The full-stack engineer, which has been so popular in recent years, is not mentioned internally, and who thinks it’s so great to learn more when amazon’s underlying requirements for SDE are to be as independent as possible, to be as versatile as possible, and to learn everything in order to Deliver results? After all, in this environment, full stack is just a byproduct of omnipotence!

In the current start-up company, we try to build a support tool chain system corresponding to Amazon through open source tools, as shown in the figure below, and let the r&d personnel complete all the development, operation and maintenance work.

In a research and development team of nearly 100 people, we only retained two traditional operation and maintenance personnel to take charge of the machine, network and other infrastructure maintenance, and there were no testers at all.

In an interview with Werner Voegls, the Don downplayed it by saying, “This brings developers into contact with the day-to-day operation of their software.” for Amazon’s SDE, Day-to-day operations and maintenance can be very specific and uncomfortable…

7X24 OnCall

When developers start operating and maintaining their own systems, they naturally need to keep an eye on how the system is performing.

When problems occur in the production system, the monitoring and alarm system will capture these problems and generate corresponding barriers. These barriers will automatically find the corresponding person to solve the problems according to the previously set responsible team and scheduling situation, and then notify through Pager or SMS.

No matter where you are, or what you’re doing — most likely sleeping — you need to turn on your computer immediately to catch up and deal with problems, which is why your Amazon friend is always traveling with you on your laptop. 19 】 【 note

It is not hard to see that 7×24 OnCall is a hard burden, especially when it interferes with sleep or personal life, but this is exactly what makes SDE bear OnCall interesting.

When things are painful, there are two ways to deal with them. One is to walk away and make the problem worse. One is to stay and bite the bullet and sort it out and make the world a better place. Obviously, any kind of workplace chicken soup encourages the latter behavior.

In fact, pain can spur innovation, as Amazon’s deployment toolchain and Google’s application operations system were both forced to do when developers couldn’t sleep at all.

For OnCall this kind of pain, the r&d team will also think of some ways to ease.

For example, some early teams tried to set up a support team in India to focus on OnCall and Bug resolution, but after a while the drawbacks of this approach became apparent.

First of all, the support team is highly mobile, and problem-solving jobs are not only low-paid but also less fulfilling. Naturally, the members of the support team either leave or become familiar with the system and become SDE of the system.

Second, the quality of the system has not been improved significantly, and supportive team members are not directly responsible for the business system, so they are more focused on the problems will be solved, as to the root of the hidden problems, that is about to see the corresponding supportive team member’s mood and personality, the results of some problem will repeat the cycle.

The other way is to have r&d teams in different time zones, so that everyone can handle each other’s oncalls at night, and finally sleep at night!

The OnCall system has other benefits.

First, OnCall allows newcomers to quickly familiarize themselves with the business and system. Some teams let new employees OnCall for a period of time after they join the team, during which they learn about the development process, tools, business, systems, and people involved by solving real-world problems.

In addition, OnCall is related to OE (Operation Excellence), which promotes system quality and efficient utilization of R&D resources at the r&d level.

When judging the quality of a system, what methods do we use? Obviously, the number, type, and severity of problems in online systems gives us an outside insight into the system.

A system that breaks all the time, or has serious problems from time to time, is of low quality, and worse, because the developer is responsible for operations and maintenance, if the system problems take up too much time, the developer will not have time to develop new features.

Google uses SRE and breakdown budgeting to balance r&d and operations, while Amazon uses YOY (Year of Year) OE goals to drive system progress.

Bezos requirements:

The r&d team needs a 10% reduction in the number of problems per year and a 10% reduction in support staff as the business continues to grow…

For example, if the team has 10 Sdes and 1,000 Tickets last year, the number of system Tickets should be less than 900 this year, and one SDE should be freed so that the team can handle more business with the same person, or downsize if the business does not grow significantly.

Here 10% is a baseline value for reference, and usually the team will set a relatively reasonable value based on their situation. If the value is below 10% or cannot be completed, this exception needs to be reported to senior management for approval.

As OE goals are derived from Bezos’ requirements, and also serve as the content of performance appraisal of managers at all levels, the number and trend of system problems, as well as post-mortem analysis of serious problems, will be discussed in the weekly management meeting.

Amazon’s post-hoc summary and analysis method is called Correction Of Error (COE). It addresses problems by identifying root causes and tracking identified action items to improve the overall quality of services (or systems) and advance the accountability of the responsible team.

Note that COE is not a process of finding and punishing those responsible for the problem!

COE is similar to the post hoc analysis of Google SRE in form and function, and includes the following parts:

  • A brief description of what went wrong

  • Impact of the problem on the business and customers

  • A timeline of events

  • A detailed analysis of the root cause of the problem is given by 5Whys

  • Lessons learned

  • Corrective action

  • Action item

Through the COE, the r&d team can identify the root cause of a problem and address the root cause with traceable and deliverable action items, ultimately preventing the problem from reoccurring.

And, in the process, we need to preserve and share the lessons learned from our analysis with others.

To sum up:

  • Through OnCall, developers can actually feel the impact of system problems and generate motivation to solve them.

  • Through OE, the operation and maintenance work is guaranteed within a reasonable range in terms of the system, and the continuous optimization of the system and the team is promoted.

  • COE enables developers to identify root problems and focus on landing long-term solutions.

Putting developers directly in charge of online reporting may be the best way to quickly improve business service levels and system quality.

At two startups, I tried to set up a system where developers were in charge of operations maintenance and 7×24 oncalls.

At the first company, the later head of R&D thought the r&d team should be protected — the pain of operations should be taken care of by someone dedicated to operations.

Finally, problems in the system are repeated over and over again. In the second company, we built the entire process and support tools, as shown in the picture below.

Organizational structure and 2 Pizza Team

Around 2002, Amazon made one of its best-known organizational and structural changes. After this adjustment, Amazon gradually adjusted its system architecture from the single application of directly sharing data access to the SOA structure based on services, and gradually changed its organization structure to a compound organization based on small teams. [note] 20

This kind of small Team is often called 2 Pizza Team. For the source of 2 Pizza Team, you can refer to Inside the Mind of Jeff Bezos written by Fast Company, which is excerpted below.

One of Bezos’s more memorable behind-the-scenes moments came during an off-site retreat, says Risher.

People were saying that groups needed to communicate more. Jeff got up and said, ‘No, communication is terrible! ‘” The pronouncement shock his managers.

But Bezos pursued his idea of a decentralized, disentangled company where small groups can innovate and test their visions independently of everyone else. 

He came up with the notion of the “two-pizza Team” : If you can’t feed a team with two pizzas, it’s too large. That limits a task force to five to seven people, depending on their appetites.

There are a lot of articles about 2 Pizza Team on the Internet, and the analysis of its benefits is more detailed, so we don’t need more keyboard here. As a result, small teams offer two other benefits:

First, when business systems are assigned to 2 Pizza Team by function (or specific business), there is a limit to the size and complexity of the problems they can handle.

Similarly, the size and complexity of the architectural problems that teams face is reduced — this is what we mean when we talk about anyone can do architecture by dividing teams to reduce the size of the problem.

Second, when there are fewer people, it’s harder for bureaucracy to develop, and it’s easier for the team to develop a positive, autonomous atmosphere.

Understandably, in a results-oriented gladiator culture, the team is at risk of falling apart if someone is messing around or machivating — whether you think you’re doing the right thing or not.

Here’s a graph from the Web that Outlines amazon’s transformed organizational structure.

Under this organizational structure, Amazon divides the organization vertically along lines of business, and technology is usually within the lines of business, as shown in figure 21 below.

Lines of business are often further divided by business + region, with technical teams typically responsible for global support.

Therefore, the technical team is often further divided by business or system based on business needs. The aggregation of business and technology generally occurs at the VP layer and sometimes at the Director layer.

It is also worth mentioning that the principle of the 2 Pizza team applies to any part of the organizational hierarchy. If there are too many people directly under a Director, there will be further split, but of course there will be exceptions.

This division enables the technical team to focus on solving business problems, or it is an organizational representation of business-driven technology (or business first).

Since the team size is usually 7+/-2 at this point, there is usually no particularly complex work, and design decisions about the business are digested internally by the team.

Another benefit of this division is that technical people, especially technical team leaders, are familiar with the business, so some of the senior business team leaders (directors and VP) come from the technical line!

However, a challenging result is the increase of teams. A business need may require the cooperation of different teams to play a role, so team coordination will be a problem.

Amazon uses Operation Planning to help teams strategically solve work scheduling problems, which we will discuss later. Now, how can the technical and business teams work together?

As shown in the figure above, the cooperation between the technical team and the business team is not coordinated by the upper level. The main communication between the two sides is the direct communication between the teams. That is to say, at the grassroots level, the requirements, problems and daily communication are directly reported by the business team to the head of the technical team.

In addition, there is equal level communication at all levels, and the upward reporting mechanism is mainly used to feedback problems or report business progress.

Let’s take a look at the processing method of the meeting after the requirement submission, as shown in the figure below. Once the requirements are reported, it is usually the recipient (the team) who identifies the external systems that depend on them, but this work involves a lot of communication and understanding of the high-level view, especially of the business process.

Sometimes the business person can help the technical team identify the dependent business team, sometimes the technical team can delegate this to the TPM, but more often the technical person in charge takes care of it himself.

Once the dependency is identified, the technical team will play a full Ownership and cooperate with the relevant technical team to promote the final realization of the function.

OP1 & OP2

Whether at the team level or the company level, the following questions must be answered before products and features can be developed:

  • For who do

  • Why do

  • What do you do

  • When to do it

  • How to do

From a business perspective, some work of product planning needs to point the way forward. In the project launch stage, Amazon’s R&D manager or business manager will take responsibility for some products, and they will try to answer these questions through press releases.

For more information on the press release, see Chris Vander Mey’s how Google and Amazon Make Products.

After a team split, business needs may require different teams to work together. An important part of the planning process is to identify the dependent team and incorporate your requirements into the dependent team’s planning.

The method used by Amazon is Operation Planning.

This Operation Planning will be carried out twice a year, once in early February. This is OP1.

We will come in August in the middle of the year to check OP1 and add some new requirements. This is OP2.

In the process of formulating OP, the whole Amazon team, from business to technology, will be active and put forward the things to be done by business, the innovation to be done by technology and the holes to be filled by technology, especially the things that need the cooperation of other teams, to be mentioned in the OP of the other team.

These pooled requirements are refined through r&d and business discussions, prioritized, and assessed to produce a grand plan.

This plan is then summarized according to their respective reporting lines (business and R&D have their respective OP), PK by PK, and finally summarized to my brother-in-law.

An example of OP1 is shown below, with deliberately blurred text. It is important to note that this plan includes information on priorities, effort estimates, external team dependencies, preliminary business value judgments, estimated delivery times, and so on.

The small table at the top of the schedule will automatically calculate the cumulative number of personnel needed to complete each priority project, and compare with the current number of personnel to provide a preliminary suggestion for the number of personnel to be recruited in the future.

During the actual implementation of OP1 and OP2, the business team will still have AD hoc requirements, even top-down requirements from Bezos. These requirements (except Bezos’s) are communicated by the business and technical teams, and compared with the requirements on the OP according to their business value, and planned adjustments are made.

Amazon’s planning process isn’t suitable for startups, but it’s instructive:

  • Before the product starts, it is necessary to determine the users, strategies and scope of the product. In addition to press releases, you can also refer to the methods in User Story Map or Agile Warrior, which are more suitable for the development of medium-sized Internet products.

  • We need a medium – to long-term plan for product evolution.

  • We need to do iteration planning for product implementation.

We’ve talked about simplifying architecture complexity through organizational adjustments, and reducing r&d complexity through tools and services that encapsulate supporting architecture and development operations. Now we’ll look at how architecture work is distributed among developers.

SDE ranks, responsibilities, and other roles

Amazon has an internal document detailing the soft and hard skills required at SDE1 to Senior Principal levels, as well as what needs to be done from one level to the next, so that developers can compare themselves.

In the figure below, we briefly analyze the team, shared status, responsibilities, and influence of each level. Because Amazon does not have a dedicated team of architects.

And amazon expects everyone to be able to do architecture work, so the more senior the technician, the bigger the role, the greater the responsibility.

Although the r&d personnel above SDE3 need to belong to a business unit, they are already somewhat of a shared resource and their work is arranged from within the department to outside the department.

While his focus may no longer be on implementing small-team business needs, he will still be involved, especially on key projects.

With Principals, they often do pioneering exploration and development work. For example, SWF (Simple Workflow), which is extremely useful on AWS, was created with a Principal leading SDE3.

For those above SDE3, there are additional responsibilities at the company level, such as design reviews, mentoring, and disseminating ideas.

As far as Principle is concerned, Amazon has a regular sharing called Principle Talk, where Principle is asked to Talk about new technologies and ideas in his current work.

For design reviews, extensions to existing features usually do not require design reviews, and design reviews for new features are usually done in-house.

When the business impact involved is relatively large or there are certain technical challenges, the team manager will look for high-level technical personnel within the department (usually up the organizational structure) to review the design, and occasionally find experts in certain fields across departments to review the design.

Although occasionally face brushing or higher level managers are required to intervene and coordinate, r&d personnel are usually warm-hearted — a sincere email is enough. Moreover, as Amazons, it is necessary to be shameless in the name of Ownership!

There are several other roles that often deal with SDE in their daily work: SDM (Software Development Manager), TPM (Technical Program Manager), and PM (Program Manager). Amazon has technical requirements for SDM and TPM, which means they need to know programming and design.

So it’s no surprise when you’re roaming the Amazon office and you see an SDM with a bunch of specs for AWS products, because they’re going to the cloud.

Amazon’s SDM is first People Manager, Project Manager, and sometimes even Product Manager, which is also the performance of one person with multiple capabilities.

TPM, depending on the requirements of the team, is usually a technical Product Manager — who can draw prototypes and sometimes Project managers — who can help the team identify dependent teams and even identify interface details with these teams and move the Project forward.

PMS are often referred to as business people who can prototype their own products based on requirements…

Structure, democracy or centralization?

Part of the architecture effort is the selection of different solutions. This choice can involve both business and technical aspects. The architecture of the system is more of a technical choice.

It is understandable that some problems at the technical level have common solutions, and in contrast, these solutions become fixed (or recommended) choices, as shown in the figure below, in authoritarian waters.

Experience tells us that autocracy brings advantages in efficiency but stifles innovation.

So how do you make such choices simple and efficient within a company without stifling innovation?

Company-wide, Amazon has mandated or recommended some key technology architectures.

For example, RPC must be Coral, DJS (Distributed Job Scheduler) should be used for timed tasks, SWF should be used for workflow recommendations, SQS should be recommended by messaging middleware, and so on.

Sometimes, the same problem may have multiple frames in autocratic waters, each of which may perform better for certain situations.

On the other hand, within each team, the language of the system, the framework, and even the ability to reinvent the wheel are self-determined. However, if the impact of the system is high and non-recommended technology is used, the evidence must be used to convince the participants in the design review, especially the senior SDES who will often ask why they did not use the technology already available.

In some cases, when a team’s work proves to have broader impact, it moves from democratic waters to focus waters, where it becomes a company-wide recommended solution.

For example, SWF started as a Java library and has proven to greatly simplify development in the business, becoming not only a service on AWS but also a workflow-recommendation solution within the company.

The knowledge sharing

As mentioned earlier, the software development process is a learning process. Every day, a lot of questions arise, and every day, a lot of knowledge accumulates. Making good use of this knowledge will greatly save the time of those who come after it. Amazon, for example, makes videos of historical problems for people to watch and learn from.

Amazon has always attached importance to the accumulation and sharing of knowledge. Here is a brief introduction:

  • WIKI: This is the core of the whole knowledge system. All the information related to the business, system and process is written down in a fixed format that is easy to read and share.

  • Mailing lists: Amazon’s mailing list is an open service that anyone can create, with fixed mailing lists for company-wide requests for Linux, Java, etc.

  • SAGA: An internal q&A community similar to StackOverFlow, designed to replace the q&A functionality of mailing lists.

  • Broadcase: An internal video-sharing site similar to Youtube, where you can find all Bezos’ internal talks, Principal talks, or training on certain systems.

  • Community: Amazon’s internal technical Community, which maintains internal open source projects and discussions.

  • Amazon Patterns: SPECIFICATIONS for UI design. Each product line can provide its own specifications and usage guidelines.

  • Issues: business Issues tracking tool, scrum-like project management tool to replace JIRA and part of Ticket system capabilities.

  • OminSearch: Code search tool for easy Copy/Phase of other people’s code.

  • CodeBrowser: Code browsing tool.

  • Simple Search: Internal Search tool for wikis, mailing lists, and Issues.

These tools also reveal amazon’s attitude toward code. Except for certain critical projects, Amazon does not restrict employees from reviewing and learning other teams’ code.

In some ways, Amazon takes a stricter approach to data and has a history of browsing core data and walking away.

Another point is that supporting tools are always in a state of constant evolution — as are internal supporting systems and tools, and as they scale, more efficient and easy-to-use tools are created to increase developer productivity.

Simple Search, for example, launched around 2012, can simultaneously Search three major knowledge bases within Amazon.

Finally, Amazon has an innate fascination with tools and automation, both in business and research. According to incomplete statistics, there are nearly 50 internal tools for r&d and management, most of which are self-developed and most of which are used in daily work.

Now that we’ve covered most of the areas directly related to R&D efficiency, let’s talk about some of the interesting things that happen inside Amazon that indirectly affect R&D efficiency.

Good Intention Doesn’t Work, Mechanisms Work.

Bezos shared the idea at an all-hands meeting in February 2008. Most interesting of all, Bezos tells the story of how he and amazon’s customer service team created the customer service pull cord using the Toyota TPS pull cord.

The story is detailed in former colleague Zhang Sihong’s Success at Amazon is Simple, But You Just Can’t Do It! There is a detailed description of it, and I strongly recommend that you study it, because of space, I will not repeat it here.

Other lessons from the story: Bezos’ constant focus on efficiency and new knowledge as a company executive.

As far as I know, Amazon had practiced ToC and Lean before 2008, and it was with the boss’s personal attention. This is what I often joke about — your company is inefficient because your boss is inefficient.

Another is Amazon’s problem-oriented, problem-solving culture, in which it rarely talks about concepts.

Platformization and Self-service

In The research and development of Amazon tools — both internally and externally, The characteristics of self-service and platform are The characteristics it focuses on. John Rossman has two articles about this in The appendix of “The Amazon Way”, and he suggests readers to study them. The following are excerpted Bezos’ views on self-service.

“I am emphasizing the self-service nature of these platforms because it is important for a reason I think is somewhat Non-obvious, “wrote Jeff Bezos in his 2011 Letter to Shareholders.

“Even well-meaning gatekeepers slow innovation. When a platform is self-service, Even the improbable ideas can get tried, because there’s no expert gatekeeper ready to say, ‘That will never work! “Guess what? Many of those Improbable ideas do work.”

Day 1, Day 2?

Bezos mentioned Day 1 in his 1997 shareholder letter, and Day 2 in his 2016 shareholder letter, If you are interested, you can have an in-depth understanding of 2016 Letter to Shareholders and 2016 Letter to Shareholders.

This is also explained in The Amazon Way: 14 Leadership Principles of The World’s Most Disruptive Company.

conclusion

What are the characteristics of Amazon’s r&d?

  • It had a gladiator culture that encouraged competition.

  • It uses a whole set of mechanisms to find the right people for the culture.

  • It uses tools to simplify the complexity of research and development.

  • It promotes autonomous small teams through organizational systems.

  • It encourages trial and error.

  • It lets developers do everything they can.

  • It focuses on continuous improvement.

  • Without mercy, it lets developers experience the pain of operating and maintaining systems for themselves.

  • It is problem-oriented, especially the fundamental problem.

  • It loves numbers and manages and explains everything by them.

  • It understands that great innovation often comes from developers who know the technology, so it wants to build an engineer culture that respects engineers. [note] 22

The article was rushed, and some topics were not covered, and some topics were not in-depth, such as performance evaluation OLR and Dog Fight, Engineering Excellence, tool chain and so on. I hope to have more in-depth thinking and summary in the future.

Amazon has an internal tagline: “Work Hard, Have Fun, Make History!” .

We often joke that we only need to Work Hard and forget the other two. However, after two years of leaving Amazon, I found that Amazon has changed me so much…

Finally, mark my time at Amazon with a bunch of former colleagues with this quote from Bezos in The Everything Store:

You can work overtime, you can work hard, you can work with your head, but at Amazon, you can’t choose two.

annotation

Note 1: No one doubts that Amazon is the Titan of Twenty-First Century Commerce since 2015, when it started making consecutive profits.

Note 2: Bezos and Jobs are notorious for being bad bosses – people who yell and put down their underlings, as a good photographer.

  • “Are you lazy or just incompetent?”

  • “I’m sorry, did I take my stupid pills today?”

  • “After an Engineer’s presentation:” Why are you wasting my life?” “

Note 3: Rossman explains the Growth part of the flywheel here, which comes from Bezos’s judgment that “it’s still Day One of the Internet,” The Internet’s potential for Growth is Gargantuan and Still fundamentally unexploited.

As a result, “So he’s ready to slash prices and create programs like free shipping to cultivate customer loyalty and drive sales Growth toward the unimaginable heights he foresees. Then he invests the revenues generated back into “the Holy Trinity” : Price, Selection, and availability”

Note 4: The essence of complex business systems is often simple, and only deception is sophisticated and complex. Similarly, the entire foundation of a research and development system is often simple to sort out in the end. The so-called complexity here refers to complex. For example, the supplier entry process involves the docking of more than a dozen systems, which can be said to be complicated, but its principle is not complex.

Note 5: All companies have a corporate culture, and at worst, a boss culture!

Note 6: Don’t judge the correctness of some of the Internet’s biggest claims by the number of followers, because George Bernard Shaw once said, “He’s a bonerhead, but there are dumber bonerheads cheering for him!”

Note 7: It also reminds us of the importance of independent thinking. Better to believe in books than to have no books at all!

Note 8: THERE is a similar idea that regularly fills the screen in China — character is more important than ability.

Note 9: In The Amazon Way, John Rossman established a corresponding relationship between Bezos’ personality and Amazon’s Leadership Principle.

Note 10: Mr. Liu, a fanatic, gives us a side view of the influence of language on organizations in his article “Knowing the Mother — On the Characteristic Rhetoric Invention of the Taiping Heavenly Kingdom”.

In my opinion, Bezos set a good example in this regard. Every year, he attached his 1997 shareholders’ letter to his open letter to shareholders.

Ma Yun in China is also a man of heart, you can see the video of headmaster Ma promoting China Yellow Pages today. Guys, get a pen and a camera and use them!

Note 12: To poke fun at my other former employer, it has been teased every year in the market articles since 2012, when a platform ranked it as one of the most difficult companies to interview.

Note 13: The interviewer will not be able to see other people’s feedback until he/she has completed his/her own feedback. This prevents the interviewer from influencing his/her judgment.

Note 14: I have seen that SDE is called System Development Engineer in my internal sharing. Actually, I think this positioning is more accurate.

Note 15: In the context of the rapid development of the Internet, everything needs to be fast. There are two similar ideas in China. One is that the Internet can never be broken, and the other is the “fix” that has become popular in the past two years. In fact, this fix is the combination of problem-solving ability, responsibility and execution ability.

Note 16: My second favorite fat man, my technical spiritual mentor…

Note 17: ABB stands for the following system:

  • Apollo: Service deployment system. Information about Apollo is available

  • Brazil: Package build system

  • BSF: First generation RPC framework. Steve Yegge famously praised BSF in his Google+ joke, but he had been away from Amazon for a few years and BSF had been replaced by Coral, the next generation RPC framework, so some Amazon employees used it to poke fun at him.

The entire toolchain body is now ABCP, with P representing a continuously deployed Pipeline. Information on Amazon’s delivery systems is available for reference; this article will not go into the details of how these systems work.

Note: 18 Because amazon’s business demands are more important and more frequent, the requirements of high concurrency and stability are only needed by a few systems with a large number of users, such as Retail sites. Therefore, Google’s SRE system could not have been generated from a company like Amazon. And when systems and tools like Apollo and Coral are in place, most of the problems of high concurrency and stability become simple operational problems.

In terms of resource maintenance, Amazon should have a distributed resource scheduling system similar to Google Borg. However, from the perspective of application development and maintenance, the service reflected by such scheduling after system cloud is Auto Scaling Group (ASG). As for resource switching and physical machine management, It is completely transparent to SDE!

Note 19: Amazon’s internal online reports are classified into levels 1-5. The smaller the number, the more serious the problem. Only Serv1 and Serv2 problems will be notified to Oncall personnel and require quick processing.

If you don’t respond at the appointed time, the Serv1 problem will be notified to Bezos, and the Serv2 problem will be reported to VP along the way. Congratulations, the whole world is looking for you at this time. Even if your manager finds someone to solve it, you still have to copy the follow-up and post-issue analysis. Of course, you have to apologize if necessary.

Tickets at levels 1-4 have their SLAs, and their performance data are analyzed regularly.

Note 20: In 2002, Bezos reportedly sent a letter to the entire company, sounding the trumpet for the evolution of the entire architecture:

  • Starting today, all teams will provide data and functionality in the form of service interfaces.

  • Teams must communicate with each other through interfaces.

  • No other form of interoperability is allowed: no direct linking, no direct reading of other teams’ data, no shared memory, no backdoors of any kind. The only permitted means of communication is to invoke the service over the network.

  • The specific implementation technology is not specified, HTTP, Corba, PubSub, custom protocol can be.

  • All service interfaces must be designed to be publicly available from the outset, without exception. That is, the interface was designed with the implicit assumption that it would be open to outsiders, with no room for bargaining.

  • If you don’t follow the rules above, you’re fired.

Note 21: There may appear to be exceptions, such as the team responsible for internal support tool development should be part of Engineering Excellence, but their business is actually service internal development.

Note 22: Obviously, Amazon doesn’t care about your work/life balance. It just respects your ideas and results, of course.

“Read the transcript” for the Chat transcript