1. How to ensure system reliability

  1. Design the system in a way that minimizes error

    Consider: Minimizing the risk of external error calls by not providing a similar universal interface. But the disadvantage is the reduction of reusability, so we need to trade off in practice.

  2. Find ways to separate the most error-prone and fail-prone interfaces

    Thinking: For complex business flows, add logs of key nodes, separate normal business flows and reduce coupling degree. Convenient investigation and transformation.

  3. Fully tested

    Think: boundary testing, automated testing

  4. When errors occur, provide a quick recovery mechanism to minimize the impact of the failure

    Thinking: routine code rollback, scheduled task data comparison and processing, alarm means

  5. Monitoring (performance, error rate)

    Thinking: not only the monitoring of machine system indicators, but also the monitoring of our business class should be improved

  6. Implement and train management processes

2. Load

Usually when we say load, we mean most of the load on the machine. But the load on the system may include more than just the load on the machine.

Major dimensions of load vary from system to system, such as web server requests per second, database write rates, and concurrent users in chat rooms.

For Twitter, the distribution of followers per user is a key load parameter.

3. Performance indicators

As the load increases, we focus on performance. The Angle of concern varies from system to system.

For example, in big data systems we focus more on throughput (number of processes per second or total processing time), whereas online systems such as e-commerce tend to focus more on service response time.

4. How to accurately evaluate the performance of a system

Even if all requests are the same, the response time will be affected by network jitter, process scheduling, network packet loss, GC, disk I/O, and dirty page brushing.

Usually we use percentiles to evaluate system performance, collecting response information and ranking it from fastest to slowest

Agree acceptable response time such as 200ms, if 80% of our response sorting is within 200ms, then our current system indicator is P80.

Percentiles are commonly used to describe and define quality of Service goals (SLOs) and quality of Service agreements (SLAs)

5. How to make a maintainable system

Often we find that projects become more problematic as time, business, and people change, and the word “refactoring” is now a buzzword in companies. How do you avoid the loop of refactoring in the first place?

Three principles of software system design

  1. Operational availability

    The system design should fully consider the current operation and maintenance, monitoring, and have enough capacity to deal with possible emergencies is what we must consider.

    Keep the relevant knowledge passed on, and do not allow new staff to understand the system differently because of personnel changes

    Predictable behavior

  2. simplicity

    Simplify system complexity so that every engineer can easily understand the system

    One of our biggest goals in building a system should be to keep it simple, to abstract away complex business from reality. Good design abstractions can hide a lot of implementation details and provide a clean, understandable interface to the outside world.

  3. evolvability

    In fact, it is also the ability to cope with the change of new business is strong enough

6. Summary

A business system must fulfill a variety of expected requirements, including functional requirements (mostly product requirements) and some non-functional requirements.

1. Functional requirements

That is, what the system needs to achieve, various storage, retrieval, data processing, etc.

2. Non-functional requirements

Conventions such as security, reliability, compliance, scalability, compatibility, maintainability, etc

With some of the hidden needs listed above, I hope you can harden up the next time you face a product challenge.

References:

Data Intensive Application System Design

Dull Xiaoyu public account: