Bytes to beat open platform – the purse team as a whole is responsible for the byte is eight side during the Spring Festival in 2022 entries that rewards link, display and use, below is the introduction to this piece of work, first to introduce the overall business background and the technical architecture, and then illustrates the concrete implementation plan of the various difficulties, abstract summarized finally, hope to have guidance effect on subsequent activities.

1. Background & Challenges & Goals

1.1 Service Background

(1) Support 8 Terminal: The Spring Festival activities of byte series products in 2022 need to support the reward exchange of 8 terminal APP products (including Douyin/Douyin Volcano/Douyin Extreme Version/Watermelon/Toutiao/Toutiao Extreme version/Tomato Novel/Tomato Chang Listen). Users can participate in activities at either end, and the rewards can be withdrawn and used at other ends.

(2) Variable gameplay: mainly set cards, friends page red envelope rain, red envelope rain, set card lottery and fireworks, etc.

(3) Various rewards: Reward types include cash red envelope, subsidized video red envelope, commercial advertising coupon, e-commerce coupon, payment coupon, consumer financial coupon, insurance coupon, credit card coupon, Xicha coupon, movie coupon, Dou + coupon, Tiktok Cultural creation coupon, portrait pendant, etc.

1.2 Core Challenges

(1) Design and implement a scheme of large flow of eight-terminal reward entry and display exchange, with the highest estimate of 360W QPS award.

(2) A variety of reward scenes, variable gameplay; There are more than 10 kinds of rewards. Connect to multiple downstream systems.

(3) Guarantee the stability of the reward system, user experience, fund security and basic operation ability in an all-round way to ensure the smooth progress of the activity.

1.3 Final Objective

(1) Reward entry: Design and implement an eight-end reward entry system that interconnects with multiple reward downstream systems, erase the differences between different reward downstream systems, shield the details of the bottom reward entry from the upstream, and design a unified interface protocol to provide the upstream business. Provides unified error handling, idempotent entry and bonus budget control.

(2) Reward display/use: Design and implement the activity wallet page, support to display the rewards obtained by users in eight terminals, support users to view and withdraw cash (cash), use card coupons/pendants and other abilities.

(3) Basic Skills:

  • [Basic SDK] Provide basic SDK for query of red envelope balance, accumulated income, whether users have received rewards in Spring Festival activities, etc., for the business side to query and use.
  • [Budget control] Connect with the algorithm strategy of the upstream reward issuing end to realize the inventory control ability of the entry of large-flow card vouchers and prevent overissuance.
  • [Cash withdrawal Control] After the issuance of multiple rounds of rewards on New Year’s Eve, it provides users with gray scale capacity of cash withdrawal and processing capacity that has not been recorded in the account when cash withdrawal.
  • [Operation intervention] Flexible operation configuration ability of the activity page, supporting quick announcement and timely access to users. To deal with the black Swan event, support bulk card coupons and red envelope reissue capabilities.

(4) Stability guarantee: ensure the stability and improvement of wallet core path under the scenario of large traffic entry, and ensure the core experience of users’ reward direction by means of common stability guarantee measures such as resource expansion, current limiting, circuit breaker, degradation, bottom pocket, resource isolation, etc.

(5) Fund security: in the case of large flow of account entry, fund security is guaranteed through idempotent, account checking, monitoring and alarm mechanisms to ensure that the user’s assets should be fully issued and not a few issued.

(6) Activity isolation: to achieve internal test activities, gray scale activities and formal Spring Festival activities of the three stages of reward entry and display data isolation, do not affect each other.

2. Introduction of product requirements

Users can participate in byte Spring Festival activities at either end to obtain rewards. Taking douyin Hongbao Rain cash hongbao as an example, the specific business process is as follows:

Log in to Douyin → Participate in the activity → Active Wallet page → click the cash withdrawal button → enter the cash withdrawal page → Carry out the cash withdrawal result page. In addition, you can also enter the Active Wallet page from the Wallet page.

Core scene of award issuance:

  1. Collecting card: collecting card will give all kinds of card coupons when drawing cards, collecting card koi will also give large cash red envelopes, collecting card drawing will be distributed bonuses and coupons;
  2. Red envelope rain: red envelope, card coupon and video subsidy red envelope, the maximum of which is 180W QPS respectively;
  3. Fireworks fair: red envelopes, card coupons and head ornaments.

3. Middle stage design and implementation of wallet assets

In the 2022 Spring Festival activities, UG is mainly responsible for the implementation of activities, including the collection card, red envelope rain, fireworks and other specific activity-related business logic and stability guarantee. The direction of wallet positioning is to achieve the reward entry, reward display, reward use and fund security related tasks in the scenario of large traffic. The asset center is responsible for the reward distribution and reward display.

3.1 Assets during the Spring Festival The overall structure diagram of Assets in Taiwan is as follows:

The core system of wallet assets is divided as follows:

  1. Asset order layer: converges the eight-terminal reward entry link, provides a unified interface protocol to connect with upstream activity business parties such as UG, incentive medium, video red envelope, etc., reward issuance function, at the same time shielding upstream docking reward business downstream logical processing, support budget control, compensation, order number idempotent.
  2. Active wallet API layer: converges eight-end reward display links and supports heavy traffic scenarios

3.2 Asset order center design

Core provisioning model:

Description:

  1. The activity ID uniquely distinguishes one activity, and this Spring Festival has been assigned a separate parent activity ID
  2. The scenario ID corresponds to a specific reward type one by one and defines the unique configuration for the scenario to deliver rewards. The scenario ID can be configured to: send reward bill copy; Whether compensation is needed; Current limiting configuration; Whether inventory control is carried out; Whether to do reconciliation. Provides pluggable capability for optional service access.

Effect:

  1. Implement configuration isolation between different activities
  2. The configuration of each activity is a tree structure, to achieve an activity to send a variety of rewards, a reward to send a variety of rewards ID
  3. A reward ID can have multiple distribution scenarios, supporting the personalized configuration of different scenarios

Order Number design:

The asset order layer supports the award idempotence of the order number dimension. The order number design logic is ${actID}_${scene_id}_${rain_id}_${award_type}_${statge}. From the order number design level, it is guaranteed not to be overissued.

  1. Solve core difficult problems

4.1 Difficulty 1: Support eight-terminal reward data interchange

As the previous background has been introduced, there are eight product terminals participating in the 2022 Spring Festival activity, among which Douyin and Toutiaoshi apps have different account systems, so they cannot be connected with each other through user ID. The specific solution is that byte Account Center generates a unique actID for each user through the eight-terminal account system (the mobile phone number has the highest priority. If the mobile phone number logged in from different terminals is the same, the actID on different terminals is consistent). On the basis of the unique actID provided by the byte account center, the wallet side designed and implemented a general scheme to support the entry, viewing and use of eight-terminal rewards, that is, the reward data of each user is bound to actID, and the entry and query are realized through the actID dimension, so that the exchange of eight-terminal rewards can be realized.

The schematic diagram is as follows:

4.2 Difficulty 2: The realization of the reward entry in the account under high scenario

The annual Spring Festival activities, found that the gold red envelope is the most crucial part, this year is no exception. There are several reasons for this:

  1. It is estimated that the maximum flow of Golden hongbao is 180W TPS.
  2. Cash red envelopes have high value and need to ensure the safety of funds.
  3. Users are very cash sensitive, and cost is important to ensure user experience and feature integrity.

As mentioned above, it is found that the golden red envelope faces relatively big technical challenges.

Giving red envelopes is actually a kind of transaction. The money flows from the company to the personal account.

(1) From the technical scheme, the idempotent dimension of order number should be supported. Multiple requests for the same order number can only be recorded once. The order number generation logic is ${actID}_${scene_id}_${rain_id}_${award_type}_${statge}.

(2) To support high concurrency, there are the following two traditional schemes:

Specific scheme type Implementation approach advantages disadvantages
Synchronization is booked Apply for computing and storage resources with the same traffic as the estimated traffic 1. Simple development; 2. Not easy to make mistakes; Waste storage costs.Take the account database for example, according to the actual pressure test result: support 30W red envelope need 152 database instances, if support 180W red envelope, need at least 1152 database instances, not counting tCE and Redis and other computing and storage resources.
Asynchronous enter an item in an account Some computing and storage resources are applied for. The actual recording capability is different from the estimated one 1. Simple development; 2. Not easy to make mistakes; 3. Don’t waste resources; The user experience is greatly affected.There is a large delay in recording, for example, there will be a delay of more than ten minutes in this year’s activities. After users participate in the game and get rewards, they will not see rewards on the activity wallet page, nor can they withdraw cash. There will be a large number of customer complaints, which will affect the effect of Douyin activity.

Both of the above two traditional technical solutions have obvious disadvantages, so what is the solution that can save resources relatively and ensure user experience? Finally, the hongbao Rain token scheme is adopted. The specific scheme is realized by using asynchronous account entry, distributed storage with a small amount and a complex scheme. The following details will be introduced.

4.2.1 Hongbao Rain Token Scheme:

In this Spring Festival activity, under the activities of “Red envelope rain”/” Card collection lottery “/” fireworks “, there will be a scene of handing out red envelopes with huge flow. As mentioned above, the maximum QPS for handing out prizes is estimated to be 180W QPS. Based on the current account entry design, a large amount of storage and computing resources are needed to support. It is calculated that the minimum TPS supported by the actual wallet entry is 30W, so there is a process of pressing bills in the actual issuance.

Design objectives:

Ensure the core experience of users when there is a big gap between the activity estimated to be issued to users (180W) and the actual account received (30W). Users can’t perceive the process of pressing bills when viewing and using the front page, that is, the viewing and using experience can’t be affected. The data displayed includes balance, accumulated income and red envelope flow, and use includes withdrawal, etc.

Specific design scheme:

In the case of heavy traffic, each time we send a red envelope to the user, we generate an encrypted token (using asymmetric encryption, including the meta-information of sending the red envelope: the amount of the red envelope, actID, and issuing time, etc.), which is stored on the client and server respectively (Dr Mutual Recovery). Each user has a token list. Each time the red envelope is sent, the status of the token will be recorded in Redis. Then the user will see the flow of the red envelope, the balance and other data in the active wallet page, which is the result of combining the red envelope list entered into account + the token list – entered into account/entered token list. At the same time, in order to ensure that users do not perceive the red envelope pressing process in their withdrawal experience, the list of tokens that have not been entered into the account will be forcibly entered into the account when entering the withdrawal page or clicking the withdrawal button. The balance of the account at the time of withdrawal will be the total amount that should be entered into the account, and the user withdrawal process will not be blocked.

The schematic diagram is as follows:

Token data structure:

Token uses PB format. The single test proves that the storage consumption is actually twice that of USING JSON, saving the bandwidth and storage cost of the request network. Serialization and deserialization also reduce CPU consumption.

Type RedPacketToken struct {AppID int64 'protobuf: varint,1,opt json: ActID int64 'protobuf: varint,2,opt json: UserID,omitempty ` // ActID ActivityID string `protobuf: bytes,3,opt json: ActivityID, omitEmpty '// Active ID SceneID String' protobuf: bytes,4,opt json: SceneID,omitempty '// SceneID Amount int64' protobuf: varint,5,opt json: Amount,omitempty '// OutTradeNo String' protobuf: bytes,6,opt json: OpenTime int64 'protobuf: varint,7,opt json: RainID int32 'protobuf: varint,8,opt,name= RainID json: Status int64 'protobuf: varint,9,opt,name= Status json: Status,omitempty'}Copy the code

Token state machine flow:

Before the call account is truly entered into the account, it will be set to the status of processing (2), the call account is successful for the status of success (8), there is no failure to send red packets, the subsequent can be successfully retried.

Token security guarantee:

Asymmetric encryption algorithms are used to protect the client from being cracked as much as possible. The encryption algorithms are secret warehouses and restrict access by others. At the same time, in extreme cases, if the token encryption algorithm is deciphered by hackers, it can be detected by monitoring alarm and degraded.

4.2.2 Display red envelope flow on the active wallet page

Demand Background:

Activity wallet page shows the red envelope flow is cash red envelope flow, cash flow, C2C red envelope flow three data sources of the merger, in accordance with the creation of time in reverse order, need to support paging, can be downgraded, to ensure that the user experience is not perceived to find gold red envelope single process.

4.3 Difficulty three: The reward link relies on more stability guarantee

The red envelope process is degraded as follows:

According to historical experience, the more complex functions are implemented, the more dependencies will become, and the higher the corresponding stability risk will be. Then how to ensure the stability of the system with high dependencies?

Solution:

The most basic function of cash red envelope recording is to record the red envelope received by the user, and at the same time support idempotent and budget control (to avoid oversending). The idempotent design of the red envelope account strongly relies on the database to maintain transaction consistency. However, in extreme cases, the intermediate link may have problems. If the dependency is weak, it needs to be degraded, which does not affect the main provisioning process. The shortest path to send a red envelope in the wallet direction is to achieve cash red envelope entry for the dependent service instance computing resources and MySQL storage resources.

The strength of sending red packets depends on combing.

psm Rely on the service Whether strongly dependent Demotion scheme Post-downgrade impact
Assets of China tcc is Degraded read local cache There is no
bytkekv no Active degrade switch, skip byteKV, rely on downstream to do idempotent There is no
Capital transaction layer Distributed lock Redis no Passive degradation, call failure, skip directly Basic no
token Redis no Active degrade switch, no Redis call Users can sense a delay in receipt, and there will be a lot of complaints
MySQL is If the master has a problem, contact the DBA to cancel the master Sending red packets is not available during a fault

4.4 Difficulty four: budget control of issuing card coupons with large flow

Demand Background:

Fireworks will start at 7:30 p.m. on The New Year’s Eve, which is a scene of centralized coupon issuance with large flow. The wallet side and algorithm strategy are used to control the inventory of card coupon issuance to prevent overissuance.

Concrete implementation:

(1) The central station of wallet assets maintains the consumption and release amount of each card coupon template ID.

(2) Before each card coupon is issued, the algorithm strategy will read the wallet SDK to obtain the consumption and total inventory of the card coupon template ID. At the same time, a threshold value will be set. If the remaining amount of the card is less than 10%, the card will not be issued (using the bottom card or blessing words for bottom card).

(3) At the same time, the central direction of the wallet asset accumulates the consumption of each coupon template ID in the bond issuing process (using Redis INCr command atomic accumulation of consumption), and then compares it with the total active inventory. If the consumption is greater than the total inventory, it will be rejected to prevent overissuance, which is also a bottom pocket process.

Specific flow chart:

Optimization direction:

(1) When Redis counting is used under heavy traffic, hot keys may exist in a single key, which needs to be split to solve.

(2) The operation of Redis under the scenario of heavy traffic will have a timeout problem. When the upstream process is returned, the upstream will continue to retry the issuance of coupons, which will consume more and issue less inventory. The actual activity inventory of this Spring Festival activity is added by 5% on the basis of the estimated inventory to alleviate the problem of less issuance caused by timeout.

4.5 Difficulty 5: Read and write stability of hot keys in high QPS scenarios

Demand Background:

At 7:30 p.m. on New Year’s Eve, fireworks activities will start, showing all red envelopes rain and fireworks red envelopes issued in real time total amount, the maximum flow is estimated to read 180wQPS, write 30wQPS.

This is a typical scenario with heavy traffic, insensitive hotspot keys, and update delay, and non-strong data consistency (the numbers are cumulative). At the same time, DISASTER recovery (Dr) degradation is required. In the end, the difference between the actual amount displayed and the expected product distribution is less than 1%.

4.5.1 scheme a

The SDK access mode is used to reuse the resources of the host machine instance. Under high QPS reading and writing single key, it is easy to think of using Redis distributed cache to achieve, but the single key reading and writing will hit an instance, pressure test single instance of the bottleneck is 3W QPS. So one of the optimizations we made was to split multiple keys and use a local cache for the bottom.

Specific writing process:

If actID%100 is requested, the incr command will be used to increment the number.

Reading process:

Similar to the writing process, the local cache is read preferentially. If the local cache value is 0, the key values of each Redis are read and added together to return.

Question:

(1) Split 100 keys will cause the problem of read diffusion, requiring more Redis resources and high storage cost. In addition, there may be a read timeout problem. It is not guaranteed that all keys can be read successfully at a time. Therefore, the returned result may be less than that of the last time.

(2) In terms of the Dr Scheme, if you apply for backup Redis, more storage resources are required and additional storage costs are required.

4.5.2 of 2

Design idea:

Based on the implementation of scheme 1, optimization is carried out, and continuous number accumulation, cost saving, and DISASTER recovery scheme should be considered. In the write scenario, the local cache is used to merge write requests for atomic accumulation. In the read scenario, the local cache value is returned to reduce the usage of additional storage resources. With Redis for centralized storage, everyone ends up reading the same value.

Specific design scheme:

Every Docker instance will perform scheduled tasks when it starts, including reading Redis tasks and writing Redis tasks.

Reading process:

  1. The local timed task executes once every second, reads the Redis single key value, and updates the local cache value if the value obtained is greater than the local cache value.
  2. The exposed SDK simply returns the locally cached value.
  3. There is a problem that needs to be noted, each instance starts in the first second, there is no data, so it will block read, wait for data to return.

Write process:

  1. Since all reads are read from the local cache (which does not expire), you can handle concurrent writes.
  2. Local cache write variables use go’s atomic.addint64 to support atomic accumulation of local write cache values.
  3. Each time a scheduled Redis update is performed, the local write cache is copied to the amount variable, the atomicity of the local write cache is subtracted from the amount value, and the value of the amount is incr onto the Redis single key. The value of the single key that implements Redis keeps accumulating.
  4. In the Dr Scheme, backup Redis cluster is used to write data. Once the host group fails, a configuration switch is designed to support reading backup Redis. Data consistency between the two Redis clusters is achieved through a scheduled task.

The flow of calling Redis in this scheme is directly proportional to the number of instances. According to the survey, there are 20,000 instances of the main venue of the service on the reading side, and 8,000 instances of the asset center on the writing side. Therefore, the actual QPS supported by Redis is 28,000 / execution interval of scheduled tasks (unit: s). According to the pressure test, Redis single instance can support the operation of single KEY20,000 GET and 8K INCr, so the execution interval of scheduled task is set to 1s. If there are more instances, the execution interval can be extended.

The specific flow chart is as follows:

4.5.3 Comparison of schemes

advantages disadvantages
Plan a 1. Simple implementation cost 1. Waste storage resources. 2. Disaster recovery is difficult. 3. Cannot keep adding up;
Scheme 2 1. Save resources. 2. The Dr Solution is simple and saves resource costs. 1. The implementation is slightly complicated, and the problem of concurrent atomicity accumulation needs to be considered

Conclusion:

From the implementation effect, resource cost and disaster recovery and other aspects of consideration, the final choice of plan 2 online.

4.6 Difficulty six: Smooth switching between parent and child activities

Demand Background:

In order to ensure the final online effect and delivery quality of this Spring Festival activity, it was actually carried out in three stages.

(1) The first stage is the internal personnel test stage.

(2) The second stage is the external rehearsal stage, in which some external users are selected to verify the functions of Spring Festival activities (gray scale scaling), which is also the most effective means to find exposed problems and verify the corresponding solution mechanism, and the influence level is controllable.

(3) The third stage is the official Spring Festival activities.

The requirements of the product are that these three stages are separate, including the user getting the reward, showing and using the reward are isolated.

Technical challenges:

There are multiple upstream calls to the wallet to send rewards, while the wallet has multiple rewards to the downstream business, so the communication cost of changing together is high, and the probability of configuration error is relatively large. Moreover, it cannot be changed synchronously, and there will be great technical security risks.

Design idea:

As the only entry to the reward account, the wallet asset center converges the realization of the whole activity configuration switch. The hierarchical configuration of parent activity and child activity is designed. The upstream request parameter uniformly transmits the parent activity ID to represent the Spring Festival activity. The middle stage of wallet assets decides which sub-activity configuration to distribute prizes according to the request time, so as to realize the product requirements of different activities in different time periods. It reduces the communication cost, reduces the probability of configuration error, and can be switched synchronously, greatly improving the human efficiency of research and development and testing.

Schematic diagram:

4.7 Difficulty 7: Fund security in heavy traffic scenario

Wallet direction did three things during this Spring Festival activity to ensure the security of the funds distributed by large flow and large budget cash red envelopes:

  1. Cash red envelopes issued overall budget control intercept
  2. The interception of the upper limit of single cash red envelope
  3. Capital reconciliation of the scene of sending red packets with large flow
  • Hour-level reconciliation: support h+1 hour-level reconciliation for hongbao rain/collection card/fireworks hongbao, and set h+2 check for some scenes.
  • Quasi-real-time reconciliation: the red envelope data entered by hongbao Rain will be checked against the asset center of the wallet and the activity side for quasi-real-time reconciliation

Multi-dimensional checking diagram:

Quasi-real-time reconciliation flow chart:

Description:

Quasi-real-time account checking monitoring and alarm can timely discover whether abnormal account entry, if the alarm is found, there will be an emergency plan to deal with.

5. General pattern abstraction

After experiencing the design and implementation of the Spring Festival large flow activities, I have some summary and experience to share with you.

5.1 Dr Degradation Layer

In heavy traffic scenarios, disaster recovery (Dr) must be performed to ensure that activities are online. Make storage usage estimates based on the estimated number of participants and effects by referring to common implementation solutions in the industry, such as degrade, traffic limiting, fusing, and resource isolation.

5.1.1 Current limiting Layer

(1) In terms of traffic limiting, Nginx API layer is used for inbound traffic limiting, distributed inbound traffic limiting and distributed outbound traffic limiting. These current limiter are all common middleware at byteDance company level and have been verified by large traffic.

(2) Firstly, the actual single instance pressure test was carried out. According to the traffic carried by the single instance and the estimated traffic sent to the service during the Spring Festival activity, the capacity was expanded. Combined with the resistance of the downstream, the detailed and complete configuration of TLB inbound flow, inbound flow limiting and outbound flow limiting was completed.

Current limiting objectives:

To ensure the stability of its own services, prevent the expected external flow from destroying its own services, prevent avalanche effect, and ensure the core business and user core experience.

Traffic limiting of a simple cluster is traffic limiting of instance dimensions. QPS of traffic limiting for each instance is equal to QPS of total configuration traffic limiting/number of instances. For multiple machines with low QPS, errors may occur.

For distributed inbound traffic and outbound traffic limiting, the two modes are as follows. Each mode supports high and low QPS. The only difference is that the SDK uses different modes and functions. Generally, low QPS requires high accuracy. Redis counting method is adopted, and users provide their own REDIS cluster. High QPS requires low accuracy and degenerates into single-instance current limiting with total QPS/ TCE instances.

5.1.2 Downgrade Layers

For high traffic scenarios, each core function must have a corresponding degradation scheme to ensure the stability of core links in emergencies.

(1) This Spring Festival reward entry and activity wallet page direction made a full operation plan, a total of 26 degradation switches, the critical moment to abandon the car to ensure the cool, to prevent a single point of problem affecting the core link.

(2) Take the discovery of golden red link as an example, the wallet direction can be completely degraded by relying only on Docker and MySQL, and other dependencies can be degraded. If the MySQL master has problems, you can contact the master in an emergency. Although the last one is useless, the premise should be designed to ensure the safety of activities.

5.1.3 Resource Isolation Layer

(1) Improve the development efficiency and do not repeat the wheel. As the wallet asset Center also supports the requirement of Douyin asset distribution, the existing interface and code process are reused to support the award distribution.

(2) At the same time, for this Spring Festival activity, cluster isolation is made at the service level to create a dedicated activity cluster, and the underlying storage resources are isolated. The activity traffic and regular traffic do not affect each other.

5.1.4 Storage Estimation

(1) Not only should we consider and verify that Redis or MySQL storage can resist the corresponding traffic, but also estimate whether the storage resources are enough according to the actual acquisition, participation and release of data.

(2) For The Redis component of Bytedance, it can be expanded vertically (add storage for each instance, maximum 10GB) or horizontally (single room maximum 500 instances). Because Redis is synchronized with three computer rooms, only the storage limit of one computer room can be considered when calculating storage. Sufficient buffer should be reserved, because horizontal expansion is a slow process. When storage resources are insufficient in an emergency, you can only configure switches to remove the dependent storage in advance, which needs to be designed in advance.

5.1.5 Pressure measurement level

This Spring Festival activity, wallet reward account and activity wallet page to do a full full link pressure test, the following is some experience summary.

  1. Before the pressure test, the monitoring board of the whole link should be established so that problems can be found in time and conveniently during the pressure test.
  2. For MySQL database, in the red envelope rain and other large flow formal activities before the start of the pressure test of small flow to warm up the database, the peak flow before the establishment of chain, reduce the formal activities of a large number of chain construction time, to ensure the stability of the red envelope link database level.
  3. During the pressure measurement process, the pressure gauge must be transmitted to support special logic processing of the pressure measurement traffic through the link identification, which does not interfere with normal online services.
  4. During the pressure test, verify that computing and storage resources can withstand the predicted traffic
  • Sort out the pressure measurement plan, set a reasonable initial flow rate based on historical experience, gradually improve the pressure measurement flow rate, and observe various pressure measurement indicators in real time.
  • Storage resource pressure test data should be isolated from online data. For MySQL and Bytekv, the pressure test table is built. For Redis and Abase, the pressure test prefix is added to the key base on the online key.
  • The pressure test data should be cleaned in time. Redis and Abase add a short expiration time and the expiration mechanism is convenient to deal with. If you forget to set the expiration time, you can write scripts to identify the prefix of the pressure test mark and delete it.
  1. After the pressure measurement, check whether storage resource indicators meet expectations.

5.2 Thinking about microservices

In daily technical design, everyone will abide by the principles and specifications of microservice design, and split different modules according to system responsibilities and core data model, so that the improvement of development iteration efficiency does not affect each other. However, microservices also have its disadvantages. For scenarios with large traffic, the functions are complicated and will pass through multiple links, which is extremely consuming of computing resources. In this Spring Festival activity assets, China National Taiwan Provides SDK package to replace RPC for microservice link aggregation to provide basic external capabilities, such as checking balance, judging whether users have received rewards, and forcing entry into accounts. The maximum access traffic is tens of millions, saving computing resources of tens of thousands of cpus compared with using microservice architecture.

6. Future system evolution

(1) Sorting out upstream and downstream needs and pain points, optimizing the design and implementation of assets in Taiwan, improving basic capabilities, optimizing service architecture, and providing one-stop services so that the access party can focus more on the research and development of activity business logic.

(2) Strengthen the capacity building of real-time and offline data kanban, so as to make the presentation of reward distribution data clearer and more accurate.

(3) Strengthen configuration and document construction, reduce the docking cost of internal docking activities, and improve the access efficiency of activity business parties externally.