One, foreword

With the continuous expansion of mall business channels and the increasing number of promotional games, the original V2.0 architecture of the mall has been unable to meet the increasing number of activity games. Therefore, the independent construction of the promotional system needs to be conducted to decouple it from the mall and provide the pure supporting ability of marketing games in the mall.

We will introduce the problems and solutions encountered in the process of building the promotion system of Vivo Mall in a series, and share the experience of architecture design.

Second, the system framework

2.1 Business Combing

Before introducing the business structure, we first have a brief understanding of the business capacity building process of Vivo Mall’s promotion system, and review the current promotion ability. There are the following problems with the promotion function in Mall V2.0:

1. The promotion model is not abstract enough, the maintenance is chaotic, and there is no independent inventory of activities;

2. Disorderly management of mutually exclusive relationship of activities and lack of unified promotion pricing ability.

In the core transaction link of the mall, the pricing logic of the merchant detail page, the shopping cart and the order are maintained separately and independently, as shown in the figure below. Obviously, with the increase of promotional incentives or changes in gameplay, the amount of repeated development of the mall side business will increase significantly.

(Figure 2-1. Before the unification of promotion pricing)

3. The promotion performance cannot meet the level of activity, which often affects the performance of the mall main station.

Coupled with the mall system, it is unable to provide targeted performance optimization, resulting in the system unable to support more and more frequent promotion activities under the scenarios of large traffic.

Based on these pain points, we completed the independence of the promotion system in the first phase, decoupled it from the mall, and built the core competence of the promotion system:

Preferential Event Management

A unified preferential model and configuration management interface are abstracted for all preferential activities, and functions such as activity editing, modification, query and data statistics are provided. And a unified activity inventory management, to facilitate the unified control of activity resources.

Promotional pricing

Based on the highly flexible and abstract pricing engine capability, the promotion pricing model of hierarchical pricing is defined, and the unified preferential superposition rules and pricing process are formulated to realize the construction of the promotion pricing capability of Vivo Mall. Pushed to complete the promotion and pricing of all core links of Vivo Mall, and realized the unified calculation of whole-link preferential prices, as shown in the figure below:

(Figure 2-2. After unified promotion pricing)

With the completion of the core ability of the first promotion system, the business needs have been greatly met, and all kinds of preferential gameplay have increased. But with it come all sorts of operational pain points:

  • The promotional activities maintained cannot be inspected in advance to check whether the effects of the activities meet expectations;
  • With the increase of preferential play, a commodity can enjoy more and more preferential, configuration is also more and more complex, easy to configure the wrong cause online accidents;

For this reason, we began to build the capacity of the second phase of the promotion system, focusing on solving the above operational pain points:

  • Time travel function is provided, so that users can “travel” to a certain point in the future, so as to realize the advance inspection of promotional activities;
  • Provide price monitoring function, combined with the planning ability of “mall marketing price ability matrix”, through the multi-dimensional monitoring measures before/during/after the event, to “reduce the probability of error, error can stop the loss in time”.

2.2 Promotion and coupons

The main purpose of promotion is to convey various preferential information of goods to users, provide preferential interests, attract users to buy, so as to promote the promotion of new, improve the purpose of sales. From this perspective, coupons are part of the promotion.

However, for some reasons, the promotion system of Vivo Mall was not put together with the promotion system during the independent process:

  • First of all, the coupon system has been independent since V2.0, and has been connected with many upstream businesses. It has become a mature mid-platform system.
  • Moreover, it is the business particularity of coupons compared with other promotional offers, such as the ability to send coupons and get coupons.

Considering the cost of design and transformation, coupons are not included in the scope of promotion system capabilities. However, coupons are also a part of commodity price concessions after all, so promotion pricing depends on the ability of coupon system to provide coupon concessions.

2.3 Business architecture & processes

So far, we have sorted out the general capability matrix of the whole promotion system, and the overall architecture is designed as follows:

(Figure 2-3. Promotion System Architecture)

With the independence of the promotion system, the relationship between the whole shopping process of the mall and the promotion system is as follows:

(Picture 2-4. The latest shopping process in the mall)

Technical challenges

As a capability system of China and Taiwan, the technical challenges of promotion system include the following aspects:

  • In the face of complex and changeable promotion methods and preferential superposition rules, how to make the system extensible to meet the increasingly changeable preferential demands and improve the efficiency of development and operation?
  • How to meet the high performance requirements in high concurrency scenarios in the face of large traffic scenarios such as new product release and Double 11 for customers?
  • How to improve the overall stability of the system and ensure the high availability of the system in the face of complex system environment such as untrusted calls from upstream business parties and unreliable services from downstream dependent parties?

We combine their own business characteristics, comb out some technical solutions.

3.1 Extensibility

Extensibility improvement is mainly reflected in two parts:

  • The definition of preferential model abstracts the unified preferential model and configuration management interface for all preferential activities;
  • The establishment of promotion pricing engine and the unification of pricing model.

Relevant detailed design content will be explained in subsequent articles.

3.2 High Concurrency/High Performance

The cache

Caching is almost always a “silver bullet” to solve performance problems, and there is a lot of use of caching in promotional systems to improve performance, including Redis caching and local caching. With caching, there are concerns about data consistency. Redis caches are easy to deal with, but local caches are not. Therefore, the use of local caching depends on the business scenario, where the data does not change frequently and the business can accept certain inconsistencies.

mass

The business scenario of the promotion system belongs to the typical scenario of reading more and writing less, and the IO operation, including DB, Redis and third-party remote call, which has the biggest impact on the performance during the reading process. The bulk transformation of these IO operations to trade space for time and reduce the number of IO interactions is also a major scheme for performance optimization.

Streamlining/Asynchronous

Simplify the realization of functions and transform non-core tasks asynchronously. For example, the cache processing after the activity editing, the message synchronization after the resource preoccupation, the message notification of the group status flow, and so on.

Hot and cold separation

For scenarios with more reads and less writes, in addition to IO operations, the biggest impact on performance is the amount of data. In the promotion system, there are also some user data, such as preferential resource preholding records, user group information, etc. These data have a time nature, there is a hot tail effect, most of the need for the most recent data. Hot and cold separation of data for such scenarios is the best choice.

3.3 System stability

Current limiting the drop

Based on the company’s current limiting component, the flow of non-core service functions is restricted and the service is degraded, and the core services of the whole system are fully guaranteed under high concurrency scenarios

idempotence

All interfaces are idempotent to avoid system exceptions caused by the network timeout retry of the business side

fusing

Use the Hystrix component to add fuse protection to calls to external systems to prevent the failure of external systems from crashing the entire promotional system service

Monitoring and alerting

By configurable the logging platform’s error logging alerts, the service analysis alerts of the call chain, and the monitoring alerts of the company’s middleware and infrastructure components, we can detect system anomalies in the first place

Four, stepped on the pit

4.1 Redis Scan command used

In the process of Redis cache data clearing, some cache keys are searched and cleared by means of fuzzy matching, and the bottom layer relies on Redis SCAN command.

The SCAN command is a cursor-based iterator. Each time it is called, a new cursor is returned to the user. The user needs to use the new cursor as the cursor parameter of the SCAN command in the next iteration to continue the previous iteration.

For the use of KEYS command, SCAN command does not return all matching results at once, reducing the risk of blocking the command operation on the Redis system. However, it does not mean that SCAN command can be used freely. In fact, SCAN has the same risk problems as KEYS command in scenarios with large data volume, which can easily lead to increased load of Redis and slow response, thus affecting the stability of the whole system.

(Figure 4-1 Redis load increased)

(Figure 4-2 Redis response spikes)

And the solution is:

  • Optimize Redis Key design to reduce unnecessary cache keys;
  • Removes the SCAN command to be used to clean up by looking for an exact match.

4.2 Hot Key Issues

Redis cache is commonly used in promotional systems for performance improvement, and much of the cached data is SKU merchandise dimensions. It is very easy to generate hot Key problems in business scenarios such as new product release and promotion of specific types of mobile phones.

The hot Key has the aggregation effect, which will lead to the unbalanced load of nodes in the Redis cluster, thus causing the instability of the whole system. This problem can not be solved by ordinary machine enlargement. The following figure shows the load of REDIS during a platooning test:

There are two common solutions:

  • Hashing scheme: Hash Redis keys and distribute them to the RedisCluster Nodes on average to solve the clustering effect of hot keys.
  • Multi-level cache solution: Increase the use of local cache for hotspot keys to maximize access performance and reduce Redis node load.

We multistage cache scheme is adopted, with reference to the outstanding hot cache open source framework, customized extension of a hot solution, support radio and hot hot spot detection, local cache, cluster preheating function, achieve the quasi real-time hot spot detection and shall notify the local cache the cluster instance, hot Key maximum limit to avoid a large number of repeated calls to impact a distributed cache, Improve the efficiency of system operation.

Five, the summary

This article is an overview of the Vivo mall promotion system introduction, a brief review of the Vivo mall promotion system business capacity building process and system architecture, and share the technical problems encountered and solutions. In the future, we will share the design practices of the core function modules of the promotion system (promotion activity management, promotion pricing, price monitoring and time travel) one by one. Stay tuned.

Authors: Vivo Internet official mall development group