What is a ABTest

Product change doesn’t have to be driven by our patting of the brain. It needs to be driven by actual data, with user feedback guiding us on how to improve our service. As Chen Gang, Chief executive of Honeycomb, said in an interview: “Sense is needed for some things, but for most things you can use Science to judge.”

ABTest is familiar to many readers. To put it simply, ABTest is a process in which users are divided into different groups, different versions of products are tested online at the same time, and which version is better to use based on the real data fed back by users.

We used ABTest with the original release as the control and as a principle with as little traffic iteration as possible for each release. Once the index analysis is completed, the version with the best performance of user feedback data will be launched in full.

Many times, a change in a button, a picture or a line of text can lead to significant growth. Here is an example of ABTest in a hornet’s nest:

As you can see, the e-commerce team at our exchange wanted to optimize a search list for “skiing.” As you can see, the page display before optimization feels thinner. But we are not sure whether the more complex presentation form will make users feel not concise enough, produce aversion. Therefore, we put the before and after pages online for an ABTest. The final data feedback showed that the optimized style UV improved by 15.21% and the conversion rate increased by 11.83%. Using ABTest helped us reduce the risk of iteration.

Through this example, we can understand several features of ABTest more intuitively:

  • Priori: The method of traffic segmentation and small traffic test is adopted to verify our idea by first letting some online users with small traffic use it, and then promoting it to the full traffic according to data feedback to reduce product losses.

  • Parallelism: we can run at the same time two or more than two versions of the test to compare at the same time, and ensure that the environment of each version consistent, in this way can the quarter before sure you want to don’t release, now may only need a week’s time, avoid the complex process and cycle long, save the time of verification.

  • Scientific: When statistical test results, ABTest requires statistical indicators to judge whether the results are feasible, avoiding us to rely on empiricism to make decisions.

In order to make our verification conclusions more accurate, reasonable and efficient, we implemented a set of algorithm guarantee mechanism in accordance with Google’s practice to strictly realize scientific allocation of traffic.

Multi-layer streaming model based on Openresty

ABTest are most companies by providing interface, retrieve the user data and then call interface by business manner, this will amplify the original flow, and to invade the obvious business, support the scene is relatively single, lead to more need to develop a lot of distribution systems, business requirement for different scenarios are also difficult to reuse.

In order to solve the above problems, our shunt system is based on Openresty implementation and transmits shunt information through HTTP or GRPC protocol. In this way, the diversion system works upstream of the business, and there is no secondary traffic due to Openresty’s inherent traffic distribution nature. For the business side, it only needs to provide differentiated services and will not intrude into the business.

There are several reasons for choosing Openresty to do ABTest:

The whole process

In the design of ABTest system, we split three elements into three parts: the first is the determined terminal, which contains equipment and user information; The second is the definite URI; The third is the matching allocation strategy, that is, how traffic is allocated.

First, the device initiates a request, and the AB gateway extracts the device ID, URI and other information from the request. Then the terminal information and URI information are determined. Then, the URI information is traversed to match the corresponding policy, and the AB gateway notifies the downstream in two ways after the request finds the currently matched AB experiment and version through the shunt algorithm. For applications running on physical Web machines, a key named ABtest is added to the header, which contains the hit AB experiment and version information. For micro-service applications, the information matching the micro-service will be added to the Cookie to be processed by the micro-service gateway.

Stable shunt guarantee: MurmurHash algorithm

Shunt algorithm The MurmurHash algorithm is adopted. The Hash factors of the algorithm include device ID, policy ID, and traffic layer ID.

MurmurHash is a common ABTest algorithm that can be used in many open source projects, such as Redis, Memcached, Cassandra, HBase, etc. MurmurHash has two distinct features:

  1. Fast, dozens of times faster than the secure hash algorithm

  2. The change is drastic enough that similar strings, such as “ABC” and “abd”, can be evenly distributed over the hash ring, mainly for the purposes of orthogonal and mutually exclusive experiment splitting

The following is a brief explanation of orthogonal and mutex:

  • The mutex. Indicates that the traffic of two experiments is independent and the user can enter only one experiment. Generally speaking, it is aimed at the experiments on the same flow layer, such as the experiment of mixed graph and text list and the experiment of pure graph list. The same user can only see one experiment at the same time, so they are mutually exclusive.

  • Orthogonal. Orthogonality means that there is no correlation between all the experiments that the user enters. For example, users who entered version A in experiment 1 were evenly distributed in other experiments, rather than concentrated in a certain area.

Flow layer experimental shunt

The hash factors in the traffic layer are device ID and traffic layer ID. When a request flows through a traffic layer, only one experiment in the layer will be hit, that is, the same user can hit only one experiment in each layer at most. First, hash the hash factor, using the Murmurhash2 algorithm, which ensures that the hash factor changes slightly but the value of the result changes dramatically. Then, mod 100, +1, and finally get a value between 1 and 100.

The schematic diagram is as follows:

Experimental version shunt

The hash factors include device ID, policy ID, and traffic layer ID. The same policy is used for version matching. The matching rules are as follows:

Stability guarantee: multi-level cache policy

As I said, after each request comes in, the system tries to get an experimental policy that matches it. Experimental policies are configured from the background. We synchronize the configured policies to our policy pool in the form of message queues.

Our initial plan is to read data from Redis after each request comes, which requires high stability of Redis, and a large number of requests will also cause high pressure on Redis. Therefore, we introduce multi-level caching mechanism to form the policy pool. A policy pool consists of three layers:

The first layer, Lrucache, is a simple and efficient caching strategy. It is characterized by the existence of the life cycle of the Nginx worker process, which is worker exclusive and very efficient. Due to the exclusive feature, each cache will exist in each worker process, so it will occupy a lot of memory.

The second layer, lua_shared_dict, is shared across workers, as the name suggests. Its data is not lost on Nginx reload, only when it is restarted. But there is a feature, in order to secure read and write, the implementation of read and write lock. So there may be performance issues in some extreme cases.

The third layer Redis.

From the perspective of the whole strategy, although the use of multi-level cache, but there is still a certain risk, that is, when L1 and L2 cache failure (such as Nginx restart), may face the risk of Redis “running” because of heavy traffic. Here we use lua-resty-lock to solve this problem. If the cache fails, only the part of the request that holds the lock can be returned to the source, ensuring that Redis is not so stressed.

In the case of 30 seconds of cache, statistics on online data show that the hit ratio of the first-level cache is above 99%, the hit ratio of the second-level cache is 0.5 %, and the request back to Redis is only 0.03%.

The key features

  • Throughput: 5% of the total station traffic

  • Low latency: the average online latency is less than 2ms

  • Full platform: support App, H5, WxApp, PC, cross-language

  • Disaster:

  • Automatic degradation: When the policy fails to be read from REDis, AB will automatically enter the no-stream mode and try to read redis every 30 seconds until the data is read, avoiding frequent sending

  • Request manual degradation: When server_event logs are too many or the system load is too high, close all experiments or AB shunt through the background “one-click Shutdown”

performance

The test tool was JMeter, and the number of concurrent sessions was 100 and lasted for 300s.

In terms of response time, except for the large request deviation value at the beginning, the average response time is less than 1ms. The reason for the large gap at the beginning of the analysis was that there was no data in the multilevel cache.

There was a slight drop in TPS pressure performance because of the hash algorithm after all, but overall it was within acceptable limits.

A/B

Conventional A/B publishing is mainly done by API gateways. When faced with complex business requirements, A/B publishing releases open A/B publishing capabilities of more complex dimensions through interaction with microservices.

summary

It should be noted that ABTest is not completely applicable to all products, because the results of ABTest need a large amount of data support, and websites with more daily traffic will get more accurate results. In general, it is recommended that A/B testing be performed at least 1000 UVs per version per day, otherwise the test cycle will be long or it will be difficult to derive accurate (convergence) results from the data.

A lot of careful work is needed to design a complete ABTest platform. Due to space limitation, this paper only focuses on the shunt algorithm. In summary, the hornet’s nest ABTest shunting system has achieved some results in the following aspects:

  • Traffic interception and distribution is adopted. The original interface is discarded, and service code is not invaded, performance is not significantly affected, and secondary traffic is not generated.

  • The strategy of traffic stratification and binding experiment can be used to define the shunting experiment more precisely and intuitively. Through the mechanism of reporting the hit experimental version to the client, the storage of service data is reduced and the function of serial experiment split can be realized.

  • In terms of data transfer, the business side does not need to care about the specific implementation language by adding the stream information to the HTTP header.

Recent planning improvements:

  • Monitoring system.

  • User portrait and other fine customization AB.

  • Statistical function supports confidence interval, eigenvalue and other productized functions.

  • The influence of experiment on Polaris index was evaluated by AARRR model.

There are many areas that need to be improved in the future, and we will continue to explore and look forward to discussing with you.

Author: Li Pei, technical expert of hornet’s Nest foundation platform informatization research and development; Zhang Lihu, engineer of static data r&d team of Hornet’s Nest Hotel.

(Hornet’s nest technology original content, reproduced must indicate the source to save the end of the two-dimensional code picture, thank you for your cooperation.)