Jingdong risk control system architecture practice and technical details based on Spark

1. The background

The rapid development of the Internet provides fertile soil for the rise of electronic commerce. In 2014, the transaction volume of China’s e-commerce market reached 13.4 trillion yuan, up 31.4 percent year on year. Among them, the transaction volume of B2B e-commerce market reached 10 trillion yuan, up 21.9% year on year. Behind this series of high-speed growth figures, criminals coveted Internet assets, malicious behavior against the e-commerce industry is becoming increasingly fierce, among which, the most typical is scalpers snatching orders to store goods and merchants malicious brushing orders. Scalpers hoard goods so that the majority of normal users lost the business to give preferential benefits; And the merchants brush single brush praise, not only interferes with the reasonable shopping choice of users, but also disturbs the whole market order.

As the leading enterprise of domestic e-commerce, JINGdong is under severe risk threat today. Machine account registration, malicious ordering, scalpers, merchants brushing and other problems if not effectively prevented, will bring incalculable losses to JINGdong and consumers

In the Internet industry, risk control systems are usually used to defend against these malicious visits. In terms of technology, the field of risk control has gradually developed from the traditional “rule-base” (rule-based judgment) to today’s real-time + offline double-layer recognition based on big data. The continuous development of big data cluster distributed processing framework such as Hadoop and Spark provides effective support for risk control technology.

2. What is skynet?

Under this background, the jingdong risk control department to build “justice” system, after years of precipitation, “justice” is now fully cover jingdong mall dozens of node and effectively support the jingdong corp. ‘s business jingdong home and overseas purchase risk control related business, effectively guarantee the business process of user interests and jingdong.

As the core tool of jingdong’s risk control, “Skynet” has built a graph computing platform dedicated to risk control based on Spark. The main analysis dimensions include: user portrait, user social relationship network, and transaction risk behavior characteristic model.

Its internal system contains both the business-oriented trade order risk control system, explosive goods for risk control system, the businessman brush against single system, behind the recognition engine and store user risk credit information and the rules of risk credit center (RCS) system, focus on creating a picture of the risk of users risk rating system.

Below, we will analyze Skynet from the two parts of front-end business risk control system and background support system that users can directly perceive:

3. Front-end business risk Control System 1
Trade order risk control system

The transaction order risk control system is mainly devoted to controlling all kinds of malicious behaviors in the order link. Based on the user’s mobile phone registration, the basic information such as the delivery address, combined with the current ordering behavior, historical purchase records and other dimensions, the system can make real-time discrimination and intercept a variety of abnormal orders such as machine brushing, manual batch ordering and abnormal large orders.

At present, the system has developed different recognition rules for different types of goods such as books, daily necessities, 3C products, clothing and home furnishing. After several rounds of iterative optimization, the recognition accuracy has exceeded 99%. For suspected orders that cannot be accurately identified by the system, the system will automatically push them to the background risk control operation team for manual review. The operation team will determine whether they are malicious orders based on the historical order information of the account and the current order. From the automatic recognition of the system to the manual recognition of the back, it can guarantee the validity of the order transaction to the greatest extent.

2
Explosion buying risk control system

On the e-commerce platform of JINGdong, there will be regular instant kill products launched every day. Most of these products are from the first-tier brand merchants who launch products or snap up explosive products on the platform of Jingdong. Therefore, the price of instant kill products will have a great discount compared with the market price.

But this also brings huge interest temptation to scalpers, they will use batch machine registration account, machine buying software and other forms to buy seconds to kill goods, a limited number of seconds to kill goods are often snatched up in an instant, but it is difficult for general consumers to enjoy the benefits of seconds to kill goods. In view of such a business scenario, the second kill risk control system this sword is also favorable.

In actual seckill scenarios, it is characterized by huge instantaneous traffic. Even so, the “explosive panic buying risk control system” this sword to this kind of high concurrency, high flow of machine buying behavior shows infinite power. At present, the cluster computing capacity of JINGdong can reach the recognition engine capacity of over 100 million concurrent request processing per minute and millisecond level real-time calculation. In the second kill behavior, it can block more than 98% of scalpers to generate orders and provide fair buying opportunities for normal users to the maximum.

3
Merchant reverse brushing system

With the continuous development of the e-commerce industry, many dishonest businesses try to use brush orders, brush evaluation way to improve their search ranking and improve their own product sales. With the introduction of the third-party seller platform in JINGdong, some merchants also try to take advantage of this loophole. We put forward the principle of “zero tolerance” for such behaviors. In order to achieve this goal, the anti-brushing system of merchants was born.

Merchants’ reverse order brushing system makes use of the big data platform built by JINGdong to analyze orders, goods, users, logistics and other dimensions and calculate different characteristic values under each dimension. Through discovering the difference between the historical price of goods and the actual price of the order, the abnormality of SKU sales of goods, the abnormality of logistics distribution, the abnormality of evaluation, the abnormality of user purchase category and other hundreds of characteristics, the precise positioning is carried out by combining with a variety of intelligent algorithms such as Bayesian learning, data mining and neural network.

For the suspected brushing behavior identified by the system, the system will call the data stored in the big data mart for offline in-depth mining and calculation through the background offline algorithm combined with the information of orders and users, and continue to identify it so that it will not be hidden. For these identified brushing behaviors, the merchant anti-brushing system will directly inform the relevant merchant information to the operator to make severe punishment, so as to ensure good user experience for consumers.

Up to now, front-end business system has basically covered the whole process of the transaction link, and cracked down on all kinds of malicious behaviors that infringe on the interests of consumers from all dimensions.

4. Background support system

As the risk control system of JD, Skynet deals with different risk scenarios with different characteristics every day. It could be tens of millions of malicious second kill requests per minute, or it could be a new way for scalpers around the world to brush up. How does Skynet solve these problems one after another by building the underlying system? Let’s take a look at skynet’s two core systems: Risk Credit Service (RCS) and Risk Control Data Support System (RDSS).

1
Risk credit service

Risk credit service (RCS) is the core risk control engine buried in each business system. It not only supports the efficient online identification of dynamic rule engine, but also Bridges the precipitation data and business system. It is the only way for the risk control data layer to provide services externally, and the importance and performance pressures are self-evident.

See Figure 1.1 for the RCS service framework

RCS, as the only outlet for Skynet to provide risk control services, relies on JSF, a service architecture framework independently developed by JINGdong. It helps RCS to provide efficient RPC calls, highly available registries and complete disaster recovery features under distributed architecture. It also supports service governance functions such as blacklist and whitelist, load balancing, dynamic Provider grouping, and dynamic call switching.

In the face of tens of millions of calls per minute, RCS combines JSF’s load balancing and dynamic grouping functions to deploy multiple distributed clusters according to service characteristics and provide services in groups. Each group is deployed across the equipment room to maximize high availability of the system.

1.2 Recognition principle of RCS dynamic rule engine

RCS implements a self-developed engine for dynamic configuration and resolution of rules. Users can submit or modify online recognition models in real time. When the real-time request comes, the system will slice the real-time request data in a high-performance middleware according to the core characteristics of the model according to the time for high-performance statistics. Once the statistics of the characteristics in the model exceed the threshold, the front-end risk control system will immediately intercept.

The high-performance middleware system mentioned above is JIMDB, which is also independently researched and developed. Its main function is distributed cache and high-speed Key/Value storage service based on Redis. It adopts “pre-sharding” technology to distribute cache data to multiple fragments (each fragment has the same composition, for example: Both are on one master and one slave nodes), allowing for the creation of large caches. Supports I/O policies such as read/write separation and dual-write, dynamic capacity expansion, and asynchronous replication. It plays an important role in RCS on-line identification

1.3 Data flow steps of RCS

The risk database is the core component of RCS, which stores the basic data of various dimensions. The following diagram shows the basic data flow in the whole service system:

Review images

1) Each front-end business risk control system carries out risk identification for each business scenario, and the result data will be returned to the risk database users for subsequent offline analysis and risk value judgment.

2) The risk database will clean the incoming data identified by business risk control, verify manually, define and extract risk control index data, and the metadata of the risk database can be basically available after this process.

3) The background data mining tool calculates the weight of all kinds of data from various sources according to the algorithm, and the calculation results will be used for the subsequent calculation of risk value.

4) Once the risk credit service receives the risk value query call, it will read the user’s risk control index data in JIMDB cache cloud in real time, calculate the risk level value using Euclidean distance combined with weight configuration, and provide real-time service for each business risk control system.

1.4 Technical innovation and planning of RCS

After entering 2015, RCS system is faced with great challenges. First of all, with the increasing amount of data, the previous processing framework can no longer meet the demand. Meanwhile, the constantly updated malicious behavior means have higher and higher requirements for risk control, which requires the risk control system to constantly increase targeted rules, which also brings considerable business pressure.

In the face of such challenges, RCS has strengthened its cooperation with JD’s big data platform more closely. In the storage of real-time recognition data, the combination of Kafka+Presto is introduced in the face of more than one billion recognition flow information every day. Through Presto, the identification data cached in Kafka within a week can be queried in real time. Data more than one week old is written to the HDFS of Presto through ETL, supporting historical query. In terms of the improvement of RCS identification dimension, the company has gone through the process with jingdong user risk rating system, and has obtained more than 100 million risk levels calculated based on social network dimensions for risk credit identification. In terms of real-time calculation of risk levels, it has gradually switched to JRC, a streaming computing platform based on Strom developed by the Big Data Department.

5. Risk control data support system

Risk control data support system is a set of risk control data mining system built around jingdong user risk rating system.

1
The core architecture of RDSS

View picture 1) Data layer

As shown in the figure, the data layer is responsible for data extraction, cleaning and preprocessing. At present, ETL program accesses business data of more than 500 production systems, including a large amount of unstructured data, through JMQ, Kafka, data mart, basic information interface and log. By processing the diversity, dependence and instability of data, complete and consistent risk control index data is finally outputed and provided to the algorithm engine layer through the data interface. The most critical part of this layer is the collation of risk control index data. The quality of index data is directly related to the final output of the system. At present, the sorting of indicators is mainly carried out from the following three dimensions:

A) Index data collation based on user life cycle

For e-commerce business, an ordinary user will basically have the following sticky states, from trying to register to trying to buy; From being deeply attracted, to gradually rational consumption. Each state is accompanied by certain consumption characteristics, and these characteristics can be useful data for capturing abnormal user behavior.

B) Collation of risk control index data based on user purchase process

For ordinary users, their purchasing habits are quite common. For example, they usually search for the goods they need, browse and compare the brands they are interested in in the search results, and finally make a purchase decision after several iterations. You have to look for coupons before you actually buy, and there are more or less pauses in the payment process. As for scalpers, they have clear goals, go straight to the topic after logging in, and pay readily. These differences in browsing behavior are also useful data for us to find malicious users.

C) Collation of risk control index data based on users’ social networks

The index data based on users’ social network is established under the background that the black industry chain in the field of risk control has gradually become a system. Often those malicious users will always gather in some features, which is behind a scalper, brush single company, through this way can achieve a catch a string, individual find the effect of the accomplice.

View picture 2) Algorithm engine layer

The algorithm engine layer is a collection of various data mining algorithms, which are classified into a variety of commonly used classification, clustering, association, recommendation and other algorithm sets in the system, which are provided to the analysis engine layer for invocation.

3) Analyze the engine layer

The analysis engine layer is the main platform for risk control data analysts. Data analysts can set up projects based on business in the analysis engine layer, carry out the whole process of data mining on the platform, and finally produce risk control models and identification rules.

4) Decision engine layer

The decision engine layer is responsible for the management of models and rules. All models and rules produced by the system are gathered here for unified management and update.

5) the application layer

The application layer mainly covers the application scenarios of the output model and rules of the decision engine layer. The most important one here is the risk credit service (RCS), whose main function is to connect with the underlying data and provide risk identification services for the business risk control system of the external layer.

Before models and rules can be put into use, we have to go through another important system, which is the Risk Control Data Analysis Platform (FBI), because all models and rules are evaluated in this platform. The input is the output data of all rules and models, and the output is the evaluation results. The evaluation results will also be fed back to the decision engine layer for further rule and model optimization.

2
User risk rating system in RDSS

Jingdong user risk rating system is the first data project incubated by Skynet data mining system. Its main purpose is to classify all jingdong users, and determine which are loyal users and which are malicious users that need to be focused on. Its realization principle is to rely on the social relationship network described above to identify the risk degree of JINGdong users. And this approach is leading the way in the whole field of data. The first phase of JD user risk rating system has produced 100 million data, and it has provided services externally through THE RCS system. According to the evaluation of identification results, the identified loyal users increased by 37% and the identified malicious users increased by 10% compared with the RCS risk database.

At present, jd user risk rating system has been implemented:

1) The data layer produces more than 50 risk indicators based on the dimension of social networks.

2) Define points and edges through PageRank, triangle counting, connected graph, community discovery and other algorithms, and identify hundreds of thousands of community networks.

3) Calculate the risk index of hundreds of millions of users through the classical weighted network of energy propagation ideas.

5. Conclusion

All the past is the prelude, JINGdong Risk Control is creating a set of data defined everything super risk control computing framework. This risk control framework will unify risk control model management (data model, recognition model, rule engine), unified risk control service management (JRC, PRESTO, Streaming), unified risk control data management (HDFS, HBASE, Kafka), and will span cloud computing, big data, artificial intelligence, Intelligent adjustment of risk control strategy for the rapidly changing e-commerce transaction risk real-time processing.

About the author

Denver

Senior R&D engineer of JD Chengdu Research Institute, graduated from Xihua University, joined JD Risk Control R&D Department in 2012, and participated in the research and development of several risk control businesses and data core systems.

The temporary

Jingdong Chengdu Research Institute data product manager, master of Sichuan University, participated in the research and development of several risk control sky net systems and data-related business systems

Meng meng

Senior manager of JINGdong Chengdu Research Institute, master of University of Electronic Science and Technology of China, mainly responsible for the background of Jingdong Skynet risk control system and research and development of data processing, data mining, decision support and other related business systems.

Big data chatter

ID: BigdataTina2016

View pictures ▲ Long press the QR code to identify attention

Focusing on big data and machine learning,

Share cutting-edge technology, exchange deep thinking.

Welcome to the community!

Jingdong risk control system architecture practice and technical details based on Spark

1.2 Recognition principle of RCS dynamic rule engine

1.3 Data flow steps of RCS

1.4 Technical innovation and planning of RCS

Related Posts

【 Similarities and Differences between Consul, Eureka and Nacos 】

Introduction to Java lambda expressions

Chapter 15 Cache WebSocket