Author’s brief introduction

Liu Jiang is the director of risk management of Ctrip Financial Management Department, responsible for the overall risk management of Ctrip Group. With nearly 15 years of experience in risk management, He has held key management positions in Guangfa Bank, OperaSolutions, Alibaba and Tencent, and has been engaged in risk control policy, risk control model, big data credit investigation and other related work.

After more than 10 years of development and accumulation, Ctrip’s anti-fraud system has become very mature in real-time parallel computing of big data and real-time multi-dimensional correlation analysis, which is the basis for the stable and efficient operation of the whole system.

In the past two years, we have invested r&d resources in big data and artificial intelligence, and produced a series of innovative projects such as device fingerprint, CDNA and real-time complex variable computing engine, which have achieved good application effects. In 2017, the overall card BP decreased by more than 50%, far lower than the average level of peers, providing favorable conditions for ctrip’s business development and globalization process.

 

Both performance and complexity can be achieved

Ctrip’s risk control system, like most third-party payment platforms, is also based on real-time risk control system:

  1. In the payment link, the time for risk control verification is generally not more than 1s, and the risk control can be passed within 100ms at the business risk control point. The pursuit of performance is also the pursuit of the ultimate user experience.

  2. In the past two years, ctrip’s orders have increased by more than 50% annually, and the amount of intervention in business risk control, such as marketing activities and malicious occupation of resources, has increased by more than 10X annually.

  3. The number of rules has increased by five times in two years. Meanwhile, rules use more data than product information, payment information, account information, and weak associated data such as behavior data.

  4. Deploy complex models in real-time risk control scenarios so that models can reject transactions as directly as rules; On average, the amount of resources required to execute a model and the associated variable calculations can be as high as 200 ordinary rules, which is both architecturally and performance-challenging for the system.

  5. We need more data to identify fraudsters, such as the identification of simulators and proxy servers, and have invested considerable resources in research and development.

 

Let me show you some statistics:

The calculation complexity of ctrip’s risk control rules behind a payment request:

 

The number of variables generated during the calculation is close to 2000, and more than 90% of the variables are Velocity and Ratio type variables, and a large part of them are even accurate to the transaction. It takes less than 150ms on average to execute a complete rule check and risk control returns to the payment system to pass or reject the instruction, and 99.9% line is only about 500ms.

       

Ctrip risk control architecture change brief history

Ctrip started building its own risk control system around 2011, until 2015, just in time to catch up with the company’s technology stack. Net to Java platform transformation, risk control system also ushered in a complete rewrite.

The architecture, design complexity, and projected processing power of the new system also take into account the company’s business growth expectations, allowing technology to get ahead of business for the first time. After a large version of the iteration every year, so far, Ctrip risk control technology has been in the first echelon of the industry.

 

Architecture overview and core services

Let’s take a look at ctrip’s risk control architecture:

The figure above may be a bit abstract, but let’s look at a concrete example:

Concepts: Login/registration, order, payment, notification of payment results, invoicing, etc. These are what we call risk control access points.

Some access points are used for real-time verification, while others are used for data collection. There are over 400 risk control access points in ctrip’s entire system to review or monitor every link of Ctrip’s transactions, ensuring the safety of each transaction and the interests of users.

Every day, risk control collects more than 5 billion pieces of data, of which more than 100 million requests require risk control to check risks in real time and return to the business system whether the current operation can continue.

Risk control has been involved since the user logged in. In the process of user browsing and placing orders, risk assessment and calculation of the user have been continuing. When the user initiates a payment request, the thermal data of risk control has already included complete data about the user’s portrait. The risk control engine can calculate and derive the variables needed by rules and models in real time based on these data.

To support the high availability and high performance of the risk control system, powerful infrastructure is indispensable. Here, I would like to show you several core services and components of Ctrip risk control:

Risk control engine

We gave him a name Matrix, which means as flexible as a Rubik’s cube. Thousands of rules are distributed and executed in parallel to ensure that there is no significant positive correlation between the number of rules and execution time; Risk control engine can be grouped dynamically by business, which not only ensures good isolation of computing resources between businesses, but also provides enough flexibility.

The rule engine

The initial version is based on Drools implementation, but after two iterations of optimization, it has been completely replaced with a self-developed engine, which is compatible with Drools scripts, and the migration to the new engine is almost zero cost. After migration, rule execution performance is improved by more than one order of magnitude and has better stability.

Model execution engine

The risk control engine supports the deployment of models trained by tools such as SAS and SPARK in risk control systems. It supports DOT and PMML formats.

We implemented our own DOT model interpreter, which is more than 20 times more efficient than Python.

Real-time traffic service

Internally, it is called Counter Server, which is responsible for derivative calculation of all Velocity variables and Ratio variables. Its importance is self-evident. The performance of Counter directly affects the time consuming and accuracy of the whole transaction.

We built a Slide Window based on the Redis cluster, which is lightweight in implementation, but very useful. The scale of the time window is mapped to the Redis key. Currently, the accuracy of seconds, minutes, hours, days, months and so on is supported. Real-time statistics can be flexibly and dynamically configured according to the requirements of variables. The capacity of the cluster ranges from 2 TB to 5TB.

Counter service supports more than 10 billion queries every day, and the average time of a single traffic query is only about 1ms, ensuring the reliability of variable derivation.

 

Device fingerprinting

Traditionally, IP is used to identify a device, but with the popularity of mobile network, IP has basically lost this function, you get a lot of base station IP, exit IP, blocking an IP may be a mistake.

In APP, you can use IMEI or IDFA hardware ID to identify devices, but in PC and H5, you need a more accurate device identification identifier than IP. For example, ThreatMetrix, a well-known ThreatMetrix provider, and there are several service providers specializing in equipment fingerprinting in China.

Device fingerprint is a key technology for risk control to identify fraudulent transactions, and such core technology should be mastered in our own hands. The device fingerprint service developed by Ctrip Risk Control has been deployed in all sites of Ctrip and several sites under Ctrip Group, and the accuracy of rule capture has been significantly improved after application.

Structure and key indicators of device fingerprint:

  

CDNA

       

We need a complete and in-depth understanding of the “lifetime” behavior and “footprint” of the same person or the same type of fraud gang in Ctrip.

Based on this goal, the CDNA service is developed to aggregate the data of the same person through multi-dimensional infinite pole convergence correlation of all data flowing through risk control. The CDNA service processes more than 100 terabytes of data per day.

CDNA is very useful for detecting new fraud signatures and making rule fetching more accurate.

 

Agent and simulator identification

       

The technology of fraudsters is also constantly evolving, with stronger concealment of crimes. Proxy server and simulator are very good concealment methods, which can be seen in many scenarios such as transaction brushing and credit card fraud.

      

We have studied TCP Signature, Time Gap, user behavior, experimental data for various simulators, etc., and have developed our own methodology and identification scheme.

 

Artificial rules vs. models

       

The supplementary significance of the model for rules is very significant, which can make up for the blind area of artificial rules. The model can well cover the characteristics of historical fraud and greatly reduce the number of rules.

       

Both rules and models need to be based on a good understanding of the business context. The features extracted from the analysis of data itself without business context are often biased and incomplete. In fact, the line effect is bound to be not very ideal.

Briefly introduce our feature variable extraction method:

Variable derivation method:

 

conclusion

“Make the Travel More Freely and Securely” is ctrip’s internal culture and mission of risk control. With the continuous progress of Ctrip’s globalization and the increasing trading volume, the black production technology at home and abroad is becoming increasingly mature, and the fraud situation is becoming more and more serious.

Ctrip is a leader in the OTA industry. Ctrip’s anti-fraud technology team will also lead the technological progress in the field of anti-fraud, study and master the application of advanced tools such as big data and artificial intelligence in advance, so as to deal with greater challenges in the future and provide better services to users.

【 Recommended reading 】

  • Certificate full text OCR technology, know about it

  • Reveal CRNWEB in ctrip three-terminal general framework

  • Ctrip picture service architecture

  • Ctrip DARE regression test was conducted

  • Machine learning orientation for front-end engineers