Introduction: In marketing scenarios, algorithms will provide advertisers with personalized marketing tools to help advertisers better fine-grained marketing and achieve better ROI increases within controllable costs. We in this period of time to support multiple real-time business scenarios, such as the bidding strategy of real-time forecast, keywords batch service scenarios such as synchronization, real-time characteristics, understand the business side of the students, in view of the ODPS scenario most can be flexible to use, but there are still insufficient to Blink to use, we accumulated some experience for scene, Hope to be of some help to you.

Source of the authors shigeru | | ali technology to the public

A background

In marketing scenarios, algorithms will provide advertisers with personalized marketing tools to help advertisers better fine-grained marketing and achieve better ROI increases within controllable costs. We in this period of time to support multiple real-time business scenarios, such as the bidding strategy of real-time forecast, keywords batch service scenarios such as synchronization, real-time characteristics, understand the business side of the students, in view of the ODPS scenario most can be flexible to use, but there are still insufficient to Blink to use, we accumulated some experience for scene, Hope to be of some help to you.

Ii. Technology Selection

Why do you choose Blink? For most offline scenes, if there is no requirement for timeliness, or the data source is Batch mode and non-streaming (such as TT, SLS, SWIFT, sequence), ODPS is a good choice for this scene. In general, if the data source is real-time (such as TT/SLS/SWIFT), ODPS need to be read sequentially, and the scene requires high timeliness, Blink is a good choice.

Blink also supports Batch mode and Steaming mode at present. Batch mode refers to a fixed start time and end time. Compared with ODPS, its biggest advantage is that it applies for resources in advance but exclusively, so as to ensure timeliness; Streaming mode is real-time consumption in the traditional sense, enabling millisecond processing.

From the perspective of development mode, it is mainly divided into Data Stream mode, similar to ODPS MR; The second is SQL schema; From a ease of use perspective, SQL is undoubtedly the lowest cost to use; However, for complex scenarios, Data Stream has the best control ability, flexibly defining various caches and Data structures, and supporting multiple scenarios at the same time.

Three Main scenarios

1 Real-time replay bid strategy evaluation

Business background

Replay system is a set of online bidding log collection, structured, subsequent processing simulation system. The system records the bidding information of the engine on the through-train line after the recall, mainly covering the queue information such as recall, bidding and scoring. Combined with sorting and deduction formula, the log can be used to simulate the online bidding environment. In short, you can evaluate what would have happened if another bid had been used on BidWord. With the replay system, algorithm teams and advertisers can use offline traffic to estimate the effect of user policy changes before online AB testing. This minimates the impact of policy changes on the online testing and makes the results more controllable. At the same time, in the process of negative strategy testing, we can reduce the impact on the market earnings as much as possible.

The algorithm team hopes to evaluate multiple bidding strategies on the business side based on online recall logs, replay the sampling logs within one day (1 billion data), evaluate the bidding strategy, and support the real-time offline AD to avoid the influence of offline AD on bidding strategy. It is expected that the 1 billion data volume will run out within 1-2 hours.

The main challenge

  • How to load 10 million material data;
  • Real-time synchronization of high QPS (1 million) offline AD;
  • How to decouple the entire real-time Job link from services

The solution

  • Material data loading: Load all data directly at blink startup to avoid stress on igraph access under high QPS; In addition, the broadcast mode is adopted, which can be used by each node only once, avoiding multiple loading of ODPS data.
  • The information of offline AD was stored in IGraph by buckets, and the full amount of offline AD was read by periodic cache, and the 200W+ QPS was controlled to 1W or so. The RateLimit stream limiting component was used to control the access concurrency, and the control of IGraph concurrency was limited to 400000 or so to realize the overall flow smoothing.
  • The overall real-time engineering framework reserves UDF interfaces, so that only SDK can be realized on the business side, and other engineering performance, concurrency, current limiting, buried point and other logic can be realized internally, supporting the engineering framework and algorithm strategy Replay decoupling.


Based on the flexible capability of blink Streaming Batch mode, we realized the data processing of fixed start and end time of TT data. The precipitated read-write TT component, ODPS component, iGraph component, and buried component are well supported for subsequent job development of similar businesses, while the component provides the basic capability for subsequent job productization.

2 Real-time Features

Business background

With B end algorithm development, model of incremental updates dividends less and less, to consider from the customer real-time information to further capture user intent, more comprehensive and more real-time mining potential demand, to further enhance the growth space, from the perspective of the B end user behavior based on online user behavior log output real-time characteristic, algorithm using real-time data to improve the online team model.

Based on this demand, we generated A user real-time feature output link and obtained user real-time features by parsing upstream A+ data source. Real-time features mainly include the following:

  • Get nearly 50 feature data values from the user and output them into igraph.
  • Output user ids with certain characteristics and aggregate them by minute
  • Output the sum, mean, or number of a feature in the last hour

The main challenge

  • The amount of real-time feature data development is very large. For each feature data, real-time data link and maintenance need to be developed. The development cost and operation and maintenance cost are high.
  • Characteristic data development requires developers to understand: data source, ETL processing based on factual data source; Computing engine, Flink SQL maintains a set of its own computing semantics, need to learn to understand and according to the scene skilled use; Storage engine, real-time data development needs to be grounded to serve, so it needs to relational storage engine selection, such as igraph, hbase, Hologres, etc. Query optimization methods: Different storage engines have their own query clients, usage, and optimization methods. Therefore, you need to learn how to use different engines.

The solution

From a product design perspective, design a set of real-time platform capabilities that make developing real-time features as easy as developing offline tables in ODPS. The product advantage is that users only need to understand SQL to develop real-time features:

  • You don’t need to know about real-time data sources
  • No understanding of the underlying storage engine is required
  • Only SQL can be used to query real-time feature data without learning different engine query methods

The whole real-time development product is linked with aurora platform, Dolphin engine, Blink engine and storage engine, connecting the whole process in series, providing users with end-to-end development experience, without the need to perceive technical details irrelevant to their work.

Related platform introduction:

Dolphin Intelligent Acceleration Analysis engine: Dolphin smart acceleration analysis engine from ali mother plate (DMP) scene data marketing platform dharma, in general the OLAP MPP computing framework, on the basis of typical calculation for marketing scenario (such as label circle, insight analysis), a large amount of storage, indexing, and calculated level of operator performance optimization, realized in computing performance, storage costs, Stability and so on. Dolphin itself is positioned as an acceleration engine, with data storage and computation operators relying on the underlying ODPS, Hologres and other engines. In the form of plug-ins, operator integration and optimization of underlying data storage and index are completed in Hologres, which improves the computing performance in specific computing scenarios and supports business scale by an order of magnitude. Dolphin’s core computing capabilities include cardinality computing kernel, approximate computing kernel, vector computing kernel, SQL result materialization and cross-DB access, etc. Dolphin also implements a set of SQL translation and optimization capabilities that automatically convert the original user input into the underlying optimized storage format and computation operator. Users do not need to care about the underlying data storage and calculation mode, but only need to spell SQL according to the original data table, which greatly improves the user convenience.

Aurora Consumer Operation platform: Aurora is a one-stop research and development platform for marketing acceleration scenarios. Through the way of platform productization, the feature engine capability can better empower users. The featured scenarios supported by auroras include the ability of super-large-scale tag alignment (ten billion tag circles selected millisecon-level output), crowd insight (hundreds of billions of second-level queries), second-level effect attribution (event analysis, attribution analysis), real-time and million-level crowd orientation, etc. Based on the marketing data engine, Aurora provides one-stop operation and maintenance management and control, data governance and self-access capabilities, making it more convenient for users to use; Aurora has accumulated common data engine templates for search promotion, including base calculation template, report template, attribution template, crowd insight template, vector calculation template, approximate calculation template, real-time delivery template, etc. Based on mature business template, users can use it at zero cost and without code.

According to current business requirements, encapsulate real-time data source and storage data source usage examples:

Create table if not exists source_table_name(user_id String comment ", click String comment ", item_id String comment '', behavior_time String comment '' ) with ( bizType='tt', topic='topic', pk='user_id', timeColumn='behavior_time' ); ---- create table create table if not exists output_table_name (user_id STRING click STRING) with (bizType='feature', pk='user_id' );Copy the code

Realtime feature operator:


  • Description: Select one field from the input table and arrange it in reverse timestamps order. The parameter can be deleted by ID and TIMESTAMP. The user can obtain a maximum of K data


Insert into table ${output_table_name} select nickname, concat_id(true, item_id, behavior_time, behavior_time) 50) as rt_click_item_seq from ${source_table} group by user_id; Insert into table ${output_table_name} SELECT WINDOW_start (behavior_time) as time_id, concat_id(true, user_id) as user_id_list from ${source_table} group by window_time(behavior_time, '1 MINUTE');Copy the code

Sum, AVg, count:

  • Meaning: Select a field from the records entered in the input table to sum, average, or count the specified time range

Using an example

Insert into table ${output_table_name} SELECT user_id, WINDOW_start (behavior_time) as time_id, sum(pv) as pv, sum(click) as click from ${source_table} group by user_id,window_time(behavior_time, '1 HOUR');Copy the code


Based on the real-time feature requirement of b-end algorithm, a real-time feature output system based on Blink SQL + UDF is precipited. The SQL input by users is escaped, the Bink SQL Streaming task is generated on Bayes platform, and the real-time feature data is generated and stored in iGraph. The basic abilities such as blink writing igraph component, concat_id operator and aggregation operator are precipitate, which lays a foundation for the subsequent Dolphin Streaming real-time feature output system and supports the subsequent multiple feature operator expansion modes to quickly support such user needs.

3 Keyword Batch synchronization

Business background

Every day, many businesses join through train through different channels; And there is relatively large space to undertake respect to new guest. On the other hand, there is more room for optimization of the low active part of the system’s inventory customers. System buy word as a new customer undertake, low activity promotion of an important starting point, hope through through train new customers and low activity of the key words of higher frequency update (day level -> hour level), to help the target customer advertising try more keywords, save the good and eliminate the bad, to achieve the goal of promoting activity.

Based on this demand, we added hour-level message update link on the basis of the existing days-level offline link to support the system word update of each word package under standard plan and intelligent plan. The message update volume per hour is tens of millions of magnitude. Using Blink, the full ODPS request parameters are called faAS function service. Write the result of each request to the ODPS output table. Update frequency is two hours, update time: 8:00 am to 22:00 PM, single add and delete scale: add 500W/ delete 500W.

The main challenge

  • Blink Batch jobs need to be scheduled by the hour
  • Faas function calls require limiting

The solution

  • Use Blink UDF to realize the function service function of invoking HSF on request
  • Blink UDF uses RateLimiter for stream limiting, and QPS of access function service can be strictly controlled by node parallelism
  • Shell script was configured on Dataworks platform to perform batch computing task scheduling on Bayes platform


Based on this requirement, the blink SQL Batch mode is used to realize such update link in near real time, which opens up the scheduling mode of such batch jobs and lays a foundation for the subsequent batch job productization.

Iv Future Prospects

Dolphin Engine has designed and developed Dolphin Streaming link based on the B-end algorithm. It is as easy for users to develop real-time features on aurora platform as offline tables in ODPS. Users can query real-time feature data only with SQL without knowing real-time data sources and underlying storage engines. However, b-side algorithm services also have batch services similar to those mentioned in this paper, which need to develop blink Batch SQL, Blink Streaming Batch mode, ODPS UDF and Java Code tasks, and provide scheduling scripts. Finally, the project was encapsulated and submitted to the algorithm team for use. In the future, we hope that users can develop batch computing services on aurora platform, reduce the cost of algorithm development, provide an extensible and low-cost batch computing engine capability, support rapid business iteration, and enable business landing to get results quickly.

The original link

This article is the original content of Aliyun and shall not be reproduced without permission.