background

Xueqiu is the largest stock investment community in China. Its mission is to “connect everything about investment” and its vision is to become “the largest investment exchange and trading platform in the world”. Snowball’s core goal is to provide stock and fund investors with quality information acquisition, investment exchange and trading experience.

Inside the snowball, in order to achieve “all connection investment” set up the department of big data, bear the responsibility of the middle construction snowball data as well as the mission of “connecting all the” snowball data, data from various departments, from the data of each APP, from a variety of service data, how can fast to integrate data management, Rapid completion of data statements required by business to complete data analysis and even data decision-making has become a prominent problem impeding snowball’s business development. How to solve it?

Data center AIBO – Everything that connects data

Want to quickly integrate a variety of data, it is necessary to achieve the separation of storage and computing, want to data analysis and decision, must provide convenient data analysis tools, to various lines of business integration, must have a directory permissions and data management, want to quickly respond to business, must provide the data API, good services to quickly build and governance, based on the dismantling of many problems, We found that want to solve the problem of all the data, you must have a snowball’s exclusive data middle office, no enterprise can by buying generic services to solve all business problems, middle data is not universal, need to be closely integrated with the business, providing business’s ability to obtain the data quickly, integration analysis ability, ability to report, etc.

Now that the goal is set, we can take a look at what capabilities are necessary to build an enterprise’s data center (there is no unified cognition of the current situation of integrating snowball:

  • Data integration: Need to be able to quickly integrate all kinds of structured and unstructured data

  • Data catalogue: it is necessary to clarify the company’s important data assets and build a unified data catalogue

  • Data label and get through: It is necessary to get through the data of each business, while sorting out the data label (such as user label, stock label, post label, advertising label, etc.)

  • Data analysis: To explore and analyze business values, efficient data analysis tools are indispensable. Unified and flexible data analysis tools are required

  • Data rights: Data rights must be managed to prevent data leaks and share and cooperate with each other

  • Data services: It is necessary to provide data support for business, rapidly develop data services, and drive business with the idea of micro-services. Meanwhile, it is necessary to do a good job in service governance, and update and iterate data services with data analysis (unused data services go offline, and core data services ensure high availability and stability).

From the perspective of data flow and data application, the entire AIBO architecture is as follows:

Data integration

Snowball Data Center uses an architecture designed to separate storage and computing. Data integration involves ETL and integration of data, which is then stored in hive data warehouse. Data mainly comes from the following aspects:

  • Kafka Topic data: Use Flume to synchronize data to Hive. Flume cluster configuration is required, and a standard data access interface will be developed for configuring drivers to quickly access message queue data

  • Mysql business database data: Through SQOOP guide table to Hive, AIBO developed a special interface, to quickly achieve the collection of database tables, standardized completion of mysql data integration

  • Data ETL work: It is necessary to do some ETL work on the underlying data sources to meet business requirements. This point is flexible and can be completed according to business needs by MEANS of SQL, Python,shell, etc., depending on the specific situation

  • DolphinScheduler is used to handle data dependencies graphically

Data directory –User+Event

Data directory, which is data management center, the need to integrate as many valuable data assets, also need to provide a variety of data definition, snowball is a ToC product, for the mass of users, so naturally, AIBO underlying data are related to the user, who described, at any time and do what happened and what the special attribute, etc., Therefore, it is the User+Event data model in the abstract, and the data directory is also the Event definition in the common User analysis system.

Data tags and get through –USER+EVENT+Item

Data Item is the material, the meaning of the data labels, when a user completes an event, we know who the user is, naturally want to know what are the user tag attributes, such as age, gender, number of fans, focus on big V number, collection post count, from all lines of business such as Posting frequency to describe the user data labels, expand again, The snowball when someone went to a particular stock, we may also want to know the person looking at the stock market, which kind of stock market, are fond of watching the label also need to stock, such as stock by sector, stock p/e ratio, net, information such as interest rates, the same user, when she clicks on an AD, we also want to know what kind of advertising, users love point In order to know the material information of the advertisement, technically speaking, it is also easy to understand that there is an ID foreign key of Item material data in one of the attributes of the Event. We can get through the material data through the other key. For example, uid in the Event can get through the user_info user tag. Adid can get through the advertising tag,stock_code can get through the stock tag, etc.

The label of Item can be any material information, such as order information, product information, logistics information, etc., in the snowball, there are also a lot of material information of various businesses and the whole snowball user label data.

Item tags can be added and maintained flexibly, through data integration and custom ETL, and dependency management through DolphinScheduler.

Data analysis – General model analysis capability

Data is indispensable to the successful application of data analysis and exploration, and use the data to influence business, build the closed-loop data applications, data analysis tools is an important link in China, analyze the data, the indispensable step to create value, the need to provide enough data analysis, AIBO currently offers the following data analysis:

  • Event analysis

  • Retained analysis

  • Funnel analysis

  • Custom data Kanban

  • Based on contrast

  • ABTest

Event analysis

User actions on platforms such as APP/ web can be called events. Such as APP startup, registration, login, browsing posts, deposit, etc., are a kind of events. Event analysis can realize the calculation and display of indicators (APP start-up users, registered users, financial users, etc.) under the specified segmentation dimensions (application version, device brand, operating system, etc.), helping to solve problems encountered in daily operation analysis quickly and efficiently.

Features: Support customized dimensions, customized indicators, support group comparison function, support more chart display, and data download.

Retained analysis

Retention analysis is an analysis model used to analyze user engagement/activity levels by looking at how many of the users who have performed the initial behavior will continue to do so. This is an important indicator to measure the value of a product to users.

Function: can customize the initial behavior and follow-up behavior, according to the initial behavior or follow-up behavior of any dimension group analysis, can do user group comparison, can view data in a variety of charts.

Funnel analysis

Funnel model is mainly used to analyze the transformation and loss of each step in a multi-step process. It can be used to verify whether users achieve the final goal according to the path of product design, and also to analyze how users complete the core transformation steps. Identify potential problems and locate lost users by analyzing the conversion and churn between each step.

Function: freely define steps of funnel event, freely add filter conditions for each step of event, group comparison, multi chart display funnel.

Custom data Kanban

According to their own data tracking needs, indicators and charts saved in the analysis process can be added to the kanban to form a custom kanban, and kanban can be shared to designated personnel to facilitate daily data monitoring

Features: In the event analysis, retention, funnel and other analysis pages, you can quickly save charts and freely define data kanban.

Based on contrast

That the flexible model, AIBO support user group, user group can pass any event, or any of the model analysis filter conditions to save, also supports static based on the configuration of the user based on file upload, also support dynamic (custom conditions, with the change of the time based on user set also change).

Function: Quickly compare the data performance of two or more user groups in each model.

ABTest

Product upgrade, support custom AB experiment, and a snowball front team work together, in the underlying tests ABTest APP shunt support, business side only need to define ABTest version of multiple pages, as well as the experimental target user group (can be configured through the group to do screening, are also free to define conditions), After the experiment is started, data analysis statistics of T+1 and T+0 will be provided to quickly determine the advantages and disadvantages of product revision.

Functions: ABTest provides support for product upgrading, customized crowd streaming, real-time statistics of data effects, and free definition of core indicators and auxiliary indicators.

Data services

On the technical architecture, in the form of micro service, provide data support for the business line, through the expansion of the existing data model and the model, data combination, can share data configuration and the data catalogue metadata, common data service integration into the AIBO generic data service interface, business custom with micro service development and management, Flexible adjustment based on business needs.

Summary and Prospect

Snowball in the data for the construction of China, the storage and computing of separation of architecture design, data on the basis of micro service API and the expansion of the data analysis model, to abstract integration of the business needs of the general, in order to ensure the business smoothly, form the data consolidation + data directory configuration, data analysis and data application of closed loop iterations, Subsequent will gradually added data middle AIBO ability, improve the data security of finer-grained support, perfect data related (all etl process can be seen in the data directory), believe that under the constant business iteration, snowball data can China become more and more perfect, more and more strong, through the data driven business, complete business sublimation.

Colleagues who are interested in technology can follow the snowball Engineer team public account, and the technology and design of each module will be shared in the future, please look forward to it.