Undertake the opening | mPaaS core components of a service system of outline has introduced the main functions of the mobile analysis services MAS and the data link, including the basic analysis, custom analysis, performance analysis, log management “. In this chapter, we will further analyze the architectural capabilities behind mobile analytics service MAS.

1. MAS mobile analysis core competencies:

  1. Through real-time calculation of multidimensional data, the performance of mobile applications can be fully demonstrated.
  2. Promote service product optimization and operation promotion through offline analysis of user attributes and behaviors;
  3. Automatically collects process logs to improve troubleshooting efficiency and solve r&d problems.

Functions that extend from core competencies include:

  1. User behavior analysis: Provides application usage analysis, including statistical functions of user report, user login, new users and other indicators, and supports multi-dimensional analysis and comparison by platform, version, region and time, so that users can know their App usage more quickly and conveniently.
  2. Stability analysis: Provides application stability analysis, including blinking back monitoring, exception monitoring, performance monitoring, and user diagnosis, helping developers discover and locate problems in a timely manner.
  3. Diagnostic analysis: Provides application log diagnosis, including individual user diagnosis and diagnostic log collection. The individual user diagnosis user obtains the behavior of the user client in real time, and the diagnostic log collection sends instructions to the client in Push mode to return the local logs of the client.

Thus, the role of MAS mobile analysis on mobile client r&d and other aspects of enterprises is as follows: Massive mobile end of log data, and through real-time or offline calculation, output analysis of the specific results and report, and enable enterprises through MAS data analysis ability, help enterprises to establish a mutually beneficial ecological symbiosis of mobile terminal services, help enterprise monitoring terminal, insight into user behavior and industry change to assist enterprises to make strategic layout and decision support.

2. Analysis of MAS mobile analysis architecture

Let’s start with the data link diagram:

As can be seen from the data link diagram, the most front-end of data comes from the SDK of the client. Currently, the log burying point SDK provided by the mPaaS framework mainly has four functions:

  1. Obtain the log collection and reporting rules from the server, such as whether to report, network conditions to report, frequency to report, packet size, and log storage time.
  2. Automatic monitoring of client basic behaviors, and record buried point logs, such as: report, jump, network, time, Crash, etc.
  3. Provide the corresponding interface API for the service module to call, encapsulate the basic parameters (the service only needs to pay attention to the service log data) and output them to the client log file.
  4. Client logs are reported to the server log gateway (MDAP) based on the collection and reporting rules.

After all required logs are reported to the MDAP log gateway, THE MDAP outputs the logs to the server log file as server logs. The collection tools such as Logtail and Flume deployed in the server collect the logs and transmit them to the message middleware such as SLS and Kafka, and then receive the logs from different platforms. And according to their own rules and scheduling plan for calculation.

Currently, MAS supports real-time and offline computing:

A. Real-time computing

Kepler or JStorm’s computing engine to provide support, where: Kepler is a real-time computing platform for ants, including Kepler, the underlying computing engine, and Kepler-UI, the corresponding development platform. Kepler supports exactly once semantics and provides Transaction and other support.

Kepler offers two programming paradigms, SQL and higher-order operators:

  1. Kepler SQL: Compatible with most of the Streaming SQL semantics, and supports the Apache Beam concepts such as Window, Trigger, etc. Because SQL is easy to use and maintain, it is the mainstream way of use by Kepler users.
  2. Higher-order operators: Kepler has a complete set of real-time operators built in, including: Filter, transform (UDF, UDTF), aggregate (UDAF), multi-stream Join, Union, Split, etc. You can see the same concepts in Spark and Flink advanced apis. By assembling these operators, You can easily implement a DataStream with custom logic. Through higher-order API, users can describe the computation itself in finer granularity. Because the higher-order operator encapsulates State and Retract capabilities, users can save tedious storage and error rollback details.

In the Kepler computing engine layer, the focus is on scheduling and executing data streams, while the execution engine is pluggable for the operator layer. Kepler currently supports the default execution engine and Raya, the ant’s distributed computing framework.

Raya is ant version of the Ray, Ray learning may refer to: https://ray.readthedocs.io/en/latest/index.html

In the IO layer, Kepler project supports all storage components, including but not limited to SLS, AntQ, DRC, HBase, MySQL, Kudu, Pangu, Explorer, etc., and these storage types have corresponding built-in IO components. Users can easily read and store data without having to write a single line of IO code.

In addition, TO adapt to different deployment environments, MAS also supports the JStorm real-time computing platform to complete data analysis by submitting the computing topology in JStorm.

JStorm is a distributed real-time computing engine, similar to Hadoop MapReduce. Users implement a task according to specified programming specifications, submit the task to JStorm, and JStorm can schedule the task 7*24 hours. The core principle is as follows:

  1. The program JStorm submits to run is called Topology.
  2. The smallest unit of message that Topology handles is a Tuple, which is an array of arbitrary objects.
  3. The Topology consists of Spout and Bolt. Spout is the node that emits the Tuple. Bolt can subscribe to any Spout or Tuple sent by Bolt. Spout and Bolt are both collectively called Components.

MAS buried data flows into the computing topology through message-oriented middleware such as SLS/Kafka. In the topology, data is processed multiple times according to the pre-configured log segmentation and aggregation rules, and finally flows out to the persistence layer. MESDB (Based on ElasticSearch), OTS(HBase), and Explorer are used on the MAS at the persistence layer.

  • Due to ElasticSearch’s full text retrieval capabilities and powerful write and query capabilities, MESDB is used to store basic behavior data, such as active users, new users, accumulated users, and log playback data.

  • OTS(HBase) stores partial results and intermediate data for calculation. Explorer stores data for user-defined analysis.

  • Explorer is an ant distributed low latency PB-level real-time analysis column database:

    First, a column design minimizes I/O contention, which is a major cause of delays in analytical processing. Column designs also provide extremely high compression rates, typically four or five times higher than row databases. MPP data warehouses typically scale linearly, which means that if you double the space of a two-node MPP warehouse, you can effectively double its performance.

    Second, the Explorer protocol layer provides the interface of the MySQL protocol. Through the mysql-JDBC-driver, you can initiate insert and SELECT requests to the Explorer. The computing layer is based on Drill, supporting multiple types of storage, linear cluster expansion, and customizable execution plan. The storage layer is based on Druid, with unique storage formats and computing capabilities for OLAP. The overall architecture of Explorer is shown as follows:

For the MAS custom analysis, the user – defined aggregation rules and attribute dimensions cannot be determined in advance, so Explorer is selected and supported by its powerful pre-aggregation capability. In Kepler/JStorm real-time computing topology, the aggregation can be completed by inserting Explorer in real time after sharding according to user-defined attribute dimensions.

In addition, real-time query capabilities are strong with weak core criteria: return time.

Not to mention super apps like Alipay client, mPaaS public cloud service manufacturers have hundreds of millions of logs every day, data storage level of TB, at this level of query to return data in seconds, it is very difficult. Explorer uses the Hyperlog algorithm to solve this problem and fully support THE MAS appeal. (For more information about this algorithm, please search Cardinality Estimation for a more comprehensive understanding of the type of algorithm).

B. Offline computing

To support different deployment scenarios, MAS can choose Ali Cloud DataWorks, Hadoop/Spark, or Ant Financial Cloud-Data Intelligent R&D platform as required in offline computing platforms.

  1. DataWorks: It is an offline data platform provided by Alibaba Cloud and has a very good application in China.
  2. Hadoop/Spark: An open source fact-based data platform, which is the best choice for self-built big data platforms.
  3. Data Intelligent RESEARCH and development platform (Caiyunjian) : it is a data research and development platform carried in ant Financial cloud, which has been applied and practiced in many large institutions

The above offline computing platforms all provide core capabilities required by MAS: data integration, data development, data management, and data governance. They can transmit, transform, and integrate data, introduce data from different data sources, transform and develop data, and finally transfer data to other data systems.

In addition, the core concept of offline computing is ETL task: ETL is the cleaning, processing and loading of big data, which implements the formatting, verification and supplement processing of log data, and loads it into the online system after a series of statistical analysis. No matter how the above three platforms are supported, MAS offline analysis is always built and managed around ETL. In the current SET of MAS, common and general calculation methods and logic are extracted for a long time. MAS presets hundreds of tasks, which are mainly divided into: ODS,CDM and ADS are data access layer, data common layer and data application layer respectively. The data model adopts star model, which can meet the characteristics of fast understanding and rapid development.

  • Task relationship:

  • Task tree Demo:

At present, offline computing mainly supports several modules of MAS, such as device analysis, retention analysis, page analysis, funnel analysis and mPaaS component analysis.

3. Core competitiveness and advantages of MAS

  1. Diversity: MAS except behavior analysis, active distribution, participation, funnel, retained analysis, equipment, page, event analysis, custom analysis based on the user for data analysis, such as analysis also provides for a start, Crash, caton jammed, channel analysis, spatial analysis and various dimensions analysis ability, It can meet most of the analysis requirements of conventional APPS.
  2. Compatibility: MAS provides the ability to support mainstream open source solutions and solutions based on Ant Financial Cloud and Ari Cloud to provide solutions that are more in line with different resources and environments.
  3. Scalability: MAS architecture has been tested under the pressure of massive data and has a strong ability of horizontal expansion. Even if the minimum unit is built at the initial stage, with the growth of users, the capacity can be immediately expanded and the level of service capacity can be improved to meet the future planning needs of enterprises.
  4. Adaptability: MAS’s customers include financial enterprises such as banks and securities, as well as non-financial industries such as travel and subway. Meanwhile, MAS can support enterprises to customize specific business statements based on their own characteristics.

Through this section, I hope to introduce the basic technologies of big data and related technologies of mPaaS MAS to you. I hope to have the opportunity to communicate with you on technologies from various dimensions such as full-text indexing, analytical data warehouse system, real-time stream computing and offline development.

Welcome to join mPaaS Technical Exchange Group:

  • Nail group: Search for the number of nail group 23124039.

Looking forward to your joining us.

The first session of Code Hub online live broadcast invited Chen Zhengwei from DevOps community in Taiwan and Gu Tang from MOBILE development platform mPaaS team of Ant Financial to discuss “DevOps past and present” and “In-depth Practice of Agile Development and dynamic update in Alipay”. See you live on March 13!