Summary: There are many terms related to data. Although there are different definitions, they are essentially complementary and often used in combination to get results. Analogical terms such as data analytics, data mining, and data insight. This article will talk about the data insights we did in business link upgrades.

The author | | ali Jin Duo source technology public number

An overview of

There are many terms for data, and although there are different definitions, they are essentially complementary and often used in combination to get results.

Analogical terms such as data analytics, data mining, and data insight.

The following is the definition on the wiki

  • Data analysis: is a common statistical method, its main characteristics are multidimensional and descriptive. Some geometric methods help to reveal the relationships that exist between different data sets and to plot statistical information to more succinctly explain the main information contained in these data sets.
  • Data mining: Is an interdisciplinary branch of computer science. It is a computational process that uses an intersection of artificial intelligence, machine learning, statistics and databases to discover patterns in relatively large data sets.
  • Data insight: There is no wiki entry before this project. It is based on general knowledge and data analysis and data mining. After combining business scenarios, it defines a unified caliber around business links, so as to better analyze problems and further improve strategies.

In essence, the three analysis methods are all data processing to obtain information, but the goals are different. The following is my personal understanding.

  • Data analysis is more focused, based on people’s understanding of moving line, combined with people’s understanding of business and data, output analysis results. There is more emphasis on human analysis;
  • Data mining is the same as data analysis, except that the role has changed from human to machine.
  • On the basis of data analysis and mining, data insight introduces the concept of business scenarios and sorts out the influencing factors and links around the results of business scenarios. The goal is to attribute, split and form improvement directions better and faster for abstract problems. This is also our business development students have the most advantage.

Two Core elements

We found that data insight understanding can actually be divided into several core elements.

Here we briefly explain one by one.

1 data

Clean and valid data is what we need, otherwise it will mislead the following conclusions. As the login link is the first step to ensure the security level of services, traffic is often flushed. How to avoid traffic generated by grey and black, which may affect the subsequent judgment, is also of top priority.

2 Service Scenarios

Business scenario is the core difference between data insight and other data analysis methods, and is probably the biggest value point for business students to distinguish BI analytics. Any analysis strategy requires an understanding of the business scenario, not just the data.

Defining the “behavior of a complete business link” is the core, around which to analyze useful policies for a link.

3 caliber

What’s the caliber? I understand that caliber is the understanding of business scenes based on reasonable data dimensions and good objectives. Caliber also combines the understanding of business scenes and business objectives. Data dimensions can be many and varied.

Taking login as an example, it is normal for a user to log in on a device, but there will be multiple accounts logging in on the same device in mobile shopping, which is also a normal data feature. When defining the success rate of login, Whether to use the device dimension (the device succeeds as long as one user logs in to the same device) or the user dimension (the user dimension data is only viewed, and the device does not define indicators) also needs to be considered.

Iii. Data Construction

Data cleaning is a means to ensure data effectiveness

For example, some data sources have device information but no user information; some data sources have user information but the device information is incomplete. Even the same time field, the format is not uniform.

At this time, it is necessary to process the data first, eliminate dirty data, supplement missing points, process clean single-dimensional information, and ensure that the data dimension and format processed by each data source are unified, such as standard device ID or user ID and time.

2. Data construction is both supplement and evolution

Data quality issues are not only seen from the clarity of the data, but also from the point at which the data is generated. If the data is missing or inconsistent, and the data cleaning can not be handled, it is necessary to carry out development, such as the database to increase the field, the dotted framework to increase the dotted logic.

Data construction is a long-term process, not only to complement what is being analyzed now, but also to form a standard set of deliverables. Furthermore, when doing daily requirements and projects, data quality should also be considered. After all, doing requirements online is not the result, but getting business goals is the result.

4 Service Scenarios

1 Service scenario definition

The business scenario is the most specific segment of the overall business insight. The definition of this link directly affects the effectiveness of problem resolution results.

Different service scenarios have their own particularities and need to be analyzed based on service characteristics.

In my experience so far, there are also core approaches to defining business scenarios.

  • In a business scenario, who is the end product?

Taking login as an example, the ultimate goal of login must be to deliver the login state, otherwise no one will come back to “play” the login, and the link around the login state is the service link we want.

The same goes for other businesses, such as orders, which run around inventory.

  • How deep are the dimensions you need to analyze in a business scenario;

This is also easier to understand, to continue with the appeal example, to look at the login of the business link, you need to split up the various login methods of the link. Right? Or it is enough to look at a total login link.

Generally, in the early stage of insight, the finer the dimension is, the better. However, after the analysis, the dimension will gradually rise, because with the insight into the business, it will be found that although some dimensions are deeper and more complete, they cannot be analyzed, which is also called “over-analysis”.

  • In a business scenario, you define “a complete business action”.

Distinguishing data insight from other analysis methods, the biggest advantage lies in the combination of business to analyze the business itself, so that the direct impact of business results must be a complete business link.

This point is not easy to illustrate without an example, for example, the login process.

Have you ever wondered what a punch would look like, and what the difference would be from a complete business operation.

The normal pattern is as follows.

Table 1

These two discrete dots are a complete login behavior, but are based on the representation of the RPC request dimension.

2 Data structure evolution based on service scenarios

The dotting data describes a staged outcome. As described in the above example, the user initiated an account secret login request at 2021-12-1 11:20:54, but due to the unsafe environment, the security challenge required to verify identity (such as SMS verification), the user performed the kernel operation, initiated no-login and delivered the login state at 2021-12-1 11:21:20.

So this is a login action. The core of business insight also revolves around this point.

If our analysis dimension is the total login dimension or the login dimension analysis by login mode, the two data points are not suitable for us. We only need login mode, final result, time and device ID.

Table 2.

Or the nucleus didn’t get through

Table 3

However, we can also find that the behavior described by this data is not complete. For example, Table 2 cannot describe the feature that the login process has gone through the kernel.

At this point, we need the data structure for the next stage of evolution.

We introduced statustag to describe the path.

Statustag formats: 12 | 0 0 ^ 0 ^ ^ 1 ^ abcde.

Before and after | is divided into two kinds of format, the first format for bitmap, 0 version; The second format is a string, representing the format of version 1. The string is the passed node that has not been added to the bitmap.

This tag describes the path through bX1100 results, through version 1 of 4 and 8 nodes, and through version 2 of ABCDE nodes.

With this tag, you can describe more information.

3 Visual representation of data in business scenarios

Pure data is not easy to discern, nor is it a rational way to manage long-term operations. That’s when we need visualization to get things done.

The visual content contains the content we want to express, such as funnel, such as curve.

The most common visual representations today are funnels and reports.

  • For example, funnel

Figure 1

Making a funnel is cumbersome and requires manual definition point by point. But the funnel is very beneficial to the initial understanding of links and analysis of problems.

What we need at this point is the ability to quickly generate visual funnels from structured data sources.

We can quickly generate structured data by specifying conventions at the time the data is generated.

  • Based on the state machine + convention
  1. Introduce state machine change record log;
  2. Combined with structured drawing ability, directional output convention log, dynamic drawing
  • The core elements of a state machine

1. StatusTag records path information. 2. Status and old_status record upstream and downstream information of nodes. 3. Depth Records the node depth.

Logon data “->” Can produce the following logon behavior sample data (data is not real user data)

Five caliber

Caliber is the output based on data and business scenarios. Caliber is also the most important point, which represents our understanding of business results based on data and business scenarios. For example, the caliber of login was defined at the beginning of the financial year. The success rate of login increased from 9x% to 9Y %.

Do not change calibre 1 often

Once the caliber is defined, don’t change it often. Because generally defining caliber is the most difficult and time-consuming, when defining caliber, we have generally completed the disassembly of targets, the insight of opportunities and the final calculation.

2 calibre is not necessarily a single calibre

In addition to the appeal feature, the caliber will also have single caliber and multi-caliber, generally exist at the same time, such as the login process, on the basis of a total caliber, even a single login behavior, we will split multiple business stages.

Taking login as an example, we define a user’s willingness from entering the page to initiating the login behavior, and the success rate from the beginning of the login behavior to the login result. The two problems to be solved are different. If they are combined, the problems will become complicated and not conducive to analysis.

Multi-aperture also has the advantage that we can do staged work, at different stages, to deal with multi-aperture link upgrades in one part of the link.

3. Dimension definition of caliber

The definition of caliber dimension needs to combine the characteristics of field and business, even if the same business link may be defined differently in different fields and groups.

This is hard to explain. Let me give you an example.

In our definition, c-end caliber is the dimension of equipment. Because c-end users naturally have wool pulling behavior, we will think that the successful login of a device is beneficial to C-end.

However, for the same login link, the B end is defined as the user dimension, because the individual value of the merchants at the B end is very large, and there is little behavior similar to wool pulling at the C end. The user dimension allows us to better see the user behavior, so as to optimize the experience.

Six summary

In terms of data insight, we are still learning and practicing, and we have achieved some results on this road, but there is still a lot of space for the future. This path has advantages for business development, and business platform has unique advantages in terms of the richness of business scenarios. We can have more freedom in what we do with data insight. Welcome to discuss and explore together.

Data insight is a powerful tool for enabling business in the business. It is also a very big proposition for us to have insight into business output data.

The original link

This article is the original content of Aliyun and shall not be reproduced without permission.