preface

Data analysis is a process of collecting and sorting out data by clarifying the purpose of analysis, sorting out and determining the analysis logic, and using statistics and mining techniques to analyze, extract useful information and present conclusions. It is a core skill in the field of data science.

This paper starts from the common logical framework and technical methods of data analysis and comprehensively interprets data analysis in combination with the actual practice of Python project, so as to systematically master the framework routines of data analysis and quickly grasp data analysis.

First, the logic of data analysis — to build the analysis dimension and index of the system

1.1 PEST Analysis

PEST analysis refers to the analysis of macro environment, which refers to various macro forces affecting all industries or enterprises. P is Politics, E is Economy, S is Society and T is Technology. It is usually a method used by strategy consultants to help enterprises review their external macro environment, taking Geely’s acquisition of Volvo as an example:

1.2 5W2H analysis

5W2H analysis method is also known as seven analysis methods, including: Why, What, Where, When, Who, How, How much. It is mainly used for user behavior analysis, business topic analysis, marketing activities, etc. It is a convenient and practical tool.

1.3 Logical tree Analysis

Logical tree is one of the most commonly used tools to analyze a problem. It is a hierarchical list of all the sub-problems of a problem, starting at the highest level and extending down gradually. The main advantages of using logical tree analysis are the integrity of the problem-solving process, the ease with which work can be broken down into manageable tasks, the prioritization of the parts, and the clear responsibility placed on the individual.

1.4 4P marketing theory

4Ps are Product, Price, Place and Promotion. In the field of marketing, this market-oriented marketing mix theory is most widely applied by enterprises. Through the combination of the four, coordinated development, so as to improve the market share of the enterprise, to achieve the ultimate goal of profit.

4P marketing theory is applicable to analyze the operating status of enterprises, which can be regarded as the internal environment of enterprises. PEST analysis is the external environment of enterprises.

1.5 SCQA analysis

SCQA analysis is A “structured expression” tool, namely S (Situation), C (Complication), Q (Question), and A (Answer).

The whole structure is structured to provide smarter solutions by describing the realities of the parties and then bringing out conflicts and core issues. Take SCQA analysis of campus recruitment as an example:

1.6 SMART Analysis method

SMART method is A goal-based management method, namely, S (Specific) clarity, M (Measurable) scalability, A (Attainable), R (Relevant) relevance and T (time-based) timeliness.

SWOT Analysis

SWOT analysis is also called situation analysis, S (Strengths), W (Weaknesses), O (Opportunities) and T (Threats) are Threats or risks. It is often used to determine the internal advantages and disadvantages of an enterprise as well as external opportunities and threats, so as to organically combine the company’s strategy with the internal and external environment of the company. Take HUAWEI’s SWOT analysis as an example:

2. Technical methods of data analysis

The technical method of data analysis refers to the specific method of extracting key index information, such as comparative analysis, crossover analysis, regression prediction analysis and so on.

2.1 Comparative analysis

Comparative analysis method is to compare two or more data, analyze the differences, reveal the development and change of the situation and law.

  • Static comparison: different indicators, such as departments, cities and stores, are selected under the premise of consistent time, which is also called horizontal comparison.

  • Dynamic comparison: comparison of data in different periods under the premise of consistent indicators, also known as longitudinal comparison.

Example: sales performance of each car company

2.2 Grouping analysis method

  • After data processing, the data is grouped, and then the grouped data is analyzed.

  • The purpose of grouping is to facilitate comparison, distinguish objects with different properties in the whole, combine objects with the same properties together, maintain the consistency of object attributes within each group and the difference between groups, so as to further explain the inherent quantitative relationship by using various data analysis methods.

Example: sales of new books in various distribution channels

2.3 Structural analysis

  • Structural analysis method, also known as proportion analysis method, is based on grouping analysis method to calculate the proportion of each component in the overall, and then analyze the internal characteristics of the overall data.

Example: Market share is a typical structural analysis.

2.4 Average Analysis (Standard parameter analysis)

  • The method of calculating average is used to reflect the general level of a certain quantitative feature of the population in a certain time and place.

  • Average indicators can be used to compare the same phenomenon in different regions, different departments or units, and can also be used to compare the same phenomenon at different times.

For example: Index is often used in seasonal analysis and price analysis

2.5 Cross analysis method

  • It is usually used to analyze the relationship between two variables, that is, two variables with certain connection and their values are arranged in a table, so that each variable value becomes the cross node of different variables, forming a cross table.

Example: common bubble chart data table

2.6 Funnel plot analysis

  • Funnel plot can well reflect the conversion rate of each step on the website, and the comparison method is used to compare and analyze the effect before and after the optimization of the same link to reflect the quality of a step conversion rate.

Example: commodity turnover rate performance chart

Iii. Chart presentation of data analysis

Chart presentation can help us understand data information better and more intuitively.

The choice of chart is not just about the style of the chart, but the key is to focus on the data and the function of the chart. Charts can be selected by data presentation functions (composition, comparison, trend, distribution, and association), as shown below:

Iv. Actual Project (Python)

4.1 Data Content

The data comes from kesCI Tmall real transaction orders, mainly behavioral data.

A. Order No. : Order No

B. Total amount: Total amount of the order

C. Actual amount paid by buyer: Total amount – refund amount (if paid); The unpaid payment amount is 0

D. Delivery address: all provinces in China

E. Order creation time: place the order time

F. Order payment time: Payment time (NaN if not paid)

G. Refund amount: the amount applied for refund after payment. The outstanding refund amount is 0

4.2 Tmall order analysis process

4.2.1 Background and analysis purpose

Based on the order data of Tmall within a month, observe the order quantity and sales volume of this month, analyze the influence of order date, delivery address and other factors on the order quantity and order conversion, aiming to improve the order quantity and order conversion rate of users, and then improve the actual payment amount of users.

4.2.2 Analyzing logic

In combination with the order process, this paper analyzes the influencing factors of order order by logical tree method from the following dimensions:

4.2.3 Data reading and processing

import pandas as pdimport numpy as npimport matplotlib.pyplot as pltimport osimport Warningswarnings. Filterwarnings (' ignore ') # read data df = pd read_csv (' tmall_order_report. CSV) df. Head ()Copy the code

There is no

4.2.4 Analysis of overall operation indicators

Analyze the trend of the number of orders in February

Summary 1: In the first half of February, most enterprises did not resume work, express delivery was also suspended, temporarily unable to deliver goods, orders are few; Orders started to rise in the second half of February as more companies resumed work.

Use a visual map to observe the distribution of orders

Summary 2: The region has a great influence on the order quantity. Generally, the order quantity is larger in developed regions and smaller in remote regions. It may be necessary to analyze the types of goods, consumer groups, preferential policies, express delivery and other reasons in each region. Further increase the order quantity and sales amount in other regions depending on the reasons.

4.2.5 Sales conversion indicators

Order numbers and order conversion rate presentation

Summary 3: From the perspective of single conversion rate, the conversion rate of payment order number to payment order number is 79%. Subsequently, we can analyze the reasons for refund from the refund rate to improve the conversion rate.

Recently, many friends have sent messages to ask about learning Python. For easy communication, click on blue to join yourselfDiscussion solution resource base