Author: Cherich_sun Source: public account “Jie Ge’S IT Journey” ID: Jake_Internet
This is a case about how to use Excel to do data analysis. The goal is to help you quickly use Excel to analyze small data samples. Therefore, the focus of this article is the realization of analysis ideas + data processing + visualization, because the data comes from the network, so no in-depth analysis.
specifications
Analyze the sales trend of an e-commerce enterprise and find the factors affecting sales growth. At the same time, analyze the sales status of products in different markets and find the sales differences in different markets. Dig the sales status of different products and find the sales differences of different products. Analyze the user groups and excavate the portraits and values of enterprise users. Draw a conclusion from the above analysis, and according to the analysis results, put forward some ideas and suggestions for the future development of the enterprise.
The data shows that
The data of this project are annual daily order detail data and user information data of an e-commerce platform, including two data tables, sales order table and user information table.
The sales order table is the details of each order, and an order corresponds to a sale. The table structure of these two tables is as follows:
Sales order Form structure:
User information table structure:
First, analysis of ideas
This project is mainly divided into three dimensions, namely, the user portrait. It analyzes the user group to find out the main characteristics of the platform. In the dimension of goods, find out the sales difference of goods in different markets; In the dimension of user value, the intrinsic laws are mined mainly according to the consumer behavior of users. The detailed thinking of the analysis is as follows:
Ii. Data Processing (Excel)
1. Data cleaning
The final data state: “one more point is fat, one less point is thin”, so the commonly used cleaning methods are mainly the following three.
① Repeated data processing
COUNTIF(A:A,A2) COUNTIF(A$2:A2,A2) COUNTIF(A$2:A2,A2)
Advanced Filtering select the data area to be processed, click the menu bar [data] — [Sorting and Filtering] select [Advanced], the advanced filtering will pop up. The operations are as follows:
Conditional Format Select the data area to be processed, click on the menu bar [Start] — [Conditional format] and select [Highlight cell rule] — [Repeat value]. The operations are as follows:
Pivottables also calculate how often data is repeated. It’s much easier to use than COUNTIF, just drag. First, select the first row to the left of the title data, press Shift+Ctrl+End/ down arrow, select all data — insert — PivotTable as follows:
We have learned how to handle various duplicate values above, so in real business, duplicate values are usually removed. Select all data — click — [data] — [Delete duplicates] in the menu bar, as follows:
② Missing data processing
The acceptable standard for missing values is less than 10%. The usual handling is as follows:
- Average value filling;
- A value calculated by a statistical model;
- Delete the record of missing value;
- Reserving, only making necessary exclusions in the corresponding analysis
Batch fill how to break up the combined cells in the table below and convert them into canonical data.
Select the area to transform — [Start] — [Center after merge] — cancel cell merge — Continue to select the area to transform — Press Ctrl+G — Pop up [Position] — [Position condition], select null — OK — continue to Enter “=” in A3 cell, press the up arrow, then press Ctrl+Enter, Automatic filling is complete.
** Note: ** Batch removal formula: select the data, paste into a number, this will improve the overall operation efficiency of Excel. Next, we need to use batch population to process the product name field in the sales order table, batch removing the value and leaving only the product name. Create a blank column, first enter several correct product names, press Ctrl+E, quickly and intelligently fill.
Find and replace shortcut keys are: Ctrl+F; Ctrl+H
③ Blank data processing
- Replace the space with Ctrl + H
- For special Spaces, use trim()
2. Data consolidation
① Field Merging
- CONCATENATE()
- &
- ATE()
② Field Matching
What is possible is to match data that is not in the table, but is in other tables, by corresponding common keywords (the data type must be the same). Now, let’s merge the user table with the order table to make the order table more field rich for the following analysis (the main characteristics of consumption). The desired results are as follows:
The implementation mode is VLOOKUP. The syntax is as follows:
VLOOKUP(lookup value, lookup range, lookup column number (serial number), fuzzy/exact lookup) by using the above syntax, we can successfully obtain the gender column data, but there are several fields, if the copy and paste form, change the serial number, the efficiency is too slow. We have a more efficient way. As follows:
As above, the position of the sequence number is located by the COLUMN function. Remember to add an absolute reference $to the row. In this case, just go right and drag it down, and it will automatically complete the field.
3. Data extraction
① Field Splitting
Now that we have successfully merged into a large table, we will break down the payment time into adult and monthly forms for later visualization. Select [unit price] column — insert — insert three blank columns in the same way — copy a [payment time column], and then name the column as year, month — select [payment time] column — [data] — [next] — [other] enter “/” — [Done]
Function (intercepting string) : LEFT(); RIGHT(); MID()
② Random sampling
-
RAND(): returns the decimal between 0 and 1; =INT(RAND()*10+60)
-
RANDBETWEEN() : returns a random number between the specified number, =RANDBETWEEN(60,70)
4. Data calculation
① Simple calculation
② Function calculation
DATEDIF(D2,TODAY(),”Y”);
The requirement of the data grouping project is to label users according to age (younger than 21, labeled as “post-00s”; Older than 21 years old and less than 31 years old, labeled as “post-90s”; Older than 31 years old and younger than 41 years old, labeled as “post-80s”; Older than 41 and younger than 51, labeled “after 70”), can be implemented in two ways.
-
A:
The IF () IF (W2 < 21, “00”, IF (AND (W2 > 21, W2 (31), “after 90”, IF (AND (W2 > 31, W2 (41), “after 80”, “after 70”)))
- Method 2:
VLOOKUP() this way we only need to set the threshold and display the label, it is worth noting that the third parameter to VLOOKUP is fuzzy matching.
5. Data conversion
Table row and column conversion
To do this, select the data you want to convert — right click, copy — select blank cell — start — Paste — Selective Paste — Select Transpose — done
② The two-dimensional table is transformed into a one-dimensional table
Click File — Options — Customize Features — Find PivotTable and Pivotchart Wizard in Commands not in Functional areas and select — Add New TAB under the Data TAB on the right and select — Add — OK, finally, In data, the PivotTable and Pivotchart Wizard appears as follows:
Click PivotTable and PivotChart Wizard — Select Multiple Merge calculation data area — Create single page Field — Next — Select the area to operate — Next — Select New Worksheet — Finish — Double click total — convert the table from 2d to 1D, as follows:
3. Visualization
Visualization graphics are mostly simple, I believe we can easily achieve. So, some special shapes, because it’s also important.
1. User portrait
The chart explains that the main features of the platform are: female users; Post-90s single people are mainly technical secondary school, master’s degree; The region is concentrated in Tianjin area. Girls prefer to buy drinks and daily necessities, while men prefer to buy drinks and seafood.
2. Product advantages
The chart shows that overall sales are on a downward trend and have been picking up since October. According to the 80-20 rule, it can be seen that Tianjin, Nanjing and Beijing account for 43% of the total sales, which can be considered as a key market. The most popular categories are beverages, daily necessities, and meat/poultry.
3. User value
Note: as the number of users in the above data set is limited, it is not obvious after doing cohort. So here’s a new user data set.
Chart description: The RFM model has classified users, and corresponding operation strategies can be adopted for different types of users. Focus on maintaining users whose consumption frequency ranges from 82 to 100. Based on cohort analysis, we can see that the number of new users on the platform is gradually increasing, but the retention rate is low.
Public account: Jie Ge’s IT journey, background reply:”20210226“To obtain the complete data of this paper.