This is the 17th day of my participation in Gwen Challenge

What do you like to buy? In this paper, taking a certain east as an example, Python crawls the data of popular products in 618 activity and cleans the data. Finally, it uses a visual way to understand the top products in the popular products from different angles. What about the sales figures? How do users like it? , etc.

The structure of this paper is as follows:

1, climb a certain east best-selling commodity data

2. Clean the data and conduct simple analysis

3. Visualize the data

The fields of the data are as follows:

A total of 243 items of popular commodity data were climbed

01. Get data

1. Analyze the web

Before writing the code, let’s analyze a wave of web pages.

This page has asynchronous loading (the first 10 goods are static loading, the rest are dynamic asynchronous loading), so we need to write a request to get data.

2. Obtain static web product links

The product sales, comments and other data can be found in the product details page. Please get the link of the product details page first

The results are as follows:

3. Get dynamic web product links

By capturing the package, you can get the dynamic loading link and the commodity title and commodity ID (the commodity ID here can be used to link the commodity details page later).

After the JSON data is retrieved, the item title and item ID are extracted

4. Get discounts, original prices, and second discounts

Through the product ID, you can obtain the discounted price, original price and second price reduction of the product (there is an interface here, which is obtained through packet capture. Those who are interested can try it by themselves, and those who don’t understand can use it directly).

This function is encapsulated as a function. By passing in the product ID, you can get the discount, original price and second price of the product

The results are as follows:

5. Get reviews, good reviews, medium reviews, bad reviews and good reviews

The number of comments, favorable comments, medium comments, bad comments and favorable comment rate can be obtained through the product ID (also, there are interfaces here, which are obtained through packet capture. Those who are interested can try by themselves, and those who do not understand can directly use them).

The results are as follows:

6. Save the file to Excel

It then starts iterating through the item, retrieving the sales status of the item by ID (as a function of steps 4 and 5), and finally saves the data to execl

Define the header

Write data

Where get_price and CommentCount are functions of steps 4 and 5. Count is the number of rows in Excel, so in the loop count+1 is written to the next row.

Final save result

A total of 243 items of popular commodity data were climbed

02. Data Analysis & Visualization

1. Clean data

The contents that need to be cleaned mainly include the three columns in the figure (title, discount and favorable comments).

Cleaning objectives:

  1. The title is too long (the length is controlled within 10), which is not convenient for drawing the following pictures

  2. Discount fields contain folds and cannot be directly converted to numeric types when sorting.

  3. In the number of favorable comments, 10,000 is changed to a specific value. For example, 12,000 is changed to 12,000

Cleaning results:

2. Visualization – Discount level

Take the two columns of commodity name and discount from the data after cleaning, and [sort] from the highest discount to the lowest discount. Finally, the top 15 were taken out for visualization

The core code **** is as follows:

Visual effects:

3. Visualization – Review rate statistics

Take out from the data: praise rate column, statistics of different praise rate, such as the number of goods with 100% (1) praise rate, the number of goods with 99% (0.99) praise rate, etc.

The core code **** is as follows:

Visual effects:

3. Visualization – Sales ranking of best-selling items

Take out the two columns of product name and number of comments from the data. Based on the number of comments as the sales basis, rank the sales volume of products (from high to low), and take out the top 15 for visualization.

The core code **** is as follows:

Visual effects:

4. Visualization – Comparison of the top 15 best-selling products with their original prices in seconds

In the above analysis, the top 15 best-selling products can be known. Here, the original prices of these 15 products and the second price are visually compared.

The core code **** is as follows:

Visual effects:

03, subtotal

In this paper, taking a certain east as an example, Python crawls the data of popular products in 618 activity and cleans the data. Finally, it uses a visual way to understand the top products in the popular products from different angles. What about the sales figures? How do users like it? , etc.

If you don’t understand, you can leave a comment below.