Hello, everyone, I am the talented brother.

Recently, some of my friends teased me that their girlfriends were a-cups.

Today, let’s take a look at the general sales situation of different sizes of a bra brand in JINGdong Mall to see what sizes are currently the mainstream!

1. Need sorting

This paper is relatively simple. It simply collects the number of comments on products of different sizes of a bra brand with the largest number of comments on JD, and then calculates the proportion of different sizes.

Since JD.com does not have similar data such as sales volume (or how many people pay), we only use the number of comments as a comparison dimension. As for the number of comments, you can actually refer to the previous tweet “Climb jingdong Hairy Crab”, we will not expand the introduction here, you can directly read the code.

By selecting underwear – bra – suitable for young people and sorting according to the number of comments, we can get the list of top products. Since the first two items are all sizes without size, and the third item is the bra laundry bag (also all sizes without size), we choose the fourth item.

Then, we clicked directly to the details page of the fourth item, and found that there are many 7 colors and 10 sizes, which is a lot of combinations.

In order to better obtain the comment data of each commodity, we need to obtain the productId of each commodity first. So, F12 went into developer mode and searched the elements page for one of the item ids and finally found the following place where all the item ids were stored :(which can be resolved by the re)

Now that you can get all the commodity IDS, you can call the comment interface through the commodity ID to get the comment data of the corresponding commodity. Let’s code and go!

2. data collection

In the data collection part, all commodity ids are obtained by using the re first, and then the comment data corresponding to all commodity ids are obtained by the commodity ID, so all the data needed are fully activated.

Get all commodity ids

import requests
import re
import pandas as pd

headers = {
    # "accept-encoding ": "Gzip", # Use Gzip compression to transfer data for faster access
    "User-Agent": "Mozilla / 5.0 (Windows NT 10.0; Win64; X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36".# "Cookie": cookie,
    "Referer": "https://item.jd.com/"
    }

url=  r'https://item.jd.com/100003749316.html'    
r = requests.get(url, headers=headers, timeout=6)

text = re.sub(r'\s'.' ',r.text)
colorSize = eval(re.findall(r'colorSize:(\[.*?\])', text)[0])
df = pd.DataFrame(colorSize)
Copy the code

Get the comment data corresponding to the product ID

# Get comment info
def get_comment(productId, proxies=None) :
    # time. Sleep (0.5)
    url = 'https://club.jd.com/comment/skuProductPageComments.action?'
    params = {
            'callback': 'fetchJSON_comment98'.'productId': productId,
            'score': 0.'sortType': 6.'page': 0.'pageSize': 10.'isShadowSku': 0.'fold': 1,}# print(proxies)
    r = requests.get(url, headers=headers, params=params, 
                     proxies=proxies, 
                     timeout=6)
    comment_data = re.findall(r'fetchJSON_comment98\((.*)\)', r.text)[0]
    comment_data = json.loads(comment_data)
    comment_summary = comment_data['productCommentSummary']
    
    return sum([comment_summary[f'score{i}Count'] for i in range(1.6)])
    
df_commentCount = pd.DataFrame(columns=['skuId'.'commentCount'])
proxies = get_proxies()
for productId in df.skuId[44:]:
    df_commentCount = df_commentCount.append({
                        "skuId": productId,
                        "commentCount": get_comment(productId, proxies),
                        },
                        ignore_index=True
                        )

df = df.merge(df_commentCount,how='left')
Copy the code

3. Statistical display

We first put ABC in the size.. The cup sections are in separate columns

df['cup'] = df['size'].str[-1]
Copy the code

Let’s begin our simple statistical presentation

First look at the data information overview

>>> df.info()
    
<class 'pandas.core.frame.DataFrame'>
Int64Index: 64 entries, 0 to 63
Data columns (total 5 columns):
 # Column Non-Null Count Dtype
---  ------        --------------  ----- 
 0size64 non-null     object
 1   skuId         64 non-null     object
 2color64 non-null     object
 3   commentCount  64 non-null     object
 4   cup           64 non-null     object
dtypes: object(5)
memory usage: 3.0+ KB
Copy the code

3.1. Distribution of CPU

However, in the data we collected, only three kinds of cups were divided into A-B-C.

cupNum = df.groupby('cup') ['commentCount'].sum().to_frame('number')
cupNum
Copy the code
cup The number of
A 6049
B 11618
C 4076
import matplotlib.pyplot as plt
from matplotlib import font_manager as fm

plt.rcParams['font.sans-serif'] = ['Microsoft YaHei']
plt.rcParams['axes.unicode_minus'] = False

labels = cupNum.index
sizes = cupNum['number']
explode = (0.0.1.0) 

fig1, ax1 = plt.subplots(figsize=(6.5))
patches, texts, autotexts = ax1.pie(sizes, explode=explode, labels=labels, autopct='% 1.1 f % %',
                                    shadow=True, startangle=90)
ax1.axis('equal') 

Resize the font
proptease = fm.FontProperties()
proptease.set_size('large')
plt.setp(autotexts, fontproperties=proptease)
plt.setp(texts, fontproperties=proptease)
ax1.set_title('cup distribution')
plt.show()
Copy the code

We can see that as many as 53.4% of the buyers are B-Cups, followed by A-Cups accounting for 27.8%.

3.2. Color distribution

colorNum = df.groupby(The 'color') ['commentCount'].sum().to_frame('number')
colorNum
Copy the code
color The number of
Pale skin 3627
Light blue gray 3058
Pale silver grey 3837
white 1439
Lotus root starch 8286
Red wine 1429
black 67

As we can see, lotus root is the most dominant color by a wide margin, followed by light silver, light skin and light blue.

The following is the highest lotus root pink, accounting for up to 38.1%

4. Let it be that way

We see the most 34/75B, 34 is the British code comparison, 75 can be understood as the lower chest length (actually 34 and 75 can be understood as the same meaning), B is the cup.

For comparison table of CUP and chest size, please refer to:

The above is all the content of this time, due to the small sample size, only for entertainment ha!