More than 50% of Chinese women wear B cup, but a-Cup is the most popular.

Hello, everyone, I am the talented brother.

Recently, some of my friends teased me that their girlfriends were a-cups.

Today, let’s take a look at the general sales situation of different sizes of a bra brand in JINGdong Mall to see what sizes are currently the mainstream!

1. Need sorting

This paper is relatively simple. It simply collects the number of comments on products of different sizes of a bra brand with the largest number of comments on JD, and then calculates the proportion of different sizes.

Since JD.com does not have similar data such as sales volume (or how many people pay), we only use the number of comments as a comparison dimension. As for the number of comments, you can actually refer to the previous tweet “Climb jingdong Hairy Crab”, we will not expand the introduction here, you can directly read the code.

By selecting underwear – bra – suitable for young people and sorting according to the number of comments, we can get the list of top products. Since the first two items are all sizes without size, and the third item is the bra laundry bag (also all sizes without size), we choose the fourth item.

Then, we clicked directly to the details page of the fourth item, and found that there are many 7 colors and 10 sizes, which is a lot of combinations.

In order to better obtain the comment data of each commodity, we need to obtain the productId of each commodity first. So, F12 went into developer mode and searched the elements page for one of the item ids and finally found the following place where all the item ids were stored :(which can be resolved by the re)

Now that you can get all the commodity IDS, you can call the comment interface through the commodity ID to get the comment data of the corresponding commodity. Let’s code and go!

2. data collection

In the data collection part, all commodity ids are obtained by using the re first, and then the comment data corresponding to all commodity ids are obtained by the commodity ID, so all the data needed are fully activated.

Get all commodity ids

import requests
import re
import pandas as pd

headers = {
    # "accept-encoding ": "Gzip", # Use Gzip compression to transfer data for faster access
    "User-Agent": "Mozilla / 5.0 (Windows NT 10.0; Win64; X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36".# "Cookie": cookie,
    "Referer": "https://item.jd.com/"
    }

url=  r'https://item.jd.com/100003749316.html'    
r = requests.get(url, headers=headers, timeout=6)

text = re.sub(r'\s'.' ',r.text)
colorSize = eval(re.findall(r'colorSize:(\[.*?\])', text)[0])
df = pd.DataFrame(colorSize)
Copy the code

Get the comment data corresponding to the product ID

# Get comment info
def get_comment(productId, proxies=None) :
    # time. Sleep (0.5)
    url = 'https://club.jd.com/comment/skuProductPageComments.action?'
    params = {
            'callback': 'fetchJSON_comment98'.'productId': productId,
            'score': 0.'sortType': 6.'page': 0.'pageSize': 10.'isShadowSku': 0.'fold': 1,}# print(proxies)
    r = requests.get(url, headers=headers, params=params, 
                     proxies=proxies, 
                     timeout=6)
    comment_data = re.findall(r'fetchJSON_comment98\((.*)\)', r.text)[0]
    comment_data = json.loads(comment_data)
    comment_summary = comment_data['productCommentSummary']
    
    return sum([comment_summary[f'score{i}Count'] for i in range(1.6)])
    
df_commentCount = pd.DataFrame(columns=['skuId'.'commentCount'])
proxies = get_proxies()
for productId in df.skuId[44:]:
    df_commentCount = df_commentCount.append({
                        "skuId": productId,
                        "commentCount": get_comment(productId, proxies),
                        },
                        ignore_index=True
                        )

df = df.merge(df_commentCount,how='left')
Copy the code

3. Statistical display

We first put ABC in the size.. The cup sections are in separate columns

df['cup'] = df['size'].str[-1]
Copy the code

Let’s begin our simple statistical presentation

First look at the data information overview

>>> df.info()
    
<class 'pandas.core.frame.DataFrame'>
Int64Index: 64 entries, 0 to 63
Data columns (total 5 columns):
 # Column Non-Null Count Dtype
---  ------        --------------  ----- 
 0size64 non-null     object
 1   skuId         64 non-null     object
 2color64 non-null     object
 3   commentCount  64 non-null     object
 4   cup           64 non-null     object
dtypes: object(5)
memory usage: 3.0+ KB
Copy the code

3.1. Distribution of CPU

However, in the data we collected, only three kinds of cups were divided into A-B-C.

cupNum = df.groupby('cup') ['commentCount'].sum().to_frame('number')
cupNum
Copy the code

cup	The number of
A	6049
B	11618
C	4076

import matplotlib.pyplot as plt
from matplotlib import font_manager as fm

plt.rcParams['font.sans-serif'] = ['Microsoft YaHei']
plt.rcParams['axes.unicode_minus'] = False

labels = cupNum.index
sizes = cupNum['number']
explode = (0.0.1.0) 

fig1, ax1 = plt.subplots(figsize=(6.5))
patches, texts, autotexts = ax1.pie(sizes, explode=explode, labels=labels, autopct='% 1.1 f % %',
                                    shadow=True, startangle=90)
ax1.axis('equal') 

Resize the font
proptease = fm.FontProperties()
proptease.set_size('large')
plt.setp(autotexts, fontproperties=proptease)
plt.setp(texts, fontproperties=proptease)
ax1.set_title('cup distribution')
plt.show()
Copy the code

We can see that as many as 53.4% of the buyers are B-Cups, followed by A-Cups accounting for 27.8%.

3.2. Color distribution

colorNum = df.groupby(The 'color') ['commentCount'].sum().to_frame('number')
colorNum
Copy the code

color	The number of
Pale skin	3627
Light blue gray	3058
Pale silver grey	3837
white	1439
Lotus root starch	8286
Red wine	1429
black	67

As we can see, lotus root is the most dominant color by a wide margin, followed by light silver, light skin and light blue.

The following is the highest lotus root pink, accounting for up to 38.1%

4. Let it be that way

We see the most 34/75B, 34 is the British code comparison, 75 can be understood as the lower chest length (actually 34 and 75 can be understood as the same meaning), B is the cup.

For comparison table of CUP and chest size, please refer to:

The above is all the content of this time, due to the small sample size, only for entertainment ha!

More than 50% of Chinese women wear B cup, but a-Cup is the most popular.

1. Need sorting

2. data collection

3. Statistical display

3.1. Distribution of CPU

3.2. Color distribution

4. Let it be that way

Related Posts

Continue to output RocketMQ for the interview questions

What is the Observer model?

In-depth study of CAS