I am participating in the Mid-Autumn Festival Creative Submission contest. Please see:Mid-Autumn Festival Creative Submission Contest

Hi, I’m 👉

The Mid-Autumn Festival, also known as offering to the moon festival, moon Festival, Moon Festival, Autumn Festival, Zhongqiu Festival, worship the moon festival, moon niang Festival, moon Festival, reunion Festival, etc., is a traditional Chinese folk festival. Since ancient times, there have been many folk customs, such as offering sacrifices to the moon, appreciating the moon, eating moon cakes, playing lanterns, appreciating osmanthus flowers and drinking osmanthus wine, which has been handed down to the present day and lasts for a long time.

In this issue, we analyze the sales of mooncakes in a bao Mid-Autumn Festival to see which flavor mooncakes sell well and where mooncakes sell well. We hope to help you.

Libraries involved:

  • Pandas – Data processing
  • Pyecharts – Data visualization
  • Jieba – participle
  • Collections – Data statistics

Visualization section:

  • Bar – Bar chart ****
  • Pie — Pie chart ****
  • Map – Map
  • Stylecloud – Word cloud

1. Import modules

import re
import jieba
import stylecloud
import numpy as np
import pandas as pd
from collections import Counter
from pyecharts.charts import Bar
from pyecharts.charts import Map 
from pyecharts.charts import Pie
from pyecharts.charts import Grid
from pyecharts.charts import Page
from pyecharts.components import Image
from pyecharts.charts import WordCloud
from pyecharts import options as opts
from pyecharts.globals import SymbolType
from pyecharts.commons.utils import JsCode
Copy the code

2. Pandas data processing

2.1 Reading Data

Df.head (10) df.head(10)Copy the code

Results:

2.2 Removing duplicate values

print(df.shape)
df.drop_duplicates(inplace=True)
print(df.shape)
Copy the code

(4520, 5)

(1885, 5)

There are a total of 4520 pieces of data, and there are still 1885 pieces of data after reweighting (one store of a certain treasure will be recommended on different pages, resulting in a large number of duplicate data).

2.3 Null value processing

Processing the record of empty purchasers:

In replace(np.nan,'0 people pay '), df[df[' pay '].str. Contains (" ten thousand ") Here we need to restore it: # extract numerical df [' num] = [re. The.findall (r '(\ d +. {0, 1} \ d *)', I) [0] for I in df [' payments']] df/' num = df [' num] astype (' float ') # extraction unit (m) df [' unit '] = ['. Join (re. The.findall (r '(m), I) for I in df[' payment ']] df['unit'] = df['unit']. Apply (lambda x:10000 if x==' 000 'else 1) # calculate sales df[' sales '] = df['num'] * Df [' unit '] df = df [df [' address '] notna ()] df [' provinces'] = df [' address '] STR. The split (' '). The apply (lambda x: x [0]) # delete redundant column df. Drop ([' payment status, 'num', 'unit'], axis=1, inplace=TrueCopy the code

3. Pyecharts data visualization

3.1 Top10 sales volume of mooncake products

Code:

Shop_top10 = df.groupby(' product name ')[' sales volume '].sum().sort_values(ascending=False).head(10) bar0 = (Bar() .add_xaxis(shop_top10.index.tolist()[::-1]) .add_yaxis('sales_num', ()[::-1]). Reversal_axis ().set_global_opts(title_opts= opts.titleopts (title=' Top10'), xaxis_opts=opts.AxisOpts(axislabel_opts=opts.LabelOpts(rotate=-30))) .set_series_opts(label_opts=opts.LabelOpts(position='right')) )Copy the code

Effect:

The product name is too long to display completely, we adjust a lower margin:

bar1 = ( Bar() .add_xaxis(shop_top10.index.tolist()[::-1]) .add_yaxis('sales_num', shop_top10.values.tolist()[::-1],itemstyle_opts=opts.ItemStyleOpts(color=JsCode(color_js))) .reversal_axis() .set_global_opts(title_opts= opts.titleopts (title=' mooncake merchandise sales Top10'), xaxis_opts=opts.AxisOpts(axislabel_opts=opts.LabelOpts(rotate=-30)), Grid = (grid ().add(bar1, bar1)).set_series_opts(label_opts= opts.labelopts (position='right')) grid_opts=opts.GridOpts(pos_left='45%', pos_right='10%')) )Copy the code

Isn’t that much better?

There are some other Settings (such as shapes) :

3.2 TOP10 shops in mooncake sales

Code:

Shop_top10 = df.groupby(' product name ')[' sales volume '].sum().sort_values(ascending=False).head(10) bar0 = (Bar() .add_xaxis(shop_top10.index.tolist()[::-1]) .add_yaxis('sales_num', ()[::-1]). Reversal_axis ().set_global_opts(title_opts= opts.titleopts (title=' Top10'), xaxis_opts=opts.AxisOpts(axislabel_opts=opts.LabelOpts(rotate=-30))) .set_series_opts(label_opts=opts.LabelOpts(position='right')) )Copy the code

Effect:

The product name is too long to display completely, we adjust a lower margin:

bar1 = ( Bar() .add_xaxis(shop_top10.index.tolist()[::-1]) .add_yaxis('sales_num', shop_top10.values.tolist()[::-1],itemstyle_opts=opts.ItemStyleOpts(color=JsCode(color_js))) .reversal_axis() .set_global_opts(title_opts= opts.titleopts (title=' mooncake merchandise sales Top10'), xaxis_opts=opts.AxisOpts(axislabel_opts=opts.LabelOpts(rotate=-30)), Grid = (grid ().add(bar1, bar1)).set_series_opts(label_opts= opts.labelopts (position='right')) grid_opts=opts.GridOpts(pos_left='45%', pos_right='10%')) )Copy the code

Isn’t that much better?

There are some other Settings (such as shapes) :

3.2 TOP10 shops in mooncake sales

Code:

Shop_top10 = df.groupby(' store name ')[' sales volume '].sum().sort_values(Ascending =False).head(10) bar3 = (Bar(init_opts= opts.initopts ( width='800px', height='600px',)) .add_xaxis(shop_top10.index.tolist()) .add_yaxis('', shop_top10.values.tolist(), category_gap='30%', ) .set_global_opts( xaxis_opts=opts.AxisOpts(axislabel_opts=opts.LabelOpts(rotate=-30)), Title_opts = opts.titleopts (title=' TOP10 stores ', pos_left='center', pos_top='4%', title_textstyle_opts=opts.TextStyleOpts( color='#ed1941', font_size=16) ), visualmap_opts=opts.VisualMapOpts( is_show=False, max_=600000, range_color=["#CCD3D9", "#E6B6C2", "#D4587A","#FF69B4", "#DC364C"] ), ) ) bar3.render_notebook()Copy the code

Effect:

Daoxiang village leads the way in mooncake sales.

3.3 Mooncake sales by region in China

Province_num = df.groupby(' province ')[' sales '].sum().sort_values(ascending=False) map_chart = Map(init_opts=opts.InitOpts(theme='light', width='800px', height='600px')) map_chart.add('', [list(z) for z in zip(province_num.index.tolist(), province_num.values.tolist())], maptype='china', Is_map_symbol_show =False, itemSTYLE_opts ={'normal': {'shadowColor': 'rgba(0, 0, 0,.5)', # shadowColor': 5, # shadow size 'shadowOffsetY': 0, # shadow size 'shadowOffsetX': 0, # shadow size 'borderColor': 0 '#fff' } } ) map_chart.set_global_opts( visualmap_opts=opts.VisualMapOpts( is_show=True, is_piecewise=True, min_ = 0, Max_ = 1, split_number = 5, series_index=0, pos_top='70%', pos_left='10%', range_text=[' ', ''], pieces=[ {'max':2000000, 'min':200000, 'label':'> 200000', 'color': '#990000'}, {'max':200000, 'min':100000, 'label':'100000-200000', 'color': '#CD5C5C'}, {'max':100000, 'min':50000, 'label':'50000-100000', 'color': '#F08080'}, {'max':50000, 'min':10000, 'label':'10000-50000', 'color': '#FFCC99'}, {'max':10000, 'min':0, 'label':'0-10000', 'color': '#FFE4E1'}, ], ), legend_opts=opts.LegendOpts(is_show=False), tooltip_opts=opts.TooltipOpts( is_show=True, = 'item' the trigger, the formatter = '{b}, {c}'), title_opts = dict (text = 'national regional mooncake sales, left =' center ', top = '5%', textStyle=dict( color='#DC143C')) ) map_chart.render_notebook()Copy the code

Results:

From the geographical distribution map, stores are mainly distributed in Beijing, Shandong, Zhejiang, Guangdong, Yunnan and other southeast regions.

3.4 Proportion of mooncake sales in different price ranges

Def price_range(price): if price <= 50: return elif price <= 100: return elif price <= 500: def price_range(price): if price <= 50: return elif price <= 100: return elif price <= 500: Else: return '300 + 'df['price_range'] = df['price_range']. Apply (lambda x: Price_range (x)) price_cut_num = df.groupby('price_range')[' sales '].sum() data_pair = [list(z) for z in zip(price_cut_num.index, Price_cut_num. Values)] print(data_pair) # pie1 = (Pie(init_opts= opts.initopts (width='750px') Height = "350 px")). The add (series_name = "sales", the radius = [" 35% ", "50%"], data_pair = data_pair, Label_opts = opts.labelopts (formatter='{b}\n % {d}%'),).set_global_opts(title_opts=dict(text=' mooncakes sold in different price ranges ', left='center', top='5%', textStyle=dict(color='#DC143C')), legend_opts=opts.LegendOpts(type_="scroll", pos_left="80%",pos_top="50%",orient="vertical") ) .set_colors(["#F08080", "#FFCC99", "#DC143C", "#990000"]) )Copy the code

It can be seen that the mooncake sales of less than 50 yuan account for 52%, more than half of the mooncakes are priced less than 50 yuan, and the mooncake sales of less than 100 yuan account for 85%. Although there are some prices of more than 1,000 yuan, the overall price is relatively affordable.

3.5 Moon cake flavor distribution

Roles_num =roles_num. Sort_values (by=' merchant ', Ascending =False) ROLes_num =roles_num. Reset_index (drop=True) bar4 = (Bar) Add_xaxis (list (roles_num [' taste '])). Add_yaxis (' frequency ', List (roles_num[' merchants ']), ITEMSTYle_opts = opts.itemStyLeopts (color=JsCode(color_js))).set_global_opts(title_opts=dict(title_opts=dict) TextStyle =dict(color='#DC143C')), legend_opts=opts.LegendOpts(is_show=False) xaxis_opts=opts.AxisOpts(axislabel_opts=opts.LabelOpts(rotate=-30)), Yaxis_opts = opts. AxisOpts (name = "frequency", name_location = 'middle', name_gap = 50, name_textstyle_opts = opts. TextStyleOpts (font_size = 1 6))) ) bar4.render_notebook()Copy the code

Flow heart, five kernel, egg yolk lotus seed paste, bean paste YYDS!!

END

The above is this period for everyone to sort out all the content, hurry up to practice it, like friends can like, collect, you can also leave a message in the comment area to communicate with each other. This article will be published every day with Python programming tips. I hope you enjoy it.