Hello, everyone, I am the talented brother.

As we know, the top comprehensive sports meeting in the whole universe, namely the National Games of the People’s Republic of China, is being held in Shaanxi for the 14th time this year. More than 10,000 athletes from all over the country competed here, and we can see who is our real strong province of sports through the data of this competition!

Let’s take a look

1. data collection

There is a low-key official website for the National Games, which contains very detailed data. I directly collected all kinds of data from this website using Python. I will not introduce the collection process in detail, but you can just look at the code.

The 14th National Games information Release System:

info.2021shaanxi.com/

1.1. Major event data

Major event data refers to the data of 49 major events displayed on the official website, mainly including project ID and project name.

import requests
import pandas as pd

headers = {
    "User-Agent": "Mozilla / 5.0 (Windows NT 10.0; Win64; X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36",}The single #
url = 'https://info.2021shaanxi.com/Data/commonCode/disciplineList.json'

r = requests.get(url, headers= headers)

data = r.json()
 
disciplineList = pd.DataFrame(data)
disciplineList.replace(r'-'.' ',regex=True,inplace=True)
disciplineList.sample(5)
Copy the code

1.2. Data of minor events

Competition items refer to specific competition items, such as archery – men – men recurve bow individual projects, this time a total of 427 items.

# events
url = 'https://info.2021shaanxi.com/Data/commonCode/event.json'

r = requests.get(url, headers= headers,)

data = r.json()

items = []
for key in data.keys():
    for key1 in data[key].keys():
        item = data[key][key1][0]
        items.append(item)

event = pd.DataFrame(items)
event.replace(r'-'.' ',regex=True,inplace=True)
event.sample(5)
Copy the code

1.3. Delegation data

Delegation refers to provincial or institutional organizations or individuals, such as Beijing City, Hubei Province, Beijing Sport University or individuals (such as swimming, mountain biking athletes participate in the name of individuals), a total of 58 delegations.

# delegation
url = 'https://info.2021shaanxi.com/Data/commonCode/delegation.json'

r = requests.get(url, headers= headers, )

data = r.json()

items = []
for key in data.keys():
    item = data[key][0]
    items.append(item)

delegation = pd.DataFrame(items)
delegation.replace(r'-'.' ',regex=True,inplace=True)
delegation.sample(5)
Copy the code

1.4. Athlete data

There were 12,434 athletes, but 12,037 actually participated in the event.

# athletes
items = []
for Id in disciplineList['Id']:
    url = f'https://info.2021shaanxi.com/api/participant/list?discipline={Id}'
    r = requests.get(url, headers= headers)

    data = r.json()
    item = data['data']
    items.extend(item)
    
AthleteList = pd.DataFrame(items)
AthleteList.replace(r'-'.' ',regex=True,inplace=True)    
AthleteList['event_count'] = AthleteList.EventEntry.apply(len)    
# match large items
AthleteList = AthleteList.merge(disciplineList[['Id'.'CHI_Description']],
                                right_on='Id', left_on='DisciplineCode', how='left')
# Match organization
AthleteList = AthleteList.merge(delegation[['Id'.'CHI_Description']], 
                                left_on='OrganisationCode', right_on='Id', how='left')
AthleteList.sample(5)
Copy the code

1.5. Final ranking data

The ranking of the final is actually the result data of each event and the information of the gold, silver and bronze medalists. We can follow this data to see where the strong provinces of sports are!

# Final stats
dataList = []
# Final date (data is collected from 9th to 25th, so the deadline is 9th to 24th)
dateList = pd.date_range('2021-07-12'.'2021-09-24').strftime('%Y%m%d')

for date in dateList:
    url = f'https://info.2021shaanxi.com/Data/Medal/Daily/{date}/data.json'
    
    r = requests.get(url, headers= headers)
    try:
        data = r.json()
        
        data = pd.DataFrame(data)
        data.replace(r'-'.' ',regex=True,inplace=True)
        
        def get_BirthDate(x) :
            if len(x)==1:
                return x[0] ['BirthDate']

        def get_AthleteCode(x) :
            if len(x)==1:
                return x[0] ['AthleteCode']
            
        data['BirthDate'] = data['Members'].apply(get_BirthDate)
        data['AthleteCode'] = data['Members'].apply(get_AthleteCode)
        dataList.append(data)
    except :
        continue
data = pd.concat(dataList)

df = data[['Date'.'EventCode'.'MedalRank'.'OrganisationCode'.'Gender'.'Name'.'BirthDate'.'AthleteCode']]
# Match organization
df = df.merge(delegation[['Id'.'CHI_Description']], 
              left_on='OrganisationCode', right_on='Id', how='left')
# match small items
df = df.merge(event[['CHI_Description'.'Code'.'Discipline'.'Team_Event']],
              right_on='Code', left_on='EventCode', how='left')
# match large items
df = df.merge(disciplineList[['Id'.'CHI_Description']],
              right_on='Id', left_on='Discipline', how='left')
# field filter
df = df[['Date'.'CHI_Description'.'CHI_Description_y'.'Gender'.'MedalRank'.'Name'.'CHI_Description_x'.'BirthDate'.'AthleteCode'.'Team_Event']]
df.sample(5)
Copy the code

2. Statistical display

In this part, we will carry out statistics of events, delegations (especially provincial delegations) and athletes.

2.1. Project dimensions

There are 49 major events and 427 small events in total. The specific distribution is as follows:

Big term small term distribution

Track and field and swimming have the largest number of small events, with 52 kinds of small events in track and field and 37 kinds of small events in swimming. There are 17 diving events and 7 table tennis events.

Filter all items
event = event[(event['Event'].str.len(a) >0)&(event['Gender'].str.len(a) >0)]
# match the large item name
Discipline_event_Num = event.merge(disciplineList, left_on='Discipline', right_on='Id').groupby('CHI_Description_y') ['Code'].nunique().to_frame('Minor term')
Discipline_event_Num.sort_values(by='Minor term', ascending=False).reset_index()
Copy the code

Number of participants in each event

Athletics and football are the biggest sports with more than 1,000 participants. However, the majority of group projects are undoubtedly well attended.

# Number of athletes in each event
Discipline_Num = AthleteList.groupby('CHI_Description_x') ['AthleteCode'].nunique().to_frame('number')
Discipline_Num.sort_values(by='number', ascending=False).reset_index()
Copy the code

import matplotlib.pyplot as plt
import numpy as np

plt.rcParams['font.sans-serif'] = ['Microsoft YaHei']

# Set bar chart color
colors = ['turquoise'.'coral']

labels = Discipline_Num.index
y = Discipline_Num['number']

x = np.arange(len(labels))  
width = 0.35  

fig, ax = plt.subplots(figsize=(8.16))
rects1 = ax.barh(x + width/2, y, color=colors[1], edgecolor='grey')

ax.set_title('Number of Participants in each event', fontsize=16)
y_pos = np.arange(len(labels))
ax.set_yticks(y_pos)
ax.set_yticklabels(labels, fontsize=12)

# Show data tags
ax.bar_label(rects1, padding=3)

fig.tight_layout()
# Border hide
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
# ax.spines['bottom'].set_visible(False)
ax.spines['left'].set_visible(False)
plt.show()
fig.savefig('Number of participants in each event.png')
Copy the code

Individual group distribution

Among the total small events, 139 events, accounting for 32.6%, are team events (such as football and table tennis mixed doubles), and another 288 events, accounting for 67.54 events, are individual events.

# Individual Group Distribution
Team_Event_Num = event.groupby('Team_Event') ['Code'].nunique().to_frame('Minor term')
labels = Team_Event_Num.index
sizes = Team_Event_Num['Minor term']
explode = (0.0.1,) 

fig1, ax1 = plt.subplots(figsize=(6.5))
patches, texts, autotexts = ax1.pie(sizes, explode=explode, labels=labels, autopct='% 1.1 f % %',
                                    shadow=True, startangle=90)
ax1.axis('equal') 
ax1.set_title('Distribution of Individual Teams in Each Event', fontsize=16)
Resize the font
proptease = fm.FontProperties()
proptease.set_size('large')
plt.setp(autotexts, fontproperties=proptease)
plt.setp(texts, fontproperties=proptease)

plt.show()
Copy the code

Gender distribution

There are 207 events for men, 193 events for women, 21 mixed events and 6 events for both men and women (equestrian)

# Gender distribution
Gender_Event_Num = event.groupby('Gender') ['Code'].nunique().to_frame('Minor term')
labels = Gender_Event_Num.index
sizes = Gender_Event_Num['Minor term']
explode = (0.0..0.0, ) 

fig1, ax1 = plt.subplots(figsize=(8.7))
patches, texts, autotexts = ax1.pie(sizes, explode=explode, labels=labels, autopct='% 1.1 f % %',
                                    shadow=True, startangle=90)
ax1.axis('equal') 
ax1.set_title('Sex Distribution of Each Event', fontsize=16)
Resize the font
proptease = fm.FontProperties()
proptease.set_size('large')
plt.setp(autotexts, fontproperties=proptease)
plt.setp(texts, fontproperties=proptease)

plt.show()
Copy the code

2.2. Delegation dimension

A total of 58 delegations, including 34 provincial districts +4 associations, 13 clubs, 2 schools, etc., OO is the united team (Olympic Combined team)

Distribution of various delegations

Type The number of
0 D 38
1 CLUB 13
2 RO 3
3 SCHOOL 2
4 GR 1
5 OO 1
# Type of delegation
Type_delegation_Num = delegation.groupby('Type') ['Id'].nunique().to_frame('number')
Type_delegation_Num.sort_values(by='number', ascending=False).reset_index()
Copy the code

Comparison of delegation size

Although there are 58 delegations, 53 delegations actually participated in the competition. For example, Taiwan Province did not participate in the competition. We hope that they can participate in the competition as soon as possible. Among the 53 delegations, we compare them by provincial and other categories.

Provincial district

Shandong, Guangdong, Jiangsu, Shanghai and Shaanxi are the provinces with the most contestants

other

As a university, Tianjin Sport Institute has 24 parameter players, while Beijing Sport University has 2.

Delegation participation in large items

Shaanxi, as the host, participated in a total of 47/49 categories, while Jiangsu, Guangdong and Shanghai, with large numbers of participants, each participated in 44 categories.

Organisation_Discipline_Num = AthleteList.groupby('CHI_Description_y') ['CHI_Description_x'].nunique().to_frame('Large term')
Organisation_Discipline_Num.sort_values(by='Large term', ascending=False).reset_index()
Copy the code

Number of MEDALS won by the delegation

Note: : The deadline for the data of the following chapters is 24:00 on September 24th.

Shandong, Guangdong, Jiangsu, Zhejiang and Shanghai topped the list in terms of total MEDALS.

Organisation_Medal_Num = df.pivot_table(values='Name',margins=True,index='CHI_Description_x',columns='MedalRank',aggfunc='count').fillna(0)
Organisation_Medal_Num = Organisation_Medal_Num.sort_values(by='All', ascending=False).reset_index()
Organisation_Medal_Num.columns = ['Delegation'.'gold'.'silver'.'bronze'.'Overall MEDALS']
Organisation_Medal_Num.convert_dtypes()
Copy the code
The delegation The gold medal The silver medal The bronze medal The total MEDALS
0 All 331 331 391 1053
1 shandong 35 38 35 108
2 guangdong 30 26 48 104
3 jiangsu 27 27 31 85
4 zhejiang 25 25 28 78
5 Shanghai 16 19 23 58
6 sichuan 14 17 20 51
7 shaanxi 16 13 20 49
8 fujian 18 15 16 49
9 hubei 12 16 15 43
10 A joint team 35 4 1 40
11 liaoning 7 13 17 37
12 Beijing 10 12 12 34
13 tianjin 8 8 16 32
14 hebei 8 10 13 31
15 henan 10 8 13 31
16 hunan 13 11 7 31
17 anhui 6 9 9 24
18 shanxi 4 7 10 21
19 Inner Mongolia 7 4 7 18
20 yunnan 5 7 5 17
21 jiangxi 2 6 5 13
22 guangxi 2 6 4 12
23 Ji Lin 2 3 7 12
24 xinjiang 2 5 3 10
25 gansu 2 4 3 9
26 guizhou 2 2 5 9
27 chongqing 1 6 2 9
28 heilongjiang 1 4 4 9
29 Hong Kong 1 0 5 6
30 Avant-garde TiXie 2 2 1 5
31 hainan 4 1 0 5
32 Tibet 3 0 2 5
33 qinghai 0 3 1 4
34 Guizhou Lunji Cycling Club 1 0 1 2
35 Locomotives Association 0 0 1 1
36 Macau 0 0 1 1

Gold MEDALS won by the delegation (Gold medal Count)

Shandong topped the gold medal table, followed by the united team, followed by Guangdong, Jiangsu, Zhejiang and Fujian.

We found that the combined team basically won gold MEDALS, which is why the combined team mechanism of this year’s Olympic Games will be ridiculed by netizens!

ranking The delegation The gold medal The silver medal The bronze medal The total MEDALS
1 shandong 35 38 35 108
2 A joint team 35 4 1 40
3 guangdong 30 26 48 104
4 jiangsu 27 27 31 85
5 zhejiang 25 25 28 78
6 fujian 18 15 16 49
7 Shanghai 16 19 23 58
8 shaanxi 16 13 20 49
9 sichuan 14 17 20 51

2.3. Athlete dimension

In this section, we mainly look at the age distribution of athletes, to get a feel for our new generation of strength.

A total of 668 athletes won MEDALS, including 208 gold medalists, 215 silver medalists and 270 bronze medalists.

(Note: Team competition is not considered in this section)

Gender distribution of parameter athletes

The total number of contestants is 52.2 percent male and 47.8 percent female, which is very balanced overall.

# Gender distribution of contestants
Gender_Num = AthleteList.groupby('GenderCode') ['AthleteCode'].nunique().to_frame('number')
Copy the code

The athlete with the most MEDALS

There are 5 individuals with the most MEDALS in individual events, all with 3 MEDALS

athletes gender The gold medal The silver medal The bronze medal The total MEDALS
1 Wang Shun M 3 0 0 3
2 Tzu-yang chang M 2 1 0 3
3 Chase wang W 2 1 0 3
4 JiXinJie M 1 2 0 3
5 QinHaiYang M 1 1 1 3
Athlete_Medal_Num = df[df['Team_Event'] = ='N'].pivot_table(values='CHI_Description',margins=True,index=['Name'.'Gender'],columns='MedalRank',aggfunc='count').fillna(0)
Athlete_Medal_Num = Athlete_Medal_Num.sort_values(by=['All'.1.2.3], ascending=[False.False.False.False]).reset_index()
Athlete_Medal_Num.columns = ['sportsman'.'gender'.'gold'.'silver'.'bronze'.'Overall MEDALS']
Athlete_Medal_Num.convert_dtypes().head()
Copy the code

And through what program?

From track and field and swimming

nameList = Athlete_Medal_Num[Athlete_Medal_Num['Overall MEDALS'] > =3] ['sportsman']
df[df['Name'].str.contains('|'.join(nameList))].sort_values(by = 'Name')
Copy the code

In fact, Wang Shun from Zhejiang province in 1994 has already won 6 gold MEDALS including the team event!

The athlete with the most gold MEDALS

Gold medal number is above that Wang Shun

The woman who also won the most gold MEDALS was Wang Chunyu, a 1995 runner from Anhui province.

Age distribution of the winning athletes

The year of birth of most of the athletes in this National Games was between 1994 and 2000.

We found a 51-year-old contestant from 1971!! , the youngest was a 13-year-old friend in 2008!! The difference is nearly four times

After 12 years, liu Yingzi, a veteran in 1971, won the multi – trap gold medal in the National Games

There are three junior players born in 2008.

Age distribution of gold medal athletes

Most gold medalists were born in 1996 and 2000, with a gap of 4 years.

The above is all the content of this time, interested friends can reply 955 backstage in the national Games folder to obtain case data and presentation files.

In fact, there are many dimensions and show the form, a more comprehensive presentation of this national Games, as well as a better understanding of sports province, look forward to your work!