Pandas used room data cleaning, visualization, and actual combat

Data cleaning

Open the CSV file using pandas

import pandas as pd
data=pd.read_csv('data.csv')
print(data)
Copy the code

The first step is to set up the index column

import pandas as pd
data=pd.read_csv('data.csv',index_col=0)
print(data)
Copy the code

Index_col =0 specifies the first index column

Step 2 Go to the unit (total price, floor area, unit price fields)

Use map or Apply plus lambda expressions to clean the data

Map (apply) encodes elements

Lambda expressions use the string replace to replace the specified character with ‘ ‘(empty)

Astype (float) converts data to a numeric type

import pandas as pd
data=pd.read_csv('data.csv',index_col=0)
data['total']=data['total'].map(lambda x:str(x).replace('万'.' '))
data['total']=data['total'].astype(float)
data['Floor area']=data['Floor area'].apply(lambda x:str(x).replace('square'.' '))
data['Floor area']=data['Floor area'].astype(float)
data['price']=data['price'].apply(lambda x:str(x).replace('Yuan/square meter'.' '))
data['price']=data['price'].astype(float)
print(data)
Copy the code

The data analysis

Average price analysis of each district

Data.groupby (‘ region ‘) groups data by region

You can use get_group(‘ two lines ‘) to use data from subgroups,

df=data.groupby('regional')
print(df.get_group('2'))
Copy the code

Df [‘ price ‘] scheme () round. (2)

Get the average of the unit price of use and preserve 2 decimal places

# Analysis of average price of each district
df=data.groupby('regional')
ave=df['price'].mean().round(2)
print(ave)
Copy the code

Each district house ratio

Use Apply to traverse

Divide totals using lambda expressions

df=data.groupby('regional').size()
home=df.apply(lambda x:x/df.values.sum(*)100)
print(home)
Copy the code

Decorate a degree

df=data.groupby('decoration').size()
print(df)
Copy the code

Data visualization

Regional average

# Analysis of average price of each district
df=data.groupby('regional')
ave=df['price'].mean().round(2)
plt.rcParams['font.sans-serif'] = ['FangSong']
plt.bar(ave.index,ave.values)
plt.title('District Average price')
plt.xlabel('regional')
plt.ylabel('average')
plt.savefig('District Average price')
plt.show()
Copy the code

Attached to the house

# House ratio by district
df=data.groupby('regional').size()
home=df.apply(lambda x:x/df.values.sum(*)100)
plt.rcParams['font.sans-serif'] = ['FangSong']
plt.title('House ratio by district')
plt.pie(home.values,labels=home.index)
plt.show()
Copy the code

decorate

df=data.groupby('decoration').size()
plt.rcParams['font.sans-serif'] = ['FangSong']
plt.title('Degree of decoration')
plt.bar(df.index,df.values)
plt.show()
Copy the code

All the code

import pandas as pd
import matplotlib.pyplot as plt
data=pd.read_csv('data.csv',index_col=0)
# Data cleaning
data['total']=data['total'].map(lambda x:str(x).replace('万'.' '))
data['total']=data['total'].astype(float)
data['Floor area']=data['Floor area'].apply(lambda x:str(x).replace('square'.' '))
data['Floor area']=data['Floor area'].astype(float)
data['price']=data['price'].apply(lambda x:str(x).replace('Yuan/square meter'.' '))
data['price']=data['price'].astype(float)
# Data analysis
# Analysis of average price of each district
# df=data.groupby(' region ')
# ave=df[' cost '].mean().round(2)
# plt.rcParams['font.sans-serif'] = ['FangSong']
# plt.bar(ave.index,ave.values)
# plt.title(' average price by region ')
# plt.xlabel(' region ')
# plt.ylabel(' average price ')
# plt.savefig(' Average price by district ')
# plt.show()
# House ratio by district
# df=data.groupby(' region ').size() # df=data.groupby(' region ').size()
# home=df.apply(lambda x:x/df.values.sum()*100)
# plt.rcParams['font.sans-serif'] = ['FangSong']
# plt.title(' by district ')
# plt.pie(home.values,labels=home.index)
# plt.show()
# Degree of decoration
df=data.groupby('decoration').size()
plt.rcParams['font.sans-serif'] = ['FangSong']
plt.title('Degree of decoration')
plt.bar(df.index,df.values)
plt.show()
Copy the code

Resource file

CSV download link