Seaborn Visual Notes for Video game sales data in Python

This is the 22nd day of my participation in the August Wen Challenge.More challenges in August

This note is R:www.kaggle.com/umeshnaraya… Inspired by the works of the. The goal of the notes is to make it as simple as possible to implement the visualizations created in the R notebook above, using Python as well as some additional plots, and to add some comments and explanations to help Seborn/Python beginners with their data visualization/customization. We keep things interesting by playing with different colors.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
Matplotlib is building the font cache using fc-list. This may take a moment.
Copy the code

Use pandas to read in data sets. We see that each row of entries corresponds to a specific game, and the data contains the name of the game, the year it was released, and some categorical characteristics, such as platform, genre, and publisher. Finally, we see that the game (row) entry also includes the cumulative sales achieved by region, by that particular game.

df = pd.read_csv("/home/kesci/input/Datasets6073/vgsales.csv")
df.head()
Copy the code

	Rank	Name	Platform	Year	Genre	Publisher	NA_Sales	EU_Sales	JP_Sales	Other_Sales	Global_Sales
0	1	Wii Sports	Wii	2006.0	Sports	Nintendo	41.49	29.02	3.77	8.46	82.74
1	2	Super Mario Bros.	NES	1985.0	Platform	Nintendo	29.08	3.58	6.81	0.77	40.24
2	3	Mario Kart Wii	Wii	2008.0	Racing	Nintendo	15.85	12.88	3.79	3.31	35.82
3	4	Wii Sports Resort	Wii	2009.0	Sports	Nintendo	15.75	11.01	3.28	2.96	33.00
4	5	Pokemon Red/Pokemon Blue	GB	1996.0	Role-Playing	Nintendo	11.27	8.89	10.22	1.00	31.37

Checking the maximum year value, we see that it is 2020, which is an impossible release date.

Year_data = df['Year'] print("Max Year Value: ", year_data.max()) Max Year Value: 2020.0Copy the code

By looking at the name of the entry in the wrong year, we can search the web for the release date of the game and replace the current value with the correct release date.

max_entry = year_data.idxmax()
print(max_entry)
max_entry = df.iloc[max_entry]
pd.DataFrame(max_entry).T
5957
Copy the code

	Rank	Name	Platform	Year	Genre	Publisher	NA_Sales	EU_Sales	JP_Sales	Other_Sales	Global_Sales
5957	5959	Imagine: Makeup Artist	DS	2020	Simulation	Ubisoft	0.27	0	0	0.02	0.29

Df ['Year'] = df['Year']. Replace (2020.0, 2009.0) print(" year_data.max()) Max Year Value: 2017.0Copy the code

Next we examine the number of games (rows) and the number of unique publishers, platforms, and genres to see how the games in our dataset are clearly distributed.

print("Number of games: ", len(df))
publishers = df['Publisher'].unique()
print("Number of publishers: ", len(publishers))
platforms = df['Platform'].unique()
print("Number of platforms: ", len(platforms))
genres = df['Genre'].unique()
print("Number of genres: ", len(genres))
Number of games:  16598
Number of publishers:  579
Number of platforms:  31
Number of genres:  12
Copy the code

Let’s do a simple null check. We might search the web for all the missing years and publishers, but now we just delete entries for games that don’t have all the data.

print(df.isnull().sum())
df = df.dropna()
Rank              0
Name              0
Platform          0
Year            271
Genre             0
Publisher        58
NA_Sales          0
EU_Sales          0
JP_Sales          0
Other_Sales       0
Global_Sales      0
dtype: int64
Copy the code

Let’s create a simple column chart to represent the total annual “worldwide sales” of video games. We get our data by data — all of our video game sales data, grouped by “year” and called.sum() to get the total for each year. This creates a data RAME with our year as the index or row name, and an entry for the total sales for that year.

In the dataset, the index for the year is a floating point number, such as “2006.0” instead of “2006.” We get our x terms by taking these values as integers. Once the data is ready, we simply pass the X and Y variables to our SebornBarart function. We also set our X label name, title, and we also rotate our XtickLabels and change their fontSize.

Y = df.groupby(['Year']).sum() y = y['Global_Sales'] x = y.index.astype(int) plt.figure(figsize=(12,8)) ax = sns.barplot(y = y, x = x) ax.set_xlabel(xlabel='$ Millions', fontsize=16) ax.set_xticklabels(labels = x, fontsize=12, rotation=50) ax.set_ylabel(ylabel='Year', fontsize=16) ax.set_title(label='Game Sales in $ Millions Per Year', fontsize=20) plt.show();Copy the code

Below we’ve created a simple column chart to represent the total number of video games released each year, but with a slight twist, it’s horizontal, which means that our year entries, usually our X-axis, are now on the Y-axis, while the count of “Global_Sales” entries, usually on the Y-axis, are now on the X-axis.

X = df.groupby(['Year']).count() x = x['Global_Sales'] y = x.index.astype(int) plt.figure(figsize=(12,8)) colors = sns.color_palette("muted") ax = sns.barplot(y = y, x = x, orient='h', palette=colors) ax.set_xlabel(xlabel='Number of releases', fontsize=16) ax.set_ylabel(ylabel='Year', fontsize=16) ax.set_title(label='Game Releases Per Year', fontsize=20)Copy the code

Text(0.5, 1.0, 'Game Releases Per Year')
Copy the code

Below we have created a bullet-point chart where each publisher has the highest sales per year. Global sales are on the Y-axis and years are on the X-axis, and we use the parameter “Hue” of the pointcut diagram to represent the highest publisher.

We use a fulcrum table that makes it easy to calculate “publisher,” the publisher’s name by annual sales, and “sales,” which is the worldwide sales that the publisher generates each year.

Note that the Pivot table accepts an argument to be applied by a function with other options, such as mean, median, and mode. This pointcut requires a Dataframe, and you can simply add column names to X, Y, and Hue. We also customize our Xtick labels by rotating and resizing them.

table = df.pivot_table('Global_Sales', index='Publisher', columns='Year', aggfunc='sum') publishers = table.idxmax() sales = table.max() years = table.columns.astype(int) data = pd.concat([publishers, sales], axis=1) data.columns = ['Publisher', Figure (figsize=(12,8)) ax = sns.pointplot(y =' Global Sales', x = years, hue='Publisher', data=data, size=15) ax.set_xlabel(xlabel='Year', fontsize=16) ax.set_ylabel(ylabel='Global Sales Per Year', fontsize=16) ax.set_title(label='Highest Publisher Revenue in $ Millions Per Year', fontsize=20) ax.set_xticklabels(labels = years, fontsize=12, rotation=50) plt.show();Copy the code

Below, we create a game that generates global sales and makes the most money each year. We also returned the following data for reference. You can map different colors for each game, but adding a legend to a plot with so many purposes can make a plot look confusing.

The data creation for this graph is similar to the above, excluding the use of hues to represent categories in the data. Instead, we use a palette and pass it the numbers of colors we want from that particular palette.

table = df.pivot_table('Global_Sales', index='Name', columns='Year') table.columns = table.columns.astype(int) games = table.idxmax() sales = table.max() years = table.columns data = pd.concat([games, sales], axis=1) data.columns = ['Game', 'Global Sales'] colors = sns.color_palette("GnBu_d", Figure (figsize=(12,8) ax = SNS. Barplot (y = years, x =' Global Sales', data=data, Orient ='h', palette=colors) ax.set_xlabel(xlabel='Global Sales Per Year', fontsize=16) ax.set_ylabel(ylabel='Year', fontsize=16) ax.set_title(label='Highest Revenue Per Game in $ Millions Per Year', fontsize=20) plt.show(); dataCopy the code

	Game	Global Sales
Year
1980	Asteroids	4.310
1981	Pitfall!	4.500
1982	Pac-Man	7.810
1983	Baseball	3.200
1984	Duck Hunt	28.310
1985	Super Mario Bros.	40.240
1986	The Legend of Zelda	6.510
1987	Zelda II: The Adventure of Link	4.380
1988	Super Mario Bros. 3	17.280
1989	Tetris	30.260
1990	Super Mario World	20.610
1991	The Legend of Zelda: A Link to the Past	4.610
1992	Super Mario Land 2: 6 Golden Coins	11.180
1993	Super Mario All-Stars	10.550
1994	Donkey Kong Country	9.300
1995	Donkey Kong Country 2: Diddy’s Kong Quest	5.150
1996	Pokemon Red/Pokemon Blue	31.370
1997	Gran Turismo	10.950
1998	Pokémon Yellow: Special Pikachu Edition	14.640
1999	Pokemon Gold/Pokemon Silver	23.100
2000	Pokémon Crystal Version	6.390
2001	Gran Turismo 3: A-Spec	14.980
2002	Grand Theft Auto: Vice City	16.150
2003	Mario Kart: Double Dash!!	6.950
2004	Grand Theft Auto: San Andreas	20.810
2005	Nintendogs	24.760
2006	Wii Sports	82.740
2007	Wii Fit	22.720
2008	Mario Kart Wii	35.820
2009	Wii Sports Resort	33.000
2010	Kinect Adventures!	21.820
2011	Mario Kart 7	12.210
2012	New Super Mario Bros. 2	9.820
2013	Grand Theft Auto V	18.890
2014	Pokemon Omega Ruby/Pokemon Alpha Sapphire	11.330
2015	Call of Duty: Black Ops 3	5.064
2016	Uncharted 4: A Thief’s End	4.200
2017	Phantasy Star Online 2 Episode 4: Deluxe Package	0.020

data = df.groupby(['Publisher']).count().iloc[:,0] data = pd.DataFrame(data.sort_values(ascending=False))[0:10] publishers = data.index data.columns = ['Releases'] colors = sns.color_palette("spring", Len (data)) plt.figure(figsize=(12,8)) ax = sns.barplot(y = publishers, x =' Releases', data=data, Orient ='h', palette=colors) ax.set_xlabel(xlabel='Number of Releases', fontsize=16) ax.set_ylabel(ylabel='Publisher', fontsize=16) ax.set_title(label='Top 10 Total Publisher Games Released', fontsize=20) ax.set_yticklabels(labels = publishers, fontsize=14) plt.show();Copy the code

data = df.groupby(['Publisher']).sum()['Global_Sales'] data = pd.DataFrame(data.sort_values(ascending=False))[0:10] publishers = data.index data.columns = ['Global Sales'] colors = sns.color_palette("cool", Figure (figure size=(12,8)) ax = SNS. Barplot (y = publishers, x =' Global Sales', data=data, Orient ='h', palette=colors) ax.set_xlabel(xlabel='Revenue in $ Millions', fontsize=16) ax.set_ylabel(ylabel='Publisher', fontsize=16) ax.set_title(label='Top 10 Total Publisher Game Revenue', fontsize=20) ax.set_yticklabels(labels = publishers, fontsize=14) plt.show();Copy the code

rel = df.groupby(['Genre']).count().iloc[:,0] rel = pd.DataFrame(rel.sort_values(ascending=False)) genres = rel.index rel.columns = ['Releases'] colors = sns.color_palette("summer", Len (rel) plt.figure(figsize=(12,8)) ax = SNS. Barplot (y = genres, x =' Releases', data=rel, Orient ='h', palette=colors) ax.set_xlabel(xlabel='Number of Releases', fontsize=16) ax.set_ylabel(ylabel='Genre', fontsize=16) ax.set_title(label='Genres by Total Number of Games Released', fontsize=20) ax.set_yticklabels(labels = genres, fontsize=14) plt.show();Copy the code

rev = df.groupby(['Genre']).sum()['Global_Sales'] rev = pd.DataFrame(rev.sort_values(ascending=False)) genres = rev.index rev.columns = ['Revenue'] colors = sns.color_palette('Set3', Len (rev) plt.figure(figsize=(12,8) ax = SNS. Barplot (y = genres, x =' Revenue', data=rev, Orient ='h', palette=colors) ax.set_xlabel(xlabel='Revenue in $ Millions', fontsize=16) ax.set_ylabel(ylabel='Genre', fontsize=16) ax.set_title(label='Genres by Total Revenue Generated in $ Millions', fontsize=20) ax.set_yticklabels(labels = genres, fontsize=14) plt.show();Copy the code

I am white and white I, a program yuan like to share knowledge ❤️

If you don’t know how to program or want to learn, you can leave a message directly to me. Thank you very much for your likes, favorites, comments, one-click support.

Seaborn Visual Notes for Video game sales data in Python

Related Posts

[PaperRead]VOLO: Vision Outlooker for Visual Recognition

GitHub: TensorFlow, PyTorch

Python tensorflow ModuleNotFoundError: No module named ‘tensorflow.contrib’