preface

Following on from the previous article “Data Analysis of Python’s Magic Weapon Spectrum – Part 2: Data Processing”, today we will talk about the next part: Data Visualization.

Both data collection and data processing are the content of the iceberg under the water surface. No matter how well it is done under the water surface, it also needs visualization on the water surface to present it. The amount of content displayed on the surface of the water needs to be multiplied below the surface of the accumulation.

So how do you do data visualization quickly?

Part TWO: Data visualization

Data visualization, as the name implies, is the transformation of various data into easy-to-display, easy-to-understand charts. Among them, the most common are several kinds of charts: pie chart, line chart, bar chart, radar chart, box plot and so on. Let’s take a look at Python’s many drawing libraries to draw these diagrams.

Matplotlib

Matplotlib is a well-known drawing library for Python. It is inspired by Matlab, and uses and interfaces very much like Matlab. Let’s take a look at the diagram we drew with Matplotlib,

Compared in the rules of the moment, very suitable for scientific computing. Matplotlib is generally referenced as follows.

import matplotlib.pyplot as plt
Copy the code

There are several core concepts in Matplotlib.

  • Figure: Panel, all images are located in the Figure object, one image can only have one Figure object.
  • Subplot: One or more subplots (coordinate systems) are created under the figure object to plot the image.
  • Axis: The coordinate axes, namely, one of the coordinate axes in each subgraph or coordinate system.

Example of drawing a bar chart.

import matplotlib.pyplot as plt


labels = ['G1'.'G2'.'G3'.'G4'.'G5']
men_means = [20.35.30.35.27]
women_means = [25.32.34.20.25]
men_std = [2.3.4.1.2]
women_std = [3.5.2.3.3]
width = 0.35

fig, ax = plt.subplots()  # is the Figure, subplot object

# Draw a bar chart
ax.bar(labels, men_means, width, yerr=men_std, label='Men')
ax.bar(labels, women_means, width, yerr=women_std, bottom=men_means, label='Women')

# Set the tag, caption, legend
ax.set_ylabel('Scores')
ax.set_title('Scores by group and gender')
ax.legend()

# display chart
plt.show()
Copy the code

The graph shown below is displayed.

Matplotlib configuration is straightforward and also supports Numpy arrays as input.

fig = plt.figure(2)  # Open a new window
ax1 = fig.add_subplot(1.2.1, polar=True)  Start a polar subgraph
theta = np.arange(0.2 * np.pi, 0.02)  # Angle sequence value
ax1.plot(theta, 2 * np.ones_like(theta), lw=2)  Parameters: Angle, radius, LW line width
ax1.plot(theta, theta / 6, linestyle=The '-', lw=2)  Parameters: Angle, radius, Linestyle style, LW line width

Start a polar subgraph
ax2 = fig.add_subplot(1.2.2, polar=True)
ax2.plot(theta, np.cos(5 * theta), linestyle=The '-', lw=2)
ax2.plot(theta, 2 * np.cos(4 * theta), lw=2)

Set the grid axis distance and Angle
ax2.set_rgrids(np.arange(0.2.2.0.2), angle=45)
ax2.set_thetagrids([0.45.90])

plt.show()
Copy the code

Matplotlib Matplotlib is a Python drawing library with powerful functions, but it is low-level. It has strong control over drawing, but it also has many configuration items.

Seaborn

Seaborn is a drawing library based on Matplotlib, which provides a more advanced and concise syntax. Seaborn’s advanced API provides the ability to quickly draw graphics and provide some nice styles without configuration.

Let’s start with a line chart.

import seaborn as sns
sns.set(style="ticks")

Load the test dataset
df = sns.load_dataset("anscombe")

Draw line charts for each dataset and show linear regression lines
sns.lmplot(x="x", y="y", col="dataset", hue="dataset", data=df,
           col_wrap=2, ci=None, palette="muted", height=4,
           scatter_kws={"s": 50."alpha": 1})
Copy the code

Drawing is only one line of code, isn’t it very cool!

Seaborn has a universal drawing API that requires the raw data input to be a Dataframe or Numpy array for Pandas:

  • SNS. Map name (x=’ x ‘, y=’ y ‘, data= original data df object)
  • SNS. Map name (x=’ x column name ‘, y=’Y column name ‘, Hue =’ Group drawing parameter ‘, data= original data DF object)
  • Array (x=np.array, y=np.array[,…]) )

Let’s draw a bar chart as well.

import seaborn as sns
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Generate data
x = np.arange(8)
y = np.array([1.5.3.6.2.4.5.6])
df = pd.DataFrame({"x-axis": x, "y-axis": y})
# Draw a bar chart
sns.barplot("x-axis"."y-axis", palette="RdBu_r", data=df)
Call the underlying API of Matplotlib
plt.xticks(rotation=90)
plt.show()
Copy the code

Seaborn provides a high-level, easy-to-use API that can be used in conjunction with Matplotlib, greatly simplifying Matplotlib configuration.

plotnine/ggplot

Both Plotnine and GGplot are inspired by GGplot2 of R language, which is completely different from the design idea of Matplotlib. It is the idea of layer superposition, that is, layer by layer superposition. However, they are still based on Matplotlib. The input data must be of the DataFrame type for Pandas. The syntax of the two libraries is similar, ggPlot has not been updated recently, while Plotnine is more active recently.

The following code from Plotnine draws a bar chart.

import pandas as pd
import numpy as np

from plotnine import *
from plotnine.data import *

ggplot(mpg) + geom_bar(aes(x='class'))
Copy the code

Add layers on top of each other by +, usually consisting of a data layer, a geometry layer, and a landscaping layer.

Then add some color.

ggplot(mpg) + geom_bar(aes(x='class', fill='drv'))
Copy the code

Let’s do some more transformation.

(
    ggplot(mpg)
    + geom_bar(aes(x='class', fill='drv'))
    + coord_flip()
    + theme_classic()
)
Copy the code

The syntax of Plotnine/ggplot is very concise and ideal for people who like or are used to plotting in R.

Bokeh

Bokeh is a very powerful interactive chart library in Python, that is, interactive web side charts generated from JS code, perfect for embedding in front-end applications. Bokeh’s diagrams are nice and interactive.

A selection box on the left affects the data rendering in the middle in real time, and an interactive control bar on the right side of the chart. The data points can also respond to mouse events, such as displaying specific values when the mouse moves over them.

The code for the diagram above is as follows:

import pandas as pd

from bokeh.layouts import column, row
from bokeh.models import Select
from bokeh.palettes import Spectral5
from bokeh.plotting import curdoc, figure
from bokeh.sampledata.autompg import autompg_clean as df

df = df.copy()

SIZES = list(range(6.22.3))
COLORS = Spectral5
N_SIZES = len(SIZES)
N_COLORS = len(COLORS)

# Data cleanup
df.cyl = df.cyl.astype(str)
df.yr = df.yr.astype(str)
del df['name']

columns = sorted(df.columns)
discrete = [x for x in columns if df[x].dtype == object]
continuous = [x for x in columns if x not in discrete]

def create_figure(a):
    xs = df[x.value].values
    ys = df[y.value].values
    x_title = x.value.title()
    y_title = y.value.title()

    kw = dict()
    if x.value in discrete:
        kw['x_range'] = sorted(set(xs))
    if y.value in discrete:
        kw['y_range'] = sorted(set(ys))
    kw['title'] = "%s vs %s" % (x_title, y_title)

    p = figure(plot_height=600, plot_width=800, tools='pan,box_zoom,hover,reset', **kw)
    p.xaxis.axis_label = x_title
    p.yaxis.axis_label = y_title

    if x.value in discrete:
        p.xaxis.major_label_orientation = pd.np.pi / 4

    sz = 9
    ifsize.value ! ='None':
        if len(set(df[size.value])) > N_SIZES:
            groups = pd.qcut(df[size.value].values, N_SIZES, duplicates='drop')
        else:
            groups = pd.Categorical(df[size.value])
        sz = [SIZES[xx] for xx in groups.codes]

    c = "#31AADE"
    ifcolor.value ! ='None':
        if len(set(df[color.value])) > N_COLORS:
            groups = pd.qcut(df[color.value].values, N_COLORS, duplicates='drop')
        else:
            groups = pd.Categorical(df[color.value])
        c = [COLORS[xx] for xx in groups.codes]

    p.circle(x=xs, y=ys, color=c, size=sz, line_color="white", alpha=0.6, hover_color='white', hover_alpha=0.5)

    return p


def update(attr, old, new):
    layout.children[1] = create_figure()


x = Select(title='X-Axis', value='mpg', options=columns)
x.on_change('value', update)

y = Select(title='Y-Axis', value='hp', options=columns)
y.on_change('value', update)

size = Select(title='Size', value='None', options=['None'] + continuous)
size.on_change('value', update)

color = Select(title='Color', value='None', options=['None'] + continuous)
color.on_change('value', update)

controls = column(x, y, color, size, width=200)
layout = row(controls, create_figure())

curdoc().add_root(layout)
curdoc().title = "Crossfilter"
Copy the code

Since Bokeh generates Html pages, you can’t see the images directly, so you can use the Bokeh server to run and view them, or you can integrate them into your own front-end services.

bokeh serve --show crossfilter
Copy the code

Pygal

Pygal is a library for SVG graphics, a vector diagram that can also be embedded in front end web pages for interactive display. Unlike Bokeh, however, SVG is just a diagram interaction and does not contain JS code to dynamically modify data and styles.

Diagrams drawn by Pygal can also be interacted with the mouse and can hide or show a series of data.

Draw a bar chart.

bar_chart = pygal.Bar()
# add data
bar_chart.add('Fibonacci'[0.1.1.2.3.5.8.13.21.34.55])
# generate SVG
bar_chart.render_to_file('bar_chart.svg')
Copy the code

Only static screenshots can be displayed.

Multiple series of data.

import pygal

# Configuration chart
line_chart = pygal.Bar()
line_chart.title = 'Browser usage evolution (in %)'
line_chart.x_labels = map(str, range(2002.2013))
# add data
line_chart.add('Firefox'[None.None.0.16.6.25.31.36.4.45.5.46.3.42.8.37.1])
line_chart.add('Chrome'[None.None.None.None.None.None.0.3.9.10.8.23.8.35.3])
line_chart.add('IE'[85.8.84.6.84.7.74.5.66.58.6.54.7.44.8.36.2.26.6.20.1])
line_chart.add('Others'[14.2.15.4.15.3.8.9.9.10.4.8.9.5.8.6.7.6.8.7.5])
line_chart.render()
Copy the code

Pygal has a simple syntax and can generate interactive SVG diagrams, making it suitable for scenarios with simple front-end interaction requirements.

Plotly

Plotly is a chart library for scientific computing and machine learning visualization, as well as the Dash framework for rapid development of machine learning or data science applications. Plotly’s chart is beautiful and interactive on the front end. Plotly is a product of Plotly, but it is still open source and free to use without requiring an Internet connection or an account, although they also offer enterprise versions.

Again, let me draw a bar chart.

import plotly.express as px
data_canada = px.data.gapminder().query("country == 'Canada'")
fig = px.bar(data_canada, x='year', y='pop')
fig.show()
Copy the code

Diagrams are also interactive and have a menu bar in the upper right corner.

Simple code beautification.

import plotly.express as px
data = px.data.gapminder()

data_canada = data[data.country == 'Canada']
fig = px.bar(data_canada, x='year', y='pop',
             hover_data=['lifeExp'.'gdpPercap'], color='lifeExp',
             labels={'pop':'population of Canada'}, height=400)
fig.show()
Copy the code

Plotly is perfect for building interactive graphs on the front end, as well as quickly building analytics applications using the Dash framework, which is straightforward to use.

Altair

Altair is a declarative drawing language based on the Vega and Vega-Lite syntax, which allows you to draw interactive charts using simple syntax with relatively good results.

Let’s draw the bar chart as usual.

import altair as alt
import pandas as pd

source = pd.DataFrame({
    'a': ['A'.'B'.'C'.'D'.'E'.'F'.'G'.'H'.'I'].'b': [28.55.43.91.81.53.19.87.52]
})

alt.Chart(source).mark_bar().encode(
    x='a',
    y='b'
)
Copy the code

Charts are also interactive.

import altair as alt
from vega_datasets import data

source = data.wheat()

bar = alt.Chart(source).mark_bar().encode(
    x='year:O',
    y='wheat:Q'
)

# Add a mean line
rule = alt.Chart(source).mark_rule(color='red').encode(
    y='mean(wheat):Q'
)

(bar + rule).properties(width=600)
Copy the code

The Altair is also very simple to use and can draw powerful interactive charts, but requires learning Vega or Vega-Lite plotting syntax.

Afterword.

Python’s drawing library is huge, and there are many more that haven’t been listed here, but that doesn’t mean they’re not easy to use. The selection and use of libraries need to be based on different usage scenarios. Examples are Matlibplot, Seaborn, Plotline, ggplot for pure image files, Bokeh, Plotly, Altair for interactive charts, and Pygal for SVG. Some libraries, such as Matplotlib, have a low-level API and a large amount of code, but high flexibility. Some libraries, such as Seaborn, have high-level apis with more packages and less code, but have many limitations.

This concludes the Python magic book. I will continue to share useful Python libraries and tools, and even do some in-depth practice in the future.

reference

  • Matplotlib
  • Seaborn
  • ggplot
  • plotnine
  • Bokeh
  • Pygal
  • Plotly
  • Altair