This article is participating in Python Theme Month. See the link to the event for more details

Introduction to the

Matplotlib is an important and convenient graphical tool for manipulating data. This article will explain how to use matplotlib in Python in detail.

Based on drawing

To use matplotlib, we need to reference it:

In [1]: import matplotlib.pyplot as plt
Copy the code

If we were to randomly generate data for 365 days starting from January 1, 2020, and then plot it like this:

ts = pd.Series(np.random.randn(365), index=pd.date_range("1/1/2020", periods=365))

ts.plot()
Copy the code

Use DF to draw multiple Series of images at the same time:

df3 =  pd.DataFrame(np.random.randn(365, 4), index=ts.index, columns=list("ABCD"))

 df3= df3.cumsum()

df3.plot()
Copy the code

You can specify the data to be used by rows and columns:

df3 = pd.DataFrame(np.random.randn(365, 2), columns=["B", "C"]).cumsum()

df3["A"] = pd.Series(list(range(len(df))))

df3.plot(x="A", y="B");
Copy the code

Other images

Plot () supports many image types, including bar, HIST, Box, Density, Area, Scatter, Hexbin, PIE, etc. Here are some examples of how to use them.

bar

 df.iloc[5].plot(kind="bar");
Copy the code

Bar for multiple columns:

df2 = pd.DataFrame(np.random.rand(10, 4), columns=["a", "b", "c", "d"])

df2.plot.bar();
Copy the code

stacked bar

df2.plot.bar(stacked=True);
Copy the code

barh

Barh represents the horizontal bar diagram:

df2.plot.barh(stacked=True);
Copy the code

Histograms

Df2. Plot. Hist (alpha = 0.5).Copy the code

box

df.plot.box();
Copy the code

Box can customize the color:

color = { .... : "boxes": "DarkGreen", .... : "whiskers": "DarkOrange", .... : "medians": "DarkBlue", .... : "caps": "Gray", .... : } df.plot.box(color=color, sym="r+");Copy the code

Can be converted to horizontal:

df.plot.box(vert=False);
Copy the code

In addition to box, you can also plot the box using dataframe.boxplot:

In [42]: df = pd.DataFrame(np.random.rand(10, 5))

In [44]: bp = df.boxplot()
Copy the code

Boxplot can be grouped using by:

df = pd.DataFrame(np.random.rand(10, 2), columns=["Col1", "Col2"])

df
Out[90]: 
       Col1      Col2
0  0.047633  0.150047
1  0.296385  0.212826
2  0.562141  0.136243
3  0.997786  0.224560
4  0.585457  0.178914
5  0.551201  0.867102
6  0.740142  0.003872
7  0.959130  0.581506
8  0.114489  0.534242
9  0.042882  0.314845

df.boxplot()
Copy the code

Now add a column to df:

 df["X"] = pd.Series(["A", "A", "A", "A", "A", "B", "B", "B", "B", "B"])

df
Out[92]: 
       Col1      Col2  X
0  0.047633  0.150047  A
1  0.296385  0.212826  A
2  0.562141  0.136243  A
3  0.997786  0.224560  A
4  0.585457  0.178914  A
5  0.551201  0.867102  B
6  0.740142  0.003872  B
7  0.959130  0.581506  B
8  0.114489  0.534242  B
9  0.042882  0.314845  B

bp = df.boxplot(by="X")
Copy the code

Area

You can plot area plots using either Series. Plot.area () or DataFrame.

In [60]: df = pd.DataFrame(np.random.rand(10, 4), columns=["a", "b", "c", "d"])

In [61]: df.plot.area();
Copy the code

If you don’t want to stack, you can specify Stacked =False

In [62]: df.plot.area(stacked=False);
Copy the code

Scatter

Dataframe.plot.scatter () creates dot plots.

In [63]: df = pd.DataFrame(np.random.rand(50, 4), columns=["a", "b", "c", "d"])

In [64]: df.plot.scatter(x="a", y="b");
Copy the code

The scatter map can also have a third axis:

 df.plot.scatter(x="a", y="b", c="c", s=50);
Copy the code

We can change the third argument to the size of the scatter:

df.plot.scatter(x="a", y="b", s=df["c"] * 200);
Copy the code

Hexagonal bin

Use dataframe.plot. hexbin() to create a cellular diagram:

In [69]: df = pd.DataFrame(np.random.randn(1000, 2), columns=["a", "b"])

In [70]: df["b"] = df["b"] + np.arange(1000)

In [71]: df.plot.hexbin(x="a", y="b", gridsize=25);
Copy the code

By default, the color depth represents the number of elements in (x, y). You can specify different aggregation methods by using the reduce_C_function: mean, Max, sum, STD.

In [72]: df = pd.DataFrame(np.random.randn(1000, 2), columns=["a", "b"])

In [73]: df["b"] = df["b"] = df["b"] + np.arange(1000)

In [74]: df["z"] = np.random.uniform(0, 3, 1000)

In [75]: df.plot.hexbin(x="a", y="b", C="z", reduce_C_function=np.max, gridsize=25);
Copy the code

Pie

Use DataFrame.plot.pie() or Series.plot.pie() to build the pie chart:

In [76]: series = pd.Series(3 * np.random.rand(4), index=["a", "b", "c", "d"], name="series")

In [77]: series.plot.pie(figsize=(6, 6));
Copy the code

You can draw the graph by the number of columns:

In [78]: df = pd.DataFrame(
   ....:     3 * np.random.rand(4, 2), index=["a", "b", "c", "d"], columns=["x", "y"]
   ....: )
   ....: 

In [79]: df.plot.pie(subplots=True, figsize=(8, 4));
Copy the code

More customized content:

In [80]: series.plot.pie( .... : labels=["AA", "BB", "CC", "DD"], .... : colors=["r", "g", "b", "c"], .... : autopct="%.2f", .... : fontsize=20, .... : figsize=(6, 6), .... :);Copy the code

If the values passed in do not add up to 1, an umbrella is drawn:

In [81] : series = pd series ([0.1] * 4, index = [" a ", "b", "c", "d"], name = "series2") In [82] : series.plot.pie(figsize=(6, 6));Copy the code

Processing NaN data in a drawing

Here is how NaN data is handled in the default drawing mode:

Drawing way The way NaN is handled
Line Leave gaps at NaNs
Line (stacked) The Fill 0 ‘s
Bar The Fill 0 ‘s
Scatter Drop NaNs
Histogram Drop NaNs (column-wise)
Box Drop NaNs (column-wise)
Area The Fill 0 ‘s
KDE Drop NaNs (column-wise)
Hexbin Drop NaNs
Pie The Fill 0 ‘s

Other drawing tools

Scatter matrix

The scatter_matrix diagram may be drawn using scatter_matrix in pandas. Plotting:

In [83]: from pandas.plotting import scatter_matrix In [84]: df = pd.DataFrame(np.random.randn(1000, 4), columns=["a", "b", "c", "d"]) In [85]: Scatter_matrix (df, alpha=0.2, Figsize =(6, 6), diagonal="kde");Copy the code

Density plot

Using Series.plot.kde() and DataFrame.plot.kde(), you can plot the density:

In [86]: ser = pd.Series(np.random.randn(1000))

In [87]: ser.plot.kde();
Copy the code

Andrews curves

Andrews curves allow multivariate data to be plotted as a large number of curves created using the properties of the sample as the coefficients of the Fourier series. By coloring these curves differently for each class, data clustering can be visualized. The curves of samples that belong to the same category tend to be closer together and form a larger structure.

In [88]: from pandas.plotting import andrews_curves

In [89]: data = pd.read_csv("data/iris.data")

In [90]: plt.figure();

In [91]: andrews_curves(data, "Name");
Copy the code

Coordinates

Parallel coordinates are a rendering technique used to plot multivariate data. Parallel coordinates allow people to see clusters in the data and visually estimate other statistics. Parallel coordinate points are represented as connected line segments. Each vertical line represents a property. A set of connected line segments represents a data point. Points that tend to converge will appear closer together.

In [92]: from pandas.plotting import parallel_coordinates

In [93]: data = pd.read_csv("data/iris.data")

In [94]: plt.figure();

In [95]: parallel_coordinates(data, "Name");
Copy the code

Lag plot

A hysteresis graph is a scatter diagram of time series and the corresponding sequence of hysteresis order. Can be used to observe autocorrelation.

In [96]: from pandas.plotting import lag_plot In [97]: plt.figure(); In [98]: spacing = np.linspace(-99 * np.pi, 99 * np.pi, num=1000) In [99]: Data = pd.series (0.1 * np.random.rand(1000) + 0.9 * np.sin(spacing)) In [100]: lag_plot(data);Copy the code

Autocorrelation plot

Autocorregrams are commonly used to examine randomness in time series. An autocorregram is a two-dimensional coordinate overhang diagram of a plane. The abscissa represents the delay order, and the ordinate represents the autocorrelation coefficient.

In [101]: from pandas.plotting import autocorrelation_plot In [102]: plt.figure(); In [103]: spacing = np.linspace(-9 * np.pi, 9 * np.pi, num=1000) In [104]: Data = pd.series (0.7 * np.random.rand(1000) + 0.3 * np.sin(spacing)) In [105]: autocorrelation_plot(data);Copy the code

Bootstrap plot

Bootstrap plot is used to visually evaluate the uncertainty of statistical data, such as mean, median, middle range, etc. Select a random subset of the specified size from the data set, calculate the relevant statistics for that subset, and repeat the specified number of times. The resulting graph and histogram form the guide graph.

In [106]: from pandas.plotting import bootstrap_plot

In [107]: data = pd.Series(np.random.rand(1000))

In [108]: bootstrap_plot(data, size=50, samples=500, color="grey");
Copy the code

RadViz

It’s based on the spring tension minimization algorithm. It maps the feature of the data set to a point in the unit circle of the two-dimensional target space, and the position of the point is determined by the feature attached to the point. Drop the instance into the center of the circle, and the feature “pulls” the instance toward its position in the circle (the normalized value of the instance).

In [109]: from pandas.plotting import radviz

In [110]: data = pd.read_csv("data/iris.data")

In [111]: plt.figure();

In [112]: radviz(data, "Name");
Copy the code

Image format

After matplotlib 1.5, there are many default drawing Settings that can be set with matplotlib.style.use(my_plot_style).

By using matplotlib. Style. The available to list all of the available type style:

import matplotlib as plt;

plt.style.available
Out[128]: 
['seaborn-dark',
 'seaborn-darkgrid',
 'seaborn-ticks',
 'fivethirtyeight',
 'seaborn-whitegrid',
 'classic',
 '_classic_test',
 'fast',
 'seaborn-talk',
 'seaborn-dark-palette',
 'seaborn-bright',
 'seaborn-pastel',
 'grayscale',
 'seaborn-notebook',
 'ggplot',
 'seaborn-colorblind',
 'seaborn-muted',
 'seaborn',
 'Solarize_Light2',
 'seaborn-paper',
 'bmh',
 'seaborn-white',
 'dark_background',
 'seaborn-poster',
 'seaborn-deep']
Copy the code

Get rid of small ICONS

By default, the drawn graph will have an icon representing the column type, which can be disabled using Legend =False:

In [115]: df = pd.DataFrame(np.random.randn(1000, 4), index=ts.index, columns=list("ABCD"))

In [116]: df = df.cumsum()

In [117]: df.plot(legend=False);
Copy the code

Set the name of the label

In [118]: df.plot();

In [119]: df.plot(xlabel="new x", ylabel="new y");
Copy the code

The zoom

If there is too much difference between X axis and Y axis data in the drawing, the image display may be unfriendly, and the part with small values can hardly be displayed. You can pass logy=True to scale Y axis:

In [120]: ts = pd.Series(np.random.randn(1000), index=pd.date_range("1/1/2000", periods=1000))

In [121]: ts = np.exp(ts.cumsum())

In [122]: ts.plot(logy=True);
Copy the code

Multiple Y

Secondary_y =True is used to draw multiple Y-axis data:

In [125]: plt.figure();

In [126]: ax = df.plot(secondary_y=["A", "B"])

In [127]: ax.set_ylabel("CD scale");

In [128]: ax.right_ax.set_ylabel("AB scale");
Copy the code

By default, the icon is added with the word “right”. To remove it, set mark_right=False:

In [129]: plt.figure();

In [130]: df.plot(secondary_y=["A", "B"], mark_right=False);
Copy the code

Coordinate text adjustment

X_compat =True (x) =True (x) =True

In [133]: plt.figure();

In [134]: df["A"].plot(x_compat=True);
Copy the code

If more than one image needs to be adjusted, you can use with:

In [135]: plt.figure(); In [136]: with pd.plotting.plot_params.use("x_compat", True): ..... : df["A"].plot(color="r") ..... : df["B"].plot(color="g") ..... : df["C"].plot(color="b") ..... :Copy the code

subgraph

When drawing DF, you can separate multiple Series as subgraphs:

In [137]: df.plot(subplots=True, figsize=(6, 6));
Copy the code

You can modify the layout of the subgraph:

df.plot(subplots=True, layout=(2, 3), figsize=(6, 6), sharex=False);
Copy the code

This is equivalent to:

In [139]: df.plot(subplots=True, layout=(2, -1), figsize=(6, 6), sharex=False);
Copy the code

A more complex example:

In [140]: fig, axes = plt.subplots(4, 4, figsize=(9, 9)) In [141]: PLT. Subplots_adjust (wspace = 0.5, img tags like hspace = 0.5) In [142] : target1 = [axes[0][0], axes[1][1], axes[2][2], axes[3][3]] In [143]: target2 = [axes[3][0], axes[2][1], axes[1][2], axes[0][3]] In [144]: df.plot(subplots=True, ax=target1, legend=False, sharex=False, sharey=False); In [145]: (-df).plot(subplots=True, ax=target2, legend=False, sharex=False, sharey=False);Copy the code

Painting form

If set table=True, you can directly display the table data in the graph:

Ax = plt.subplots(1, 1, figsize=(7, 6.5)) In [166]: df = pd.DataFrame(np.random.rand(5, 3), columns=["a", "b", "c"]) In [167]: ax.xaxis.tick_top() # Display x-axis ticks on top. In [168]: df.plot(table=True, ax=ax) figCopy the code

Table can also be displayed on the image:

In [172]: from pandas.plotting import table In [173]: fig, ax = plt.subplots(1, 1) In [174]: Table (ax, np.round(df.describe(), 2), loc=" describe ", colWidths=[0.2, 0.2, 0.2]); table(ax, np.round(df.describe(), 2), loc=" describe ", colWidths=[0.2, 0.2, 0.2]); In [175]: df.plot(ax=ax, ylim=(0, 2), legend=None);Copy the code

Using Colormaps

If there is too much data on the Y-axis, it may be difficult to distinguish the lines using the default color. In this case, you can pass colorMap.

In [176]: df = pd.DataFrame(np.random.randn(1000, 10), index=ts.index)

In [177]: df = df.cumsum()

In [178]: plt.figure();

In [179]: df.plot(colormap="cubehelix");
Copy the code

This article is available at www.flydean.com/09-python-p…

The most popular interpretation, the most profound dry goods, the most concise tutorial, many you do not know the small skills waiting for you to find!