How can recorded data be analyzed, or graphically, in Python?

In this article, we will introduce numpy, matplotlib, PANDAS, scipy, and other packages for data analysis and graphics.

Prepare the environment

The Anaconda distribution is recommended for Python environments.

  • Official: www.anaconda.com/products/in…
  • Tsinghua source: mirrors.tuna.tsinghua.edu.cn/anaconda/ar…

Anaconda is a Python distribution for scientific computing that already includes a number of popular Python packages for scientific computing and data analysis.

Conda lists existing packages, and you’ll find several of the packages covered in this article:

$conda list | grep numpy numpy 1.17.2 py37h99e6662_0 $conda list | grep"matplot\|seaborn\|plotly"Matplotlib 3.1.1 py37h54f8f79_0 seaborn 0.9.0 py37_0 $conda list | grep"pandas\|scipy"Pandas 0.25.1 PY37H0A44026_0 SCIPY 1.3.1 PY37H1410FF5_0Copy the code

If you already have Python environments, PIP will install them:

pip install numpy matplotlib pandas scipy
# pypi mirror: https://mirrors.tuna.tsinghua.edu.cn/help/pypi/
Copy the code

Python 3.7.4 (Anaconda3-2019.10)

To prepare data

This article assumes data in the following format datA0.txt:

Id, data, timestamp 0, 55, 1592207702.688805 1, 41, 1592207702.783134 2, 57, 1592207702.883619 3, 59, 1592207702.980597 4, 58, 1592207703.08313 5, 41, 1592207703.183011 6, 52, 1592207703.281802...Copy the code

CSV format: comma separated, easy to read and write, Excel can be opened.

After that, we will achieve the following goals together:

  • CSV data, NUMPY reading and calculation
  • Data column data, matplotlib graphical
  • Data column data, SCIPY interpolation, forming curves
  • Timestamp column data, the difference before and after the analysis and the number of seconds

Numpy reads the data

Numpy can read CSV data directly from loadtxt.

import numpy as np

# id, (data), timestamp
datas = np.loadtxt(p, dtype=np.int32, delimiter=",", skiprows=1, usecols=(1))
Copy the code
  • dtype=np.int32: Data typenp.int32
  • delimiter=",": delimiter “,”
  • skiprows=1: Skip line 1
  • usecols=(1): Reads column 1

If I read multiple columns,

# id, (data, timestamp)
dtype = {'names': ('data'.'timestamp'), 'formats': ('i4'.'f8')}
datas = np.loadtxt(path, dtype=dtype, delimiter=",", skiprows=1, usecols=(1.2))
Copy the code

Dtype: numpy.org/devdocs/ref…

Numpy analyzes the data

Numpy calculates mean value and sample standard deviation:

# average
data_avg = np.mean(datas)
# data_avg = np.average(datas)

# standard deviation
# data_std = np.std(datas)
# sample standard deviation
data_std = np.std(datas, ddof=1)

print(" avg: {:.2f}, std: {:.2f}, sum: {}".format(
      data_avg, data_std, np.sum(datas)))
Copy the code

Matplotlib graphical

It only takes four lines to graphically display:

import sys

import matplotlib.pyplot as plt
import numpy as np

def _plot(path):
  print("Load: {}".format(path))
  # id, (data), timestamp
  datas = np.loadtxt(path, dtype=np.int32, delimiter=",", skiprows=1, usecols=(1))

  fig, ax = plt.subplots()
  ax.plot(range(len(datas)), datas, label=str(i))
  ax.legend()
  plt.show()

if __name__ == "__main__":
  if len(sys.argv) < 2:
    sys.exit("python data_plot.py *.txt")
  _plot(sys.argv[1])
Copy the code

ax.plot(x, y, …) Range (len(datas)).

See data_plot.py for the Gist address at the end of this article for the full code. The running effect is as follows:

$python data_plot.py datA0.txt Args nonzero: False Load: datA0.txt Size: 20 AVg: 52.15, STD: 8.57, sum: 1043Copy the code

Can read multiple files, display together:

$python data_plot.py data*.txt Args nonzero: False Load: datA0.txt Size: 20 AVg: 52.15, STD: 8.57, sum: 1043 Load: TXT size: 20 AVG: 53.35, STD: 6.78, sum: 1067Copy the code

Scipy interpolates data

Data of X and Y were interpolated by SCIPY and smoothed into curves:

from scipy import interpolate

xnew = np.arange(xvalues[0], xvalues[- 1].0.01)
ynew = interpolate.interp1d(xvalues, yvalues, kind='cubic')
Copy the code

See data_interp.py at the bottom of this Gist address for the complete code. The running effect is as follows:

python data_interp.py data0.txt
Copy the code

How to configure, delay, save, and view code and comments when matplotlib is graphed.

Pandas Analyzing data

Here we need to read the timestamp column,

# id, data, (timestamp)
stamps = np.loadtxt(path, dtype=np.float64, delimiter=",", skiprows=1, usecols=(2))
Copy the code

Numpy calculated before and after the difference,

stamps_diff = np.diff(stamps)
Copy the code

Pandas statistics the number of games per second.

stamps_int = np.array(stamps, dtype='int')
stamps_int = stamps_int - stamps_int[0]
import pandas as pd
stamps_s = pd.Series(data=stamps_int)
stamps_s = stamps_s.value_counts(sort=False)
Copy the code

The timestamp is changed to the integer second and the value is the same for pandas.

See stamp_diff.py for the Gist address for the complete code. The running effect is as follows:

python stamp_diff.py data0.txt
Copy the code

Matplotlib graphically displays multiple charts, also visible code.

conclusion

This article code Gist address: gist.github.com/ikuokuo/862…


Share practical tips and knowledge in Coding! Welcome to pay attention and grow together!