Python Data Analysis Toolkit pandas

1. Brief introduction

Pandas is a high-performance, easy-to-use library of data structures and analysis tools designed for the Python programming language. It is built on top of NUMpy and can be seamlessly integrated into a scientific computing environment for many third-party libraries. Pandas is widely used in finance, statistics, social sciences, and many engineering and technical fields to handle typical data analysis cases.

2. Install

Pandas can be installed in conda and PIP modes.

Conda installation:

conda install pandas
Copy the code

PIP installation:

pip install pandas
Copy the code

As of this writing, the latest official release is V0.25.1, which was released on August 22, 2019. The latest version is a bug-fixed version of the 0.25.x series. The update mode is as follows:

pip install --upgrade pandas
Copy the code

3. Data structure

Pandas has two main data structures: Series (dimension 1) and DataFrame (dimension 2).

The following describes both data structures separately, starting with importing pandas, abbreviated to PD by industry convention, into our Python script or Jupyter Notebook.

import pandas as pd
Copy the code

3.1 Series

Series is a one-dimensional token array that can hold any data type (integer, string, floating point, Python object, etc.). Axis labels are collectively referred to as indexes.

3.1.1 create Series

Create from a list:

data = [1.2.3]
pd.Series(data)
Copy the code

0    1
1    2
2    3
dtype: int64
Copy the code

Create from a dictionary:

data = {'a': 1.'b': 2.'c': 3}
pd.Series(data)
Copy the code

a    1
b    2
c    3
dtype: int64
Copy the code

Set the index (label) with the index parameter:

data = [1.2.3]
index = ['a'.'b'.'c']
pd.Series(data, index=index)
Copy the code

a    1
b    2
c    3
dtype: int64
Copy the code

Created by scalar (same value) and set index (label, cannot be repeated) :

data = 0
index = ['a'.'b'.'c']
pd.Series(data, index=index)
Copy the code

a    0
b    0
c    0
dtype: int64
Copy the code

Granted access to the Series

s = pd.Series([10.100.1000], index=['a'.'b'.'c'])
s
Copy the code

a      10
b     100
c    1000
dtype: int64
Copy the code

Array access:

print(s[0], s[1], s[2])
Copy the code

10, 100, 1000Copy the code

Dictionary access:

print(s['a'], s['b'], s['c'])
Copy the code

10, 100, 1000Copy the code

The corresponding relationship between the two access modes can be seen:

print(s[0] == s['a'], s[1] == s['b'], s[2] == s['c'])
Copy the code

True True True
Copy the code

3.2 DataFrame

A DataFrame is a two-dimensional tagged data structure with possibly different types of columns. It can be likened to a spreadsheet or SQL table, or a dictionary for a Series object. It is also the most commonly used object for pandas.

3.2.1 create DataFrame

Create from list dictionary:

data = {
    'col1': [1.2.3].'col2': [4.5.6].'col3': [7.8.9]
}

pd.DataFrame(data)
Copy the code

	col1	col2	col3
0	1	4	7
1	2	5	8
2	3	6	9

Create from Series dictionary:

s1 = pd.Series([1.2.3], index=['row1'.'row2'.'row3'])
s2 = pd.Series([4.5.6], index=['row2'.'row3'.'row4'])
s3 = pd.Series([7.8.9], index=['row3'.'row4'.'row5'])

data = {
    'col1': s1,
    'col2': s2,
    'col3': s3
}

pd.DataFrame(data)
Copy the code

	col1	col2	col3
row1	1.0	NaN	NaN
row2	2.0	4.0	NaN
row3	3.0	5.0	7.0
row4	NaN	6.0	8.0
row5	NaN	NaN	9.0

Create from a dictionary list:

data = [
    {'col1': 1.'col2': 2.'col3': 3},
    {'col1': 2.'col2': 3.'col3': 4},
    {'col1': 3.'col2': 4.'col3': 5}
]

pd.DataFrame(data, index=['row1'.'row2'.'row3'])
Copy the code

	col1	col2	col3
row1	1	2	3
row2	2	3	4
row3	3	4	5

Create from a two-dimensional list:

data = [
    [1.2.3],
    [2.3.4],
    [3.4.5]
]

pd.DataFrame(data, index=['row1'.'row2'.'row3'], columns=['col1'.'col2'.'col3'])
Copy the code

	col1	col2	col3
row1	1	2	3
row2	2	3	4
row3	3	4	5

3.2.2 access DataFrame

df = pd.DataFrame([[1.4.7], [2.5.8], [3.6.9]],
                  index=['row1'.'row2'.'row3'],
                  columns=['col1'.'col2'.'col3'])
df
Copy the code

	col1	col2	col3
row1	1	4	7
row2	2	5	8
row3	3	6	9

Access a column through a column label:

df['col1']
Copy the code

row1    1
row2    2
row3    3
Name: col1, dtype: int64
Copy the code

Access a row through a row label:

df.loc['row1']
Copy the code

col1    1
col2    4
col3    7
Name: row1, dtype: int64
Copy the code

Accessing a row via an integer:

df.iloc[0]
Copy the code

col1    1
col2    4
col3    7
Name: row1, dtype: int64
Copy the code

Select rows by slicing:

df[1:]
Copy the code

	col1	col2	col3
row2	2	5	8
row3	3	6	9

3.2.3 transposed DataFrame

Transpose the rows and columns, similar to the transpose of a matrix in linear algebra.

df.T
Copy the code

	row1	row2	row3
col1	1	2	3
col2	4	5	6
col3	7	8	9

Guess you like

[1] The foundation of Scientific computation in Python, Numpy
[2] Python Data visualization toolkit Matplotlib

Writing a column is not easy, so if you find this article helpful, give it a thumbs up. Thanks for your support!

Personal website: Kenblog.top
Github site: kenblikylee.github. IO

Wechat scan qr code to obtain the latest technology original