1. Brief introduction

Pandas is a high-performance, easy-to-use library of data structures and analysis tools designed for the Python programming language. It is built on top of NUMpy and can be seamlessly integrated into a scientific computing environment for many third-party libraries. Pandas is widely used in finance, statistics, social sciences, and many engineering and technical fields to handle typical data analysis cases.

2. Install

Pandas can be installed in conda and PIP modes.

Conda installation:

conda install pandas
Copy the code

PIP installation:

pip install pandas
Copy the code

As of this writing, the latest official release is V0.25.1, which was released on August 22, 2019. The latest version is a bug-fixed version of the 0.25.x series. The update mode is as follows:

pip install --upgrade pandas
Copy the code

3. Data structure

Pandas has two main data structures: Series (dimension 1) and DataFrame (dimension 2).

The following describes both data structures separately, starting with importing pandas, abbreviated to PD by industry convention, into our Python script or Jupyter Notebook.

import pandas as pd
Copy the code

3.1 Series

Series is a one-dimensional token array that can hold any data type (integer, string, floating point, Python object, etc.). Axis labels are collectively referred to as indexes.

3.1.1 create Series

Create from a list:

data = [1.2.3]
pd.Series(data)
Copy the code
0    1
1    2
2    3
dtype: int64
Copy the code

Create from a dictionary:

data = {'a': 1.'b': 2.'c': 3}
pd.Series(data)
Copy the code
a    1
b    2
c    3
dtype: int64
Copy the code

Set the index (label) with the index parameter:

data = [1.2.3]
index = ['a'.'b'.'c']
pd.Series(data, index=index)
Copy the code
a    1
b    2
c    3
dtype: int64
Copy the code

Created by scalar (same value) and set index (label, cannot be repeated) :

data = 0
index = ['a'.'b'.'c']
pd.Series(data, index=index)
Copy the code
a    0
b    0
c    0
dtype: int64
Copy the code

Granted access to the Series

s = pd.Series([10.100.1000], index=['a'.'b'.'c'])
s
Copy the code
a      10
b     100
c    1000
dtype: int64
Copy the code

Array access:

print(s[0], s[1], s[2])
Copy the code
10, 100, 1000Copy the code

Dictionary access:

print(s['a'], s['b'], s['c'])
Copy the code
10, 100, 1000Copy the code

The corresponding relationship between the two access modes can be seen:

print(s[0] == s['a'], s[1] == s['b'], s[2] == s['c'])
Copy the code
True True True
Copy the code

3.2 DataFrame

A DataFrame is a two-dimensional tagged data structure with possibly different types of columns. It can be likened to a spreadsheet or SQL table, or a dictionary for a Series object. It is also the most commonly used object for pandas.

3.2.1 create DataFrame

Create from list dictionary:

data = {
    'col1': [1.2.3].'col2': [4.5.6].'col3': [7.8.9]
}

pd.DataFrame(data)
Copy the code
col1 col2 col3
0 1 4 7
1 2 5 8
2 3 6 9

Create from Series dictionary:

s1 = pd.Series([1.2.3], index=['row1'.'row2'.'row3'])
s2 = pd.Series([4.5.6], index=['row2'.'row3'.'row4'])
s3 = pd.Series([7.8.9], index=['row3'.'row4'.'row5'])

data = {
    'col1': s1,
    'col2': s2,
    'col3': s3
}

pd.DataFrame(data)
Copy the code
col1 col2 col3
row1 1.0 NaN NaN
row2 2.0 4.0 NaN
row3 3.0 5.0 7.0
row4 NaN 6.0 8.0
row5 NaN NaN 9.0

Create from a dictionary list:

data = [
    {'col1': 1.'col2': 2.'col3': 3},
    {'col1': 2.'col2': 3.'col3': 4},
    {'col1': 3.'col2': 4.'col3': 5}
]

pd.DataFrame(data, index=['row1'.'row2'.'row3'])
Copy the code
col1 col2 col3
row1 1 2 3
row2 2 3 4
row3 3 4 5

Create from a two-dimensional list:

data = [
    [1.2.3],
    [2.3.4],
    [3.4.5]
]

pd.DataFrame(data, index=['row1'.'row2'.'row3'], columns=['col1'.'col2'.'col3'])
Copy the code
col1 col2 col3
row1 1 2 3
row2 2 3 4
row3 3 4 5

3.2.2 access DataFrame

df = pd.DataFrame([[1.4.7], [2.5.8], [3.6.9]],
                  index=['row1'.'row2'.'row3'],
                  columns=['col1'.'col2'.'col3'])
df
Copy the code
col1 col2 col3
row1 1 4 7
row2 2 5 8
row3 3 6 9

Access a column through a column label:

df['col1']
Copy the code
row1    1
row2    2
row3    3
Name: col1, dtype: int64
Copy the code

Access a row through a row label:

df.loc['row1']
Copy the code
col1    1
col2    4
col3    7
Name: row1, dtype: int64
Copy the code

Accessing a row via an integer:

df.iloc[0]
Copy the code
col1    1
col2    4
col3    7
Name: row1, dtype: int64
Copy the code

Select rows by slicing:

df[1:]
Copy the code
col1 col2 col3
row2 2 5 8
row3 3 6 9

3.2.3 transposed DataFrame

Transpose the rows and columns, similar to the transpose of a matrix in linear algebra.

df.T
Copy the code
row1 row2 row3
col1 1 2 3
col2 4 5 6
col3 7 8 9

Guess you like

  • [1] The foundation of Scientific computation in Python, Numpy
  • [2] Python Data visualization toolkit Matplotlib

Writing a column is not easy, so if you find this article helpful, give it a thumbs up. Thanks for your support!

  • Personal website: Kenblog.top
  • Github site: kenblikylee.github. IO



Wechat scan qr code to obtain the latest technology original