Pandas what are pandas? What’s the use?

Pandas is an open source, BSD-licensed library that provides high-performance, easy-to-use data structures and data analysis tools for the Python programming language.

How do I use pandas?

In order to use a library, we need to import it into our project. We import pandas and Numpy, a Python library that provides mathematical functions for array operations.

import numpy as np
import pandas as pd
Copy the code

Pandas’ primary data structures are Series and DataFrame. Series: It’s a single column of data, sort of like an array, and the data is one-dimensional. DataFrame: Think of it as a table of relational data, with data in two dimensions.

Create a Series object:

s = pd.Series([1.3.5, np.nan, 6.8])
s
Copy the code

Create a DataFrame object with a NumPy array and use the array’s datetime as an index:

dates = pd.date_range('20130101', periods=6)
df = pd.DataFrame(np.random.randn(6.4), index=dates, columns=list('ABCD'))
df
Copy the code

Create DataFrame objects with dict objects:

df2 = pd.DataFrame({'A': 1..'B': pd.Timestamp('20130102'),
                    'C': pd.Series(1, index=list(range(4)), dtype='float32'),
                    'D': np.array([3] * 4, dtype='int32'),
                    'E': pd.Categorical(["test"."train"."test"."train"]),
                    'F': 'foo'})
fd2
Copy the code

Display the data

# display the first 5 entries of df
df.head()
# display the last three values of df
df.tail(3)
# show the index of df
df.index
# display all columns in df
df.columns
Copy the code

Get partial data

Get column A data
df['A']
Get data with subscripts 0,1, and 2
df[0:3]
Get data for the time period
df['20130102':'20130104']
Copy the code

Get data based on the label

df.loc[:, ['A'.'B']]
df.loc['20130102':'20130104'['A'.'B']]
df.loc['20130102'['A'.'B']]
Copy the code

Get data based on subscripts

df.iloc[3]
df.iloc[3:5.0:2]
df.iloc[[1.2.4], [0.2]]
Copy the code

Add a new column of data F

s1 = pd.Series([1.2.3.4.5.6], index=pd.date_range('20130102', periods=6))
df['F'] = s1
Copy the code

For more complex operations, you can pass in a lambda function using the Apply method

df.apply(lambda x: x.max() - x.min())
Copy the code

Working with CSV files

# write data from df to foo.csv file
df.to_csv('foo.csv')
# Read data from foo.csv file
pd.read_csv('foo.csv')
Copy the code

Operating Excel files

# write the data in df to foo. XLSX
df.to_excel('foo.xlsx', sheet_name='Sheet1')
# Read data from foo. XLSX file
pd.read_excel('foo.xlsx'.'Sheet1', index_col=None, na_values=['NA'])
Copy the code

In this article, we introduced some basic ways to use pandas to handle data. For more information about the library’s apis, see the pandas tutorial. We will introduce some commonly used apis later in the actual operation.

A powerful machine learning web page runner is a Google machine learning experiment that requires no installation and is very simple to use. Address: colab.research.google.com

PS: Clear mountains and clear waters begin with dust, knowledge is more valuable than diligence. I got wine. You got a story? Wechat official account: “Clean the dust chat”. Welcome to chat and talk about code.