Introduction to the

In this article, we will explain the basic data types Series and DataFrame used in Pandas, and the basic behavior of creating and indexing for the two types.

To use Pandas, you need to reference the following lib:

In [1]: import numpy as np

In [2]: import pandas as pd
Copy the code

Series

Series is a one-dimensional array with label and index. We use the following method to create a Series:

>>> s = pd.Series(data, index=index)
Copy the code

Data here can be a Python dictionary, NP’s NDARray, or a scalar.

Index is a list of horizontal labels. Let’s look at how to create a Series.

fromndarraycreate

s = pd.Series(np.random.randn(5), index=['a', 'b', 'c', 'd', 'e']) s Out[67]: A-1.300797 b-2.044172 C-1.170739 D-0.445290E 1.208784 DType: Float64Copy the code

Get index with index:

s.index
Out[68]: Index(['a', 'b', 'c', 'd', 'e'], dtype='object')
Copy the code

From the dict create

d = {'b': 1, 'a': 0, 'c': 2}

pd.Series(d)
Out[70]: 
a    0
b    1
c    2
dtype: int64
Copy the code

Create from scalars

Pd. Series (5., index = [' a ', 'b', 'c', 'd', 'e']) Out [71] : a b c d e 5.0 5.0 5.0 5.0 5.0 dtype: float64Copy the code

The Series and ndarray

Series and nDARray are very similar. Using index values in Series behaves like Ndarray:

S [0] Out[72]: -1.3007972194268396 s[:3] Out[73]: a-1.300797 b-2.044172 c-1.170739 dType: Float64s [s > s.median()] Out[74]: float64s [[4, 3, 1]] Out[75]: float64s [S > s.median()] Out[74]: dType: float64s [[4, 3, 1]] Out[75]: E 1.208784 D-0.445290 B-2.044172 DType: FLOAT64Copy the code

The Series and dict

If you access a Series using a label, it behaves like a dict:

S [' a '] Out [80] : 1.3007972194268396 s = 12. [' e '] s Out [82] : A-1.300797 b-2.044172 C-1.170739 D -0.445290E 12.000000 DType: Float64Copy the code

Vectorization and label alignment

Series can use a simpler vectorization operation:

S + s Out[83]: a-2.601594 b-4.088344 c-2.341477 d-0.890581 e 24.000000 dType: float64s * 2 Out[84]: A-2.601594 b-4.088344 c-2.341477 d-0.890581 e 24.000000 dType: float64 np. Exp (s) Out[85]: A 0.272315b 0.129487 c 0.310138 d 0.640638 E 162754.791419 DType: float64Copy the code

The Name attribute

Series also has a name property that we can set at creation time:

s = pd.Series(np.random.randn(5), name='something')

s
Out[88]: 
0    0.192272
1    0.110410
2    1.442358
3   -0.375792
4    1.228111
Name: something, dtype: float64
Copy the code

S also has a rename method that can rename s:

s2 = s.rename("different")
Copy the code

DataFrame

A DataFrame is a two-dimensional data structure with labels. It is made up of Series. You can think of a DataFrame as an Excel table. DataFrame can be created from the following data:

  • One-dimensional Ndarrays, Lists, Dicts, or Series
  • Structured array creation
  • 2 d numpy. Ndarray
  • Other DataFrame

From the Series to create

DataFrame can be created from a Series dictionary:

d = {'one': pd.Series([1., 2., 3.], index=['a', 'b', 'c']),'two': pd.Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'd'])} df = pd.DataFrame(d) df Out[92]: One two a 1.0 1.0 b 2.0 c 3.0 3.0 D NaN 4.0Copy the code

Index reorder:

Pd.DataFrame(d, index=['d', 'b', 'a']) Out[93]: one two d NaN 4.0b 2.0 2.0a 1.0 1.0Copy the code

To perform a rearrangement:

DataFrame(d, index=['d', 'b', 'a'], columns=['two', 'three']) Out[94]: two three d 4.0 NaN b 2.0 NaN a 1.0 NaNCopy the code

Created from Ndarrays and lists

d = {'one': [1., 2., 3., 4.],'two': [4., 3., 2., 1.]}

pd.DataFrame(d)
Out[96]: 
   one  two
0  1.0  4.0
1  2.0  3.0
2  3.0  2.0
3  4.0  1.0

pd.DataFrame(d, index=['a', 'b', 'c', 'd'])
Out[97]: 
   one  two
a  1.0  4.0
b  2.0  3.0
c  3.0  2.0
d  4.0  1.0
Copy the code

Created from a structured array

DF can be created from a structured array:

In [47]: data = np.zeros((2, ), dtype=[('A', 'i4'), ('B', 'f4'), ('C', 'a10')]) In [48]: data[:] = [(1, 2., 'Hello'), (2, 3., "World")] In [49]: pd.DataFrame(data) Out[49]: A B C 0 1 2.0b 'Hello' 1 2 2.0b 'World' In [50]: pd.DataFrame(data, index=['first', 'second']) Out[50]: A B C first 1 2.0b 'Hello' second 2 2.0b 'World' In [51]: pd.DataFrame(data, columns=['C', 'A', 'B']) Out[51]: C A B 0 B 'Hello' 1 2.0 1 B 'World' 2 3.0Copy the code

Created from the dictionary list

In [52]: data2 = [{'a': 1, 'b': 2}, {'a': 5, 'b': 10, 'c': 20}] In [53]: pd.DataFrame(data2) Out[53]: In [54]: DataFrame(data2, index=['first', 'second']) Out[54]: DataFrame(data2, index=['first', 'second']) Out[54]: In [55]: data.data.columns (data2, columns=['a', 'b']) Out[55]: data.columns =['a', 'b'] Out[55]: data.columns =['a', 'b'] Out[55]: data.columns =['a', 'b'] Out[55]Copy the code

Created from a tuple

More complex DFS can be created from tuples:

In [56]: pd.DataFrame({('a', 'b'): {('A', 'B'): 1, ('A', 'C'): 2}, .... : ('a', 'a'): {('A', 'C'): 3, ('A', 'B'): 4}, .... : ('a', 'c'): {('A', 'B'): 5, ('A', 'C'): 6}, .... : ('b', 'a'): {('A', 'C'): 7, ('A', 'B'): 8}, .... : ('b', 'b'): {('A', 'D'): 9, ('A', 'B'): 10}}) .... : Out[56]: a b b a c a b b 1.0 4.0 5.0 8.0 10.0 c 2.0 3.0 6.0 7.0 NaN D NaN NaN 9.0Copy the code

Column selection, add and remove

DF can be manipulated as Series:

In [64]: df['one'] Out[64]: a 1.0b 2.0c 3.0D NaN Name: One, dType: float64 In [65]: df['three'] = df['one'] * df['two'] In [66]: df['flag'] = df['one'] > 2 In [67]: df Out[67]: One two three Flag A 1.0 1.0 1.0 False B 2.0 2.0 4.0 False C 3.0 3.0 9.0 True D NaN 4.0 NaN FalseCopy the code

You can delete specific columns, or pop:

In [68]: del df['two']

In [69]: three = df.pop('three')

In [70]: df
Out[70]: 
   one   flag
a  1.0  False
b  2.0  False
c  3.0   True
d  NaN  False
Copy the code

If a constant is inserted, the entire column will be filled:

In [71]: df['foo'] = 'bar'

In [72]: df
Out[72]: 
   one   flag  foo
a  1.0  False  bar
b  2.0  False  bar
c  3.0   True  bar
d  NaN  False  bar
Copy the code

The last column in DF is inserted by default. You can use insert to specify which column to insert into:

In [75]: df.insert(1, 'bar', df['one']) In [76]: df Out[76]: One bar Flag foo one_Trunc a 1.0 1.0 False bar 1.0B 2.0 2.0 False bar 2.0C 3.0 3.0 True bar NaN D NaN NaN False bar NaNCopy the code

Use assign to derive new columns from existing columns:

In [77]: iris = pd.read_csv('data/iris.data') In [78]: iris.head() Out[78]: SepalLength SepalWidth PetalLength PetalWidth Name 0 5.1 3.5 1.4 0.2 Iris-setosa 1 4.9 3.0 1.4 0.2 Iris-setosa 2 4.7 3.2 1.3 0.2 Iris-setosa 3 4.6 3.1 1.5 0.2 Iris-setosa 4 5.0 3.6 1.4 0.2 Iris-setosa In [79]: (iris.assign(sepal_ratio=iris['SepalWidth'] / iris['SepalLength']) .... : .head()) .... : Out[79]: SepalLength SepalWidth PetalLength PetalWidth Name Sepal_ratio 0 5.1 3.5 1.4 0.2 Iris-setosa 0.686275 1 4.9 3.0 1.4 0.2 Iris-setosa 0.612245 2 4.7 3.2 1.3 0.2 Iris-setosa 0.680851 3 4.6 3.1 1.5 0.2 Iris-setosa 0.673913 4 5.0 3.6 1.4 0.2 Iris - setosa 0.720000Copy the code

Note that assign creates a new DF, leaving the original DF unchanged.

Here is a table for index and selection in DF:

operation grammar Returns the result
Select the column df[col] Series
Select rows by label df.loc[label] Series
Select rows by array df.iloc[loc] Series
The section of df[5:10] DataFrame
Select rows using Boolean vectors df[bool_vec] DataFrame

This article is available at www.flydean.com/03-python-p…

The most popular interpretation, the most profound dry goods, the most concise tutorial, many tips you didn’t know waiting for you to discover!

Welcome to pay attention to my public number: “procedures those things”, understand technology, more understand you!