Public account: You and the cabin by: Peter Editor: Peter

Pandas series _DataFrame Data filtering _

Pandas has a wide variety of methods for filtering data. In this article, we will focus on the following methods:

  • Expression fetch
  • Query, evel
  • filter
  • Where, mask

Further reading

For a series of articles about PANDAS, read:

1, DataFrame data filter _

2. 10 ways to create DataFrame data

3. Create Series type data

4. It all starts with the explosive function

Simulated data

The following is a complete simulation of the data, including: name, gender, age, mathematics, Chinese, total score, address a total of 7 field information.

import pandas as pd
import numpy as np

df = pd.DataFrame({
    "name": ['Ming'.'wang'.'zhang fei'.'GuanYu'.Sun Xiaoxiao.'Wang Jianguo'.'pei liu']."sex": ['male'.'woman'.'woman'.'male'.'woman'.'male'.'woman']."age": [20.23.18.21.25.21.24]."math": [120.130.118.120.102.140.134]."chinese": [100.130.140.120.149.111.118]."score": [590.600.550.620.610.580.634]."address": [Nanshan District, Shenzhen City, Guangdong Province."Haidian District, Beijing"."Yuhua District, Changsha City, Hunan Province".Dongcheng District, Beijing."Baiyun District, Guangzhou City, Guangdong Province"."Jiangxia District, Wuhan City, Hubei Province"."Longhua District, Shenzhen City, Guangdong Province"
              ]
})

df
Copy the code

Here are the five methods of taking numbers:

  1. Expression fetch
  2. The query () take a number
  3. The eval () take a number
  4. The filter () take a number
  5. Where/mask access

Expression fetch

Expression fetch refers to the use of an expression to specify one or more filters to fetch numbers.

1. Specify a mathematical expression

# 1. Mathematical expressions
df[df['math'] > 125]
Copy the code

2. Reverse operation

The inverse operation is implemented with the symbol ~

# 2. Reverse operation
df[~(df['sex'] = ='male')]  # Retrieve data that is not male
Copy the code

3. Specify the value of an attribute as specific data

# 3. Specify specific data
df[df.sex == 'male']  # = df[df['sex'] == 'male ']
Copy the code

4. Inequality expressions

# 4. Compare expressions
df[df['math'] > df['chinese']]
Copy the code

5. Logical operators

# 5. Logical operators
df[(df['math'] > 120) & (df['chinese'] < 140)]
Copy the code

The query () function

Directions for use

When using ⚠️, note that if there is a space in the column attribute, we need to enclose it in backquotes.

Use case

1. Use numeric expressions

df.query('math > chinese > 110')
Copy the code

df.query('math + chinese > 255')
Copy the code

df.query('math == chinese')
Copy the code

df.query('math == chinese > 120')
Copy the code

df.query('(math > 110) and (chinese < 135)')  # Two inequalities
Copy the code

2. Use character expressions

df.query('sex ! = "female" ')  # Is not equal to female, is all male
Copy the code

df.query('Sex not in (' girl')')  # If it's not women, it's men
Copy the code

df.query('Sex in (' male', 'female')')   # Gender is the whole person in male and female
Copy the code

3. Pass in variables; Variables need to be preceded by @ when used

# set variable
a = df.math.mean()
a

df.query('math > @a + 10')
Copy the code

df.query('math < (`chinese` + @a) / 2')
Copy the code

The eval () function

The eval function is used in the same way as the query function

1. Use numeric expressions

# 1. Numeric expressions
df.eval('math > 125')   # is a bool expression
Copy the code

df[df.eval('math > 125')]
Copy the code

df[df.eval('math > 125 and chinese < 130')]
Copy the code

2. Character expressions

# 2, character expressions
df[df.eval('Sex in (' male')')]
Copy the code

3. Use variables

# 3. Use variables
b = df.chinese.mean()  # calculating mean
df[df.eval('math < @b+5')]
Copy the code

The filter function

We can use filter to filter column or row names by using the following method:

  • Specified directly
  • Regular specified
  • Fuzzy specified

Where axis=1 specifies the column name; Axis =0 specifies the index

Directions for use

Use case

1. Specify the attribute name directly

df.filter(items=["chinese"."score"])   # column name operation
Copy the code

Specify row indexes directly

df.filter(items=[2.4],axis=0)   # line filter
Copy the code

2. Specify by re

df.filter(regex='a',axis=1)  The column name contains
Copy the code

df.filter(regex='^s',axis=1)  # column names start with s
Copy the code

df.filter(regex='e$',axis=1)  # the column name ends with e
Copy the code

df.filter(regex='3 $',axis=0)  The # row index contains 3
Copy the code

3. Vague designation

df.filter(like='s',axis=1)   The column name contains s
Copy the code

df.filter(like='2',axis=0)   The # row index contains 2
Copy the code

# specify both the column name and index
df.filter(regex='^a',axis=1).filter(like='2',axis=0)
Copy the code

Where and mask functions

The where and mask functions are opposites, yielding exactly the opposite result:

  • Where: Retrieve data that meets the requirements. If data that does not meet the requirements is displayed as NaN
  • Mask: The data that does not meet the requirements is displayed as NaN

Both methods can set NaN values to the data we specify

Where the use of

s = df["score"]
s
Copy the code

# where: if the condition is met, NaN is displayed
s.where(s>=600)
Copy the code

We can assign values to data that do not meet the requirements:

# We can assign values that are not satisfied
s.where(s>=610.600)  # Assign 600 if the condition is not met
Copy the code

Take a look at the results of the two groups:

The WHERE function can also specify multiple conditions:

# return True if the condition is met, False if the condition is not met
df.where((df.sex=='male') & (df.math > 125))
Copy the code

Select the data we want:

df[(df.where((df.sex=='male') & (df.math > 125)) == df).name]
# df [(df) where ((df) sex = = 'male') & (df) math > 125)) = = df). Sex]
Copy the code

The mask function

The mask function gets the opposite of where

S.ask (s>=600) # where (s>=600Copy the code

s.mask(s>=610.600)  # Assign 600 if the condition is not met
Copy the code

The mask function accepts multiple conditions:

The value is the opposite of where
df[(df.mask((df.sex=='male') & (df.math > 125)) == df).sex]
Copy the code

conclusion

There are a wide variety of ways to fetch numbers in Pandas. There are too many tricks to get the data we want, and sometimes there are different ways to get the same data. This article focuses on the expression and 5 functions to get the number, the next article will focus on 3 pairs of functions to filter data methods.