“This is the 12th day of my participation in the Gwen Challenge in November. See details of the event: The Last Gwen Challenge 2021”.

Missing values are common in actual data processing, and how to use Python to detect and handle missing values is the main topic of this article.

Detecting missing values

We start by creating a DataFrame with missing values.

import pandas as pd

df = pd.DataFrame(
    {'A': [None.2.None.4].'B': [10.None.None.40].'C': [100.200.None.400].'D': [None.2000.3000.None]})
df
Copy the code

Missing values of numeric classes are shown as NaN (Not A Number) in Pandas. Let’s see how to determine which columns or rows have missing values.

1.info()

In the result returned by info(), we just need to see if the number of non-null counts for each column is equal to RangeIndex.

2. Isnull () ISNULL () returns a data box with the same size (number of columns, number of rows) as the original DataFrame. The data corresponding to the column and column indicates whether the value is missing.

df.isnull()
Copy the code

usesum()To detect the number of missing values in each column.

df.isnull().sum(a)Copy the code

through.TDataFrameTranspose to get the number of missing values detected in each row.

df.isnull().T.sum(a)Copy the code

Missing value handling

Delete missing values

If the missing row/column is not important, drop the missing row/column using dropna().

df.dropna(axis=0,
          how='any',
          thresh=None,
          subset=None,
          inplace=False)
Copy the code

Parameter meaning

  • axis: Control row parameters, row 0, column 1.
  • how: any, delete the row or column if NaN is present; All, if all values are NaN, deletes the row or column.
  • thresh: Specifies the number of nans to be deleted when the number of nans is reached.
  • subsetSubset specifies the range of data to be considered. For example, if you delete a missing row, a subset specifies all columns.
  • inplace: indicates whether to modify the original data. True Indicates whether to modify the original dataNone, False returns the processed data box.


Specify Axis = 1 to delete a column if it has missing values.

df.dropna(axis=1, how='any')
Copy the code

Specify Axis = 0 (the default) to delete a row if it has missing values.

df.dropna(axis=0, how='any')
Copy the code

With reference to the ABC column, delete the three missing columns.

df.dropna(axis=0, subset=['A'.'B'.'C'], how='all')
Copy the code

Reserve rows that have at least three non-nan values.

df.dropna(axis=0, thresh=3)
Copy the code

Fill in missing value

Another common way to handle missing values is to use fillna() to fill in missing values.

df.fillna(value=None,
          method=None,
          axis=0,
          inplace=False,
          limit=None)
Copy the code

1. Directly specify the filling value

df.fillna(Awesome!)
Copy the code

2. Fill it with the value before or after the missing value

Fill with the previous value

methodA value offfillpadIs filled according to the previous value.

axis = 0, is filled with the last value in the same column as the missing value, not if the missing value is in the first row.

axis = 1, is filled with the value in the same row as the missing value, not if the missing value is in the first column.

df.fillna(axis=0, method='pad')
Copy the code

Press the next value to fill

methodA value ofbackfillbfill, press the last value to fill.

axis = 0, is filled with the next value in the same column as the missing value, not if the missing value is in the last row.

axis = 1, is filled with the next value in the same row as the missing value, not if the missing value is in the last column.

df.fillna(axis=0, method='bfill')
Copy the code

Specify the appropriate method to fill

df.fillna(df.mean())
Copy the code

Limit Limits the number of fill operations

On the ABCD column, only the first null value is filled in each column.

df.fillna(value=Awesome!, axis=1, limit=1)
Copy the code



This is what I want to share today. Search Python New Horizons on wechat, bringing you more useful knowledge every day. More organized nearly a thousand sets of resume templates, hundreds of e-books waiting for you to get oh!