Small knowledge, big challenge! This article is participating in the creation activity of “Essential Tips for Programmers”

SimpleImputer parameter description

class sklearn.impute.SimpleImputer(*, missing_values=nan, strategy=’mean’, fill_value=None, verbose=0, copy=True, add_indicator=False)

Parameter meaning

  • missing_values:int.float.str, (default)np.nanorNoneThat is, what the missing value is.
  • strategy: null-filled strategy, with four options (default)mean,median,most_frequent,constant.meanThe missing value for the column is filled in by the mean of the column.medianIs the median,most_frequentFor the number.constantMeans to fill a null value with a custom value, but the custom value passesfill_valueTo define.
  • fill_value:strorThe numericalBy default,Zone. whenstrategy == "constant"When,fill_valueIs used to replace all missing values (missing_values).fill_valueforZone, when dealing with numerical data, missing values (missing_valuesWill be replaced by0For string or object data types"missing_value"This string.
  • verbose:int, (default)0Control,imputerThe lengthy.
  • copy:boolean, (default)True, indicating that a copy of the data is processed,FalseModify data in place.
  • add_indicator:boolean, (default)False.TrueWill be added after the datanColumn by0and1Of the same size of data,0Represents a non-missing value at the location,1Indicates that the value is missing.


Commonly used method

fit(X)

The return value is the SimpleImputer() class, and the relevant value of the X matrix can be calculated through the FIT (X) method for filling other missing data matrices.


transform(X)

Fill in the missing values, usually using the fit() method before processing the matrix.

from sklearn.impute import SimpleImputer
import numpy as np

X = np.array([[1.2.3],
             [4.5.6],
             [7.8.9]])
X1 = np.array([[1.2, np.nan],
               [4, np.nan, 6],
               [np.nan, 8.9]])
imp = SimpleImputer(missing_values=np.nan, strategy='mean')
imp.fit(X)
print(imp.transform(X1))

Run result
[[1. 2. 6.]
 [4. 5. 6.]
 [4. 8. 9.]]
Copy the code

Since FIT (X) and strategy=’mean’, the fill value is the mean of each column of the X matrix.


fit_transform(X)

Equivalent to fit() + transform().

X1 = np.array([[1.2, np.nan],
               [4, np.nan, 6],
               [np.nan, 8.9]])
imp = SimpleImputer(missing_values=np.nan, strategy='mean')
print(imp.fit_transform(X1))

Run result
[[1.  2.  7.5]
 [4.  5.  6. ]
 [2.5 8.  9. ]]
Copy the code


get_params()

Obtain the SimpleImputer parameter information.

imp = SimpleImputer(missing_values=np.nan, strategy='mean')
print(imp.get_params())

Run result
{'add_indicator': False.'copy': True.'fill_value': None.'missing_values': nan, 'strategy': 'mean'.'verbose': 0}
Copy the code


inverse_transform(X)

Converts data back to its original representation. Inverts the conversion operation performed on the array. This operation can only be performed after simpleImputer is instantiated with add_indicator=True note: Invert can only be performed on features of binary indicators with missing values. If a feature has no missing values at the time of fitting, then the feature has no binary index and the assignment at the time of transformation will not be reversed. Simply put, there is no restore without replacing the missing value.

X1 = np.array([[1.2, np.nan],
               [4, np.nan, 6],
               [np.nan, 8.9]])

imp = SimpleImputer(missing_values=np.nan, strategy='mean', add_indicator=True)
X1 = imp.fit_transform(X1)
print(X1)
print(imp.inverse_transform(X1))

Run result
[[1.  2.  7.5 0.  0.  1. ]
 [4.  5.  6.  0.  1.  0. ]
 [2.5 8.  9.  1.  0.  0. ]]
[[ 1.  2. nan]
 [ 4. nan  6.]
 [nan  8.  9.]]
Copy the code


Custom value fill

Fill_value User-defined.

X = np.array([[1.2.3],
             [4.5.6],
             [7.8.9]])

imp = SimpleImputer(missing_values=1, strategy='constant', fill_value=Awesome!)
print(imp.fit_transform(X))
Run result
[[Awesome! 2 3]
 [4 5 6]
 [7 8 9]]
Copy the code

Fill_value is the default value Zone.

X = np.array([[1.2.3],
             [4.5.6],
             [7.8.9]])

imp = SimpleImputer(missing_values=1, strategy='constant', fill_value=None)
print(imp.fit_transform(X))

Run result
[[0 2 3]
 [4 5 6]
 [7 8 9]]
Copy the code

For startersPythonOr they want to get startedPythonYou can search on wechat [A new vision of PythonSometimes a simple question card for a long time, but others may dial a point will suddenly see light, heartfelt hope that we can make progress together.