This is the 20th day of my participation in the November Gwen Challenge. Check out the details: The last Gwen Challenge 2021

Data analysis – Data preprocessing

Handling duplicate values

Duplicated () looks up duplicate values

 import pandas as pd
 a=pd.DataFrame(data=[['A',19],['B',19],['C',20],['A',19],['C',20]],
                columns=['name','age'])
 print(a)
 print('--------------------------')
 a=a.duplicated()
 print(a)
Copy the code

You judge the whole world, not every one

any()

 import pandas as pd
 a=pd.DataFrame(data=[['A',19],['B',19],['C',20],['A',19],['C',20]],
                columns=['name','age'])
 print(a)
 print('--------------------------')
 a=any(a.duplicated())
 print(a)
Copy the code

Drop_duplicates () Deletes duplicate values

Whether the inplace parameter is modified on the original data

 import pandas as pd
 a=pd.DataFrame(data=[['A',19],['B',19],['C',20],['A',19],['C',20]],
                columns=['name','age'])
 print(a)
 print('--------------------------')
 b=a.drop_duplicates(inplace=False)
 a.drop_duplicates(inplace=True)
 print(a)
 print('--------------------------')
 print(b)
Copy the code

Handling missing values

NaN represents a missing value

 import pandas as pd
 a=pd.read_csv(r'text.csv')
 print(a)
Copy the code

Isnull () determines whether all positional elements are missing

 import pandas as pd
 a=pd.read_csv(r'text.csv')
 print(a.isnull())
Copy the code

Any () determines whether the column element is missing

 import pandas as pd
 a=pd.read_csv(r'text.csv')
 print(a.isnull().any())
 print(a.isnull().any(axis=1))
Copy the code

Del ()dropna() deleted

 import pandas as pd
 a=pd.read_csv(r'text.csv')
 del a['name']
 print(a)
Copy the code

 import pandas as pd
 a=pd.read_csv(r'text.csv')
 b=a.dropna(axis=0)
 print(b)
 c=a.dropna(axis=1)
 print(c)
Copy the code

Del () deletes the specified column, dropna() deletes the column with the missing value.

Fillna () missing value is filled

import pandas as pd
a=pd.read_csv(r'text.csv')
a=a.fillna('wu')
print(a)
Copy the code

Fill in according to the upper (lower) data

Pad/ffILL: fill according to the previous line backfill/bfill: fill according to the next line

import pandas as pd a=pd.read_csv(r'text.csv') print(a) print('---------------------') b=a.fillna(method='pad') print(b)  print('---------------------') c=a.fillna(method='bfill') print(c)Copy the code

Numeric data population

The average mean ()

The average value of each column is populated

import pandas as pd
a=pd.read_csv(r'text.csv')
print(a)
print('---------------------')
a=a.fillna(a.mean())
print(a)
Copy the code

Median ()

import pandas as pd
a=pd.read_csv(r'text.csv')
print(a)
print('---------------------')
a=a.fillna(a.median( ))
print(a)
Copy the code

Character data filling

Modal mode ()

import pandas as pd
a=pd.read_csv(r'text.csv')
print(a)
print('---------------------')
for i in a.columns:
    a[i] = a[i].fillna(a[i].mode()[0])
print(a)
Copy the code

Data transformation

Map () data conversion

The import pandas as pd data = {' sex ':,0,1,1,0 [1]} a = pd. The DataFrame (data) a [' sex - T'] = [' sex ']. A map ({1: 'male', 0: 'female'}) print (a)Copy the code

Dummy variable

Import pandas as pd data = {' sex ': [' male' and 'female' and 'male' and 'female' and 'confidential']} a = pd. The DataFrame (data) a = pd. Get_dummies print (a) (a)Copy the code

Data analysis – Data preprocessing

Data analysis – Data preprocessing

Handling duplicate values

Duplicated () looks up duplicate values

Drop_duplicates () Deletes duplicate values

Handling missing values

Isnull () determines whether all positional elements are missing

Any () determines whether the column element is missing

Del ()dropna() deleted

Fillna () missing value is filled

Fill in according to the upper (lower) data

Numeric data population

Character data filling

Data transformation

Map () data conversion

Dummy variable

Related Posts

Flink State Backends

Redis basic data structures – hash objects

RabbitMQ learning (6) – dead letter queue