This is the 25th day of my participation in the August Genwen Challenge.More challenges in August

The operations that pandas performs on the DataFrame are described here

Add a header to the table

If the table has no header, be sure to add header = None, otherwise the first line will be used as the header

  1. Rename when reading the contents of the file

    Df = pd read_excel (‘ file path ‘names = name ([‘ 1’, ‘column 2’]))

  2. Name after read

    df.columns = name

Resets the index to overwrite the original data

df.reset_index(drop = True, inplace = True)

To delete a column

  1. Del df[‘ column name ‘]
  2. Df.drop (‘ column ‘, axis = 1) – does not change the original data
  3. Df.drop (‘ column ‘, axis = 1, inplace = True) – Overwrite the original data

Get rid of all empty rows

df.dropna(how=’all’, inplace = True)

Handle outliers – delete/mean/high frequency values

  1. Average – df [‘ column ‘] fillna (df [‘ column ‘] scheme (), inplace = True)
  2. High frequency value -df [‘ column name ‘].value_counts().index[0] -Highest frequency value

Convert a row format

  1. Find all data to convert – add a judgment column to the table

    df[‘rows_with_lbs’] = df[‘weight’].str.contains(‘lbs’).fillna(False) df[‘weight’].str.contains(‘lbs’) – Find the weight column containing the LBS row fillna(False) – assign NaN to False

  2. For I, lbs_row in df[rows_with_lbs].iterrows()

  3. Weight = int(float(lbs_row[‘weight’][:-3])/2.2)

  4. -df.at (I, ‘weight’) df.at(I, ‘weight’) = ‘{} KGS ‘. Format (weight)

Go unless ASCII characters

df[‘first_name’].replace({ r'[\x00 – \x7F]+’ : }, regex = True, inplace = True) replace(old value, new value, regex = True: regular expression support) r – Remove escape characters, used for regular expressions [\x00-\x7F] is equivalent to [x00-x7f]: ASCII characters ranging from 0 to 127

Uniqueness: Split a column with multiple parameters – split(expand = True) – expand refers to split the contents as one column

df[[‘first_name’, ‘last_name’]] = df[‘name’].str.split(expand = True)

Delete duplicate data rows – df.duplicates()

df.drop_duplicates([‘first_name’, ‘last_name’], inplace = True)