The assign function is provided for Pandas

This article introduces a very useful function from the LIBRARY Pandas: assign

Assign is very convenient when we need to evaluate a column to create a new column for later use. This is equivalent to creating a new column based on a known column. The following uses an example to illustrate the use of the function.

Pandas articles

This is the 21st article in the Pandas series, which is divided into three categories:

The basic operations in Pandas (1 to 16 chapters) are introduced to the basic and common operations in Pandas, such as creating data, retrieving and querying data, ranking and sorting, and missing/duplicate value handling

Chapter 17 begins with the advanced operations used in Pandas

Compare THE OPERATIONS of SQL and Pandas to learn Pandas

parameter

Assign takes only one parameter: datafame. Assign (**kwargs).

**kwargs: dict of {str: callable or Series}
Copy the code

A few notes on the parameters:

Column names are keyword keywords
If column names are callable, they are evaluated on the DataFrame and assigned to the new column
If the column name is not callable (for example, Series, scalar Scalar, or array Array), it will be allocated directly

Finally, the return value of this function is a new DataFrame data box containing all existing and newly generated columns

Import libraries

import pandas as pd
import numpy as np
Copy the code

# Simulation data

df = pd.DataFrame({
  "col1": [12.16.18]."col2": ["xiaoming"."peter"."mike"]})

df
Copy the code

.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }

	col1	col2
0	12	xiaoming
1	16	peter
2	18	mike

The instance

When the value is callable, we calculate it directly on the data box:

Method 1: Directly invoke the data enclosure

# Method 1: call on data box DF
# Generate col3 using the col1 property of data box DF

df.assign(col3=lambda x: x.col1 / 2 + 20)  
Copy the code

.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }

	col1	col2	col3
0	12	xiaoming	26.0
1	16	peter	28.0
2	18	mike	29.0

We can look at the original DF and see that it’s constant

df  The original data box is unchanged
Copy the code

.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }

	col1	col2
0	12	xiaoming
1	16	peter
2	18	mike

Manipulating string data:

df.assign(col3=df["col2"].str.upper())
Copy the code

Approach 2: Call Series data

The same behavior can be achieved by referring directly to an existing Series or sequence:

# Method 2: Call the existing Series to calculate

df.assign(col4=df["col1"] * 3 / 4 + 25)
Copy the code

df  The original data remains unchanged
Copy the code

.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }

	col1	col2
0	12	xiaoming
1	16	peter
2	18	mike

In Python3.6+, we can create multiple columns in the same assignment, and one of the columns can depend on another column defined in the same assignment. The new column generated in the middle can be used directly:

df.assign(
    col5=lambda x: x["col1"] / 2 + 10,         
    col6=lambda x: x["col5"] * 5.# Use COL5 directly in COL6 calculations
    col7=lambda x: x.col2.str.upper(),         
    col8=lambda x: x.col7.str.title()  Col7 is used in # col8
)
Copy the code

df   The original data remains unchanged
Copy the code

.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: left; }

	col1	col2
0	12	xiaoming
1	16	peter
2	18	mike

If we reassign an existing column, the value of the existing column will be overwritten:

df.assign(col1=df["col1"] / 2)  # col1 is directly overwritten
Copy the code

.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: left; }

	col1	col2
0	6.0	xiaoming
1	8.0	peter
2	9.0	mike

Compare the Apply function

We can also use the apply function in pandas

df  # the original data
Copy the code

.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: left; }

	col1	col2
0	12	xiaoming
1	16	peter
2	18	mike

To generate a copy, we operate directly on the copy:

df1 = df.copy()  Create a copy and operate directly on the copy
df2 = df.copy()

df1
Copy the code

.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: left; }

	col1	col2
0	12	xiaoming
1	16	peter
2	18	mike

df1.assign(col3=lambda x: x.col1 / 2 + 20)  
Copy the code

.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: left; }

	col1	col2	col3
0	12	xiaoming	26.0
1	16	peter	28.0
2	18	mike	29.0

df1  # df1 remains the same
Copy the code

.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: left; }

	col1	col2
0	12	xiaoming
1	16	peter
2	18	mike

df1["col3"] = df1["col1"].apply(lambda x:x / 2 + 20)

df1  # dF1 has changed
Copy the code

.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: left; }

	col1	col2	col3
0	12	xiaoming	26.0
1	16	peter	28.0
2	18	mike	29.0

We find that with assign, the original data remains the same, but with apply, the data has changed

BMI

Finally, a data simulation was performed to calculate each person’s BMI.

Body mass index, or BMI, is an internationally used measure of fat, thinness and health.

${BMI} = \frac {weight}{height ^2}$

The weight unit is kg and the height unit is M

df2 = pd.DataFrame({
    "name": ["xiaoming"."xiaohong"."xiaosu"]."weight": [78.65.87]."height": [1.82.1.75.1.89]
})

df2
Copy the code

.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: left; }

	name	weight	height
0	xiaoming	78	1.82
1	xiaohong	65	1.75
2	xiaosu	87	1.89

Use assign

df2.assign(BMI=df2["weight"] / (df2["height"] * *2))
Copy the code

df2 # stays the same
Copy the code

.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: left; }

	name	weight	height
0	xiaoming	78	1.82
1	xiaohong	65	1.75
2	xiaosu	87	1.89

df2["BMI"] = df2["weight"] / (df2["height"] * *2)

df2  # df2 generates a new column: BMI
Copy the code

conclusion

Through the above example, we find that:

The assigned DataFrame does not change the original DataFrame. This DataFrame is new
Assign can operate on multiple column names at the same time, and the intermediate column names can be used directly
The main difference between Assign and apply is that the former does not change the original data. The Apply function adds new columns to the original data

The assign function is provided for Pandas

Pandas articles

parameter

Import libraries

The instance

Method 1: Directly invoke the data enclosure

Approach 2: Call Series data

Compare the Apply function

BMI

conclusion

Related Posts

Analysis of earthquake distribution in the past 10 years using Python

Artificial Intelligence Data Solutions: How to Create New Advantages for Retail Enterprises Based on AI?

【ReID】Circle Loss: A Unified Perspective of Pair Similarity Optimization