Public account: Youerhuts author: Peter editor: Peter

My name is Peter, and I’m here to illustrate two important functions in Pandas: stack and unstack.

Stack and unstack are two ways to rearrange the shafts of pandas, and they are mutually inverse:

  • Stack: rotates data columns into row index
  • Unstack: Rotates the data row index into columns
  • Both default to the innermost layer

Pandas serialized articles

This is the 16th article to be updated by Pandas.

Here are some detailed examples of how to use both

stack

The main function of the stack function is to turn the original column into the innermost row index, after the transformation is multi-level index. Official text:

Stack the prescribed level(s) from columns to index.

Use method:

pd.stack(level=-1, dropna=True)
Copy the code
  • Level means that the transformation is the innermost level
  • Dropna represents the processing of missing values

This is illustrated by a diagram on the website: the column property AB becomes the row index AB

Stack the single-layer DataFrame

import pandas as pd 
import numpy as np
Copy the code

Take a look at the default:

We find that the index of DF2 has also become a multilevel index:

One more feature: When we stack a single-layer DataFrame, it becomes a Series:

Stack the multilayer DataFrame

First we generate a multilevel column number type

Simulate a multilevel column of attributes:

Look at the analog data DF3 for more information:

type(df3)
pandas.core.frame.DataFrame

df3.index
Index(['Ming'.'little red'], dtype='object')

df3.columns
MultiIndex([('information'.'sex'),
            ('information'.'weight') ",Copy the code

Look at the data after the stack:

The comparison

Compare the original data with the generated new data:

1. Index comparison

2. Column attribute comparison

3. Data type comparison

Parameter level

Level controls the stacking of one or more attributes; You can use numeric indexes or name indexes.

Simulate a multilevel column attribute data:

multicol2 = pd.MultiIndex.from_tuples([('weight'.'kg'),  # Multilevel column properties
                                       ('height'.'m')],
                                     name=["col"."unit"])

data1 = pd.DataFrame([[1.0.2.0], [3.0.4.0]],
                     index=['cat'.'dog'],
                     columns=multicol2
                    )

data1
Copy the code

We can see that data1’s column properties are multilayered:

data1.columns

# the results
MultiIndex([('weight'.'kg'),
            ('height'.'m')],
           names=['col'.'unit'])
Copy the code

We can also stack using the name of the number of columns:

Do the same for another “col” :

You can also operate on more than one at a time, specifying a name or index number:

Parameter dropna

What do we do if there are missing values in the original data? To simulate a piece of data with missing values:

data2 = pd.DataFrame([[None.2.0].# Introduce a missing value
                      [4.0.6.0]],
                     index=['cat'.'dog'],
                     columns=multicol2)
data2
Copy the code

The default value is True, which removes both missing values:

If we change this to False, we will keep data that is also NaN:

unstack

Change the innermost row index to a column: that is, the row index AB becomes the column property AB

Method of use

Unstack is the inverse of stack, turning the innermost row index into a column

unstack(level=- 1, fill_value=None)
Copy the code
  • Level: indicates the index level at which the operation is performed. It can be a name
  • Fill_value: If missing values are generated when we operate, we can fill them with the specified values

Parameter level

Unstack (1) unstack(1) unstack(1)

We use the previous generated data, and then we operate on DF5.

1. Default operations on the unstack: The default operation is on the innermost layer

2, we change the row index to 0, and we can also use the row name as the parameter value

Parameter fill_value

The purpose of this parameter is to fill the missing value with the specified data when we operate on the unstack.

We use the previous DF6 data box to do this:

Using the default of unstack: produces two null values

Fill in the resulting missing values:

  • The default is used
  • Use the name
  • Use index number

conclusion

Stack and unstack are used to stack and unstack data of the Series or DataFrame type. By default, they are used to stack and unstack data of the Series or DataFrame type. They are used to invert each other.