Public account: You and the cabin by: Peter Editor: Peter

Hello, I’m Peter

In this article, there are four methods for converting a column and column in Pandas:

  • melt
  • Transpose T or transpose
  • wide_to_long
  • Explode (explode function)

Finally, answer a reader’s question about data processing.

Pandas Column conversion

There are a number of ways to do this:

Import libraries

import pandas as pd
import numpy as np
Copy the code

Function melt

Melt main parameters:

pandas.melt(frame, 
            id_vars=None, 
            value_vars=None, 
            var_name=None, 
            value_name='value',
            ignore_index=True,  
            col_level=None)
Copy the code

The following describes the meanings of parameters:

  • Frame: indicates the DataFrame of the DataFrame to be processed.

  • Id_vars: specifies the column names that do not need to be converted

  • Value_vars: specifies the name of the column to be converted. If all the remaining columns need to be converted, do not write it

  • Var_name and value_name: user-defined column names, equivalent to taking a new column name

  • Igonore_index: whether to ignore the original column name. The default value is True. The original index name is ignored and 0,1,2,3,4…. is generated Natural index of

  • Col_level: this parameter is used if the column is MultiIndex. Use this parameter sparingly

Simulated data

Data to be converted: frame
df = pd.DataFrame({"col1": [1.1.1.1.1]."col2": [3.3.3.3.3]."col3": ["a"."a"."a"."b"."b"]
                  })
df
Copy the code

id_vars

value_vars

Use both of the above parameters:

Convert multiple column attributes simultaneously:

Var_name and value_name

pd.melt(
    df,
    id_vars=["col1"].# stays the same
    value_vars=["col3"].# change
    var_name="col4".# New column name
    value_name="col5" # New column name for the corresponding value
)
Copy the code

ignore_index

The default is to generate a natural index:

We can change this to False and use the original index:

Transpose function

The T attribute in pandas or the transpose function is used to transpose rows and columns

Simple transpose

To simulate a piece of data, see the result of transpose:

Transpose using the transpose function:

There is another way to do this: transpose values and swap index and column names:

Finally, a simple example:

Wide_to_long function

Convert a dataset from a wide format to a long format

wide_to_long(
    df,
    stubnames,
    i,
    j,
    sep: str = "",
    suffix: str = "\\d+"
Copy the code

Specific explanation of parameters:

  • Df: indicates the data enclosure to be converted
  • Stubnames: The saved portion of a wide table with the same column name
  • I: The column to be used as the ID variable
  • J: Set columns for long “suffix” columns
  • Sep: Sets the separator to delete. For example, if columns are A-2020, specify sep=’-‘ to delete the delimiter. The default value is null.
  • Suffix: Obtain the suffix by setting the regular expression. The default ‘\d+’ takes a numeric suffix. No numeric “suffixes” can be obtained by ‘\D+’

Simulated data

The conversion process

Use functions to perform transformations:

Setting up multiple indexes

First simulate a piece of data:

If you are not used to multi-tier indexes, you can convert to the following format:

Sep and suffix

df5 = pd.DataFrame({
    'a': [1.1.2.2.3.3.3].'b': [1.2.2.3.1.2.3].'stu_one': [2.8.2.9.1.8.1.9.2.2.2.3.2.1].'stu_two': [3.4.3.8.2.8.2.4.3.3.3.4.2.9]
})
df5
Copy the code

pd.wide_to_long(
    df5, 
    stubnames='stu', 
    i=['a'.'b'], 
    j='number',
    sep='_'.# if a hyphen exists in the column name; The default is empty
    suffix=r'\w+')  # regular expression based suffixes; The default is number \d+; Let me change this to \w+ for letters
Copy the code

Explode function -explode

explode(column, ignore_index=False)
Copy the code

This function takes only two arguments:

  • Column: indicates the element to be exploded
  • Ignore_index: indicates whether to ignore the index. The default is False, keeping the original index

Simulated data

Single field explosion

Perform explosion procedure on single field, convert wide table to long table:

Parameter ignore_index

Multiple field explosion

The process of exploding multiple fields in succession:

Readers disambiguation

Answer a reader’s question here, using simulated data. There is such a data as follows, demand:

The proportion of each fruit in each shop

fruit = pd.DataFrame({
    "shop": ["shop1"."shop3"."shop2"."shop3"."shop2"."shop1"."shop3"."shop2"."shop3"."shop2"."shop3"."shop2"."shop1"]."fruit": [The word "apple"."Grapes"."Banana".The word "apple"."Grapes"."Orange"."Pear"."Cantaloupe"."Grapes"."Banana".The word "apple"."Grapes"."Orange"]."number": [100.200.340.150.200.300.90.80.340.150.200.300.90]})
fruit
Copy the code

First, we need to count the sales volume of each shop and each fruit

Method 1: Multi-step

Method 1 adopts multi-step solution:

1. Total sales volume of each shop

2. Add the shop_sum column

3. Proportion of generation

Method 2: Use the transform function