This is the seventh day of my participation in the August More text Challenge. For details, see: August More Text Challenge

Data grouping consists of grouping data into groups based on one or more keys (which can be functions, arrays, or DF column names), summarizing the grouped data, and combining the results. Functions that are used as summary calculations are called aggregate functions.

The table used in this article is as follows:

Let’s look at the data first

Import pandas as pd life_df = pd.read_excel(r 'c :\Users\admin\Desktop\ life_df. XLSX ') print(life_df)Copy the code

result:

Category Number Name 0 fruit 0 Apple 1 Fruit 1 Orange 2 groceries 2 Toothbrush 3 Groceries 3 Refrigerator 4 Groceries 4 TV 5 Food 0 Apple 6 Groceries 1 Orange 7 Appliances 3 refrigerator 8 Appliances 4 TV 9 Large items 3 refrigerator 10 Large 4 TV 11 large 5 tea table 12 Daily necessities 7 hand warming baby 13 novel 8 dream of red MansionsCopy the code

The specific process of data grouping is as follows

1 Group by column

Life_df = pd.read_excel(r 'c :\Users\admin\Desktop\ life_df.groupby(" category "))Copy the code

result:

<pandas.core.groupby.generic.DataFrameGroupBy object at 0x000001894A41C708>
Copy the code

As you can see from the above results, if you just pass in the column name, the result of grouping is a DataFrameGroupBy object. This object contains a number of groups of data, but it is not directly displayed. The group data needs to be summarized before it is displayed

Life_df = pd.read_excel(r 'c :\Users\admin\Desktop\ table.xlsx ') print(life_df.groupby(" category ").count())Copy the code

result:

3 3 Home appliances 2 2 Novel 1 1 Fruit 2 2 daily necessities 4 4 Food 2 2Copy the code

The above code groups all the data according to the classification of items, then counts the grouped data separately, and finally merges them.

Since the grouped data is counted, each column has a result. However, if you perform numeric operations on the grouped results, only columns of numeric type (int, float) will be evaluated

Import pandas as pd life_df = pd.read_excel(r 'c :\Users\admin\ desktop.xlsx ') print(life_df.groupby(" category ").sum())Copy the code

result:

Serial Number Classification large items 12 household appliances 7 novels 8 fruit 1 daily necessities 16 food 1Copy the code

The operation of summarizing the grouped data is called aggregation, and the functions used are called aggregation functions. For example, the nonnull count, sum, maximum and minimum values, mean, median, mode, variance, standard deviation, and quantile were improved in the previous series. Are aggregate functions.

2 Group the group by multiple columns

Multi-column grouping is similar to single-column grouping, as long as the names of the columns are passed to GroupBy () as a list.

Life_df = pd.read_excel(r 'c :\Users\admin\Desktop\ life_xlsx ') print(life_df.groupby([" category "]).count())Copy the code

result:

Large refrigerator 1 TV set 1 Tea table 1 Home appliance refrigerator 1 TV set 1 novel Dream of red Chamber 1 fruit orange 1 Apple 1 Daily necessities Refrigerator 1 hand warmer baby 1 toothbrush 1 TV set 1 Food orange 1 Apple 1Copy the code
Life_df = pd.read_excel(r 'c :\Users\admin\Desktop\.xlsx') print(life_df.groupby([" category ", "name "]).sum())Copy the code

result:

Large refrigerator 3 TV set 4 Tea table 5 Home appliance refrigerator 3 TV set 4 Novel Dream of red Chamber 8 Fruit orange 1 Apple 0 Groceries Refrigerator 3 hand warming baby 7 toothbrush 2 TV set 4 Food orange 1 apple 0Copy the code

3 Use different columns for grouping and aggregation

In the above two methods (1 and 2), all computable columns are computed as long as the aggregated calculation is performed directly on the grouped data, no matter it is one column or multiple columns. Sometimes you don’t need to evaluate all columns, but you can specify which columns to evaluate (single or multiple columns)

Life_df = pd.read_excel(r 'c :\Users\admin\Desktop\ life_xlsx ') print(life_df.groupby(" category ")[" name "].count())Copy the code

result:

3 Home appliance 2 novel 1 fruit 2 daily necessities 4 food 2 Name: dtype: int64Copy the code

Here is according to the classification of items to be grouped, and then according to the name of the summary statistics