Public account: You and the cabin by: Peter Editor: Peter

Hello, I’m Peter

I have written many articles about the operations and functions used in Pandas. Today I created a simulation of fruit orders and sales (for study only) to understand how Pandas can be used flexibly and quickly to meet our requirements.

Pandas articles

recommended

Data interpretation

1. The first data of simulation has 5 fields: order number, ordering person, commodity, price and quantity

  • Order number: The order number of each order. One or more items exist in an order number
  • Placing a single person: a person may place one or more orders, such as Joe placing only one order and Joe placing multiple orders
  • Item: The same item may appear in multiple orders
  • Price: The price of each commodity in each order. In different orders, the price of the same commodity may be different. For example, the price of apple in SOD order is 10, but it is 9.8 in DFH order
  • Quantity: The quantity of each item sold per order

2. There are only two fields in the simulated second data: commodity and place of origin

At the same time, we can see that these two pieces of data are stored in different sheets and stored as XLSLX files without any missing value data.

Requirement 1: Different ways to read data

There are different sheets in the same Excel, we take different ways to read:

Method 1: Specify the name of the file and the name of the sheet

import pandas as pd  # Preload
Copy the code

Method 2: Specify the name of the file and the index number of the sheet. The index starts from 0

Requirement 2: Merger of two pieces of data

As you can see, the data in the two sheets is related by the commodity field. We use the merge function in PANDAS and preserve all the information in the first table (left).

The merge function is a very important function that provides flexibility in handling the data merge problem in Pandas.

The following requirements are processed against the combined data above

Demand 3: order quantity, customer quantity and commodity quantity

Order volume: How many orders were placed in total

The order number field has multiple unique and unique information. A total of seven orders

Same logic: how many users can you get, how many items are sold?

Requirement 4: Orders per user

Calculate the number of orders placed by each user: use groupby to group the number of orders placed by each user.

  • The groupby function is used first
  • Using the aggregate function nUNIQUE, count the number of each “order number” (de-count)
  • And then finally index reset

See Li Si placed 3 orders, is the most

Requirement 5: The total amount spent per user

1. Add a column first: total

2. Grouping and reaggregation in two different ways

Demand 6: order quantity, sales volume and total sales amount of different origin

Demand 7: The highest price item per order

Find the most expensive item in each order. For example, the most expensive item in the SOD order is grapes

Method 1: The first implementation is as follows:

  • First the whole is sorted in descending order
  • Then group them according to the order number and take out the first first data

Method 2: The implementation is as follows

1. First, each order number is arranged in descending order according to price

2, the mixed use of multiple functions, can be run separately to view the results of each step

df.groupby("Order Number").apply(lambda x: x.sort_values("Price",ascending=False)).reset_index(drop=True).groupby("Order Number").first().reset_index()
Copy the code

Method 3: Use the groupby_keys parameter when grouping groups

Demand 8: The top 2 highest prices per order

Take out the top two prices in each order, if only one take out one.

Above is the highest data after grouping, namely the first first. In this requirement we use the head function, which can fetch any n data: top-n

Demand 9: Unit price of pens per item (keep 2 decimal places)

Let’s break it down.

  • For each item: groupby=” item”
  • Pen unit price: the first total sales of each commodity, in the order number of each commodity, and finally divide

How to reserve two decimal places for the unit price of the commodity pen above? There are two ways to do this: