“This is the first day of my participation in the First Challenge 2022. For details: First Challenge 2022.”

Hello, I’m Ding Xiaojie. The source of the case in this paper is the book “Data Analysis Practice”, which uses R language. In the following period of time, I will use Python + Tableau to reproduce the case as much as possible for everyone to learn.

Scene description

The monthly sales of a small APP game operated by a certain company rose steadily, but suddenly dropped in July. No matter from the market environment or the environment of the game itself, the sales of this game still has room for continued growth. There may be two main factors affecting the sales of this game:

  • New on the game event
  • Change in commercial publicity

Come to light

  • The game activity has not changed much from last month’s activity
  • Publicity has declined because of budget constraints

Next, we will analyze the data to confirm the above conclusions and propose ways to recover sales.

Data description

Daily Active Users

139,112 lines of user data that visits at least once a day.

field type meaning
log_date str Access time
app_name str The application of
user_id numpy.int64 The user ID
DPU (Daily Payment User)

Data of users who spend at least 1 yen per day (0.056 yuan), 884 lines.

field type meaning
log_date str Access time
app_name str The application of
user_id numpy.int64 The user ID
payment numpy.int64 Consumption amount
Install

Log the first time each user logs into the game, 29,329 lines.

field type meaning
install_date str First Login Time
app_name str The application of
user_id numpy.int64 The user ID

The data analysis

Data is read

Read three data sets.

import pandas as pd

DAU_data = pd.read_csv('DAU.csv')
DPU_data = pd.read_csv('DPU.csv')
install_data = pd.read_csv('install.csv')
Copy the code

Display the top five rows of the DAU dataset.

DAU_data.head()
Copy the code

The first five lines of the DPU data set are displayed.

DPU_data.head()
Copy the code

Display the first five lines of the Install dataset.

install_data.head()
Copy the code

Data consolidation

Merge daily active user data DAU with user’s first login data Install, and use user_id and app_name as reference keys. This will give you the user’s first login time.

all_data = pd.merge(DAU_data,
                    install_data,
                    on=['user_id'.'app_name'])
all_data.head()
Copy the code

After obtaining the user’s first login time, it is merged with the daily consumption user data DPU. All data in ALL_data is retained in left-connection mode, and the missing value is NaN by default.

all_data = pd.merge(all_data,
                    DPU_data,
                    on=['log_date'.'app_name'.'user_id'],
                    how='left')
all_data.head()
Copy the code

The data processing

Fill the null value in the Payment column with 0.

all_data['payment'] = all_data['payment'].fillna('0')
all_data
Copy the code

Convert the units of the Payment column to units, leaving only months for log_date and install_date.

all_data['payment'] = all_data['payment'].astype(float)
all_data['payment'] = all_data['payment'] * 0.056
all_data['log_date'] = pd.to_datetime(all_data['log_date']).map(lambda x : x.strftime("%m") [1] + 'month')
all_data['install_date'] = pd.to_datetime(all_data['install_date']).map(lambda x : x.strftime("%m") [1] + 'month')
all_data.head()
Copy the code

New and old users are divided

Users whose login month > first login month are defined as old users, and others are defined as new users.

all_data['user'] = all_data['log_date'] > all_data['install_date']
all_data['user'] = all_data['user'].map({False: 'New user'.True: 'Regular users'})
all_data.head()
Copy the code

Payment is summed by log_date and user groups to calculate the monthly sales of old and new users.

user_category = all_data.groupby(['log_date'.'user'[])'payment'].sum().reset_index()
user_category.head()
Copy the code

It can be seen that sales from existing users in June and July are basically the same, but sales from new users in July are significantly less than that in June.

Divide sales into regions to see which tiers of users are spending less.

import numpy as np

sale_df = all_data.loc[all_data['payment'] > 0['log_date'.'payment']].copy()
bins = list(range(0.101.30)) + [np.inf]
labels = [str(n) + The '-' + str(n + 30) for n in bins[:-2]] + ['> 90']
sale_df['payment_cut'] = sale_df['payment'].apply(lambda x : pd.cut([x], bins=bins, labels=labels)[0])
sale_df.head()
Copy the code

As can be seen from the bar chart above, compared with June, the number of users spending less than 60 yuan decreased in July.

Here we can draw some conclusions.

conclusion

The number of new users who consume decreased, especially those who consume small amounts of money. As a result, the company needs to launch a commercial campaign again and return to the previous level in order to increase awareness of the company’s products among potential users and add new users.

Case reference [1] Data Analysis Practice [J] Akinamura Takashi Yohei/translated by Xiao Feng


For those who are new to Python or want to learn Python, you can search “Python New Horizons” on wechat to communicate and learn together. They are all beginners. Sometimes a simple question is stuck for a long time, but others may realize it at a touch. There are also nearly 1,000 resume templates and hundreds of e-books waiting for you to collect!