Retained analysis

The definition of retention is very simple, and that is if a user used our platform yesterday, and they use it today, it’s retention, which is one day retention, and if they come back tomorrow, it’s two days retention. For example, if a user uses our product for the first time on May 1st. If he still uses it on May 2, then he will use it again on May 1, then he will use it again on May 3, and then he will use it again on May 1.

Before we say first why retained analysis, retained analysis can reflect the potential of an application or the well-being, why say so, if an application retained is very low, that means the user comes at a time to come, we, for example, if a company is new every day also can, but if the user is very low, So it shows one thing, that is, the user may be come out to buy, such as advertising or drainage, and then after the user registration of application is not interested in or is very poor, with not not happy then don’t come, of course, is a possible data is false, I’ve experienced such a company, with thirty thousand m new subscribers a day or so, Retention was poor and I left, stayed for less than a month, and then I heard the company went out of business.

Another point is that retention is generally a concept for new users, which is how active they are a few days after signing up on a given day, measured on a daily basis. The average demand is for a few days of retention of new users within a period of time, or retention of new users on a given day, but this is not always the case. Retention of active users is also important. Retention of existing users is also important

Of course, whether it’s a day or a period of time depends on the business of the company. For example, if a user has opened a new advertising channel for a period of time, it might be possible to focus on that channel or retention of new users during that period to better measure the ROI of the channel

Although retention analysis is important, it is not very complicated to calculate, so let’s analyze how it is calculated

Calculations for new users

Earlier we talked about retention analytics and we’re generally looking at new users, so one of the first things we do is count new users, and for that count we have two ideas

  1. Synchronize the new users directly from the business library, because the business system must have user tables, so we can synchronize directly
  2. Data platform maintenance, we can appear on the platform of the user maintenance down, then appeared on the platform of the user is the new user

We mainly introduce the first kind, and we will introduce the second kind separately later, and compare the differences between the two.

Active Retention

First we take a look at what is the definition of a retained, we think today if the user on the platform, so it also appeared on the platform tomorrow so we call it a day, in the same way if after 30 days also appeared on the platform, we call for 30, it is important to note here, does not require continuous attention here.

Let’s forget about new users, any user can actually be an active user, let’s say we have a user behavior log table DWD_JYxD_PATIent_page_view_di, and then we want to calculate retention for the last 30

select a.ds, count(distinct a.union_id) as cnt, The sum (the if (datediff (to_date (a. d. s, 'yyyymmdd'), to_date (b.d s, 'yyyymmdd')) = 1, 0)) as cnt1, The sum (the if (datediff (to_date (a. d. s, 'yyyymmdd'), to_date (b.d s, 'yyyymmdd')) = 2, 0)) as cnt2, The sum (the if (datediff (to_date (a. d. s, 'yyyymmdd'), to_date (b.d s, 'yyyymmdd')) = 3, 0)) as cnt3, The sum (the if (datediff (to_date (a. d. s, 'yyyymmdd'), to_date (b.d s, 'yyyymmdd')) = 11, 0)) as cnt11, The sum (the if (datediff (to_date (a. d. s, 'yyyymmdd'), to_date (b.d s, 'yyyymmdd')) = 0) 14) as cnt14, The sum (the if (datediff (to_date (a. d. s, 'yyyymmdd'), to_date (b.d s, 'yyyymmdd')) = 15, 0)) as cnt15, The sum (the if (datediff (to_date (a. d. s, 'yyyymmdd'), to_date (b.d s, 'yyyymmdd')) = 0) 21) as cnt21, The sum (the if (datediff (to_date (a. d. s, 'yyyymmdd'), to_date (b.d s, 'yyyymmdd')) = 28, 0)) as cnt28, Sum (if(to_date(a.date,'yyyymmdd'),to_date(b.date,'yyyymmdd'))=30,1,0))as cnt30 from (select union_id,ds from union_id dwd_jyxd_patient_page_view_di where ds='${ds}' group by union_id,ds ) a left join ( select union_id,ds from dwd_jyxd_patient_page_view_di where ds>=to_char(date_sub(to_date('${ds}','yyyymmdd'),30),'yyyymmdd') group by union_id,ds ) b on a.union_id=b.union_id and a.ds>=b.ds group by a.ds ;Copy the code

Here’s a quick look at the code

  1. Our driver table is A, and it should be noted that we have removed the data according to the user. Ds is the business date, so we have removed the data according to the user
  2. In the second table b, we obtained the data of the past 30 days, and then deleted the data according to date and users
  3. Then we associate the first table with the second table. If the ds difference between the two tables is equal to 1, it means one-day retention. Similarly, if the ds difference is equal to 30, it means 30-day retention.
  4. Need to be aware of when, according to the definition of our retained 30 retained refers to 29 days before the day of the land users, on how many people today, that is to say, we should be in 30 days before the data correlation data computing today, but because is equivalent inner relationship, so we could use the current data correlation historical data is the same.

Here are the results

ds cnt cnt1 cnt2 cnt3 cnt11 cnt14 cnt15 cnt21 cnt28 cnt30 + + -- -- -- -- -- -- -- -- + + -- -- -- -- -- -- -- -- -- -- + + -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- - + -- -- -- -- -- - + -- -- -- -- -- - + -- -- -- -- -- - + -- -- -- -- -- - + 20210621 3933 1462 1416 1436 1262 1201 1166 0 0 0Copy the code

Retention of new users

Next we introduce the logic of new users, as we said before we can get the new users directly from the business library, and then do the calculation.

select a.ds, count(distinct a.union_id) as cnt, The sum (the if (datediff (to_date (a. d. s, 'yyyymmdd'), to_date (b.d s, 'yyyymmdd')) = 0, 0)) as cnt0, The sum (the if (datediff (to_date (a. d. s, 'yyyymmdd'), to_date (b.d s, 'yyyymmdd')) = 1, 0)) as cnt1, The sum (the if (datediff (to_date (a. d. s, 'yyyymmdd'), to_date (b.d s, 'yyyymmdd')) = 2, 0)) as cnt2, The sum (the if (datediff (to_date (a. d. s, 'yyyymmdd'), to_date (b.d s, 'yyyymmdd')) = 3, 0)) as cnt3, The sum (the if (datediff (to_date (a. d. s, 'yyyymmdd'), to_date (b.d s, 'yyyymmdd')) = 11, 0)) as cnt11, The sum (the if (datediff (to_date (a. d. s, 'yyyymmdd'), to_date (b.d s, 'yyyymmdd')) = 0) 14) as cnt14, The sum (the if (datediff (to_date (a. d. s, 'yyyymmdd'), to_date (b.d s, 'yyyymmdd')) = 15, 0)) as cnt15, The sum (the if (datediff (to_date (a. d. s, 'yyyymmdd'), to_date (b.d s, 'yyyymmdd')) = 0) 21) as cnt21, The sum (the if (datediff (to_date (a. d. s, 'yyyymmdd'), to_date (b.d s, 'yyyymmdd')) = 28, 0)) as cnt28, The sum (the if (datediff (to_date (a. d. s, 'yyyymmdd'), to_date (b.d s, 'yyyymmdd')) = 0) 30) as cnt30 from (select ds, union_id from ods_lo_applet_user_df where ds='${ds}' and date(create_time)='2021-06-01' ) a left join ( select union_id,ds from dwd_jyxd_patient_page_view_di where ds>=to_char(date_sub(to_date('${ds}','yyyymmdd'),30),'yyyymmdd') group by union_id,ds ) b on a.union_id=b.union_id and a.ds>=b.ds group by a.ds ;Copy the code

Here, we directly use the new user of 2021-06-01 to calculate its 30-day retention. To explain, since the system has not been online for 30 days, there is no 30-day retention. Here, I also calculate CNT0 specially, and you will find that it is not equal to the new number of users, which indicates that users did not register on the platform on the same day.

ds cnt cnt0 cnt1 cnt2 cnt3 cnt11 cnt14 cnt15 cnt21 cnt28 cnt30 + - + - + + -- -- -- -- -- -- -- -- -- -- + + -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- - + -- -- -- -- -- - + -- -- -- -- -- - + -- -- -- -- -- - + -- -- -- -- -- - + -- -- -- -- -- - + 20210621 792 21 23 25 and 30 and 48 0 0 0Copy the code

Create_time between’2021-06-01’and ‘2021-06-10’ date(create_time) between’2021-06-01’and ‘2021-06-10’

Retention rates

Actually about retention rates actually and retained essentially no big difference, but is not the same as their feedback information, retained only a specific number, such as I told you 3 retained is 500, what can you get that information, not because you do not have reference, can’t tell exactly how many users keep 500, So we introduced the concept of retention.

Actually there is a problem is we calculate the above new users retained when noticed that not all of our new user to platform, which is not login after registration, so this time you should make it clear what do you think the demand side is the problem, do you want in the first day of users as the foundation to calculate retention rates or still have other method to calculate, We’re going to do it in terms of registered users.

select a.ds, count(distinct a.union_id) as cnt, Round (sum (if (datediff (to_date (a. d. s, 'yyyymmdd'), to_date (b.d s, 'yyyymmdd')) = 0, 0))/count (distinct a.u nion_id), 2)as rate0, Round (sum (if (datediff (to_date (a. d. s, 'yyyymmdd'), to_date (b.d s, 'yyyymmdd')) = 1, 0))/count (distinct a.u nion_id), 2)as rate1, Round (sum (if (datediff (to_date (a. d. s, 'yyyymmdd'), to_date (b.d s, 'yyyymmdd')) = 2, 0))/count (distinct a.u nion_id), 2)as rate2, Round (sum (if (datediff (to_date (a. d. s, 'yyyymmdd'), to_date (b.d s, 'yyyymmdd')) = 3, 0))/count (distinct a.u nion_id), 2)as rate3, Round (sum (if (datediff (to_date (a. d. s, 'yyyymmdd'), to_date (b.d s, 'yyyymmdd')) = 11, 0))/count (distinct a.u nion_id), 2) the as rate11, Round (sum (if (datediff (to_date (a. d. s, 'yyyymmdd'), to_date (b.d s, 'yyyymmdd')) = 0) 14)/count (distinct a.u nion_id), 2) the as rate14, Round (sum (if (datediff (to_date (a. d. s, 'yyyymmdd'), to_date (b.d s, 'yyyymmdd')) = 15, 0))/count (distinct a.u nion_id), 2) the as rate15, Round (sum (if (datediff (to_date (a. d. s, 'yyyymmdd'), to_date (b.d s, 'yyyymmdd')) = 0) 21)/count (distinct a.u nion_id), 2) the as rate21, Round (sum (if (datediff (to_date (a. d. s, 'yyyymmdd'), to_date (b.d s, 'yyyymmdd')) = 28, 0))/count (distinct a.u nion_id), 2) the as rate28, Round (sum (if (datediff (to_date (a. d. s, 'yyyymmdd'), to_date (b.d s, 'yyyymmdd')) = 0) 30)/count (distinct a.u nion_id), 2) the as rate30 from ( select ds, union_id from ods_lo_applet_user_df where ds='${ds}' and date(create_time)='2021-06-01' ) a left join ( select union_id,ds from dwd_jyxd_patient_page_view_di where ds>=to_char(date_sub(to_date('${ds}','yyyymmdd'),30),'yyyymmdd') group by union_id,ds ) b on a.union_id=b.union_id and a.ds>=b.ds group by a.ds ;Copy the code
ds cnt rate0 rate1 rate2 rate3 rate11 rate14 rate15 rate21 rate28 rate30 + -- -- -- -- -- - + - + -- -- -- -- -- - + -- -- -- -- -- - + -- -- -- -- -- - + -- -- -- -- -- - + -- -- -- -- -- -- -- + + -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- + + -- -- -- -- -- -- -- -- -- -- -- -- + 20210621 792 0.03 0.03 0.03 0.02 0.04 0.05 0.06 0.0 0.0 0.0 0.0Copy the code

Here we can see that there are 792 registered users on day 20210621, with a retention rate of 0.03 for the day and 0.06 for the day 15, which makes the numbers much clearer.

conclusion

  1. Understand the definition of retention and how it counts
  2. Make a distinction between retention of new users and retention of existing/active users