Continue with the last intensive reading of 15 LOD Expressions, and summarize the Top 15 LOD Expressions from 9 to 15 scenes in this article.

9. Value of the last day in a period

How to realize the trend chart of the average daily closing price of a stock compared with the closing price on the last day of the month?

As shown in the figure, you are comparing the closing prices on the last day of the month rather than a time period, so you must use LOD expressions.

Imagine the original table as follows:

Date Ticker Adj Close
29/08/2013 SYMC The $1
28/08/2013 SYMC $2
27/08/2013 SYMC $3

We performed the aggregation according to month as the horizontal axis, and calculated AVG ([Adj Close]) as the vertical axis. But to calculate the comparison we need a Max Date field as follows:

Date Ticker Adj Close Max, Date
29/08/2013 SYMC The $1 29/08/2013
28/08/2013 SYMC $2 29/08/2013
27/08/2013 SYMC $3 29/08/2013

If we use the Max (Date) expression, we can see the result of Max Date after aggregation:

Month of Date Ticker Avg, Adj Close Max, Date
08/2013 SYMC $2 29/08/2013

The reason is that Max (Date) is an aggregate expression and only works with group by aggregate SQL. Sum ([Close value on last day]); sum([Close value on last day])

[Close value on last day] = if [Max Date] = [Date] then [Adj Close] else 0 end

The problem is that this expression computes the level of detail in the day granularity, so Max (Date) can’t calculate it in the day granularity:

Date Ticker Adj Close Max, Date
29/08/2013 SYMC The $1
28/08/2013 SYMC $2
27/08/2013 SYMC $3

The reason is that, as mentioned above, aggregate expressions cannot appear at the non-aggregate detail level. So we can easily use {include: Max ([Date])} :

Date Ticker Adj Close { include : max([Date]) }
29/08/2013 SYMC The $1 29/08/2013
28/08/2013 SYMC $2 29/08/2013
27/08/2013 SYMC $3 29/08/2013

The {include: Max ([Date])} expression does not have an include parameter, which means that the current view is always evaluated at the detail level, so this field can be pushed down to the list to be evaluated on every row in the list. Then assemble the expression in the same way.

To expand, if we aggregate the horizontal axis year by year, then the logarithm ratio is the closing price on the last day of each year. Max ([Date]) {include: Max ([Date]) {include: Max ([Date])} {include: Max ([Date]) {include: Max ([Date])}

Date Ticker Adj Close [Close value on last day]
31/12/2013 SYMC The $1 The $1
30/12/2013 SYMC $2 The $1
. . . .
03/01/2013 SYMC $7 The $1
02/01/2013 SYMC $8 The $1
01/01/2013 SYMC $9 The $1

Sum ([Close value on last day]).

10. Repurchase array

As shown in the figure below, we want to view the repurchase array of the quarter between the customer’s first purchase and the second purchase:

The key is how to calculate the quarterly time difference between the first and second purchase. [1st Purchase] = {fixed [Customer ID] : min([order date])}

How to calculate the second purchase time? Here’s a little trick. [repeat Purchase] = iIF ([order date] > [1st Purchase], [Order date], NULL); [2nd Purchase] = {fixed [customer ID] : min([repeat purchase])} [2nd Purchase] = {fixed [customer ID] : min([repeat purchase])}

Finally, use the datediff function to get the number of quarters of the interval: [quarters repeat to purchase] = datediff(‘quarter’, [1st prechase], [2nd Purchase]).

11. Percentage difference in range mean

As shown in the figure below, we want to make a percentage difference between each point on the trend chart and the mean of the selected area (within the range of the two dotted lines in the figure) and generate a new line chart at the top.

The focus is on the Y-axis field of the line chart above, how the percentage difference is expressed. First we generate a closing value that contains only the specified range:

[Close value in reference period] = IF [Date] >= [Start reference date] AND [Date] <= [End reference date] THEN [Adj Close] END, this expression only returns [Adj close] if the date is within the specified interval.

[Average daily close value between ref date] = {fixed [Ticker] : AVG([Close value in reference period])}

Step 3: Calculate the percentage difference: [percent different from ref period] = ([Adj close] – [Average daily close value between ref date]) / [Average daily Close value between ref date].

Finally, the graph is drawn using the field [percent Different from ref period].

12. Relative periodic filtering

If we want to compare data differences between two periods, we may encounter errors caused by incomplete data. For example, it is unreasonable to compare the data of March this year with that of the whole month of March last year. We can solve this problem with LOD expressions:

The point of relative period filtering is that you can’t directly compare dates, because this year’s data is always larger than last year’s. For example, since the latest data of this year is up to November 11, all data after November 11 of last year will be filtered out.

[Max date] = {Max ([date])} [Max date] = {Max ([date])}

Then use the datepart function to calculate the current day of the year:

[day ofyear of Max date] = datePart (‘dayofyear’, [Max date]), [day ofyear of order date] = datePart (‘dayofyear’, [order date]).

So [Day of Year of Max Date] is a point where any data that exceeds that many days of the year is filtered out. So we create a filter: [period filter] = [day of year of order date] <= [day of year of Max date].

Use the [Period Filter] field as the filter criteria.

13. User login frequency

How to plot the monthly login frequency of a user?

To calculate this metric, divide the total time a user is active by the total number of logins.

Calculate the total active time first: use the FIX expression to calculate the earliest and latest login time of the user:

  • [first login] = { fixed [user id] : min([log in date]) }
  • [last login] = { fixed [user id] : max([log in date]) }

Calculate the month diFF, which is the number of active months:

[total months user is active] = datediff("month", [first login], [last login])

After the total number of login times is relatively simple and is also a fixed user ID, the login date can be counted:

[numbers of logins per user] = { fixed [user id] : count([login date]) }

Finally, we divide the two to get the login frequency:

[login frequency] = [total months user is active] / [numbers of logins per user]

To make a chart, move [Login Frequency] on the horizontal axis and count distinct user ID on the vertical axis.

14. Proportional brushes

This is the most common LOD scenario. For example, how much does the sales volume of each category contribute to the total sales volume of this category?

Sum (sales)/sum({fixed [category] : sum(sales)})

The current detail level is Category + Country. If we fix the category, the cumulative sales volume of each category in all countries can be obtained.

15. Annual frequency of purchases by customer base

How to prove that old customers are more loyal?

We can observe the annual purchase frequency distribution of customer groups (customers in 2011 and 2012) as a legend in the figure below.

As is shown in the figure above, we find that the earlier a customer registers, the higher the proportion of each purchase frequency, thus supporting the conclusion that regular customers are more loyal. Note that we are looking at at least N purchases, so each line comparison is convincing. If it is N times of purchase, regular customers may purchase one time less than 10 times more, so it is difficult to make a direct comparison.

[Cohort] = {Fixed [Customer ID] : min(Year([order date])}

And then, just like in our first example, we calculate how many customers there are per order. The only difference is that we not only according to the customer ID group, but also to further split the earliest purchase date, namely: {fixed [customer ID], [Cohort] : count([order ID])}.

The above fields are on the X-axis, and the Y-axis is similar to the first example: count(customer ID), but we want to look at at least N purchases, i.e. the number of purchases is a cumulative value, i.e. at least 9 purchases = 9 purchases + 10 purchases +… Buy MAX times. [Running Total] = WINDOW_SUM(count(customer ID)), 0, LAST()).

Finally, because the actual Y axis calculation is accounted for, so use just calculated at least buy N times indicator divided by the Total number of purchases under the Cohort, namely [Running Total] / sum({fixed [Cohort] : Count (/ customer id)}).

conclusion

The examples above are all superimposed on the basic LOD uses of fixed, include, and exclude. But from a practical example, the real difficulty lies not in the syntax of LOD expressions, but in how we can understand the requirements precisely, break them down into reasonable computation steps, and use them correctly in the computation steps that need to run LOD.

LOD expressions look like they can magically fit together with data. We can understand why LOD expressions work the way they do when we understand that different levels of detail represent different group by rules and that behind LOD expressions are joins between tables.

The discussion address is: Intensive reading of “15 big LOD expressions – II” · Issue #370 · dT-fe /weekly

If you’d like to participate in the discussion, pleaseClick here to, with a new theme every week, released on weekends or Mondays. Front end Intensive Reading – Helps you filter the right content.

Copyright Notice: Freely reproduced – Non-commercial – Non-derivative – Remain signed (Creative Commons 3.0 License)