Magic ETL

Magic ETL

Rolling Average for Each Category

I have a dataset containing total monthly units by month and by person. I'd like to calculate the 3 month rolling average for each person. How should I group the data so that the rolling average calculation goes by person and not by date?

Tagged:

Welcome!

It looks like you're new here. Members get access to exclusive content, events, rewards, and more. Sign in or register to get started.
Sign In

Answers

  • This is how my dataset looks so far

  • Hi @JR_S1L

    You'd need to keep your data as is to partition based on name to get the lag for the prior 3 months to get your rolling average. You can use a rank & window tile in Magic ETL to calculate the lag just make sure you're partitioning by the name and that you're sorting/ordering by the month

    **Was this post helpful? Click Agree or Like below**
    **Did this solve your problem? Accept it as a solution!**
  • Hi @GrantSmith ....I am new to lag functions, can you explain a bit further? This is what I have so far when calculating the lag using Magic ETL...how would I then calculate the rolling average?


  • I've done a writeup of lag functions and rolling averages before utilizing Beast Modes but the idea at the end is the same concept. You just need to calculate the Lag for 1 month ago and 2 months ago into three separate fields. Then add all 3 together (current month, 1 month ago and 2 months ago) and divide the total by 3 using a formula tile. That will get your rolling 3 month average.

    This is depending on how you want to calculate the rolling three months. Is that the current month and the prior 2 months or the prior 3 months excluding the current month? If you want the prior 3 months just calculate the lag for 3 months ago and add that value instead of the current value.


    If you're interested here's the writeup using a beast mode (but if your rolling average won't need to have filters dynamically applied to it I'd recommend keeping it in the magic etl): https://dojo.domo.com/discussion/52679/domo-ideas-conference-beast-modes-rolling-averages#latest

    **Was this post helpful? Click Agree or Like below**
    **Did this solve your problem? Accept it as a solution!**
  • +1 on Grant's solution using Lag(1) .. Lag(2), Lag(3) IF YOU DON"T HAVE GAPS IN YOUR DATA.

    Only thing I strongly dislike about this approach is that you have to create 3 functions (not efficient) AND it must be true that there are no gaps in data.


    Computationally, @GrantSmith i would say this is probably the only good use case for MySQL b/c it allows you to calculate LAG without the assumption of contigusous data (cc. @MarkSnodgrass


    NOTE THIS IS ALL UNTESTED CODE there are probably missing commas, and when you set variables (as opposed to display you must use := and variable names are case sensitive.


    if you write

    1. SELECT
    2. t.*
    3. , r.*
    4. FROM
    5. table t
    6. , (SELECT @person := '' , @date := null FROM ) r
    7. ORDER BY t.Date

    You are CROSS APPLYing a generated table with one row (r) to every row of table t where table r contains variables person and date which are insantiated as blank and null respectively


    from there if you write

    1. SELECT
    2. @date as prevDate,
    3. t.*,
    4. @date := t.date as _setDateVariable
    5.  
    6. FROM
    7. table t
    8. , (SELECT @person := '' , @date := null FROM ) r
    9.  
    10. ORDER BY t.Date

    when MySQL reads this function it will process each column of my SELECT statement in order (top to bottom). so when if i was on row 3 of my table...

    prevDate is the value of assigned to @date (which happens to be the value of t.date from row 2 -- i'll explain in a second), then it reads all the values for that row of table t, then it assigns @date the value of t.date for the current row (row 3)

    when i SELECT row 4...

    prevDate is the value of @date (row 3) , t.* is all the contents of row 4, then i assign ( using := ) the value of t.date (row 4) to @date.

    and so on.


    implement partitioning and test for previous day

    1. SELECT
    2. -- execute tests. testing @person implements my Partition clause
    3. -- testing @lag1Date ensures the no gaps rule (this is the key part that Magic doesn't give you)
    4.  
    5. case when t.person = @person and @lag1Date = date_add(t.date, interval -1 day) then @lag1Value end as prevDayValue,
    6. t.*,
    7.  
    8. -- paritioning assumes an ORDER BY clause
    9. -- OR reset my variables to null
    10. @lag1Value := case when t.person = @person then t.value else null end as _setLag1Value,
    11. @lag1Date := case when t.person = @person then t.date else null end as _setLag1Date,
    12. @person := t.person as _setPersionVariable
    13.  
    14. FROM
    15. table t
    16. , (SELECT @person := '' , @date := null , @lag1Value:=null, @lag1Date:=null, FROM ) r
    17.  
    18. ORDER BY t.person, t.Date


    Implement Cascading testing

    1. SELECT
    2. -- execute tests. testing @person implements my Partition clause
    3. -- testing @lag1Date ensures the no gaps rule (this is the key part that Magic doesn't give you)
    4.  
    5. case when t.person = @person and @lag1Date = date_add(t.date, interval -1 day) then @lag1Value end as prevDayValue,
    6. case when t.person = @person and @lag2Date = date_add(t.date, interval -2 day) then @lag2Value end as prevDayValue,
    7.  
    8. t.*,
    9.  
    10. -- cascade the values of lag1 to lag2 or reset to null
    11. @lag2Value := case when t.person = @person then @lag1Value else null end as _setLag2Value,
    12. @lag2Date := case when t.person = @person then @lag2Value else null end as _setLag2Date,
    13.  
    14. @lag1Value := case when t.person = @person then t.value else null end as _setLag1Value,
    15. @lag1Date := case when t.person = @person then t.date else null end as _setLag1Date,
    16. @person := t.person as _setPersionVariable
    17.  
    18. FROM
    19. table t
    20. , (SELECT @person := '' , @date := null , @lag1Value:=null, @lag1Date:=null, @lag2Value := null lag2Date:=null FROM ) r
    21.  
    22. ORDER BY t.person, t.Date


    Jae Wilson
    Check out my 🎥 Domo Training YouTube Channel 👨‍💻

    **Say "Thanks" by clicking the ❤️ in the post that helped you.
    **Please mark the post that solves your problem by clicking on "Accept as Solution"
  • Coach
    edited April 2022

    Thinking it through another set based way of approaching this without variables would be to do what SQL has to do under the covers (generate windows) for each row in your dataset we'll generate the 4 rows, then we'll assign the value of the transaction for those lagged rows if they exist, then we'll aggregate.

    this assumes we have a numbers table, sys_num, that contains integers between 1 and 3 in a column called n.


    NOTE this is untested code but i think it works.

    P.S. this code you could implement in Magic if you wanted to avoid writing SQL. it would perform faster since we didn't index MySQL.


    duplicate (using CROSS APPLY) the rows in t.table 4 times

    1. CREATE TABLE referenceTable as
    2.  
    3. SELECT distinct
    4. date_add(t.date, interval n day) as report_date
    5. t.person
    6. t.value as lag_value
    7. n.num as lag_num
    8. t.date as actual_transaction_date
    9. FROM
    10. table t
    11. , (SELECT num from sys_num where num beteen 0 and 3) n


    create your windowed data

    for each row in table t, you should have duplicate rows if table has a matching 0, 1, 2, or 3 day offset (report_date).

    1. CREATE TABLE blowOut as
    2.  
    3. SELECT
    4. t.person,
    5. , t.date,
    6. , r.lag_num
    7. , r.lag_value
    8. , case when lag_num = 1 then r.lag_value as lag1day end
    9. , case when lag_num =2 then r.lag_value as lag2day end
    10. , case when lag_num=3 then r.lag_value as lag3day end
    11. , case when lag_num=0 then r.lag_value as lag0day end
    12. FROM
    13. table t
    14. JOIN
    15. referenceTable r
    16.  
    17. on t.person = r.person
    18. and t.date = r.report_date

    deduplicate rows

    1. create table lag_table as
    2.  
    3. SELECT
    4. max(lag_1day) as lag1day,
    5. max(lag_2day) as lag2day,
    6. max(lag_3day) as lag3day,
    7. max(lag4_day as lag4day,
    8.  
    9. date,
    10. person
    11.  
    12. from blowOut
    13.  
    14. GROUP BY
    15. date,
    16. person

    then join in the data for that person and that row

    1. select
    2. t.*
    3. <lt.columns>
    4.  
    5. from table t
    6. inner join
    7. lag_table lt
    8. on t.person = lt.person
    9. and t.date = lt.date
    Jae Wilson
    Check out my 🎥 Domo Training YouTube Channel 👨‍💻

    **Say "Thanks" by clicking the ❤️ in the post that helped you.
    **Please mark the post that solves your problem by clicking on "Accept as Solution"
  • @GrantSmith , @jaeW_at_Onyx thank you both so much!!! I realized it's because we don't have window functions enabled in our domo instance that this was proving impossible for me...I've reached out to our rep to have them enabled so that I can apply the options that you provided! Again, thank you!!

  • @GrantSmith , @jaeW_at_Onyx just to confirm...in order to be able to dynamically filter by name, these calculations need to be done in beast mode correct?

Welcome!

It looks like you're new here. Members get access to exclusive content, events, rewards, and more. Sign in or register to get started.
Sign In