Lag() function for previous value over unpredictable rows counts of each partition

emmanuelfabre_1
edited May 14 in Magic ETL

Hi Everyone,

I have a dataset with a monthYear date field, a Quarter field with values 1-4 to represent quarters, and finally a quarterly_value column.

Here's where it gets fun.
Not all months and quarters are present in the dataset.
There will never be a monthYear with more than one row (i.e. never two rows of DEC 2023).


I use the Rank & Window LAG() function, using offset = 1.

I Order the functions on monthYearin DESC order

I partition over Quarter.

My results are that I can get the previous value on the first row in a quarter (ex: JUL 2023), but then it will still show the current Quarter's value for AUG and SEP 2023.


Results —>






How can I, instead, get the lag_results to propagate over all rows of that quarter?

Desired Results —>





Thank you again, DOMO community!

Best Answer

Answers

  • ColemenWilson
    edited May 14 Answer ✓

    My understanding is that you are looking for the final value with in a quarter to be the lagged value shown for each month of the following quarter. Wouldn't that mean the lag_result for 2023-05 and 2023-06 being NULL as well? Here is the solution I came up with: https://www.loom.com/share/d13350cfc1ca4f89bca203cffdfd1736?sid=272c8d1e-4b71-4db7-85f4-55071732b6a1

    If I solved your problem, please select "yes" above

  • I am so thankful for people like @ColemenWilson in this community!

    Your solution gets me so close to my end goal, and lets me think through the data in new [for me], enlightened way.

    There is one hiccup I am trying to wrap my head around.

    The dataset [shown below] has 5 unique quarterly values.
    The fifth value is for our current Q2 2024, and cannot be carried over yet (logical, no rows for the future, JUN 2024+).

    The lag function should work for 4 of those values.
    It works for the first two, skips the third, and works again for the fourth.

    Data below to make sense of all of this. Lag skips for Q1 2024 —>



    I am going to keep plugging away; I just wanted to thank you @ColemenWilson asap :)

  • I believe the issue lies in joining on the year.

    However, if we do not join on the year, we get duplicate rows.

  • I added the data for 2024 you provided above and the ETL I shared produces the correct lag_results values without nulls.

    If I solved your problem, please select "yes" above

  • @ColemenWilson I realize now my blunder in replicating your perfect solution.

    Apologies for the extra work there. I made a copy of the ETL from scratch using your solution, and it worked. I went back to the original and realized I had messed up the Rank and Window, by Ordering the function on Quarter instead of MonthYear .

    Thank you once again. I see you're in UT, I used to work for BEN Labs up there. Look them up if you ever want a change of scenery from Henry Schein (I looked you up, impressive stuff ;) ).

    I wish I could buy you a beer as a thank you!

  • @emmanuelfabre_1 you're welcome! Next time you are in Utah let me know and I'll take you up on that!

    If I solved your problem, please select "yes" above