Filling in data in ETL for missing dates

NathanDorsch
NathanDorsch Member
edited March 2022 in Magic ETL

I have a bunch of data which shows cumulative reviews on each product (on indirect ecommerce channels). However, if there were no reviews for a particular product on a specific date, my dataset will not show the cumulative review for that product.... so for example, I may have the following data for Product A - in which I'm missing data on 1/7/21 (because there were no reviews posted on that day for Product A). For 1/7, I would like to have a duplicate row from 1/6 for these 1/7.

Can you advise how I can do this in an ETL? I'm thinking I may need to use Rank & Window and reference the next actual value to populate the fields downward....

Date -- Product -- Cum. rating -- Cum. reviews -- Brand -- RPC

1/5/21 -- product A -- 3.93 -- 845 -- Ridgid -- B23398

1/5/21 - Product B -- 4.12 -- 423 -- Greenlee -- A4324

1/6/21 -- product A -- 3.95 -- 846 -- Ridgid -- B23398

1/7/21 - Product B -- 4.20 -- 428 -- Greenlee -- A4324

1/8/21 -- product A -- 3.99 -- 847 -- Ridgid -- B23398


I want to add in the following:

1/6/21 - Product B -- 4.12 -- 423 -- Greenlee -- A4324

1/7/21 -- product A -- 3.95 -- 846 -- Ridgid -- B23398

1/8/21 - Product B -- 4.20 -- 428 -- Greenlee -- A4324

Best Answers

  • MarkSnodgrass
    Answer ✓

    Yes, you can use the rank & window and the lag function to get the previous value for each product and then use the formula tile to place it in your main column list.

    However, first you are going to need to get a complete list of dates. You could get a distinct list of dates from your existing dataset if you know that there was activity by something on every date. Or you could bring in the date dimensions dataset. you would then join your complete list of dates to your main dataset with a left join so that you have a date for every day for each product. After that, you can do the rank & window with the lag. In your formula tile, you can utilize ifnull to populate your main columns with the lag value.

    Hope this all makes sense.

    **Check out my Domo Tips & Tricks Videos

    **Make sure to <3 any users posts that helped you.
    **Please mark as accepted the ones who solved your issue.
  • GrantSmith
    GrantSmith Coach
    Answer ✓

    With LAG you need to know how many records back you'd need to pull in order to properly forward fill that data that's missing. Instead I'd recommend utilizing two rand and window tiles to do a forward fill.


    1. Import the Domo Dimensions - Calendar Dataset
    2. Within your ETL:
    • Left join you data to this calendar dimensions dataset
    • Take this output from the join and feed it into a rank and window function
    • COUNT your RPC OVER / Partitioned by the product and ORDER BY Date - This will get us a group number - essentially tells us when there's a non-null value but increasing the count - call this new field grouper
    • Feed that into another Rank and window function
    • Take the MAX of your rating and review fields over / partitioned by the product and the new grouper field. This will be your forward filled value.


    **Was this post helpful? Click Agree or Like below**
    **Did this solve your problem? Accept it as a solution!**

Answers

  • MarkSnodgrass
    Answer ✓

    Yes, you can use the rank & window and the lag function to get the previous value for each product and then use the formula tile to place it in your main column list.

    However, first you are going to need to get a complete list of dates. You could get a distinct list of dates from your existing dataset if you know that there was activity by something on every date. Or you could bring in the date dimensions dataset. you would then join your complete list of dates to your main dataset with a left join so that you have a date for every day for each product. After that, you can do the rank & window with the lag. In your formula tile, you can utilize ifnull to populate your main columns with the lag value.

    Hope this all makes sense.

    **Check out my Domo Tips & Tricks Videos

    **Make sure to <3 any users posts that helped you.
    **Please mark as accepted the ones who solved your issue.
  • GrantSmith
    GrantSmith Coach
    Answer ✓

    With LAG you need to know how many records back you'd need to pull in order to properly forward fill that data that's missing. Instead I'd recommend utilizing two rand and window tiles to do a forward fill.


    1. Import the Domo Dimensions - Calendar Dataset
    2. Within your ETL:
    • Left join you data to this calendar dimensions dataset
    • Take this output from the join and feed it into a rank and window function
    • COUNT your RPC OVER / Partitioned by the product and ORDER BY Date - This will get us a group number - essentially tells us when there's a non-null value but increasing the count - call this new field grouper
    • Feed that into another Rank and window function
    • Take the MAX of your rating and review fields over / partitioned by the product and the new grouper field. This will be your forward filled value.


    **Was this post helpful? Click Agree or Like below**
    **Did this solve your problem? Accept it as a solution!**
  • Thanks guys!

  • Kunal
    Kunal Contributor

    @MarkSnodgrass In this solution, left joining the distinct date dataset with the main dataset won't date every day for each product. Do you have any other way to do it?

  • @Kunal If your dataset does not include every date, then use Calendar report from the Domo Dimensions Connector as both @MarkSnodgrass and @GrantSmith suggest