Filling in data in ETL for missing dates
I have a bunch of data which shows cumulative reviews on each product (on indirect ecommerce channels). However, if there were no reviews for a particular product on a specific date, my dataset will not show the cumulative review for that product.... so for example, I may have the following data for Product A - in which I'm missing data on 1/7/21 (because there were no reviews posted on that day for Product A). For 1/7, I would like to have a duplicate row from 1/6 for these 1/7.
Can you advise how I can do this in an ETL? I'm thinking I may need to use Rank & Window and reference the next actual value to populate the fields downward....
Date -- Product -- Cum. rating -- Cum. reviews -- Brand -- RPC
1/5/21 -- product A -- 3.93 -- 845 -- Ridgid -- B23398
1/5/21 - Product B -- 4.12 -- 423 -- Greenlee -- A4324
1/6/21 -- product A -- 3.95 -- 846 -- Ridgid -- B23398
1/7/21 - Product B -- 4.20 -- 428 -- Greenlee -- A4324
1/8/21 -- product A -- 3.99 -- 847 -- Ridgid -- B23398
I want to add in the following:
1/6/21 - Product B -- 4.12 -- 423 -- Greenlee -- A4324
1/7/21 -- product A -- 3.95 -- 846 -- Ridgid -- B23398
1/8/21 - Product B -- 4.20 -- 428 -- Greenlee -- A4324
Best Answers
-
Yes, you can use the rank & window and the lag function to get the previous value for each product and then use the formula tile to place it in your main column list.
However, first you are going to need to get a complete list of dates. You could get a distinct list of dates from your existing dataset if you know that there was activity by something on every date. Or you could bring in the date dimensions dataset. you would then join your complete list of dates to your main dataset with a left join so that you have a date for every day for each product. After that, you can do the rank & window with the lag. In your formula tile, you can utilize ifnull to populate your main columns with the lag value.
Hope this all makes sense.
**Check out my Domo Tips & Tricks Videos
**Make sure to any users posts that helped you.
**Please mark as accepted the ones who solved your issue.1 -
With LAG you need to know how many records back you'd need to pull in order to properly forward fill that data that's missing. Instead I'd recommend utilizing two rand and window tiles to do a forward fill.
- Import the Domo Dimensions - Calendar Dataset
- Within your ETL:
- Left join you data to this calendar dimensions dataset
- Take this output from the join and feed it into a rank and window function
- COUNT your RPC OVER / Partitioned by the product and ORDER BY Date - This will get us a group number - essentially tells us when there's a non-null value but increasing the count - call this new field grouper
- Feed that into another Rank and window function
- Take the MAX of your rating and review fields over / partitioned by the product and the new grouper field. This will be your forward filled value.
**Was this post helpful? Click Agree or Like below**
**Did this solve your problem? Accept it as a solution!**0
Answers
-
Yes, you can use the rank & window and the lag function to get the previous value for each product and then use the formula tile to place it in your main column list.
However, first you are going to need to get a complete list of dates. You could get a distinct list of dates from your existing dataset if you know that there was activity by something on every date. Or you could bring in the date dimensions dataset. you would then join your complete list of dates to your main dataset with a left join so that you have a date for every day for each product. After that, you can do the rank & window with the lag. In your formula tile, you can utilize ifnull to populate your main columns with the lag value.
Hope this all makes sense.
**Check out my Domo Tips & Tricks Videos
**Make sure to any users posts that helped you.
**Please mark as accepted the ones who solved your issue.1 -
With LAG you need to know how many records back you'd need to pull in order to properly forward fill that data that's missing. Instead I'd recommend utilizing two rand and window tiles to do a forward fill.
- Import the Domo Dimensions - Calendar Dataset
- Within your ETL:
- Left join you data to this calendar dimensions dataset
- Take this output from the join and feed it into a rank and window function
- COUNT your RPC OVER / Partitioned by the product and ORDER BY Date - This will get us a group number - essentially tells us when there's a non-null value but increasing the count - call this new field grouper
- Feed that into another Rank and window function
- Take the MAX of your rating and review fields over / partitioned by the product and the new grouper field. This will be your forward filled value.
**Was this post helpful? Click Agree or Like below**
**Did this solve your problem? Accept it as a solution!**0 -
Thanks guys!
0 -
@MarkSnodgrass In this solution, left joining the distinct date dataset with the main dataset won't date every day for each product. Do you have any other way to do it?
0 -
@Kunal If your dataset does not include every date, then use Calendar report from the Domo Dimensions Connector as both @MarkSnodgrass and @GrantSmith suggest
0
Categories
- All Categories
- 1.8K Product Ideas
- 1.8K Ideas Exchange
- 1.5K Connect
- 1.2K Connectors
- 300 Workbench
- 6 Cloud Amplifier
- 8 Federated
- 2.9K Transform
- 100 SQL DataFlows
- 616 Datasets
- 2.2K Magic ETL
- 3.8K Visualize
- 2.5K Charting
- 731 Beast Mode
- 55 App Studio
- 40 Variables
- 682 Automate
- 175 Apps
- 451 APIs & Domo Developer
- 46 Workflows
- 10 DomoAI
- 35 Predict
- 14 Jupyter Workspaces
- 21 R & Python Tiles
- 394 Distribute
- 113 Domo Everywhere
- 275 Scheduled Reports
- 6 Software Integrations
- 122 Manage
- 119 Governance & Security
- 8 Domo Community Gallery
- 38 Product Releases
- 10 Domo University
- 5.4K Community Forums
- 40 Getting Started
- 30 Community Member Introductions
- 107 Community Announcements
- 4.8K Archive