Adding data that matches a date to a date dimension table.
I have HR data that has Start Date and End Date (or null) for each employee. I am looking to be able to see the number of active employees at any given date. I have added the Domo Date Dimension table to my data set, but I can't figure out how to join so that for any date I can get the count of employees who's start date is before the date and ended after or is null.
I've been able to get start/end date counts per day added to the dimension table, but can't figure out how to get that 'active' record count added.
Any ideas?
Best Answer
-
You can utilize the Calendar dataset from the Domo Dimensions calendar to do a conditional join on your HR dataset based on the start date and end date. With Magic ETL 2.0 you can't do a straight conditional join but would need to do a cartesian join then do the filtering after the fact. Depending on your HR dataset size this may take a while to process.
With Magic 2.0:
- Input Datasets: Calendar and HR
- For each dataset add a constant "Join Column" with a value of 1
- Use a Add Formula Tile with "COALESCE(`End Date`, CURRENT_DATE())" to default any NULLs to today for the end date
- Left join your HR dataset onto the calendar dataset (have calendar on the left side) based on the Join Column
- Use a filter tile to filter your dataset based on `Start Date` <= `Calendar Date` and `Calendar Date` <= `End Date`
- Then feed the resulting dataset into a group by and group by the `Calendar Date` and count the number of employees you have employed at the given date.
Alternatively you could utilize a MySQL dataflow and do something similar except have your join clause be your filter clause (`Calendar Date` BETWEEN `Start Date` AND `End Date`) in your join clause and not worry about doing your cartesian join. This may be faster due to less data being generated.
SELECT c.`dt` as "Calendar Date", COUNT(*) FROM calendar as c LEFT JOIN `hr_dataset` as hr on c.`dt` BETWEEN hr.`Start Date` AND COALESCE(hr.`End Date`, CURRENT_DATE())
Note: This is all untested pseudo code and you may need to tweak based on your table and column names.
**Was this post helpful? Click Agree or Like below**
**Did this solve your problem? Accept it as a solution!**0
Answers
-
You can utilize the Calendar dataset from the Domo Dimensions calendar to do a conditional join on your HR dataset based on the start date and end date. With Magic ETL 2.0 you can't do a straight conditional join but would need to do a cartesian join then do the filtering after the fact. Depending on your HR dataset size this may take a while to process.
With Magic 2.0:
- Input Datasets: Calendar and HR
- For each dataset add a constant "Join Column" with a value of 1
- Use a Add Formula Tile with "COALESCE(`End Date`, CURRENT_DATE())" to default any NULLs to today for the end date
- Left join your HR dataset onto the calendar dataset (have calendar on the left side) based on the Join Column
- Use a filter tile to filter your dataset based on `Start Date` <= `Calendar Date` and `Calendar Date` <= `End Date`
- Then feed the resulting dataset into a group by and group by the `Calendar Date` and count the number of employees you have employed at the given date.
Alternatively you could utilize a MySQL dataflow and do something similar except have your join clause be your filter clause (`Calendar Date` BETWEEN `Start Date` AND `End Date`) in your join clause and not worry about doing your cartesian join. This may be faster due to less data being generated.
SELECT c.`dt` as "Calendar Date", COUNT(*) FROM calendar as c LEFT JOIN `hr_dataset` as hr on c.`dt` BETWEEN hr.`Start Date` AND COALESCE(hr.`End Date`, CURRENT_DATE())
Note: This is all untested pseudo code and you may need to tweak based on your table and column names.
**Was this post helpful? Click Agree or Like below**
**Did this solve your problem? Accept it as a solution!**0 -
I like the MySQL version more, but have never written SQL ETL, so I worked with the Magic ETL version. It does exactly what I needed. Thank you!
0 -
@mberkeley If you could accept my answer to your question so others can find it easier in the future I'd appreciate it. Thanks.
**Was this post helpful? Click Agree or Like below**
**Did this solve your problem? Accept it as a solution!**0 -
GrantSmith if you had to do this by month, then how would you do it?
0 -
I'd utilize the LAST_DAY function on the dates to return the last day of the month and then filter out the date dimension table to only have those specific records. Then you can use LAST_DAY for your dates as well when you're comparing against the dates.
**Was this post helpful? Click Agree or Like below**
**Did this solve your problem? Accept it as a solution!**0
Categories
- All Categories
- 1.8K Product Ideas
- 1.8K Ideas Exchange
- 1.5K Connect
- 1.2K Connectors
- 300 Workbench
- 6 Cloud Amplifier
- 8 Federated
- 2.9K Transform
- 100 SQL DataFlows
- 616 Datasets
- 2.2K Magic ETL
- 3.9K Visualize
- 2.5K Charting
- 738 Beast Mode
- 57 App Studio
- 40 Variables
- 685 Automate
- 176 Apps
- 452 APIs & Domo Developer
- 47 Workflows
- 10 DomoAI
- 36 Predict
- 15 Jupyter Workspaces
- 21 R & Python Tiles
- 394 Distribute
- 113 Domo Everywhere
- 275 Scheduled Reports
- 6 Software Integrations
- 124 Manage
- 121 Governance & Security
- 8 Domo Community Gallery
- 38 Product Releases
- 10 Domo University
- 5.4K Community Forums
- 40 Getting Started
- 30 Community Member Introductions
- 108 Community Announcements
- 4.8K Archive