Joining Datasets with different numbers of rows
We’re looking to analyze staff utilization data. There are two data sets:
Data Set 1: Raw data from a timekeeping system with daily time entries from each staff member, there are multiple rows per person per month.
Data Set 2: The total available hours per person per month, there is one row per person per month.
We’re looking to calculate utilization per month per person by summing all of their hours entries for a specific month from Data Set 1, and dividing by their corresponding available hours for that month in Data Set 2.
When we joined the two data sets, using their unique ID# and the identifier, the output includes the monthly available hours value for every row in which there is a raw data entry. We need to create a Beast Mode that is the sum of all raw data hours per month divided by just one instance of the available hours per month.
Any ideas?
Comments
-
Great question! As long as the card is reporting by month, the following will work:
SUM(`Hours_Field_From_Dataset_1`) / AVG(`Available_Hours_From_Dataset_2`)
You could also use MAX() or MIN() instead of average since there's just one value per rep per month. Should you wish to report by quarter or year, you will need to create quarterly and yearly availability fields in Dataset 1 before joining them and create beast modes for quarterly and yearly utilization.
1 -
I think I would approach this by bringing both datasets to the same date grain before the join. In Data Set 1 SUM up all the the hours in the month by employee so that you have just one row per employee per month. This would be my prefered method.
Alternatively, you might try something like this:
SUM(`Hours Logged by Employee`) / MAX(`Available Hours by Employee`)
As long as the Available Hours is never equal to zero this should work.
KurtF
**Say “Thanks” by clicking the “heart” in the post that helped you.
**Please mark the post that solves your problem by clicking on "Accept as Solution"1 -
Thanks!
What if the number of rows varies? IOW, a higher Dataset 2 value included five times will weight the average differently than a lower Dataset 2 value included five times.
We tried the Beast Mode and the numbers don't look quite right yet.
1 -
In that case, unless there is a reason you can't do it, then I would second Kurt's recommendation that you get both datasets on the same level of granularity.
My answer assumes that dataset 2 has data that looks like this:
Name Date Available_Hours
John Doe 01/01/2018 140
Jane Roe 01/01/2018 119
John Doe 02/01/2018 115
Jane Roe 02/01/2018 137
As long as the card is displaying by month and you are breaking it out by individual user, then that formula will work.
Another option would be to fix the data after the join. You would need to add row numbers and partition it by person by month. You would then update the data so that their available hours got set to 0 where the row number was greater than 1. Then you could use SUM()/SUM().
2 -
Where would we update the Available Hours to 0 in the dataflow? Not sure which function to use.
0
Categories
- All Categories
- 1.7K Product Ideas
- 1.7K Ideas Exchange
- 1.5K Connect
- 1.2K Connectors
- 292 Workbench
- 4 Cloud Amplifier
- 8 Federated
- 2.8K Transform
- 95 SQL DataFlows
- 602 Datasets
- 2.1K Magic ETL
- 3.7K Visualize
- 2.4K Charting
- 694 Beast Mode
- 43 App Studio
- 39 Variables
- 658 Automate
- 170 Apps
- 441 APIs & Domo Developer
- 42 Workflows
- 5 DomoAI
- 32 Predict
- 12 Jupyter Workspaces
- 20 R & Python Tiles
- 386 Distribute
- 111 Domo Everywhere
- 269 Scheduled Reports
- 6 Software Integrations
- 113 Manage
- 110 Governance & Security
- 8 Domo University
- 30 Product Releases
- Community Forums
- 39 Getting Started
- 29 Community Member Introductions
- 98 Community Announcements
- Domo Community Gallery
- 4.8K Archive