Combining data sets without duplication
I have 2 datasets:
- Contains individual transactions with an employee's name
Employee Name Sale Price Sale Date John Doe 100 11/2/20 John Doe 250 11/6/20 Jane Doe 500 11/8/20 Jane Doe 100 11/12/20 - Contains each employee and their number of consultations and new clients
Employee Name Start Date End Date Consultations New Clients John Doe 11/1/20 11/7/20 3 1 Jane Doe 11/1/20 11/7/20 2 2 John Doe 11/8/20 11/14/20 4 3 Jane Doe 11/8/20 11/14/20 5 4
I need to be able to filter both data sets such that dataset a) contains a single row for each employee with their total sales within a given date range and dataset b) contains a single row for each employee with their total consultations and new clients. Once I have these datasets, I will need to join them on the employee's name to create a single dataset containing total sales, consultations and new clients.
Employee Name | Consultations | Clients | Sale Price |
John Doe | 7 | 4 | 350 |
Jane Doe | 7 | 6 | 600 |
I was able to accomplish this, but I run into an issue where I have to do the filtering within the ETL, which forces me to edit the ETL each time I want to query a new date range. If I try to do the filtering in a card and remove the totals, I end up with duplicated data on that final table join.
Employee Name | Start Date | End Date | Consultations | New Clients | Sale Date | Sale Price |
John Doe | 11/1/20 | 11/7/20 | 3 | 1 | 11/2/20 | 100 |
John Doe | 11/1/20 | 11/7/20 | 3 | 1 | 11/6/20 | 250 |
Jane Doe | 11/1/20 | 11/7/20 | 2 | 2 | 11/8/20 | 500 |
Jane Doe | 11/1/20 | 11/7/20 | 2 | 2 | 11/12/20 | 100 |
John Doe | 11/8/20 | 11/14/20 | 4 | 3 | 11/2/20 | 100 |
John Doe | 11/8/20 | 11/14/20 | 4 | 3 | 11/6/20 | 250 |
Jane Doe | 11/8/20 | 11/14/20 | 5 | 4 | 11/8/20 | 500 |
Jane Doe | 11/8/20 | 11/14/20 | 5 | 4 | 11/12/20 | 100 |
Is there some way to make this work such that I can query for a date range after running the ETL and my data isn't duplicated?
Comments
-
Hi @user060355
Your data has two different time slice units (daily - sales and multiple days-consultations) which makes filtering your data difficult. What I'd recommend you do is reformat your data so that your sales and consultations are both sliced by day. This would allow you to easily join and slice your data on any date range and utilize the date slicers on the cards.
**Was this post helpful? Click Agree or Like below**
**Did this solve your problem? Accept it as a solution!**0 -
At a high-level @GrantSmith is right -- mixing date dimensions is always tricky (in this case you have single day vs. multi-day range). The best thing to do is normalize the dates as much as possible.
In your case I have a slightly different suggestion that sounds like it may solve your need. Can you try joining the "transaction/sale" table with the "consultant/info" table where the `Sale Date` is between the `Start` and `End` dates?
Something like:
SELECT
a."Employee Name"
, a."Start Date"
, a."End Date"
, a."Consultations"
, a."New Clients"
, b."Sale Price Total"
, b."Sale Units Total"
FROM
"Consultation Table" a
LEFT JOIN
(SELECT
"Employee Name"
, "Sale Date"
, SUM("Sale Price") as "Sale Price Total"
, COUNT("Sale Price") as "Sale Units Total"
FROM
"Sale Table"
GROUP BY
"Employee Name", "Sale Date"
) b ON a."Employee Name" = b."Employee Name" AND b."Sale Date" BETWEEN a."Start Date" and b."End Date"
You would some granular sale data (i.e. multiple sales on one day) but it seems like it would get you the sales quantity and totals per day which seems like it's what you're after. If this methodology doesn't work then you have to go down Grant's path of normalizing date range. My suggestion would be doing something like EXTRACT(WEEK FROM "Sale Date") so you can line it up with your "Consultant Date(s)".
0 -
https://www.youtube.com/watch?v=PVbOeLSae9o&t=748s
don't JOIN your data in ETL in any fashion.
Instead, STACK the data using APPEND as demonstrated in this tutorial. If you JOIN in ETL and aggregate you limit your ability to filter. If you go the opposite direction and duplicate rows, you have to do 'crazy math' to deduplicate your daily activity.
if you absolutely must consolidate down to a single customer table ... although i can't imagine why you'd need that since you can easily GROUP BY Customer in analyzer, then extend the customer dimension as a very last step based on the STACKED dataset.
Jae Wilson
Check out my 🎥 Domo Training YouTube Channel 👨💻
**Say "Thanks" by clicking the ❤️ in the post that helped you.
**Please mark the post that solves your problem by clicking on "Accept as Solution"0
Categories
- All Categories
- 1.7K Product Ideas
- 1.7K Ideas Exchange
- 1.5K Connect
- 1.2K Connectors
- 292 Workbench
- 4 Cloud Amplifier
- 8 Federated
- 2.8K Transform
- 95 SQL DataFlows
- 602 Datasets
- 2.1K Magic ETL
- 3.7K Visualize
- 2.4K Charting
- 693 Beast Mode
- 43 App Studio
- 39 Variables
- 658 Automate
- 170 Apps
- 441 APIs & Domo Developer
- 42 Workflows
- 5 DomoAI
- 32 Predict
- 12 Jupyter Workspaces
- 20 R & Python Tiles
- 386 Distribute
- 111 Domo Everywhere
- 269 Scheduled Reports
- 6 Software Integrations
- 113 Manage
- 110 Governance & Security
- 8 Domo University
- 30 Product Releases
- Community Forums
- 39 Getting Started
- 29 Community Member Introductions
- 98 Community Announcements
- Domo Community Gallery
- 4.8K Archive