Find missing records from 2 tables
Excuse me for not providing a screenshot as the data is sensitive but I have 2 sales order details datasets:
- A (sales orders for the year 2024 - 1.7M rows)
- B (all the historical sales order for all years - 14M rows)
- I want to see which orders are in A but not in B.
It's pretty straight forward to do a LEFT JOIN, A = left and B = right (matching keys: Product ID, order ID
, and Customer ID
) and then make a BeastMode:
CASE WHEN order ID
IS NULL THEN 'Missing' ELSE 'Not Missing' END —> Table Card Visual
But there are some problems with this:
1. The JOINED dataset should return the same number of rows as dataset A (since A is my left table and I'm doing a LEFT JOIN), but it's not, instead of 1.7M, it's 1.8M rows —> which probably because multiple rows in B can match to 1 row in A —> duplicates?
2. The total sales variance (A-B) is not equal to SUM(missing order sales), when it should be
Best Answer
-
In your ETL, before joining the two datasets, I would do a group by on dataset B and group by productid, orderid, and customerid. You can just choose count for the aggregate and pick any column. This will give you 1 row per productid, orderid, and customerid. Then do the left join as you already have in place.
Also, if you want your output dataset to only contain the rows where table A is not in table B, add a filter tile after the join and filter to where tableB.orderid is null. Otherwise, you can use your beast mode, assuming your order id is looking at the table B order id.
**Check out my Domo Tips & Tricks Videos
**Make sure to any users posts that helped you.
**Please mark as accepted the ones who solved your issue.0
Answers
-
In your ETL, before joining the two datasets, I would do a group by on dataset B and group by productid, orderid, and customerid. You can just choose count for the aggregate and pick any column. This will give you 1 row per productid, orderid, and customerid. Then do the left join as you already have in place.
Also, if you want your output dataset to only contain the rows where table A is not in table B, add a filter tile after the join and filter to where tableB.orderid is null. Otherwise, you can use your beast mode, assuming your order id is looking at the table B order id.
**Check out my Domo Tips & Tricks Videos
**Make sure to any users posts that helped you.
**Please mark as accepted the ones who solved your issue.0 -
Thanks for the quick response. It worked, no more duplicates for the joined dataset
0
Categories
- All Categories
- 1.8K Product Ideas
- 1.8K Ideas Exchange
- 1.6K Connect
- 1.2K Connectors
- 302 Workbench
- 6 Cloud Amplifier
- 9 Federated
- 2.9K Transform
- 104 SQL DataFlows
- 633 Datasets
- 2.2K Magic ETL
- 3.9K Visualize
- 2.5K Charting
- 760 Beast Mode
- 62 App Studio
- 42 Variables
- 699 Automate
- 181 Apps
- 457 APIs & Domo Developer
- 51 Workflows
- 10 DomoAI
- 38 Predict
- 16 Jupyter Workspaces
- 22 R & Python Tiles
- 401 Distribute
- 116 Domo Everywhere
- 277 Scheduled Reports
- 8 Software Integrations
- 130 Manage
- 127 Governance & Security
- 8 Domo Community Gallery
- 38 Product Releases
- 12 Domo University
- 5.4K Community Forums
- 40 Getting Started
- 30 Community Member Introductions
- 111 Community Announcements
- 4.8K Archive