Find missing records from 2 tables

Excuse me for not providing a screenshot as the data is sensitive but I have 2 sales order details datasets:
- A (sales orders for the year 2024 - 1.7M rows)
- B (all the historical sales order for all years - 14M rows)
- I want to see which orders are in A but not in B.
It's pretty straight forward to do a LEFT JOIN, A = left and B = right (matching keys: Product ID, order ID
, and Customer ID
) and then make a BeastMode:
CASE WHEN order ID
IS NULL THEN 'Missing' ELSE 'Not Missing' END —> Table Card Visual
But there are some problems with this:
1. The JOINED dataset should return the same number of rows as dataset A (since A is my left table and I'm doing a LEFT JOIN), but it's not, instead of 1.7M, it's 1.8M rows —> which probably because multiple rows in B can match to 1 row in A —> duplicates?
2. The total sales variance (A-B) is not equal to SUM(missing order sales), when it should be
Best Answer
-
In your ETL, before joining the two datasets, I would do a group by on dataset B and group by productid, orderid, and customerid. You can just choose count for the aggregate and pick any column. This will give you 1 row per productid, orderid, and customerid. Then do the left join as you already have in place.
Also, if you want your output dataset to only contain the rows where table A is not in table B, add a filter tile after the join and filter to where tableB.orderid is null. Otherwise, you can use your beast mode, assuming your order id is looking at the table B order id.
**Check out my Domo Tips & Tricks Videos
**Make sure toany users posts that helped you.
**Please mark as accepted the ones who solved your issue.0
Answers
-
In your ETL, before joining the two datasets, I would do a group by on dataset B and group by productid, orderid, and customerid. You can just choose count for the aggregate and pick any column. This will give you 1 row per productid, orderid, and customerid. Then do the left join as you already have in place.
Also, if you want your output dataset to only contain the rows where table A is not in table B, add a filter tile after the join and filter to where tableB.orderid is null. Otherwise, you can use your beast mode, assuming your order id is looking at the table B order id.
**Check out my Domo Tips & Tricks Videos
**Make sure toany users posts that helped you.
**Please mark as accepted the ones who solved your issue.0 -
Thanks for the quick response. It worked, no more duplicates for the joined dataset
0
Categories
- All Categories
- 1.9K Product Ideas
- 1.9K Ideas Exchange
- 1.6K Connect
- 1.3K Connectors
- 306 Workbench
- 6 Cloud Amplifier
- 9 Federated
- 3K Transform
- 112 SQL DataFlows
- 649 Datasets
- 2.2K Magic ETL
- 4K Visualize
- 2.5K Charting
- 788 Beast Mode
- 78 App Studio
- 43 Variables
- 745 Automate
- 187 Apps
- 475 APIs & Domo Developer
- 67 Workflows
- 16 DomoAI
- 40 Predict
- 17 Jupyter Workspaces
- 23 R & Python Tiles
- 406 Distribute
- 117 Domo Everywhere
- 279 Scheduled Reports
- 10 Software Integrations
- 139 Manage
- 136 Governance & Security
- 8 Domo Community Gallery
- 44 Product Releases
- 12 Domo University
- 5.4K Community Forums
- 41 Getting Started
- 31 Community Member Introductions
- 113 Community Announcements
- 4.8K Archive