Find missing records from 2 tables

Excuse me for not providing a screenshot as the data is sensitive but I have 2 sales order details datasets:
- A (sales orders for the year 2024 - 1.7M rows)
- B (all the historical sales order for all years - 14M rows)
- I want to see which orders are in A but not in B.

It's pretty straight forward to do a LEFT JOIN, A = left and B = right (matching keys: Product ID, order ID, and Customer ID) and then make a BeastMode:

CASE WHEN order ID IS NULL THEN 'Missing' ELSE 'Not Missing' END —> Table Card Visual

But there are some problems with this:
1. The JOINED dataset should return the same number of rows as dataset A (since A is my left table and I'm doing a LEFT JOIN), but it's not, instead of 1.7M, it's 1.8M rows —> which probably because multiple rows in B can match to 1 row in A —> duplicates?

2. The total sales variance (A-B) is not equal to SUM(missing order sales), when it should be

Best Answer

  • MarkSnodgrass
    Answer ✓

    In your ETL, before joining the two datasets, I would do a group by on dataset B and group by productid, orderid, and customerid. You can just choose count for the aggregate and pick any column. This will give you 1 row per productid, orderid, and customerid. Then do the left join as you already have in place.

    Also, if you want your output dataset to only contain the rows where table A is not in table B, add a filter tile after the join and filter to where tableB.orderid is null. Otherwise, you can use your beast mode, assuming your order id is looking at the table B order id.

    **Check out my Domo Tips & Tricks Videos

    **Make sure to <3 any users posts that helped you.
    **Please mark as accepted the ones who solved your issue.

Answers

  • MarkSnodgrass
    Answer ✓

    In your ETL, before joining the two datasets, I would do a group by on dataset B and group by productid, orderid, and customerid. You can just choose count for the aggregate and pick any column. This will give you 1 row per productid, orderid, and customerid. Then do the left join as you already have in place.

    Also, if you want your output dataset to only contain the rows where table A is not in table B, add a filter tile after the join and filter to where tableB.orderid is null. Otherwise, you can use your beast mode, assuming your order id is looking at the table B order id.

    **Check out my Domo Tips & Tricks Videos

    **Make sure to <3 any users posts that helped you.
    **Please mark as accepted the ones who solved your issue.
  • Thanks for the quick response. It worked, no more duplicates for the joined dataset