I need assistance with identifying transactions that net out to zero in a large dataset. Each row in the dataset represents a unique transaction with its own ID.
Objective: Identify pairs of transactions that meet the following conditions:
- Same client ID
- Same transaction type ID
- Same transaction date
- The sum of their amounts equals zero (e.g., one transaction for -$100 and another for +$100)
Current Approach:
- Group transactions by client ID, transaction type ID, and
- Use a formula to label transactions
- CASE WHEN SUM(amount) = 0 THEN 'Exclude' ELSE 'Include' END
- Join this grouped data back to the original dataset to add an 'Exclude/Include' column.
- I'm joining on client ID, transaction type ID, and transaction date
Issue: The current method incorrectly labels some transactions as 'Exclude' due to additional transactions on the same date with the same client ID and transaction type ID that do not net to zero.
Requirement: I need a solution that accurately labels only the pairs of transactions that net to zero as 'Exclude', without affecting other transactions.
I feel like I need to join back into itself with the transaction ID - but that doesn't work either. not sure what to do here. I'm open to doing this in MySQL, and joining that data set into this, but unsure how to write that query as well.
_________________
In the following screenshot you can see the two green rows are correctly labeled 'Exclude' - because they off set to zero - bu the other rows have the same transaction date, client ID and transaction type ID, but they don't net to zero, so they should be included.
The following is a screenshot of what I'm trying to do in my ETL: