Group By Issue

I need assistance with identifying transactions that net out to zero in a large dataset. Each row in the dataset represents a unique transaction with its own ID.

Objective: Identify pairs of transactions that meet the following conditions:

  1. Same client ID
  2. Same transaction type ID
  3. Same transaction date
  4. The sum of their amounts equals zero (e.g., one transaction for -$100 and another for +$100)

Current Approach:

  1. Group transactions by client ID, transaction type ID, and
  2. Use a formula to label transactions
    1. CASE WHEN SUM(amount) = 0 THEN 'Exclude' ELSE 'Include' END
  3. Join this grouped data back to the original dataset to add an 'Exclude/Include' column.
    1. I'm joining on client ID, transaction type ID, and transaction date

Issue: The current method incorrectly labels some transactions as 'Exclude' due to additional transactions on the same date with the same client ID and transaction type ID that do not net to zero.

Requirement: I need a solution that accurately labels only the pairs of transactions that net to zero as 'Exclude', without affecting other transactions.

I feel like I need to join back into itself with the transaction ID - but that doesn't work either. not sure what to do here. I'm open to doing this in MySQL, and joining that data set into this, but unsure how to write that query as well.

_________________

In the following screenshot you can see the two green rows are correctly labeled 'Exclude' - because they off set to zero - bu the other rows have the same transaction date, client ID and transaction type ID, but they don't net to zero, so they should be included.

The following is a screenshot of what I'm trying to do in my ETL:

Best Answer

  • rco
    rco Domo Employee
    Answer ✓

    The flow looks like this:

    Inverse Amounts configuration:

    Join configuration:

    Restate Include / Exclude configuration:

    Randall Oveson <randall.oveson@domo.com>

Answers

  • rco
    rco Domo Employee
    edited November 14

    I'd branch the Input DataSet tile into two Filter tiles, one with the rule amount > 0 and the other with the rule amount < 0 . Then, on one of the sides, use an Add Formula to create a column "match_amount" with the expression -amount . Then join the two branches on client ID, transaction type ID, transaction date, and amount = match_amount.

    If there are multiple transactions with the same values for all those columns, the join will produce repeats. If that's an issue you could start off with a Remove Duplicates using those same columns as the duplicate key.

    EDIT: I may not have read the requirements closely enough; it sounds like you don't want the paired transactions in a row together. To get the results you want, forget the filter tiles. Just negate amount in one of the branches and then do a left join on all those same columns. You can drop all columns from the right table except for one, let's say client ID; we'll rename that one to "join_match". Then use Add Formula to create your "Include / Exclude" column: case when join_match is not null then 'include' else 'exclude' end

    Randall Oveson <randall.oveson@domo.com>

  • @rco - Hey, i really appreciate you taking the time to respond. I feel like i followed your instruction correctly, but not sure the result is correct. Take a look at my screenshot above and please let me know what you think. Its very possible i did it wrong haha

    To clarify, the goal is to have one row per transaction, and on the off setting transactions (the -$100 and the +$100, both of those rows labeled 'exclude' while the other transactions for varying amounts, on the same day, within the same client id and transaction type id, to be labeled 'include'. The transaction only gets excluded if there are two transactions that net to zero, on the same day, with the same transaction type id and the same client id.

    So we're only excluding the two transactions that off sett and net to zero.

  • rco
    rco Domo Employee
    Answer ✓

    The flow looks like this:

    Inverse Amounts configuration:

    Join configuration:

    Restate Include / Exclude configuration:

    Randall Oveson <randall.oveson@domo.com>

  • Wow - i think that worked, i need to test this out a bit more, but its looking good. I was way off on my first attempt haha thank you for clarifying @rco. I wish I understood better what exactly is happening here, but I'll figure that out.

    I'll keep validating this, but I think its working!