Marking Duplicates So I Can Filter Them Out
Good afternoon,
I am trying to mark rows that match up with each other as duplicates so that I can filter them out and only have the truly unique records left.
Example
So after I mark and filter the only record left will be the third one that doesn't have a match.
What is the best way to do this?
Thank you in advance
Best Answer
-
Ah sorry I missed that you want both the original and the duplicate row gone when a duplicate exists. Interested in your use case - why you want both records gone when there is a duplicate and not just the duplicate? Is it that the duplicate transaction is to zero out the previous? If that is the case, and `retail` is the field that is being zero'd out then I agree with @ST_-Superman-_ 's approach. Here is how I did it using the data you provided with the correct output you are looking for:
1. Overall ETL:2. Group By step:
3. Filter Rows step
4. Output - Only unique rows
If I solved your problem, please select "yes" above
1
Answers
-
There is a tile called "Remove Duplicates" that you can use in Magic ETL to only keep unique rows.
If I solved your problem, please select "yes" above
1 -
No that will leave me with one of the duplicates. I need all the pairs to be gone.
0 -
Your solution would give me this:
I want this:
0 -
So how do you identify when something is a "duplicate"? If there is a set criteria then you can do a group by in ETL that does a count on how many times the "uniqueness" criteria shows in the dataset, then JOIN that back into the dataset and filter to leave only those were the count is 1. This if you want to go via the ETL approach. You could also go the Beast Mode approach with the FIXED function and use that as a filter, something like:
SUM(COUNT(`Date`) FIXED (BY [FIELDS THAT MAKE UP YOUR UNIQUENESS CRITERIA]))
And then filter based on this, hope one of this helps.
0 -
Ok so in the group by I have 1. Store, Line, Item, Change Qty Abs, and Retail Abs
but what do I put in 2 and 3?
0 -
Could you explain how the two rows you screenshot above are duplicates? Is there some additional criteria beyond duplicate values in the data itself?
If I solved your problem, please select "yes" above
0 -
It's more that they are offsets. So my way of trying to eliminate them is that they are duplicates based on Store, Line, Item, Change Qty Abs, and Retail Abs. There is much more data not shown but they are not really relevant like comments, people's usernames, and extra store information like divisions, regions, districts, etc. In total there are like 40 columns of data.
0 -
You can select which columns identify the duplicates using the remove duplicates title and selecting Store, Line, Item, Change Qty Abs, and Retail Abs as the fields that identify the duplicate.
If I solved your problem, please select "yes" above
0 -
Again that would leave me with one of the two matching transactions and I don't want that
0 -
Could you just SUM the Retail field by store and item and then filter your results to show items where the sum is not 0?
0 -
Would I do that with a formula field?
0 -
It would be a group by tile in the ETL. (Group by Store and Item and sum the Retail field)
You could probably do this with a calculated field as well. Something like:
SUM(SUM(`Retail`) FIXED (BY `Store`, `Item`))
0 -
I can't seem to get that to work either
0 -
Ah sorry I missed that you want both the original and the duplicate row gone when a duplicate exists. Interested in your use case - why you want both records gone when there is a duplicate and not just the duplicate? Is it that the duplicate transaction is to zero out the previous? If that is the case, and `retail` is the field that is being zero'd out then I agree with @ST_-Superman-_ 's approach. Here is how I did it using the data you provided with the correct output you are looking for:
1. Overall ETL:2. Group By step:
3. Filter Rows step
4. Output - Only unique rows
If I solved your problem, please select "yes" above
1
Categories
- All Categories
- 1.8K Product Ideas
- 1.8K Ideas Exchange
- 1.6K Connect
- 1.2K Connectors
- 300 Workbench
- 6 Cloud Amplifier
- 9 Federated
- 2.9K Transform
- 102 SQL DataFlows
- 626 Datasets
- 2.2K Magic ETL
- 3.9K Visualize
- 2.5K Charting
- 753 Beast Mode
- 61 App Studio
- 41 Variables
- 692 Automate
- 177 Apps
- 456 APIs & Domo Developer
- 49 Workflows
- 10 DomoAI
- 38 Predict
- 16 Jupyter Workspaces
- 22 R & Python Tiles
- 398 Distribute
- 115 Domo Everywhere
- 276 Scheduled Reports
- 7 Software Integrations
- 130 Manage
- 127 Governance & Security
- 8 Domo Community Gallery
- 38 Product Releases
- 11 Domo University
- 5.4K Community Forums
- 40 Getting Started
- 30 Community Member Introductions
- 110 Community Announcements
- 4.8K Archive