Compare rows and delete if one of the fields do not match
Hello,
I have data like this.
1, red
1, blue
2, purple
2, purple
3, yellow
3,red
I'd like to edit the data to show the data like this:
1, red
2, purple
2, purple
3, yellow
So I'd only like to only keep duplicates if the first and second field each have a match in the data. Any ideas? Assuming it is ETL somehow but can't wrap my head around it.
Best Answer
-
First, let me make sure I have my head wrapped around what you are trying to do.
Playing with your data, with a name for the columns:
Number Color
1 red
1 blue
2 purple
2 purple
3 yellow
3 red
3 blue
3 yellowIf I understand correctly, you don't want the `Number` column to be associated with more than one value for `Color`. And if there are multiple values associated with the `Number` then you only want to keep the first found value. Also, any duplicates that have the same value as the first one can also be kept?
Assuming that I am on the right track, you can get there. Here is the way I think I would do it using Magic ETL:
Use a [Rank and Window] action to add a "Row Number" column that restarts at 1 for each `Number value`. These settings should work:
- Functions, just add one:
- "Row"
- Row Number
- Sorting is required and may re-arrange your data. If you are worried about which comes first you will need an extra column that can be sorted in your data.
- Ascending
- `Number`
The data will look like this with the new column:
Number Color Row
1 red 1
1 blue 2
2 purple 1
2 purple 2
3 yellow 1
3 red 2
3 blue 3
3 yellow 4Then split to two different [Filter Rows] actions. One where `Row` = 1 and the other where `Row` Not = 1 and they will look like this:
[Filter 1]: Row = 1
Number Color Row
1 red 1
2 purple 1
3 yellow 1
[Filter 2]: Row Not = 1
Number Color Row
1 blue 2
2 purple 2
3 red 2
3 blue 3
3 yellow 4We will reference them as [Filter 1] and [Filter 2]
This gets a little harder to conceptualize, I'll try my best.
Concept: Remove the rows we don't want from the [Filter 2] set and then append it back into the [Filter 1] set. Sounds simple enough.
Find valid matches:
To identify them we will do a [Join Data] action with these settings:
- [Filter 2] is on the left ([Filter 1] goes on the right automatically)
- Inner Join
- Key Fields
- Number
- Color
The preview should look like this at that point:
Number Color Row Number_1 Color_1 Row_1
2 purple 2
3 yellow 4Now we want to add this data back into the results of [Filter 1] and we will use an [Append Rows] action. In order to ignore the new columns with an "_1" after them you want to choose the correct dataset in Step 1 of the append rows action. You want to pick the [Filter 1] input.
That should do it. Your final preview should look like:
Number Color Row
1 red 1
2 purple 1
3 yellow 1
2 purple 2
3 yellow 4You can use a [Select Columns] action to remove the `Row` column if it bugs you.
**Say "Thanks" by clicking the "heart" in the post that helped you.
**Please mark the post that solves your problem by clicking on "Accept as Solution"0 - Functions, just add one:
Answers
-
First, let me make sure I have my head wrapped around what you are trying to do.
Playing with your data, with a name for the columns:
Number Color
1 red
1 blue
2 purple
2 purple
3 yellow
3 red
3 blue
3 yellowIf I understand correctly, you don't want the `Number` column to be associated with more than one value for `Color`. And if there are multiple values associated with the `Number` then you only want to keep the first found value. Also, any duplicates that have the same value as the first one can also be kept?
Assuming that I am on the right track, you can get there. Here is the way I think I would do it using Magic ETL:
Use a [Rank and Window] action to add a "Row Number" column that restarts at 1 for each `Number value`. These settings should work:
- Functions, just add one:
- "Row"
- Row Number
- Sorting is required and may re-arrange your data. If you are worried about which comes first you will need an extra column that can be sorted in your data.
- Ascending
- `Number`
The data will look like this with the new column:
Number Color Row
1 red 1
1 blue 2
2 purple 1
2 purple 2
3 yellow 1
3 red 2
3 blue 3
3 yellow 4Then split to two different [Filter Rows] actions. One where `Row` = 1 and the other where `Row` Not = 1 and they will look like this:
[Filter 1]: Row = 1
Number Color Row
1 red 1
2 purple 1
3 yellow 1
[Filter 2]: Row Not = 1
Number Color Row
1 blue 2
2 purple 2
3 red 2
3 blue 3
3 yellow 4We will reference them as [Filter 1] and [Filter 2]
This gets a little harder to conceptualize, I'll try my best.
Concept: Remove the rows we don't want from the [Filter 2] set and then append it back into the [Filter 1] set. Sounds simple enough.
Find valid matches:
To identify them we will do a [Join Data] action with these settings:
- [Filter 2] is on the left ([Filter 1] goes on the right automatically)
- Inner Join
- Key Fields
- Number
- Color
The preview should look like this at that point:
Number Color Row Number_1 Color_1 Row_1
2 purple 2
3 yellow 4Now we want to add this data back into the results of [Filter 1] and we will use an [Append Rows] action. In order to ignore the new columns with an "_1" after them you want to choose the correct dataset in Step 1 of the append rows action. You want to pick the [Filter 1] input.
That should do it. Your final preview should look like:
Number Color Row
1 red 1
2 purple 1
3 yellow 1
2 purple 2
3 yellow 4You can use a [Select Columns] action to remove the `Row` column if it bugs you.
**Say "Thanks" by clicking the "heart" in the post that helped you.
**Please mark the post that solves your problem by clicking on "Accept as Solution"0 - Functions, just add one:
-
Fantastic Reply! thank you for the walkthrough and that should do it! Will let you knowif I have any questions but looks somewhat straightforward!
0
Categories
- All Categories
- 1.7K Product Ideas
- 1.7K Ideas Exchange
- 1.5K Connect
- 1.2K Connectors
- 292 Workbench
- 4 Cloud Amplifier
- 8 Federated
- 2.8K Transform
- 95 SQL DataFlows
- 600 Datasets
- 2.1K Magic ETL
- 3.7K Visualize
- 2.4K Charting
- 685 Beast Mode
- 43 App Studio
- 38 Variables
- 656 Automate
- 170 Apps
- 439 APIs & Domo Developer
- 42 Workflows
- 5 DomoAI
- 32 Predict
- 12 Jupyter Workspaces
- 20 R & Python Tiles
- 384 Distribute
- 110 Domo Everywhere
- 268 Scheduled Reports
- 6 Software Integrations
- 111 Manage
- 108 Governance & Security
- 8 Domo University
- 25 Product Releases
- Community Forums
- 39 Getting Started
- 29 Community Member Introductions
- 98 Community Announcements
- Domo Community Gallery
- 4.8K Archive