Compare rows and delete if one of the fields do not match
Hello,
I have data like this.
1, red
1, blue
2, purple
2, purple
3, yellow
3,red
I'd like to edit the data to show the data like this:
1, red
2, purple
2, purple
3, yellow
So I'd only like to only keep duplicates if the first and second field each have a match in the data. Any ideas? Assuming it is ETL somehow but can't wrap my head around it.
Best Answer
-
First, let me make sure I have my head wrapped around what you are trying to do.
Playing with your data, with a name for the columns:
Number Color
1 red
1 blue
2 purple
2 purple
3 yellow
3 red
3 blue
3 yellowIf I understand correctly, you don't want the `Number` column to be associated with more than one value for `Color`. And if there are multiple values associated with the `Number` then you only want to keep the first found value. Also, any duplicates that have the same value as the first one can also be kept?
Assuming that I am on the right track, you can get there. Here is the way I think I would do it using Magic ETL:
Use a [Rank and Window] action to add a "Row Number" column that restarts at 1 for each `Number value`. These settings should work:
- Functions, just add one:
- "Row"
- Row Number
- Sorting is required and may re-arrange your data. If you are worried about which comes first you will need an extra column that can be sorted in your data.
- Ascending
- `Number`
The data will look like this with the new column:
Number Color Row
1 red 1
1 blue 2
2 purple 1
2 purple 2
3 yellow 1
3 red 2
3 blue 3
3 yellow 4Then split to two different [Filter Rows] actions. One where `Row` = 1 and the other where `Row` Not = 1 and they will look like this:
[Filter 1]: Row = 1
Number Color Row
1 red 1
2 purple 1
3 yellow 1
[Filter 2]: Row Not = 1
Number Color Row
1 blue 2
2 purple 2
3 red 2
3 blue 3
3 yellow 4We will reference them as [Filter 1] and [Filter 2]
This gets a little harder to conceptualize, I'll try my best.
Concept: Remove the rows we don't want from the [Filter 2] set and then append it back into the [Filter 1] set. Sounds simple enough.
Find valid matches:
To identify them we will do a [Join Data] action with these settings:
- [Filter 2] is on the left ([Filter 1] goes on the right automatically)
- Inner Join
- Key Fields
- Number
- Color
The preview should look like this at that point:
Number Color Row Number_1 Color_1 Row_1
2 purple 2
3 yellow 4Now we want to add this data back into the results of [Filter 1] and we will use an [Append Rows] action. In order to ignore the new columns with an "_1" after them you want to choose the correct dataset in Step 1 of the append rows action. You want to pick the [Filter 1] input.
That should do it. Your final preview should look like:
Number Color Row
1 red 1
2 purple 1
3 yellow 1
2 purple 2
3 yellow 4You can use a [Select Columns] action to remove the `Row` column if it bugs you.
**Say "Thanks" by clicking the "heart" in the post that helped you.
**Please mark the post that solves your problem by clicking on "Accept as Solution"0 - Functions, just add one:
Answers
-
First, let me make sure I have my head wrapped around what you are trying to do.
Playing with your data, with a name for the columns:
Number Color
1 red
1 blue
2 purple
2 purple
3 yellow
3 red
3 blue
3 yellowIf I understand correctly, you don't want the `Number` column to be associated with more than one value for `Color`. And if there are multiple values associated with the `Number` then you only want to keep the first found value. Also, any duplicates that have the same value as the first one can also be kept?
Assuming that I am on the right track, you can get there. Here is the way I think I would do it using Magic ETL:
Use a [Rank and Window] action to add a "Row Number" column that restarts at 1 for each `Number value`. These settings should work:
- Functions, just add one:
- "Row"
- Row Number
- Sorting is required and may re-arrange your data. If you are worried about which comes first you will need an extra column that can be sorted in your data.
- Ascending
- `Number`
The data will look like this with the new column:
Number Color Row
1 red 1
1 blue 2
2 purple 1
2 purple 2
3 yellow 1
3 red 2
3 blue 3
3 yellow 4Then split to two different [Filter Rows] actions. One where `Row` = 1 and the other where `Row` Not = 1 and they will look like this:
[Filter 1]: Row = 1
Number Color Row
1 red 1
2 purple 1
3 yellow 1
[Filter 2]: Row Not = 1
Number Color Row
1 blue 2
2 purple 2
3 red 2
3 blue 3
3 yellow 4We will reference them as [Filter 1] and [Filter 2]
This gets a little harder to conceptualize, I'll try my best.
Concept: Remove the rows we don't want from the [Filter 2] set and then append it back into the [Filter 1] set. Sounds simple enough.
Find valid matches:
To identify them we will do a [Join Data] action with these settings:
- [Filter 2] is on the left ([Filter 1] goes on the right automatically)
- Inner Join
- Key Fields
- Number
- Color
The preview should look like this at that point:
Number Color Row Number_1 Color_1 Row_1
2 purple 2
3 yellow 4Now we want to add this data back into the results of [Filter 1] and we will use an [Append Rows] action. In order to ignore the new columns with an "_1" after them you want to choose the correct dataset in Step 1 of the append rows action. You want to pick the [Filter 1] input.
That should do it. Your final preview should look like:
Number Color Row
1 red 1
2 purple 1
3 yellow 1
2 purple 2
3 yellow 4You can use a [Select Columns] action to remove the `Row` column if it bugs you.
**Say "Thanks" by clicking the "heart" in the post that helped you.
**Please mark the post that solves your problem by clicking on "Accept as Solution"0 - Functions, just add one:
-
Fantastic Reply! thank you for the walkthrough and that should do it! Will let you knowif I have any questions but looks somewhat straightforward!
0
Categories
- 10.5K All Categories
- 8 Connect
- 918 Connectors
- 250 Workbench
- 470 Transform
- 1.7K Magic ETL
- 69 SQL DataFlows
- 477 Datasets
- 194 Visualize
- 253 Beast Mode
- 2.1K Charting
- 11 Variables
- 17 Automate
- 354 APIs & Domo Developer
- 89 Apps
- 3 Workflows
- 20 Predict
- 5 Jupyter Workspaces
- 15 R & Python Tiles
- 247 Distribute
- 63 Domo Everywhere
- 243 Scheduled Reports
- 21 Manage
- 42 Governance & Security
- 174 Product Ideas
- 1.2K Ideas Exchange
- 12 Community Forums
- 27 Getting Started
- 14 Community Member Introductions
- 55 Community News
- 4.5K Archive