Removing Duplicates with Condition

Hi All. My dataset have a lot of duplicate rows so I wanted to use "Remove Duplicates" in ETL based on the Shipment ID.

However, there are some products that are hazardous. I want to have the hazardous row left remaining when I remove the duplicates if there is a mix of Haz and non-Haz. Any function or use of "Rank & Window" I can use to have my desired output?


SAMPLE

Shipment ID vs Hazardous Tag

1 - Non-Haz

1 - Haz

1 - Non-Haz

2 - Non-Haz

2 - Non-Haz

3 - Haz

3 - Haz

4 - Non-Haz

4 - Non-Haz

4 - Haz


DESIRED OUTPUT

1 - Haz

2 - Non-Haz

3 - Haz

4 - Haz


Logic: When at least one is hazardous, return 1 line as hazardous

Best Answer

  • GrantSmith
    GrantSmith Coach
    Answer ✓

    You can take the minimum value of your Haz / Non-Haz text in your rank and window to determine if had any Hazardous.


    Alternatively, you can use a case statement to return 1 if the value is hazardous and 0 if it's not and then take the max of that value in your case statement and then convert it back to a text value using another formula tile.

    **Was this post helpful? Click Agree or Like below**
    **Did this solve your problem? Accept it as a solution!**

Answers

  • GrantSmith
    GrantSmith Coach
    Answer ✓

    You can take the minimum value of your Haz / Non-Haz text in your rank and window to determine if had any Hazardous.


    Alternatively, you can use a case statement to return 1 if the value is hazardous and 0 if it's not and then take the max of that value in your case statement and then convert it back to a text value using another formula tile.

    **Was this post helpful? Click Agree or Like below**
    **Did this solve your problem? Accept it as a solution!**
  • @GrantSmith thank you sir! it worked! Did the ranking to put HAZ on top and when I removed duplicates, it's the one getting selected. :)