Removing Duplicates with Condition

Question

Hi All. My dataset have a lot of duplicate rows so I wanted to use "Remove Duplicates" in ETL based on the Shipment ID.

However, there are some products that are hazardous. I want to have the hazardous row left remaining when I remove the duplicates if there is a mix of Haz and non-Haz. Any function or use of "Rank & Window" I can use to have my desired output?

SAMPLE

Shipment ID vs Hazardous Tag

1 - Non-Haz

1 - Haz

1 - Non-Haz

2 - Non-Haz

3 - Haz

4 - Non-Haz

4 - Haz

DESIRED OUTPUT

1 - Haz

2 - Non-Haz

3 - Haz

4 - Haz

Logic: When at least one is hazardous, return 1 line as hazardous

GrantSmith · Accepted Answer

You can take the minimum value of your Haz / Non-Haz text in your rank and window to determine if had any Hazardous.

Alternatively, you can use a case statement to return 1 if the value is hazardous and 0 if it's not and then take the max of that value in your case statement and then convert it back to a text value using another formula tile.