Removing Duplicates and making sure only the one with the most recent move in stays

I am working on an ETL where I am transforming some unit rental records. I am trying to only include the most recent rental for the unit not historical data but the input dataset includes historical data. When using the remove duplicates tile I do not have control over which ones are removed or added as they arent necassarily in historical order in the input data source.

How can I make it so that only the most recent rental for each unit is left after remove duplicate of the unit column is done.

Date column is named MoveDate and unit column is named unit. What I want is to remove duplicates in the unit column leaving only the entry with the most recent date in MoveDate

Best Answer

  • ggenovese
    ggenovese Contributor
    Answer ✓

    If you use a Rank & Window tile in your ETL you can apply the DENSE RANK function to rank on MoveDate descending and partition by the Unit Id. This should give you a Rank of 1 for the most recent date and then you can add a filter to filter for Rank = 1

Answers

  • ggenovese
    ggenovese Contributor
    Answer ✓

    If you use a Rank & Window tile in your ETL you can apply the DENSE RANK function to rank on MoveDate descending and partition by the Unit Id. This should give you a Rank of 1 for the most recent date and then you can add a filter to filter for Rank = 1