How to remove duplicates without affecting the blank cells?

Hii Team,

I have a dataset which is a join of a user name and phone number and their transactions ID however I need to remove duplicate transaction ID without removing the blank cells as if the user has no transaction ID against it. That means that user has not conducted any transactions.

Eg: Dataset

Sr No Name Transaction ID
1 John 123
2 John 123
3 Maria Blank Cell
4 James Blank Cell

So if I remove duplicates from transaction ID in the ETL that is removing the blank cell rows as well (Maria & James along with John). I want the blank cell to remain and want the duplicated John record to be removed as per transaction ID.

Requesting your help..please.

Best Answer

  • MichelleH
    MichelleH Coach
    Answer ✓

    @Vishz14 I would suggest creating two separate branches of your MagicETL: one filtered to rows where Transaction ID is null, and the other where Transaction ID is not null. You can use Remove Duplicates tile on the branch that contains Transaction ID, then merge it back with the rows without a Transaction ID using an Append Rows tile.

Answers

  • AnwarBham
    AnwarBham Contributor

    can you not use a window rank with row number include all the columns the and order by then create a filter to remove row number 1

    im assuming its magic ETl you are using

  • MichelleH
    MichelleH Coach
    Answer ✓

    @Vishz14 I would suggest creating two separate branches of your MagicETL: one filtered to rows where Transaction ID is null, and the other where Transaction ID is not null. You can use Remove Duplicates tile on the branch that contains Transaction ID, then merge it back with the rows without a Transaction ID using an Append Rows tile.

  • Hii @MichelleH, that worked..thank you so much.