It's easy to remove Duplicates.....How do I KEEP only the Duplicates in a Dataset?


I have a large dataset that I am constantly adding to, and the unique identifier for each item is a 17 or 22 character Text String.  It is critical for me to quickly identify if I have duplicated a previous item when I add to the dataset.


ETL makes it simple to remove duplicates from a dataset.....but is there a way to eliminate everything BUT the duplicates???  Ideally, I'd like to either:


1.  Create an alert anytime a new duplicate value is added to the dataset, or


2.  Create an Output Dataset that consists ONLY of the rows that have a duplicate value in a specific column.


Thanks in advance for any help.

Best Answer


  • coreyvsmith

    Thank you......that worked perfectly, and your instructions were perfectly clear and easy to implement!

  • colinr

    Worked a treat.  thanks!

  • leeloo_dallas

    I tried this but I keep getting duplicates.

    What Join did you use and what which columns from what dataset did you drop?

    My filter found no nulls.