Add Hints to Deduplicate Tile

jimsteph
jimsteph Contributor
edited August 2023 in Magic ETL Ideas

When I ingest the customer table from our POS, for example, and compare individual customer records to the last ingestion, I want to keep the older of the two records if selected fields show the records to be duplicates otherwise. The Deduplicate tile is the obvious choice to use, but appears to choose at random which record to keep and so won't work for me. I would like to add a section to the tile for hints, so you select a field (like _BATCH_LAST_RUN_) or a formula, and then indicate that you want to keep the record with the newest/greatest or oldest/least hint value.

As an aside, I know I can simulate this with the Rank & Window tile by putting the fields that I evaluate for duplication in the Window section, sorting on the date field, and filtering on the row number, but that's a workaround, and it feels more expensive to use than the Deduplicate tile (I could be wrong about that, though). My suggestion would make the Deduplicate tile usable for more than just simple use cases.

Tagged:
2
2 votes

Active · Last Updated