The Challenge:
Currently, the Dataset via Email connector only supports two update methods:
- Replace: Wipes the entire dataset and replaces it with the new file.
- Append: Adds the new file's rows to the bottom of the existing dataset.
The Problem:
Many external systems and partners send data updates via email (CSV/Excel) that contain a mix of new records and updates to existing records.
- If we use Replace, we lose historical data that wasn't in the latest email.
- If we use Append, we create massive duplication.
Current Workaround (Pain Point):
To handle this, we are forced to build "Landing Pad" architectures: Appending everything to a raw dataset, then building complex Magic ETL DataFlows (using Rank/Window functions) just to deduplicate the data and find the latest version of a row. This wastes storage rows and increases ETL compute time significantly.
The Proposed Solution:
Please add an "Upsert" option to the Dataset via Email connector settings, similar to how the Workbench or other API connectors function.
- Allow users to define a "Constraint Key" (or unique identifier column) in the connector configuration.
- When a new email arrives:
- If the Key matches an existing row → Update the record.
- If the Key does not exist → Insert the new record.
Value:
This would drastically simplify data pipelines for users ingesting flat files, reduce ETL overhead, and keep datasets cleaner and more efficient.