Magic ETL

Magic ETL

Creating a marker and deduping in ETL

Hey guys,

Hope you all are doing well. I have an issue with data transformation in the ETL. It is best described on the screenshot below. I have multiple entries for duplicate IDs (one row for each product) and want to mark all entries for a given ID that contain Product 'c' as 1. That's the formula I used below to visualize. My next step after creating that is deduping based on the ID so that as a final product I have 1 entry for each ID and a 'Marker' indicating whether initially there was a Product 'c' for that ID. Is that possible to recreate in the ETL?

Thanks!

Welcome!

It looks like you're new here. Members get access to exclusive content, events, rewards, and more. Sign in or register to get started.
Sign In

Best Answers

  • Coach
    Answer ✓

    using:

    CASE
    WHEN SUM(CASE WHEN `Product`='c' THEN 1 ELSE 0 END) OVER (PARTITION BY `ID`) > 0 THEN 1
    ELSE 0
    END

    ** Was this post helpful? Click Agree or Like below. **
    ** Did this solve your problem? Accept it as a solution! **

  • Answer ✓

    Yes, I would create a column with this case statement in a formula tile:

    CASE WHEN `Product`='c' then `Column with values you want to add` else 0 END

    Then do a group-by tile that sums that column by ID.

    Looking at @ArborRose's solution, it looks like you could do it all within the group-by tile using a partition formula for the aggregation, but those feel like dark-magic to me, so personally I think it would be worth breaking it into two simpler steps that I know I understand.

    Please 💡/💖/👍/😊 this post if you read it and found it helpful.

    Please accept the answer if it solved your problem.

Answers

  • Yes, you could use a Group-by tile, select ID as your group. If your data is structured like this, where ID 1 is all 1s for the Marker column, then you could use "Average" as your aggregating function.

    On the other hand, the way I'd approach this is I would create the marker column with a simple CASE WHEN Product = 'c' then 1 else 0 end. Then you could aggregate with a sum and get the same result.

    Please 💡/💖/👍/😊 this post if you read it and found it helpful.

    Please accept the answer if it solved your problem.

  • @DavidChurchman Thank you, exactly what I was looking for. I'm thinking if it is possible to take it a step further, my dataset also includes quantities for those products. Please see the screenshot, I added more products for order ID 4. My dataset can include multiple entries for the same product for the same ID. Looking at ID 4. Is there a way to sum up both entries for product C, so that it is combined together and shown as 3?

  • Coach
    Answer ✓

    using:

    CASE
    WHEN SUM(CASE WHEN `Product`='c' THEN 1 ELSE 0 END) OVER (PARTITION BY `ID`) > 0 THEN 1
    ELSE 0
    END

    ** Was this post helpful? Click Agree or Like below. **
    ** Did this solve your problem? Accept it as a solution! **

  • Answer ✓

    Yes, I would create a column with this case statement in a formula tile:

    CASE WHEN `Product`='c' then `Column with values you want to add` else 0 END

    Then do a group-by tile that sums that column by ID.

    Looking at @ArborRose's solution, it looks like you could do it all within the group-by tile using a partition formula for the aggregation, but those feel like dark-magic to me, so personally I think it would be worth breaking it into two simpler steps that I know I understand.

    Please 💡/💖/👍/😊 this post if you read it and found it helpful.

    Please accept the answer if it solved your problem.

Welcome!

It looks like you're new here. Members get access to exclusive content, events, rewards, and more. Sign in or register to get started.
Sign In