Creating a marker and deduping in ETL
Hey guys,
Hope you all are doing well. I have an issue with data transformation in the ETL. It is best described on the screenshot below. I have multiple entries for duplicate IDs (one row for each product) and want to mark all entries for a given ID that contain Product 'c' as 1. That's the formula I used below to visualize. My next step after creating that is deduping based on the ID so that as a final product I have 1 entry for each ID and a 'Marker' indicating whether initially there was a Product 'c' for that ID. Is that possible to recreate in the ETL?
Thanks!
Best Answers
-
using:
CASE
WHEN SUM(CASE WHEN `Product`='c' THEN 1 ELSE 0 END) OVER (PARTITION BY `ID`) > 0 THEN 1
ELSE 0
END** Was this post helpful? Click Agree or Like below. **
** Did this solve your problem? Accept it as a solution! **1 -
Yes, I would create a column with this case statement in a formula tile:
CASE WHEN `Product`='c' then `Column with values you want to add` else 0 END
Then do a group-by tile that sums that column by ID.
Looking at @ArborRose's solution, it looks like you could do it all within the group-by tile using a partition formula for the aggregation, but those feel like dark-magic to me, so personally I think it would be worth breaking it into two simpler steps that I know I understand.
Please π‘/π/π/π this post if you read it and found it helpful.
Please accept the answer if it solved your problem.
1
Answers
-
Yes, you could use a Group-by tile, select ID as your group. If your data is structured like this, where ID 1 is all 1s for the Marker column, then you could use "Average" as your aggregating function.
On the other hand, the way I'd approach this is I would create the marker column with a simple CASE WHEN Product = 'c' then 1 else 0 end. Then you could aggregate with a sum and get the same result.
Please π‘/π/π/π this post if you read it and found it helpful.
Please accept the answer if it solved your problem.
1 -
@DavidChurchman Thank you, exactly what I was looking for. I'm thinking if it is possible to take it a step further, my dataset also includes quantities for those products. Please see the screenshot, I added more products for order ID 4. My dataset can include multiple entries for the same product for the same ID. Looking at ID 4. Is there a way to sum up both entries for product C, so that it is combined together and shown as 3?
0 -
using:
CASE
WHEN SUM(CASE WHEN `Product`='c' THEN 1 ELSE 0 END) OVER (PARTITION BY `ID`) > 0 THEN 1
ELSE 0
END** Was this post helpful? Click Agree or Like below. **
** Did this solve your problem? Accept it as a solution! **1 -
Yes, I would create a column with this case statement in a formula tile:
CASE WHEN `Product`='c' then `Column with values you want to add` else 0 END
Then do a group-by tile that sums that column by ID.
Looking at @ArborRose's solution, it looks like you could do it all within the group-by tile using a partition formula for the aggregation, but those feel like dark-magic to me, so personally I think it would be worth breaking it into two simpler steps that I know I understand.
Please π‘/π/π/π this post if you read it and found it helpful.
Please accept the answer if it solved your problem.
1
Categories
- All Categories
- 1.8K Product Ideas
- 1.8K Ideas Exchange
- 1.6K Connect
- 1.2K Connectors
- 300 Workbench
- 6 Cloud Amplifier
- 9 Federated
- 2.9K Transform
- 102 SQL DataFlows
- 626 Datasets
- 2.2K Magic ETL
- 3.9K Visualize
- 2.5K Charting
- 753 Beast Mode
- 61 App Studio
- 41 Variables
- 692 Automate
- 177 Apps
- 456 APIs & Domo Developer
- 49 Workflows
- 10 DomoAI
- 38 Predict
- 16 Jupyter Workspaces
- 22 R & Python Tiles
- 398 Distribute
- 115 Domo Everywhere
- 276 Scheduled Reports
- 7 Software Integrations
- 130 Manage
- 127 Governance & Security
- 8 Domo Community Gallery
- 38 Product Releases
- 11 Domo University
- 5.4K Community Forums
- 40 Getting Started
- 30 Community Member Introductions
- 110 Community Announcements
- 4.8K Archive