Creating a marker and deduping in ETL
Hey guys,
Hope you all are doing well. I have an issue with data transformation in the ETL. It is best described on the screenshot below. I have multiple entries for duplicate IDs (one row for each product) and want to mark all entries for a given ID that contain Product 'c' as 1. That's the formula I used below to visualize. My next step after creating that is deduping based on the ID so that as a final product I have 1 entry for each ID and a 'Marker' indicating whether initially there was a Product 'c' for that ID. Is that possible to recreate in the ETL?
Thanks!
Best Answers
-
using:
CASE
WHEN SUM(CASE WHEN `Product`='c' THEN 1 ELSE 0 END) OVER (PARTITION BY `ID`) > 0 THEN 1
ELSE 0
END** Was this post helpful? Click Agree or Like below. **
** Did this solve your problem? Accept it as a solution! **1 -
Yes, I would create a column with this case statement in a formula tile:
CASE WHEN `Product`='c' then `Column with values you want to add` else 0 END
Then do a group-by tile that sums that column by ID.
Looking at @ArborRose's solution, it looks like you could do it all within the group-by tile using a partition formula for the aggregation, but those feel like dark-magic to me, so personally I think it would be worth breaking it into two simpler steps that I know I understand.
Please π‘/π/π/π this post if you read it and found it helpful.
Please accept the answer if it solved your problem.
1
Answers
-
Yes, you could use a Group-by tile, select ID as your group. If your data is structured like this, where ID 1 is all 1s for the Marker column, then you could use "Average" as your aggregating function.
On the other hand, the way I'd approach this is I would create the marker column with a simple CASE WHEN Product = 'c' then 1 else 0 end. Then you could aggregate with a sum and get the same result.
Please π‘/π/π/π this post if you read it and found it helpful.
Please accept the answer if it solved your problem.
1 -
@DavidChurchman Thank you, exactly what I was looking for. I'm thinking if it is possible to take it a step further, my dataset also includes quantities for those products. Please see the screenshot, I added more products for order ID 4. My dataset can include multiple entries for the same product for the same ID. Looking at ID 4. Is there a way to sum up both entries for product C, so that it is combined together and shown as 3?
0 -
using:
CASE
WHEN SUM(CASE WHEN `Product`='c' THEN 1 ELSE 0 END) OVER (PARTITION BY `ID`) > 0 THEN 1
ELSE 0
END** Was this post helpful? Click Agree or Like below. **
** Did this solve your problem? Accept it as a solution! **1 -
Yes, I would create a column with this case statement in a formula tile:
CASE WHEN `Product`='c' then `Column with values you want to add` else 0 END
Then do a group-by tile that sums that column by ID.
Looking at @ArborRose's solution, it looks like you could do it all within the group-by tile using a partition formula for the aggregation, but those feel like dark-magic to me, so personally I think it would be worth breaking it into two simpler steps that I know I understand.
Please π‘/π/π/π this post if you read it and found it helpful.
Please accept the answer if it solved your problem.
1
Categories
- All Categories
- 1.8K Product Ideas
- 1.8K Ideas Exchange
- 1.5K Connect
- 1.2K Connectors
- 296 Workbench
- 6 Cloud Amplifier
- 8 Federated
- 2.9K Transform
- 100 SQL DataFlows
- 614 Datasets
- 2.2K Magic ETL
- 3.8K Visualize
- 2.5K Charting
- 729 Beast Mode
- 53 App Studio
- 40 Variables
- 677 Automate
- 173 Apps
- 451 APIs & Domo Developer
- 45 Workflows
- 8 DomoAI
- 34 Predict
- 14 Jupyter Workspaces
- 20 R & Python Tiles
- 394 Distribute
- 113 Domo Everywhere
- 275 Scheduled Reports
- 6 Software Integrations
- 121 Manage
- 118 Governance & Security
- Domo Community Gallery
- 32 Product Releases
- 10 Domo University
- 5.4K Community Forums
- 40 Getting Started
- 30 Community Member Introductions
- 108 Community Announcements
- 4.8K Archive