Trying to use Unpivot
I have a dataset.
1 order has 10 stages. each stage has a schedule date, completed date and status.
1 order has 1 row and 30 columns.
I am trying to unpivot to 1 order having 10 rows and 6 columns. I used 3 different unpivot tiles for it.
I built out the ETL but not sure why its taking forever to run.
Best Answers
-
@shreeyab - does the data look something like this…
** Was this post helpful? Click Agree or Like below. **
** Did this solve your problem? Accept it as a solution! **1 -
I follow. Currently, your columns are something like:
- Order Number
- Schedule Date 1
- Completed Date 1
- Status 1
- Schedule Date 2
- Completed Date 2
- Status 2
- ….
- Schedule Date 10
- Completed Date 10
- Status 10
You want to unpivot so your columns are:
- Order Number
- Stage Number
- Schedule Date
- Completed Date
- Status
If un-pivot is giving you trouble, it wouldn't be that hard to do this manually (and maybe more computationally efficient? Not sure):
- Select columns: Order Number, Schedule Date 1, Completed Date 1, Status 1; rename without 1 (if that's part of the column name)
- Add a constant with the stage number
- Do these two steps for each of the 10 groups of columns. You could select the two blocks and copy-paste, changing just which columns and stage.
- Append the 10 stages
Alternatively, you could do the dynamic unpivot once, selecting the Order ID as not pivoted, and something like "Metric" to store the column names and "Value" to store the values, so your three columns are:
- Order ID
- Metric (Schedule Date 1/Completed Date 1/Status 1…Schedule Date 10/Complete Date 10/Status 10)
- Value
Then you could use a formula tile or split tile, to split the Metric name and stage, so your four columns are:
- Order ID
- Stage (1, 2, …10)
- Metric (Schedule Date, Completed Date, Status)
- Value
Then you could split that into three datasets using filter blocks: filter for metric=Schedule Date, metric=Completed Date, metric=Status)
Then you could join those three filter tiles together using Order ID and Stage as keys, renaming the "Value" column to the metric (Schedule Date, Completed Date, Status)
I had a similar ETL for a dataset with dozens of columns in pairs like "Question 1 TEXT", "Question 2 VALUE", "Question 3 TEXT", "Question 3 VALUE". I used a dynamic unpivot, then filtered for just the rows with TEXT and the rows with VALUE, then rejoined them, so I basically three columns; ID, TEXT, VALUE. It didn't take very long to run on a pretty big dataset.
EDIT: Marcel's answer of using the Pivot after the dynamic un-pivot and split makes more sense than filter+join. I'd be curious to see if Marcel's method or the manual method I started this answer with runs faster. The advantage of the dynamic-unpivot vs. manual appends is that it would be dynamic to different numbers of stages. You could have 1 stage or 50, and the same method works.
Please 💡/💖/👍/😊 this post if you read it and found it helpful.
Please accept the answer if it solved your problem.
3 -
Perhaps I'm oversimplifying it, but you should be able to do this with just 4 tiles: a Dynamic Unpivot, a Formula, a Alter Columns and a Pivot one. As for performance I you'd need to test it out but in General this is what the ETL would look like:
Input Dataset (you could have any number of Stage # columns for each metric):
Output Dateset
Explanation
- You unpivot the current data so you'll get entries for every single possible value column, making sure to keep all those that are meant to be repeated each time (in my example
Id
andName
) - You take the column names (hopefully that follow an standard, else you'll need an additional alter tile before to rename your columns to a workable standard) and break them into the Stage being measured and the measure in this case, so my columns followed the format Stage
#
Measure
, so I just took the second word after a space (#
) and the third one (Measure
) and stored them as their own columns. - I removed the column that had the concatenated name of the original column and make sure that the stage is treated as a number.
- I pivot again based out of the common values (in my case
Id
,Name
and the newly createdStage Number
), use theMeasure
one as the labels for the columns and just created the mapping for the 3 possible values I care for (Scheduled, Completed and Status)
In my case each order has only 3 rows since I only had 3 stages in my source, but new will be added depending on the columns you have at the source.
Code (for copy paste so you only need to do small adjustments)
{"contentType":"domo/dataflow-actions","data":[{"name":"TEST | Unpivot","id":"357de108-dccf-4135-8eaa-3ff3944f3d3d","type":"LoadFromVault","gui":{"x":36,"y":180,"color":3238043},"dependsOn":[],"removeByDefault":false,"notes":[],"dataSourceId":"7bd8e7a0-821c-4f3d-8754-eb04cf1fde9a","executeFlowWhenUpdated":false,"onlyLoadNewVersions":false,"columnSettings":{},"visiblePartitionColumn":null,"versionWindow":null},{"name":"Dynamic Unpivot","id":"0b6636c1-b3bb-40c4-b9d1-48e554a1afb6","type":"NormalizeAll","gui":{"x":174,"y":180},"dependsOn":["357de108-dccf-4135-8eaa-3ff3944f3d3d"],"removeByDefault":false,"notes":[],"idFields":["Id","Name"],"keyField":"Column Name","valueField":"Value"},{"name":"Split Column Names","id":"ce463228-5895-4532-86cc-6e972eddbbf1","type":"ExpressionEvaluator","gui":{"x":288,"y":180},"dependsOn":["0b6636c1-b3bb-40c4-b9d1-48e554a1afb6"],"removeByDefault":false,"notes":[],"expressions":[{"fieldName":"Stage Number","expression":"SPLIT_PART(Column Name,' ',2)","settings":null},{"fieldName":"Measure","expression":"SPLIT_PART(Column Name,' ',3)","settings":null}]},{"name":"Pivot","id":"61fc10bf-a9bb-4d7c-9225-26dd73182000","type":"Denormaliser","gui":{"x":480,"y":180},"dependsOn":["d54e663d-44b0-49d8-8147-5d045d6d5c2f"],"removeByDefault":false,"notes":[],"keyField":"Measure","group":[{"name":"Id"},{"name":"Name"},{"name":"Stage Number"}],"partitionedAggregation":false,"fields":[{"fieldName":"Value","keyValue":"Scheduled","targetName":"Scheduled","targetType":"STRING"},{"fieldName":"Value","keyValue":"Completed","targetName":"Completed","targetType":"STRING"},{"fieldName":"Value","keyValue":"Status","targetName":"Status","targetType":"STRING"}]},{"name":"TEST | Re-Pivoted","id":"8259db5f-ec87-4733-857e-54e2035bcc59","type":"PublishToVault","gui":{"x":600,"y":180},"dependsOn":["61fc10bf-a9bb-4d7c-9225-26dd73182000"],"removeByDefault":false,"notes":[],"dataSource":{"name":"TEST | Re-Pivoted"},"partitionIdColumns":[],"upsertColumns":[],"retainPartitionExpression":""},{"name":"Alter Columns","id":"d54e663d-44b0-49d8-8147-5d045d6d5c2f","type":"Metadata","gui":{"x":384,"y":180},"dependsOn":["ce463228-5895-4532-86cc-6e972eddbbf1"],"removeByDefault":false,"notes":[],"fields":[{"name":"Column Name","remove":true},{"name":"Stage Number","type":"LONG","remove":false}]}]}
3 - You unpivot the current data so you'll get entries for every single possible value column, making sure to keep all those that are meant to be repeated each time (in my example
Answers
-
Time to run will depend a lot on the number of records you have on your original dataset and the operations being performed, while unpivot usually is usually a heavy operation, I'd like to know why 3 tiles? Can you share a sample (with mock data or sanitized) of what the Input looks like and what the output should be?
2 -
I dont know how to explain 🤐
0 -
Im unpivoting in parts
0 -
@shreeyab - does the data look something like this…
** Was this post helpful? Click Agree or Like below. **
** Did this solve your problem? Accept it as a solution! **1 -
I follow. Currently, your columns are something like:
- Order Number
- Schedule Date 1
- Completed Date 1
- Status 1
- Schedule Date 2
- Completed Date 2
- Status 2
- ….
- Schedule Date 10
- Completed Date 10
- Status 10
You want to unpivot so your columns are:
- Order Number
- Stage Number
- Schedule Date
- Completed Date
- Status
If un-pivot is giving you trouble, it wouldn't be that hard to do this manually (and maybe more computationally efficient? Not sure):
- Select columns: Order Number, Schedule Date 1, Completed Date 1, Status 1; rename without 1 (if that's part of the column name)
- Add a constant with the stage number
- Do these two steps for each of the 10 groups of columns. You could select the two blocks and copy-paste, changing just which columns and stage.
- Append the 10 stages
Alternatively, you could do the dynamic unpivot once, selecting the Order ID as not pivoted, and something like "Metric" to store the column names and "Value" to store the values, so your three columns are:
- Order ID
- Metric (Schedule Date 1/Completed Date 1/Status 1…Schedule Date 10/Complete Date 10/Status 10)
- Value
Then you could use a formula tile or split tile, to split the Metric name and stage, so your four columns are:
- Order ID
- Stage (1, 2, …10)
- Metric (Schedule Date, Completed Date, Status)
- Value
Then you could split that into three datasets using filter blocks: filter for metric=Schedule Date, metric=Completed Date, metric=Status)
Then you could join those three filter tiles together using Order ID and Stage as keys, renaming the "Value" column to the metric (Schedule Date, Completed Date, Status)
I had a similar ETL for a dataset with dozens of columns in pairs like "Question 1 TEXT", "Question 2 VALUE", "Question 3 TEXT", "Question 3 VALUE". I used a dynamic unpivot, then filtered for just the rows with TEXT and the rows with VALUE, then rejoined them, so I basically three columns; ID, TEXT, VALUE. It didn't take very long to run on a pretty big dataset.
EDIT: Marcel's answer of using the Pivot after the dynamic un-pivot and split makes more sense than filter+join. I'd be curious to see if Marcel's method or the manual method I started this answer with runs faster. The advantage of the dynamic-unpivot vs. manual appends is that it would be dynamic to different numbers of stages. You could have 1 stage or 50, and the same method works.
Please 💡/💖/👍/😊 this post if you read it and found it helpful.
Please accept the answer if it solved your problem.
3 -
Perhaps I'm oversimplifying it, but you should be able to do this with just 4 tiles: a Dynamic Unpivot, a Formula, a Alter Columns and a Pivot one. As for performance I you'd need to test it out but in General this is what the ETL would look like:
Input Dataset (you could have any number of Stage # columns for each metric):
Output Dateset
Explanation
- You unpivot the current data so you'll get entries for every single possible value column, making sure to keep all those that are meant to be repeated each time (in my example
Id
andName
) - You take the column names (hopefully that follow an standard, else you'll need an additional alter tile before to rename your columns to a workable standard) and break them into the Stage being measured and the measure in this case, so my columns followed the format Stage
#
Measure
, so I just took the second word after a space (#
) and the third one (Measure
) and stored them as their own columns. - I removed the column that had the concatenated name of the original column and make sure that the stage is treated as a number.
- I pivot again based out of the common values (in my case
Id
,Name
and the newly createdStage Number
), use theMeasure
one as the labels for the columns and just created the mapping for the 3 possible values I care for (Scheduled, Completed and Status)
In my case each order has only 3 rows since I only had 3 stages in my source, but new will be added depending on the columns you have at the source.
Code (for copy paste so you only need to do small adjustments)
{"contentType":"domo/dataflow-actions","data":[{"name":"TEST | Unpivot","id":"357de108-dccf-4135-8eaa-3ff3944f3d3d","type":"LoadFromVault","gui":{"x":36,"y":180,"color":3238043},"dependsOn":[],"removeByDefault":false,"notes":[],"dataSourceId":"7bd8e7a0-821c-4f3d-8754-eb04cf1fde9a","executeFlowWhenUpdated":false,"onlyLoadNewVersions":false,"columnSettings":{},"visiblePartitionColumn":null,"versionWindow":null},{"name":"Dynamic Unpivot","id":"0b6636c1-b3bb-40c4-b9d1-48e554a1afb6","type":"NormalizeAll","gui":{"x":174,"y":180},"dependsOn":["357de108-dccf-4135-8eaa-3ff3944f3d3d"],"removeByDefault":false,"notes":[],"idFields":["Id","Name"],"keyField":"Column Name","valueField":"Value"},{"name":"Split Column Names","id":"ce463228-5895-4532-86cc-6e972eddbbf1","type":"ExpressionEvaluator","gui":{"x":288,"y":180},"dependsOn":["0b6636c1-b3bb-40c4-b9d1-48e554a1afb6"],"removeByDefault":false,"notes":[],"expressions":[{"fieldName":"Stage Number","expression":"SPLIT_PART(Column Name,' ',2)","settings":null},{"fieldName":"Measure","expression":"SPLIT_PART(Column Name,' ',3)","settings":null}]},{"name":"Pivot","id":"61fc10bf-a9bb-4d7c-9225-26dd73182000","type":"Denormaliser","gui":{"x":480,"y":180},"dependsOn":["d54e663d-44b0-49d8-8147-5d045d6d5c2f"],"removeByDefault":false,"notes":[],"keyField":"Measure","group":[{"name":"Id"},{"name":"Name"},{"name":"Stage Number"}],"partitionedAggregation":false,"fields":[{"fieldName":"Value","keyValue":"Scheduled","targetName":"Scheduled","targetType":"STRING"},{"fieldName":"Value","keyValue":"Completed","targetName":"Completed","targetType":"STRING"},{"fieldName":"Value","keyValue":"Status","targetName":"Status","targetType":"STRING"}]},{"name":"TEST | Re-Pivoted","id":"8259db5f-ec87-4733-857e-54e2035bcc59","type":"PublishToVault","gui":{"x":600,"y":180},"dependsOn":["61fc10bf-a9bb-4d7c-9225-26dd73182000"],"removeByDefault":false,"notes":[],"dataSource":{"name":"TEST | Re-Pivoted"},"partitionIdColumns":[],"upsertColumns":[],"retainPartitionExpression":""},{"name":"Alter Columns","id":"d54e663d-44b0-49d8-8147-5d045d6d5c2f","type":"Metadata","gui":{"x":384,"y":180},"dependsOn":["ce463228-5895-4532-86cc-6e972eddbbf1"],"removeByDefault":false,"notes":[],"fields":[{"name":"Column Name","remove":true},{"name":"Stage Number","type":"LONG","remove":false}]}]}
3 - You unpivot the current data so you'll get entries for every single possible value column, making sure to keep all those that are meant to be repeated each time (in my example
-
thank you @DavidChurchman , @ArborRose , @marcel_luthi !
0
Categories
- All Categories
- 1.7K Product Ideas
- 1.7K Ideas Exchange
- 1.5K Connect
- 1.2K Connectors
- 292 Workbench
- 4 Cloud Amplifier
- 8 Federated
- 2.8K Transform
- 95 SQL DataFlows
- 600 Datasets
- 2.1K Magic ETL
- 3.7K Visualize
- 2.4K Charting
- 685 Beast Mode
- 43 App Studio
- 38 Variables
- 655 Automate
- 170 Apps
- 438 APIs & Domo Developer
- 42 Workflows
- 5 DomoAI
- 32 Predict
- 12 Jupyter Workspaces
- 20 R & Python Tiles
- 384 Distribute
- 110 Domo Everywhere
- 268 Scheduled Reports
- 6 Software Integrations
- 111 Manage
- 108 Governance & Security
- 8 Domo University
- 25 Product Releases
- Community Forums
- 39 Getting Started
- 29 Community Member Introductions
- 98 Community Announcements
- Domo Community Gallery
- 4.8K Archive