Trying to use Unpivot

shreeb · 2024-03-14T19:10:35+00:00

There was an error rendering this rich post.

shreeb

I have a dataset.

1 order has 10 stages. each stage has a schedule date, completed date and status.

1 order has 1 row and 30 columns.

I am trying to unpivot to 1 order having 10 rows and 6 columns. I used 3 different unpivot tiles for it.

I built out the ETL but not sure why its taking forever to run.

Find more posts tagged with

Accepted answers

ArborRose

@shreeyab - does the data look something like this…

DavidChurchman

I follow. Currently, your columns are something like:

Order Number
Schedule Date 1
Completed Date 1
Status 1
Schedule Date 2
Completed Date 2
Status 2
….
Schedule Date 10
Completed Date 10
Status 10

You want to unpivot so your columns are:

Order Number
Stage Number
Schedule Date
Completed Date
Status

If un-pivot is giving you trouble, it wouldn't be that hard to do this manually (and maybe more computationally efficient? Not sure):

Select columns: Order Number, Schedule Date 1, Completed Date 1, Status 1; rename without 1 (if that's part of the column name)
Add a constant with the stage number
Do these two steps for each of the 10 groups of columns. You could select the two blocks and copy-paste, changing just which columns and stage.
Append the 10 stages

Alternatively, you could do the dynamic unpivot once, selecting the Order ID as not pivoted, and something like "Metric" to store the column names and "Value" to store the values, so your three columns are:

Order ID
Metric (Schedule Date 1/Completed Date 1/Status 1…Schedule Date 10/Complete Date 10/Status 10)
Value

Then you could use a formula tile or split tile, to split the Metric name and stage, so your four columns are:

Order ID
Stage (1, 2, …10)
Metric (Schedule Date, Completed Date, Status)
Value

Then you could split that into three datasets using filter blocks: filter for metric=Schedule Date, metric=Completed Date, metric=Status)

Then you could join those three filter tiles together using Order ID and Stage as keys, renaming the "Value" column to the metric (Schedule Date, Completed Date, Status)

I had a similar ETL for a dataset with dozens of columns in pairs like "Question 1 TEXT", "Question 2 VALUE", "Question 3 TEXT", "Question 3 VALUE". I used a dynamic unpivot, then filtered for just the rows with TEXT and the rows with VALUE, then rejoined them, so I basically three columns; ID, TEXT, VALUE. It didn't take very long to run on a pretty big dataset.

EDIT: Marcel's answer of using the Pivot after the dynamic un-pivot and split makes more sense than filter+join. I'd be curious to see if Marcel's method or the manual method I started this answer with runs faster. The advantage of the dynamic-unpivot vs. manual appends is that it would be dynamic to different numbers of stages. You could have 1 stage or 50, and the same method works.

marcel_luthi

Perhaps I'm oversimplifying it, but you should be able to do this with just 4 tiles: a Dynamic Unpivot, a Formula, a Alter Columns and a Pivot one. As for performance I you'd need to test it out but in General this is what the ETL would look like:

Input Dataset (you could have any number of Stage # columns for each metric):

Output Dateset

Explanation

You unpivot the current data so you'll get entries for every single possible value column, making sure to keep all those that are meant to be repeated each time (in my example Id and Name)
You take the column names (hopefully that follow an standard, else you'll need an additional alter tile before to rename your columns to a workable standard) and break them into the Stage being measured and the measure in this case, so my columns followed the format Stage # Measure, so I just took the second word after a space (#) and the third one (Measure) and stored them as their own columns.
I removed the column that had the concatenated name of the original column and make sure that the stage is treated as a number.
I pivot again based out of the common values (in my case Id, Name and the newly created Stage Number), use the Measure one as the labels for the columns and just created the mapping for the 3 possible values I care for (Scheduled, Completed and Status)

In my case each order has only 3 rows since I only had 3 stages in my source, but new will be added depending on the columns you have at the source.

Code (for copy paste so you only need to do small adjustments)

{"contentType":"domo/dataflow-actions","data":[{"name":"TEST | Unpivot","id":"357de108-dccf-4135-8eaa-3ff3944f3d3d","type":"LoadFromVault","gui":{"x":36,"y":180,"color":3238043},"dependsOn":[],"removeByDefault":false,"notes":[],"dataSourceId":"7bd8e7a0-821c-4f3d-8754-eb04cf1fde9a","executeFlowWhenUpdated":false,"onlyLoadNewVersions":false,"columnSettings":{},"visiblePartitionColumn":null,"versionWindow":null},{"name":"Dynamic Unpivot","id":"0b6636c1-b3bb-40c4-b9d1-48e554a1afb6","type":"NormalizeAll","gui":{"x":174,"y":180},"dependsOn":["357de108-dccf-4135-8eaa-3ff3944f3d3d"],"removeByDefault":false,"notes":[],"idFields":["Id","Name"],"keyField":"Column Name","valueField":"Value"},{"name":"Split Column Names","id":"ce463228-5895-4532-86cc-6e972eddbbf1","type":"ExpressionEvaluator","gui":{"x":288,"y":180},"dependsOn":["0b6636c1-b3bb-40c4-b9d1-48e554a1afb6"],"removeByDefault":false,"notes":[],"expressions":[{"fieldName":"Stage Number","expression":"SPLIT_PART(Column Name,' ',2)","settings":null},{"fieldName":"Measure","expression":"SPLIT_PART(Column Name,' ',3)","settings":null}]},{"name":"Pivot","id":"61fc10bf-a9bb-4d7c-9225-26dd73182000","type":"Denormaliser","gui":{"x":480,"y":180},"dependsOn":["d54e663d-44b0-49d8-8147-5d045d6d5c2f"],"removeByDefault":false,"notes":[],"keyField":"Measure","group":[{"name":"Id"},{"name":"Name"},{"name":"Stage Number"}],"partitionedAggregation":false,"fields":[{"fieldName":"Value","keyValue":"Scheduled","targetName":"Scheduled","targetType":"STRING"},{"fieldName":"Value","keyValue":"Completed","targetName":"Completed","targetType":"STRING"},{"fieldName":"Value","keyValue":"Status","targetName":"Status","targetType":"STRING"}]},{"name":"TEST | Re-Pivoted","id":"8259db5f-ec87-4733-857e-54e2035bcc59","type":"PublishToVault","gui":{"x":600,"y":180},"dependsOn":["61fc10bf-a9bb-4d7c-9225-26dd73182000"],"removeByDefault":false,"notes":[],"dataSource":{"name":"TEST | Re-Pivoted"},"partitionIdColumns":[],"upsertColumns":[],"retainPartitionExpression":""},{"name":"Alter Columns","id":"d54e663d-44b0-49d8-8147-5d045d6d5c2f","type":"Metadata","gui":{"x":384,"y":180},"dependsOn":["ce463228-5895-4532-86cc-6e972eddbbf1"],"removeByDefault":false,"notes":[],"fields":[{"name":"Column Name","remove":true},{"name":"Stage Number","type":"LONG","remove":false}]}]}

All comments

marcel_luthi

Time to run will depend a lot on the number of records you have on your original dataset and the operations being performed, while unpivot usually is usually a heavy operation, I'd like to know why 3 tiles? Can you share a sample (with mock data or sanitized) of what the Input looks like and what the output should be?

shreeb

I dont know how to explain 🤐

shreeb

Im unpivoting in parts

ArborRose

@shreeyab - does the data look something like this…

DavidChurchman

I follow. Currently, your columns are something like:

Order Number
Schedule Date 1
Completed Date 1
Status 1
Schedule Date 2
Completed Date 2
Status 2
….
Schedule Date 10
Completed Date 10
Status 10

You want to unpivot so your columns are:

Order Number
Stage Number
Schedule Date
Completed Date
Status

If un-pivot is giving you trouble, it wouldn't be that hard to do this manually (and maybe more computationally efficient? Not sure):

Select columns: Order Number, Schedule Date 1, Completed Date 1, Status 1; rename without 1 (if that's part of the column name)
Add a constant with the stage number
Do these two steps for each of the 10 groups of columns. You could select the two blocks and copy-paste, changing just which columns and stage.
Append the 10 stages

Order ID
Metric (Schedule Date 1/Completed Date 1/Status 1…Schedule Date 10/Complete Date 10/Status 10)
Value

Then you could use a formula tile or split tile, to split the Metric name and stage, so your four columns are:

Order ID
Stage (1, 2, …10)
Metric (Schedule Date, Completed Date, Status)
Value

Then you could split that into three datasets using filter blocks: filter for metric=Schedule Date, metric=Completed Date, metric=Status)

Then you could join those three filter tiles together using Order ID and Stage as keys, renaming the "Value" column to the metric (Schedule Date, Completed Date, Status)

marcel_luthi

Input Dataset (you could have any number of Stage # columns for each metric):

Output Dateset

Explanation

You unpivot the current data so you'll get entries for every single possible value column, making sure to keep all those that are meant to be repeated each time (in my example Id and Name)
You take the column names (hopefully that follow an standard, else you'll need an additional alter tile before to rename your columns to a workable standard) and break them into the Stage being measured and the measure in this case, so my columns followed the format Stage # Measure, so I just took the second word after a space (#) and the third one (Measure) and stored them as their own columns.
I removed the column that had the concatenated name of the original column and make sure that the stage is treated as a number.
I pivot again based out of the common values (in my case Id, Name and the newly created Stage Number), use the Measure one as the labels for the columns and just created the mapping for the 3 possible values I care for (Scheduled, Completed and Status)

In my case each order has only 3 rows since I only had 3 stages in my source, but new will be added depending on the columns you have at the source.

Code (for copy paste so you only need to do small adjustments)

{"contentType":"domo/dataflow-actions","data":[{"name":"TEST | Unpivot","id":"357de108-dccf-4135-8eaa-3ff3944f3d3d","type":"LoadFromVault","gui":{"x":36,"y":180,"color":3238043},"dependsOn":[],"removeByDefault":false,"notes":[],"dataSourceId":"7bd8e7a0-821c-4f3d-8754-eb04cf1fde9a","executeFlowWhenUpdated":false,"onlyLoadNewVersions":false,"columnSettings":{},"visiblePartitionColumn":null,"versionWindow":null},{"name":"Dynamic Unpivot","id":"0b6636c1-b3bb-40c4-b9d1-48e554a1afb6","type":"NormalizeAll","gui":{"x":174,"y":180},"dependsOn":["357de108-dccf-4135-8eaa-3ff3944f3d3d"],"removeByDefault":false,"notes":[],"idFields":["Id","Name"],"keyField":"Column Name","valueField":"Value"},{"name":"Split Column Names","id":"ce463228-5895-4532-86cc-6e972eddbbf1","type":"ExpressionEvaluator","gui":{"x":288,"y":180},"dependsOn":["0b6636c1-b3bb-40c4-b9d1-48e554a1afb6"],"removeByDefault":false,"notes":[],"expressions":[{"fieldName":"Stage Number","expression":"SPLIT_PART(Column Name,' ',2)","settings":null},{"fieldName":"Measure","expression":"SPLIT_PART(Column Name,' ',3)","settings":null}]},{"name":"Pivot","id":"61fc10bf-a9bb-4d7c-9225-26dd73182000","type":"Denormaliser","gui":{"x":480,"y":180},"dependsOn":["d54e663d-44b0-49d8-8147-5d045d6d5c2f"],"removeByDefault":false,"notes":[],"keyField":"Measure","group":[{"name":"Id"},{"name":"Name"},{"name":"Stage Number"}],"partitionedAggregation":false,"fields":[{"fieldName":"Value","keyValue":"Scheduled","targetName":"Scheduled","targetType":"STRING"},{"fieldName":"Value","keyValue":"Completed","targetName":"Completed","targetType":"STRING"},{"fieldName":"Value","keyValue":"Status","targetName":"Status","targetType":"STRING"}]},{"name":"TEST | Re-Pivoted","id":"8259db5f-ec87-4733-857e-54e2035bcc59","type":"PublishToVault","gui":{"x":600,"y":180},"dependsOn":["61fc10bf-a9bb-4d7c-9225-26dd73182000"],"removeByDefault":false,"notes":[],"dataSource":{"name":"TEST | Re-Pivoted"},"partitionIdColumns":[],"upsertColumns":[],"retainPartitionExpression":""},{"name":"Alter Columns","id":"d54e663d-44b0-49d8-8147-5d045d6d5c2f","type":"Metadata","gui":{"x":384,"y":180},"dependsOn":["ce463228-5895-4532-86cc-6e972eddbbf1"],"removeByDefault":false,"notes":[],"fields":[{"name":"Column Name","remove":true},{"name":"Stage Number","type":"LONG","remove":false}]}]}

shreeb

thank you @DavidChurchman , @ArborRose , @marcel_luthi !