Trying to use Unpivot

shreeb
shreeb Member
edited March 2024 in Magic ETL

I have a dataset.

1 order has 10 stages. each stage has a schedule date, completed date and status.

1 order has 1 row and 30 columns.

I am trying to unpivot to 1 order having 10 rows and 6 columns. I used 3 different unpivot tiles for it.

I built out the ETL but not sure why its taking forever to run.

Best Answers

  • ArborRose
    ArborRose Coach
    Answer ✓

    @shreeyab - does the data look something like this…

    ** Was this post helpful? Click Agree or Like below. **
    ** Did this solve your problem? Accept it as a solution! **

  • DavidChurchman
    edited March 2024 Answer ✓

    I follow. Currently, your columns are something like:

    • Order Number
    • Schedule Date 1
    • Completed Date 1
    • Status 1
    • Schedule Date 2
    • Completed Date 2
    • Status 2
    • ….
    • Schedule Date 10
    • Completed Date 10
    • Status 10

    You want to unpivot so your columns are:

    • Order Number
    • Stage Number
    • Schedule Date
    • Completed Date
    • Status

    If un-pivot is giving you trouble, it wouldn't be that hard to do this manually (and maybe more computationally efficient? Not sure):

    • Select columns: Order Number, Schedule Date 1, Completed Date 1, Status 1; rename without 1 (if that's part of the column name)
    • Add a constant with the stage number
    • Do these two steps for each of the 10 groups of columns. You could select the two blocks and copy-paste, changing just which columns and stage.
    • Append the 10 stages

    Alternatively, you could do the dynamic unpivot once, selecting the Order ID as not pivoted, and something like "Metric" to store the column names and "Value" to store the values, so your three columns are:

    • Order ID
    • Metric (Schedule Date 1/Completed Date 1/Status 1…Schedule Date 10/Complete Date 10/Status 10)
    • Value

    Then you could use a formula tile or split tile, to split the Metric name and stage, so your four columns are:

    • Order ID
    • Stage (1, 2, …10)
    • Metric (Schedule Date, Completed Date, Status)
    • Value

    Then you could split that into three datasets using filter blocks: filter for metric=Schedule Date, metric=Completed Date, metric=Status)

    Then you could join those three filter tiles together using Order ID and Stage as keys, renaming the "Value" column to the metric (Schedule Date, Completed Date, Status)

    I had a similar ETL for a dataset with dozens of columns in pairs like "Question 1 TEXT", "Question 2 VALUE", "Question 3 TEXT", "Question 3 VALUE". I used a dynamic unpivot, then filtered for just the rows with TEXT and the rows with VALUE, then rejoined them, so I basically three columns; ID, TEXT, VALUE. It didn't take very long to run on a pretty big dataset.

    EDIT: Marcel's answer of using the Pivot after the dynamic un-pivot and split makes more sense than filter+join. I'd be curious to see if Marcel's method or the manual method I started this answer with runs faster. The advantage of the dynamic-unpivot vs. manual appends is that it would be dynamic to different numbers of stages. You could have 1 stage or 50, and the same method works.

    Please 💡/💖/👍/😊 this post if you read it and found it helpful.

    Please accept the answer if it solved your problem.

  • marcel_luthi
    marcel_luthi Coach
    edited March 2024 Answer ✓

    Perhaps I'm oversimplifying it, but you should be able to do this with just 4 tiles: a Dynamic Unpivot, a Formula, a Alter Columns and a Pivot one. As for performance I you'd need to test it out but in General this is what the ETL would look like:

    Input Dataset (you could have any number of Stage # columns for each metric):

    Output Dateset

    Explanation

    1. You unpivot the current data so you'll get entries for every single possible value column, making sure to keep all those that are meant to be repeated each time (in my example Id and Name)
    2. You take the column names (hopefully that follow an standard, else you'll need an additional alter tile before to rename your columns to a workable standard) and break them into the Stage being measured and the measure in this case, so my columns followed the format Stage # Measure, so I just took the second word after a space (#) and the third one (Measure) and stored them as their own columns.
    3. I removed the column that had the concatenated name of the original column and make sure that the stage is treated as a number.
    4. I pivot again based out of the common values (in my case Id, Name and the newly created Stage Number), use the Measure one as the labels for the columns and just created the mapping for the 3 possible values I care for (Scheduled, Completed and Status)

    In my case each order has only 3 rows since I only had 3 stages in my source, but new will be added depending on the columns you have at the source.

    Code (for copy paste so you only need to do small adjustments)

    {"contentType":"domo/dataflow-actions","data":[{"name":"TEST | Unpivot","id":"357de108-dccf-4135-8eaa-3ff3944f3d3d","type":"LoadFromVault","gui":{"x":36,"y":180,"color":3238043},"dependsOn":[],"removeByDefault":false,"notes":[],"dataSourceId":"7bd8e7a0-821c-4f3d-8754-eb04cf1fde9a","executeFlowWhenUpdated":false,"onlyLoadNewVersions":false,"columnSettings":{},"visiblePartitionColumn":null,"versionWindow":null},{"name":"Dynamic Unpivot","id":"0b6636c1-b3bb-40c4-b9d1-48e554a1afb6","type":"NormalizeAll","gui":{"x":174,"y":180},"dependsOn":["357de108-dccf-4135-8eaa-3ff3944f3d3d"],"removeByDefault":false,"notes":[],"idFields":["Id","Name"],"keyField":"Column Name","valueField":"Value"},{"name":"Split Column Names","id":"ce463228-5895-4532-86cc-6e972eddbbf1","type":"ExpressionEvaluator","gui":{"x":288,"y":180},"dependsOn":["0b6636c1-b3bb-40c4-b9d1-48e554a1afb6"],"removeByDefault":false,"notes":[],"expressions":[{"fieldName":"Stage Number","expression":"SPLIT_PART(Column Name,' ',2)","settings":null},{"fieldName":"Measure","expression":"SPLIT_PART(Column Name,' ',3)","settings":null}]},{"name":"Pivot","id":"61fc10bf-a9bb-4d7c-9225-26dd73182000","type":"Denormaliser","gui":{"x":480,"y":180},"dependsOn":["d54e663d-44b0-49d8-8147-5d045d6d5c2f"],"removeByDefault":false,"notes":[],"keyField":"Measure","group":[{"name":"Id"},{"name":"Name"},{"name":"Stage Number"}],"partitionedAggregation":false,"fields":[{"fieldName":"Value","keyValue":"Scheduled","targetName":"Scheduled","targetType":"STRING"},{"fieldName":"Value","keyValue":"Completed","targetName":"Completed","targetType":"STRING"},{"fieldName":"Value","keyValue":"Status","targetName":"Status","targetType":"STRING"}]},{"name":"TEST | Re-Pivoted","id":"8259db5f-ec87-4733-857e-54e2035bcc59","type":"PublishToVault","gui":{"x":600,"y":180},"dependsOn":["61fc10bf-a9bb-4d7c-9225-26dd73182000"],"removeByDefault":false,"notes":[],"dataSource":{"name":"TEST | Re-Pivoted"},"partitionIdColumns":[],"upsertColumns":[],"retainPartitionExpression":""},{"name":"Alter Columns","id":"d54e663d-44b0-49d8-8147-5d045d6d5c2f","type":"Metadata","gui":{"x":384,"y":180},"dependsOn":["ce463228-5895-4532-86cc-6e972eddbbf1"],"removeByDefault":false,"notes":[],"fields":[{"name":"Column Name","remove":true},{"name":"Stage Number","type":"LONG","remove":false}]}]}
    

Answers

  • marcel_luthi
    marcel_luthi Coach
    edited March 2024

    Time to run will depend a lot on the number of records you have on your original dataset and the operations being performed, while unpivot usually is usually a heavy operation, I'd like to know why 3 tiles? Can you share a sample (with mock data or sanitized) of what the Input looks like and what the output should be?

  • I dont know how to explain 🤐

  • Im unpivoting in parts

  • ArborRose
    ArborRose Coach
    Answer ✓

    @shreeyab - does the data look something like this…

    ** Was this post helpful? Click Agree or Like below. **
    ** Did this solve your problem? Accept it as a solution! **

  • DavidChurchman
    edited March 2024 Answer ✓

    I follow. Currently, your columns are something like:

    • Order Number
    • Schedule Date 1
    • Completed Date 1
    • Status 1
    • Schedule Date 2
    • Completed Date 2
    • Status 2
    • ….
    • Schedule Date 10
    • Completed Date 10
    • Status 10

    You want to unpivot so your columns are:

    • Order Number
    • Stage Number
    • Schedule Date
    • Completed Date
    • Status

    If un-pivot is giving you trouble, it wouldn't be that hard to do this manually (and maybe more computationally efficient? Not sure):

    • Select columns: Order Number, Schedule Date 1, Completed Date 1, Status 1; rename without 1 (if that's part of the column name)
    • Add a constant with the stage number
    • Do these two steps for each of the 10 groups of columns. You could select the two blocks and copy-paste, changing just which columns and stage.
    • Append the 10 stages

    Alternatively, you could do the dynamic unpivot once, selecting the Order ID as not pivoted, and something like "Metric" to store the column names and "Value" to store the values, so your three columns are:

    • Order ID
    • Metric (Schedule Date 1/Completed Date 1/Status 1…Schedule Date 10/Complete Date 10/Status 10)
    • Value

    Then you could use a formula tile or split tile, to split the Metric name and stage, so your four columns are:

    • Order ID
    • Stage (1, 2, …10)
    • Metric (Schedule Date, Completed Date, Status)
    • Value

    Then you could split that into three datasets using filter blocks: filter for metric=Schedule Date, metric=Completed Date, metric=Status)

    Then you could join those three filter tiles together using Order ID and Stage as keys, renaming the "Value" column to the metric (Schedule Date, Completed Date, Status)

    I had a similar ETL for a dataset with dozens of columns in pairs like "Question 1 TEXT", "Question 2 VALUE", "Question 3 TEXT", "Question 3 VALUE". I used a dynamic unpivot, then filtered for just the rows with TEXT and the rows with VALUE, then rejoined them, so I basically three columns; ID, TEXT, VALUE. It didn't take very long to run on a pretty big dataset.

    EDIT: Marcel's answer of using the Pivot after the dynamic un-pivot and split makes more sense than filter+join. I'd be curious to see if Marcel's method or the manual method I started this answer with runs faster. The advantage of the dynamic-unpivot vs. manual appends is that it would be dynamic to different numbers of stages. You could have 1 stage or 50, and the same method works.

    Please 💡/💖/👍/😊 this post if you read it and found it helpful.

    Please accept the answer if it solved your problem.

  • marcel_luthi
    marcel_luthi Coach
    edited March 2024 Answer ✓

    Perhaps I'm oversimplifying it, but you should be able to do this with just 4 tiles: a Dynamic Unpivot, a Formula, a Alter Columns and a Pivot one. As for performance I you'd need to test it out but in General this is what the ETL would look like:

    Input Dataset (you could have any number of Stage # columns for each metric):

    Output Dateset

    Explanation

    1. You unpivot the current data so you'll get entries for every single possible value column, making sure to keep all those that are meant to be repeated each time (in my example Id and Name)
    2. You take the column names (hopefully that follow an standard, else you'll need an additional alter tile before to rename your columns to a workable standard) and break them into the Stage being measured and the measure in this case, so my columns followed the format Stage # Measure, so I just took the second word after a space (#) and the third one (Measure) and stored them as their own columns.
    3. I removed the column that had the concatenated name of the original column and make sure that the stage is treated as a number.
    4. I pivot again based out of the common values (in my case Id, Name and the newly created Stage Number), use the Measure one as the labels for the columns and just created the mapping for the 3 possible values I care for (Scheduled, Completed and Status)

    In my case each order has only 3 rows since I only had 3 stages in my source, but new will be added depending on the columns you have at the source.

    Code (for copy paste so you only need to do small adjustments)

    {"contentType":"domo/dataflow-actions","data":[{"name":"TEST | Unpivot","id":"357de108-dccf-4135-8eaa-3ff3944f3d3d","type":"LoadFromVault","gui":{"x":36,"y":180,"color":3238043},"dependsOn":[],"removeByDefault":false,"notes":[],"dataSourceId":"7bd8e7a0-821c-4f3d-8754-eb04cf1fde9a","executeFlowWhenUpdated":false,"onlyLoadNewVersions":false,"columnSettings":{},"visiblePartitionColumn":null,"versionWindow":null},{"name":"Dynamic Unpivot","id":"0b6636c1-b3bb-40c4-b9d1-48e554a1afb6","type":"NormalizeAll","gui":{"x":174,"y":180},"dependsOn":["357de108-dccf-4135-8eaa-3ff3944f3d3d"],"removeByDefault":false,"notes":[],"idFields":["Id","Name"],"keyField":"Column Name","valueField":"Value"},{"name":"Split Column Names","id":"ce463228-5895-4532-86cc-6e972eddbbf1","type":"ExpressionEvaluator","gui":{"x":288,"y":180},"dependsOn":["0b6636c1-b3bb-40c4-b9d1-48e554a1afb6"],"removeByDefault":false,"notes":[],"expressions":[{"fieldName":"Stage Number","expression":"SPLIT_PART(Column Name,' ',2)","settings":null},{"fieldName":"Measure","expression":"SPLIT_PART(Column Name,' ',3)","settings":null}]},{"name":"Pivot","id":"61fc10bf-a9bb-4d7c-9225-26dd73182000","type":"Denormaliser","gui":{"x":480,"y":180},"dependsOn":["d54e663d-44b0-49d8-8147-5d045d6d5c2f"],"removeByDefault":false,"notes":[],"keyField":"Measure","group":[{"name":"Id"},{"name":"Name"},{"name":"Stage Number"}],"partitionedAggregation":false,"fields":[{"fieldName":"Value","keyValue":"Scheduled","targetName":"Scheduled","targetType":"STRING"},{"fieldName":"Value","keyValue":"Completed","targetName":"Completed","targetType":"STRING"},{"fieldName":"Value","keyValue":"Status","targetName":"Status","targetType":"STRING"}]},{"name":"TEST | Re-Pivoted","id":"8259db5f-ec87-4733-857e-54e2035bcc59","type":"PublishToVault","gui":{"x":600,"y":180},"dependsOn":["61fc10bf-a9bb-4d7c-9225-26dd73182000"],"removeByDefault":false,"notes":[],"dataSource":{"name":"TEST | Re-Pivoted"},"partitionIdColumns":[],"upsertColumns":[],"retainPartitionExpression":""},{"name":"Alter Columns","id":"d54e663d-44b0-49d8-8147-5d045d6d5c2f","type":"Metadata","gui":{"x":384,"y":180},"dependsOn":["ce463228-5895-4532-86cc-6e972eddbbf1"],"removeByDefault":false,"notes":[],"fields":[{"name":"Column Name","remove":true},{"name":"Stage Number","type":"LONG","remove":false}]}]}