How to Join two Group By without matching pair column

PJG
PJG Member

I think the answer to the title is you can't, but to get to the point, I'm trying to go from this input to this output:

A Group by on Requestor Organization, and another on Phase works as intended on the Preview, and I am then doing a Pivot on the Requestor Organization to turn it into columns, but I'm still not sure how to merge them back together in the output dataset format above.

Best Answers

  • rco
    rco Domo Employee
    edited August 15 Answer ✓

    Unfortunately there's currently no way to do this in a way that will dynamically adapt to the actual set of unique Requestor Organizations. If you can tolerate enumerating the Requestor Organizations explicitly, then this solution works:

    In the "upper" section, all we're doing is creating a cross-join table of all the possible combinations of Requestor Organization and Phase, with a fake Project Id of -1. We then append this fake data onto our real data. This ensures that in the Group By, where we group by Requestor Organization+Phase, we end up with a group for every combination, even those combinations that didn't actually appear in the input (e.g. Phase=Cancelled, Requestor Organization=DES).

    Here's what the preview data should look like at the Project ID Const tile:

    And, in case it's helpful, here is what the Cross Join configuration looks like:

    (As I'm writing this, it occurs to me that we could save a tile by just using the same constant for both the cross join key and the dummy Project Id.)

    The aggregation we use is Count Distinct on the Project ID column, and we call it Project Count. Then, to correct for our dummy -1 Project Id, we just subtract 1 from the Project Count using an Add Formula tile. Finally, we pivot.

    Here is the configuration for the Group By:

    And the Pivot:

    The pivot is where we have to enumerate all the Requestor Organizations, as I mentioned. This is unfortunate because any changes to the set of Requestor Organizations will require modifications to the flow.

    Randall Oveson <randall.oveson@domo.com>

  • rco
    rco Domo Employee
    edited August 16 Answer ✓

    The Add Constant that comes right before the Append should add a constant "Project Id" with value -1. The goal is to create a table that has three columns that all correspond to columns in the original table, Requestor Organization, Project Id, and Phase.

    You've got everything up to the Join correct, that should be giving you a table of every possible unique combination of Req Org and Phase. Now we just need to give it a Project Id, and we need it to be obviously fake (not collide with any real Project Id) so we use -1. You could even omit the Add Constant before the Append and just use one of the -1 constants that you used for the Join, but you would have to make sure it was called Project Id so it Appends properly to that column in the original table (remember, Magic ETL Appends work off of column names, not column positions). But the way I have done it in my original example is with that Add Constants tile called ProjectID Const. When you've got that in, you should be able to see the a preview of that tile's data that looks like the screenshot from my first comment:



    ProjectID Const configuration:


    Append Rows configuration:

    Note that yours may not say "No changes", because your real data might have more columns than just those three we've been talking about. The important thing is that it doesn't show any columns that will be dropped, or any columns added to the original input.

    Randall Oveson <randall.oveson@domo.com>

  • rco
    rco Domo Employee
    Answer ✓

    Also, since this is missing from my original comment, here is the configuration for the "Correct for Dummy" Add Formula tile:

    Randall Oveson <randall.oveson@domo.com>

  • PJG
    PJG Member
    edited August 16 Answer ✓

    Thank you! I did figure out the formula piece already, and have used this in a couple of cards!

    In one card, I need to group some of the phases, so wrote a BM for this. Small exert:

    when Phase_p = 'A2 - Assessment Backlog' then 'Planned'
    when Phase_p = 'A3 - Ready for IT PB' then 'Planned'
    when Phase_p = 'E0 - Pending Execution Start' then 'Started'
    when Phase_p = 'E1 - Preparation' then 'Started'

    However, in the card table, they still appear as separate rows, and in the column aggregation options, I only have None or SUM.

    This is something I've done many times for many attributes, but is it not possible here with the way the dataset is configured? Instead I'd have to do the grouping at the start of the ETL?

    Thanks!

  • rco
    rco Domo Employee
    Answer ✓

    You could add phase totals by adding another branch of the original input with its own Group By, grouping by only Phase and doing a Count Distinct on Project Id called Total Project Count, and then Join that branch back in to your final result using Phase on both sides as the join key.

    Randall Oveson <randall.oveson@domo.com>

Answers

  • rco
    rco Domo Employee
    edited August 15 Answer ✓

    Unfortunately there's currently no way to do this in a way that will dynamically adapt to the actual set of unique Requestor Organizations. If you can tolerate enumerating the Requestor Organizations explicitly, then this solution works:

    In the "upper" section, all we're doing is creating a cross-join table of all the possible combinations of Requestor Organization and Phase, with a fake Project Id of -1. We then append this fake data onto our real data. This ensures that in the Group By, where we group by Requestor Organization+Phase, we end up with a group for every combination, even those combinations that didn't actually appear in the input (e.g. Phase=Cancelled, Requestor Organization=DES).

    Here's what the preview data should look like at the Project ID Const tile:

    And, in case it's helpful, here is what the Cross Join configuration looks like:

    (As I'm writing this, it occurs to me that we could save a tile by just using the same constant for both the cross join key and the dummy Project Id.)

    The aggregation we use is Count Distinct on the Project ID column, and we call it Project Count. Then, to correct for our dummy -1 Project Id, we just subtract 1 from the Project Count using an Add Formula tile. Finally, we pivot.

    Here is the configuration for the Group By:

    And the Pivot:

    The pivot is where we have to enumerate all the Requestor Organizations, as I mentioned. This is unfortunate because any changes to the set of Requestor Organizations will require modifications to the flow.

    Randall Oveson <randall.oveson@domo.com>

  • PJG
    PJG Member

    Hi Randall,

    Thank you for taking the time to put together this incredibly detailed explanation.

    I think I've followed successfully up to the ProjectId Const, and I think I understand the configuration for all but two of the tiles. Could I trouble you to share the configuration for the ProjectId Const and for the Append please?

    Thanks!

  • PJG
    PJG Member

    Picture heavy, but step by step, here's what I have:

    (Same config for Org)

    Like I said, my Add Constants and Append…. and maybe formula are where I'm not sure I've configured them correctly.

    Thanks!

  • rco
    rco Domo Employee
    edited August 16 Answer ✓

    The Add Constant that comes right before the Append should add a constant "Project Id" with value -1. The goal is to create a table that has three columns that all correspond to columns in the original table, Requestor Organization, Project Id, and Phase.

    You've got everything up to the Join correct, that should be giving you a table of every possible unique combination of Req Org and Phase. Now we just need to give it a Project Id, and we need it to be obviously fake (not collide with any real Project Id) so we use -1. You could even omit the Add Constant before the Append and just use one of the -1 constants that you used for the Join, but you would have to make sure it was called Project Id so it Appends properly to that column in the original table (remember, Magic ETL Appends work off of column names, not column positions). But the way I have done it in my original example is with that Add Constants tile called ProjectID Const. When you've got that in, you should be able to see the a preview of that tile's data that looks like the screenshot from my first comment:



    ProjectID Const configuration:


    Append Rows configuration:

    Note that yours may not say "No changes", because your real data might have more columns than just those three we've been talking about. The important thing is that it doesn't show any columns that will be dropped, or any columns added to the original input.

    Randall Oveson <randall.oveson@domo.com>

  • rco
    rco Domo Employee
    Answer ✓

    Also, since this is missing from my original comment, here is the configuration for the "Correct for Dummy" Add Formula tile:

    Randall Oveson <randall.oveson@domo.com>

  • PJG
    PJG Member
    edited August 16 Answer ✓

    Thank you! I did figure out the formula piece already, and have used this in a couple of cards!

    In one card, I need to group some of the phases, so wrote a BM for this. Small exert:

    when Phase_p = 'A2 - Assessment Backlog' then 'Planned'
    when Phase_p = 'A3 - Ready for IT PB' then 'Planned'
    when Phase_p = 'E0 - Pending Execution Start' then 'Started'
    when Phase_p = 'E1 - Preparation' then 'Started'

    However, in the card table, they still appear as separate rows, and in the column aggregation options, I only have None or SUM.

    This is something I've done many times for many attributes, but is it not possible here with the way the dataset is configured? Instead I'd have to do the grouping at the start of the ETL?

    Thanks!

  • PJG
    PJG Member

    Did exactly what I was thinking with the ETL, and it works perfectly:

  • PJG
    PJG Member
    edited August 16

    Hi Randall,

    One more on this, as I think it's related to the format of the output dataset

    I was trying to add Project Count as the Summary. I did a simple Beast Mode:

    Biologics+CGT+CHI+China+Communication+Finance+HR+Group Operations+IT+Legal, Risk, and Responsibility+Quality+Small Molecules

    However, this was showing a 0. I could only get it to work if I excluded the columns for which there were no values. i.e. HR has no requestor organizations, so I had to remove the +HR piece. Is it possible to accommodate this without changing the Beast Mode?

    Maybe also a question on whether it's possible to add a total column to this with a similar approach.

    Thanks!

  • rco
    rco Domo Employee
    Answer ✓

    You could add phase totals by adding another branch of the original input with its own Group By, grouping by only Phase and doing a Count Distinct on Project Id called Total Project Count, and then Join that branch back in to your final result using Phase on both sides as the join key.

    Randall Oveson <randall.oveson@domo.com>