Can't get Group & Join to Work

PJG · 2024-08-15T13:52:51+00:00

There was an error rendering this rich post.

I have a singular dataset that includes Live dynamic data, and historical fixed data. For each ProjectId, I need to SUM the PlannedCost of the historical data with a specific timestamp, and SUM the PlannedCost of Live data, so that I can compare the two in Analyzer.

Below is an example of the Input dataset for a single ProjectId, and how I'd like the Output dataset configured. In the Input dataset, the FinancialCategoryName is irrelevant to this, but I included it to show why there are multiple rows for each ProjectId

Group By

Join

What it's currently doing is it's only giving me the total for some of the Live column, and for none of the Nov 2023 column. Any ideas where I've gone wrong?

Thanks!

Quick Links

Accepted answers

rco

The issue is the comparison with that string, 'Nov 22, 2023 12:00:00 AM'. That comparison will first convert the left hand side to a string according to the Timestamp Format setting or the default timestamp format, which would be '2023-11-22T00:00:00'. Then it will perform the comparison on the strings, and they won't be equal (unless your timestamp format setting does match that format you used in the string constant, but I assume it doesn't).

This behavior is unique to Magic ETL. Most other SQL implementations will either produce an error when you attempt to compare a timestamp with a string, or convert the string to a timestamp for the comparison rather than the timestamp to a string.

I think what you probably want to do is compare these as Dates, which would look like this:

DATE(BatchTimestamp) = DATE('2023-11-22')

rco

What's happening there is pretty weird, I'll try to explain it:

When you write a Group By formula, Magic ETL allows you to reference columns both inside an aggregate function (like SUM()) and outside one. When you reference a column inside an aggregate, that reference is evaluated for every row to produce the input to that function. But when you reference a column outside an aggregate function, it is only evaluated once for every group, for the first value. What you have there is really this:

CASE WHEN FIRST_VALUE(RecordType) = 'Live' THEN SUM(PlannedCost) END

Because the first value of RecordType is actually Historical, this just evaluates to null.

What you really want is this:

SUM(CASE WHEN RecordType = 'Live' THEN PlannedCost END)

rco

Sure, add an ELSE 0 to the case statement.

All comments

ArborRose

I think we would have to see what you have in the group by. The group by paths are going to aggregate the left and right sides. Assuming what I show below in yellow is the Nov 2023 path, you would be summing the planned cost and grouping by the other fields. On the green path, you would be summing the same way. Your tiles would determine the fields.

PJG

Oops, sorry, I thought I had that in there. Added to my original post.

ArborRose

From what I can see, your output is already coming from your Group By path (left side of the join). Although I wouldn't put the condition the way you have it. I would compare the date only and knock off the timestamp. Your bottom branch does not appear to be doing anything.

PJG

I believe I need the timestamp as there are other RecordType = Historic with different BatchTimestamps. In this case, I only care about the Nov 22, 2023 12:00:00 AM one.

Not following what you mean by bottom branch not doing anything… it's needed to add back in some extra columns that I need that were removed with the Group…. or is there a better way to do this?

Thanks!

ArborRose

"Not following what you mean by bottom branch not doing anything…"

Sorry, I don't see any additional columns in your screenshots. Therefore, I couldn't see it doing anything beneficial.

PJG

Ah, understood. Yes, my output screenshot is just a simplified version of what I need. Adding in my extra columns is working as intended.

The issue is just that my output is only giving me the total for some of the Live column, and for none of the Nov 2023 column. Not sure if it's an issue with my Beast Mode, or if it's the join or something else.

rco

PJG

Hi @rco, thank you! Progress!

The Nov 2023 Planned Cost column is now populated for all projects with a Historic timestamp of 2023-11-22, which it wasn't before! Thank you!

However, it is not populating the Live Planned Cost column if the Nov 2023 Planned Cost column is already populated. I.e. only one of the two columns are populated. In the input dataset, many projects do have both Historical Nov 2023 PlannedCosts and Live PlannedCosts.

rco

I think you're saying that the case statement matching in the first formula is affecting the second formula somehow. I'm not sure how that could happen; they should be independent of each other.

However, I do notice that you don't have a SUM() function wrapping your formulas, and it seems like you would want one. Could that be the cause of unexpected results?

PJG

I did have SUM() initially, but I read elsewhere that it wasn't needed, and Group would SUM it without that, so took it out. However, tried adding it back in, and same output unfortunately.

To troubleshoot, I tried temporarily changing the Group to only include the Live code:

I then did Run Preview and checked the dataset on the Group By. It does indeed show no values for the entire Live Planned Cost column.

It should have values there. For example first row from above, ProjectId 2751 has a Planned Cost of 50,000.00 per Input dataset:

rco

PJG

Randall, thank you so much for all your help on this. Understood, and perfectly explained. I realized I needed to wrap my other code like that too, and it is now working perfectly! Any way to force blanks to appear as 0 in the output dataset?

rco

Sure, add an ELSE 0 to the case statement.

PJG

Beautiful, thank you so much again!

PJG

https://community-forums.domo.com/main/discussion/comment/97826#Comment_97826

Hi Randall, please let me trouble you for one more question on this.

I have another column, Yearly Total RUN Costs (CHF)_p. It's the same for every row, so I don't want to SUM it, but I do want it displayed in the output dataset, and I again need to distinguish between Live and Historic/ 2023-11-22.

Because I'm not using a SUM to wrap it around, how do I avoid it looking only at the first row? I thought I could use MAX() since all values are the same per ProjectId, but within Add Formula, this does not seem to be possible:

rco

To be clear, you just want the maximum (or only) value of "Yearly Total RUN Costs (CHF)_p" where BatchTimestamp is November 22nd 2023 to show up on every row of the dataset?

You'll have to use a Group By instead of an Add Formula since it's an aggregate function. Depending on your Domo version, it may allow you to specify no grouping columns. If it forces you to specify a grouping column, you could add a constant zero and group by that. This will produce just one row with the desired maximum value. Then you join that row to your original dataset. Again, depending on your Domo version it may allow you to join without specifying any join keys, but if it requires join keys then simply add constant zeroes to both sides of the join and join on those.

PJG

How can I check my DOMO version? There's a few things I don't have access, as it's a vendor controlled layer over another system that we use, and I know some features are restricted (Buzz for instance).

Yes, for "Yearly Total RUN Costs (CHF)_p", I just need to show the only value per project for Live and only value per project where BatchTimestamp is November 22nd 2023.

Could I please trouble you to elaborate on how to set this up "If it forces you to specify a grouping column, you could add a constant zero and group by that"

rco

I'm referring to the first configuration step of the Group By, which in my Domo version looks like this:

Here you really want to put nothing, since you just want to do the MAX() across the entire dataset. But it may not let you put nothing, so we put a constant value 0 instead. i.e. add an Add Constants tile before the Group By tile that adds a constant named "const" with value 0 and reference that here in the Group By tile. Then, put your MAX() formula in for the second step of the Group By configuration, and finally do they join to the original source.

With the join, there's a very similar problem. The join requires you to configure these join keys in step 2:

But since you want to join your one Group By output row with every row on the other side of the join, you really want to put nothing at all, to do a "Cross Join". If it doesn't let you do that (it probably won't unless you're enrolled in a very recent beta), you have to add a constant 0 as above to both sides of the join, and join on that. You can drop the constant immediately from both sides in the same join tile.

PJG

Maybe I was making this too complicated, or explained it poorly, but I tried just adding this to the existing Group by before getting to your reply, and this looks to be working.

rco

This will give you the maximum/only matching value within each ProjectId. The other solution was for the max/only across the whole dataset. If the value you want is within each unique ProjectId then this is your solution.

PJG

Hi yes, per ProjectId is what I'm looking for, and again, sorry if I explained that poorly! Each ProjectId could appear in the dataset dozens of times, but each ProjectId will have the same Yearly Total RUN Costs for Live and the same Yearly Total RUN Costs for Nov 2023, so I only needed to identify a value for each ProjectId and make a Nov 2023 column and a Live column

Everything you've helped me with today is enough knowledge gained to build more ETL, more datasets, and many more cards. Thank you!

And I appreciate you replying to my other topic also; I will try to tackle that tomorrow, but I will no doubt have more questions, as there are many things there I haven't used before; only started using ETL this month! :)

other categories

Product Ideas
Have a Domo product enhancement idea? Submit or upvote on ideas in the Ideas Exchange.
Ideas Exchange
Suggest & vote on new features you would like to see implemented in the Domo Product.
Data Connections
Ask questions about Connectors, Workbench, Cloud Amplifier and get best practices from Domo peers
Connectors
Connectors, Custom Connectors, Writeback
Workbench
Ask questions about Workbench, a secure, client-side solution for uploading your on-premise data to Domo.
Cloud Integrations
Ask questions about Cloud Integrations and Federated Data connection to your data warehouse or lake.
Data & ETL
Ask questions about Magic ETL, SQL DataFlows, DataFusion, Dataset Views and get best practices from Domo peers
Magic ETL
Ask Magic ETL questions and get answers from Domo peers
SQL DataFlows
Ask SQL DataFlow questions and get answers from Domo peers
Datasets
Ask DataFusion and Dataset Views questions and get answers from Domo peers
Visualize & Apps
Ask questions about Beast Mode, Cards, Charting, Dashboards, Stories, Variables and get best practices from Domo peers
Dashboards
Ask Cards, Dashboards, and Stories questions and get answers from Domo peers
App Studio
Ask questions about building apps in App Studio.
Pro-code Components
Ask questions about pro-code components and Domo Bricks and get answers from Domo peers.
Charting & Analyzer
Ask questions about charting and Analyzer and get answers from Domo peers.
Calculations & Variables (Beast Mode)
Ask questions about using calculated fields and Variables (Beast Modes) in Analyzer.
AI & Data science
Ask questions about DomoAI and get answers from Domo peers.
Domo AI & AI Chat
Ask questions about AI Chat and AI assistants.
Managing AI
Ask questions about managing AI with AI Playground, AI projects, AI models, and more.
Jupyter Workspaces
Ask questions about Jupyter Workspaces, Notebooks, and file share.
Automate
Ask questions about App Framework, Workflows, Domo Bricks, Domo Developer, API and get best practices from Domo peers
Workflows
Ask questions about Task Center, building automations with Domo Workflows, and executing JavaScript or Python code with Code Engine.
Alerts
Ask questions about managing alerts in Domo and get answers from Domo peers.
Distribute
Ask questions about Domo Everywhere, Scheduled Reports, Mobile and get best practices from Domo peers
Domo Everywhere
Ask questions about embedded analytics with Domo Everywhere.
Reporting
Ask questions about Scheduled Reports, Report Builder, and Slideshow Publications.
Manage
Ask questions about Governance Administration, Approvals, Teams, Alerts, and Buzz and get best practices from Domo peers
Governance & Security
Ask questions about People, Groups, Roles, Sandbox, Activity log, Buzz, Teams, Approvals and PDP and get best practices from Domo peers
Navigation & Productivity
Ask questions about navigation, Projects & Tasks, Goals, and Buzz chat.
APIs
Ask APIs and Developer.domo.com questions and get answers from Domo peers
Add-ins & Plugins
Ask questions about plugins, Microsoft add-ins, and other third-party software integrations.
Domo Community Gallery
Watch how our Customers are using Domo to solve their complex problems.
Product Releases
Domo support and product teams are here to live-answer questions about the most recent product releases. Please post questions in this Forum board for all users to benefit (rather than submitting a support ticket).
Domo University
Questions or discussions related to Domo University, trainings and certifications
Community Forums
Getting Started
Welcome to Domo's Community Forums! You'll find everything you need to get started in this category.
Community Announcements
Get the latest from Domo's Community Team.
Social Groups
Archive
Old or outdated content that could still be found helpful.