Can't get Group & Join to Work

I have a singular dataset that includes Live dynamic data, and historical fixed data. For each ProjectId, I need to SUM the PlannedCost of the historical data with a specific timestamp, and SUM the PlannedCost of Live data, so that I can compare the two in Analyzer.

Below is an example of the Input dataset for a single ProjectId, and how I'd like the Output dataset configured. In the Input dataset, the FinancialCategoryName is irrelevant to this, but I included it to show why there are multiple rows for each ProjectId

Group By

Join

What it's currently doing is it's only giving me the total for some of the Live column, and for none of the Nov 2023 column. Any ideas where I've gone wrong?

Thanks!

Quick Links

Accepted answers

rco

The issue is the comparison with that string, 'Nov 22, 2023 12:00:00 AM'. That comparison will first convert the left hand side to a string according to the Timestamp Format setting or the default timestamp format, which would be '2023-11-22T00:00:00'. Then it will perform the comparison on the strings, and they won't be equal (unless your timestamp format setting does match that format you used in the string constant, but I assume it doesn't).

This behavior is unique to Magic ETL. Most other SQL implementations will either produce an error when you attempt to compare a timestamp with a string, or convert the string to a timestamp for the comparison rather than the timestamp to a string.

I think what you probably want to do is compare these as Dates, which would look like this:

DATE(BatchTimestamp) = DATE('2023-11-22')

rco

What's happening there is pretty weird, I'll try to explain it:

When you write a Group By formula, Magic ETL allows you to reference columns both inside an aggregate function (like SUM()) and outside one. When you reference a column inside an aggregate, that reference is evaluated for every row to produce the input to that function. But when you reference a column outside an aggregate function, it is only evaluated once for every group, for the first value. What you have there is really this:

CASE WHEN FIRST_VALUE(RecordType) = 'Live' THEN SUM(PlannedCost) END

Because the first value of RecordType is actually Historical, this just evaluates to null.

What you really want is this:

SUM(CASE WHEN RecordType = 'Live' THEN PlannedCost END)

rco

Sure, add an ELSE 0 to the case statement.

All comments

ArborRose

I think we would have to see what you have in the group by. The group by paths are going to aggregate the left and right sides. Assuming what I show below in yellow is the Nov 2023 path, you would be summing the planned cost and grouping by the other fields. On the green path, you would be summing the same way. Your tiles would determine the fields.

PJG

Oops, sorry, I thought I had that in there. Added to my original post.

ArborRose

From what I can see, your output is already coming from your Group By path (left side of the join). Although I wouldn't put the condition the way you have it. I would compare the date only and knock off the timestamp. Your bottom branch does not appear to be doing anything.

PJG

I believe I need the timestamp as there are other RecordType = Historic with different BatchTimestamps. In this case, I only care about the Nov 22, 2023 12:00:00 AM one.

Not following what you mean by bottom branch not doing anything… it's needed to add back in some extra columns that I need that were removed with the Group…. or is there a better way to do this?

Thanks!

ArborRose

"Not following what you mean by bottom branch not doing anything…"

Sorry, I don't see any additional columns in your screenshots. Therefore, I couldn't see it doing anything beneficial.

PJG

Ah, understood. Yes, my output screenshot is just a simplified version of what I need. Adding in my extra columns is working as intended.

The issue is just that my output is only giving me the total for some of the Live column, and for none of the Nov 2023 column. Not sure if it's an issue with my Beast Mode, or if it's the join or something else.

rco

PJG

Hi @rco, thank you! Progress!

The Nov 2023 Planned Cost column is now populated for all projects with a Historic timestamp of 2023-11-22, which it wasn't before! Thank you!

However, it is not populating the Live Planned Cost column if the Nov 2023 Planned Cost column is already populated. I.e. only one of the two columns are populated. In the input dataset, many projects do have both Historical Nov 2023 PlannedCosts and Live PlannedCosts.

rco

I think you're saying that the case statement matching in the first formula is affecting the second formula somehow. I'm not sure how that could happen; they should be independent of each other.

However, I do notice that you don't have a SUM() function wrapping your formulas, and it seems like you would want one. Could that be the cause of unexpected results?

PJG

I did have SUM() initially, but I read elsewhere that it wasn't needed, and Group would SUM it without that, so took it out. However, tried adding it back in, and same output unfortunately.

To troubleshoot, I tried temporarily changing the Group to only include the Live code:

I then did Run Preview and checked the dataset on the Group By. It does indeed show no values for the entire Live Planned Cost column.

It should have values there. For example first row from above, ProjectId 2751 has a Planned Cost of 50,000.00 per Input dataset:

rco

PJG

Randall, thank you so much for all your help on this. Understood, and perfectly explained. I realized I needed to wrap my other code like that too, and it is now working perfectly! Any way to force blanks to appear as 0 in the output dataset?

rco

Sure, add an ELSE 0 to the case statement.

PJG

Beautiful, thank you so much again!

PJG

https://community-forums.domo.com/main/discussion/comment/97826#Comment_97826

Hi Randall, please let me trouble you for one more question on this.

I have another column, Yearly Total RUN Costs (CHF)_p. It's the same for every row, so I don't want to SUM it, but I do want it displayed in the output dataset, and I again need to distinguish between Live and Historic/ 2023-11-22.

Because I'm not using a SUM to wrap it around, how do I avoid it looking only at the first row? I thought I could use MAX() since all values are the same per ProjectId, but within Add Formula, this does not seem to be possible:

rco

To be clear, you just want the maximum (or only) value of "Yearly Total RUN Costs (CHF)_p" where BatchTimestamp is November 22nd 2023 to show up on every row of the dataset?

You'll have to use a Group By instead of an Add Formula since it's an aggregate function. Depending on your Domo version, it may allow you to specify no grouping columns. If it forces you to specify a grouping column, you could add a constant zero and group by that. This will produce just one row with the desired maximum value. Then you join that row to your original dataset. Again, depending on your Domo version it may allow you to join without specifying any join keys, but if it requires join keys then simply add constant zeroes to both sides of the join and join on those.

PJG

How can I check my DOMO version? There's a few things I don't have access, as it's a vendor controlled layer over another system that we use, and I know some features are restricted (Buzz for instance).

Yes, for "Yearly Total RUN Costs (CHF)_p", I just need to show the only value per project for Live and only value per project where BatchTimestamp is November 22nd 2023.

Could I please trouble you to elaborate on how to set this up "If it forces you to specify a grouping column, you could add a constant zero and group by that"

rco

I'm referring to the first configuration step of the Group By, which in my Domo version looks like this:

Here you really want to put nothing, since you just want to do the MAX() across the entire dataset. But it may not let you put nothing, so we put a constant value 0 instead. i.e. add an Add Constants tile before the Group By tile that adds a constant named "const" with value 0 and reference that here in the Group By tile. Then, put your MAX() formula in for the second step of the Group By configuration, and finally do they join to the original source.

With the join, there's a very similar problem. The join requires you to configure these join keys in step 2:

But since you want to join your one Group By output row with every row on the other side of the join, you really want to put nothing at all, to do a "Cross Join". If it doesn't let you do that (it probably won't unless you're enrolled in a very recent beta), you have to add a constant 0 as above to both sides of the join, and join on that. You can drop the constant immediately from both sides in the same join tile.

PJG

Maybe I was making this too complicated, or explained it poorly, but I tried just adding this to the existing Group by before getting to your reply, and this looks to be working.

rco

This will give you the maximum/only matching value within each ProjectId. The other solution was for the max/only across the whole dataset. If the value you want is within each unique ProjectId then this is your solution.

PJG

Hi yes, per ProjectId is what I'm looking for, and again, sorry if I explained that poorly! Each ProjectId could appear in the dataset dozens of times, but each ProjectId will have the same Yearly Total RUN Costs for Live and the same Yearly Total RUN Costs for Nov 2023, so I only needed to identify a value for each ProjectId and make a Nov 2023 column and a Live column

Everything you've helped me with today is enough knowledge gained to build more ETL, more datasets, and many more cards. Thank you!

And I appreciate you replying to my other topic also; I will try to tackle that tomorrow, but I will no doubt have more questions, as there are many things there I haven't used before; only started using ETL this month! :)

other categories

Product Ideas
Have a Domo product enhancement idea? Submit or upvote on ideas in the Ideas Exchange.
Ideas Exchange
Suggest & vote on new features you would like to see implemented in the Domo Product.
Data Connections
Ask questions about Connectors, Workbench, Cloud Amplifier and get best practices from Domo peers
Connectors
A space to troubleshoot connector errors (like authentication and sync issues), best practices for building or customizing connectors, and API and writeback options.
Workbench
Workbench discussions including configuring and running jobs, managing data types and schema, troubleshooting upload errors, and working with large datasets. Ask questions about scheduling and automation, version updates, system requirements, and SQL query behavior.
Cloud Integrations
Discussions around federated and cloud integration topics, such as Cloud Amplifier, Snowflake, Databricks, BigQuery, Oracle NetSuite, and other data warehouse or lake connections. Ask questions about authentication, auto-preview settings, cost implications, pass-through SQL, and integration configuration.
Data & ETL
Ask questions about Magic ETL, SQL DataFlows, DataFusion, Dataset Views and get best practices from Domo peers
Magic ETL
Magic ETL discussions including data transformation flows, formula editor use, tile functions (e.g., Pivot, Join, Group By, Rank & Window), and handling schema and datatype conversions. Ask questions about workflow logic, preview behavior, visual editing features, freeform SQL, and performance/error tuning.
SQL DataFlows
SQL DataFlows discussions including creating and managing SQL dataflows, API automation (e.g., via Python), error resolution (such as row-count mismatches or timeout limits), and SQL transform logic. Ask questions about performance optimization, execution time limits, workflow error troubleshooting, API integration, and SQL view or query visibility.
Datasets
Datasets discussions including DataFusion and Dataset Views, dataset sharing and permissions, importing and formatting data (e.g., CSV/XLSX), dataset granularity and filtering behavior. Ask questions about data merging and snapshots, API metadata access, header changes in imported files, and export/view limits.
Visualize & Apps
Ask questions about Beast Mode, Cards, Charting, Dashboards, Stories, Variables and get best practices from Domo peers
Dashboards
Dashboards discussions including Cards, Dashboards, and Stories—covering topics like card formatting, dashboard navigation, filtering logic, and data visualization behavior. Ask questions about layout consistency, dynamic labeling, drill-downs, access permissions, inter-dashboard navigation, and export options.
App Studio
App Studio discussions including building multi-page apps, custom navigation, themes, forms, filters, queues, and component behaviors. Ask questions about popup forms, filter persistence, control visibility, mobile access, theming and branding, embedded workflows, and publish workflows.
Pro-code Components
Pro-code Components discussions including building and debugging Domo Bricks or pro-code apps, app lifecycle management (e.g., manifest.json), and dataset or workflow integration. Ask questions about permission configurations, app-to-dataset writebacks, form security, PDF export, workflow initiation code, and use of the web-based Pro-code Editor.
Charting & Analyzer
Charting & Analyzer discussions including chart types (e.g., period-over-period charts, bullet charts, pivot tables, heat maps), tooltip and data label configuration, filter behavior, and time-based visualization logic. Ask questions about date selector binding, custom calculation displays, sorting order, annotations, chart alerts, and multi-metric formatting.
Calculations & Variables (Beast Mode)
Calculations & Variables (Beast Mode) discussions including creating and troubleshooting calculated fields, using variables in Analyzer, nesting Beast Modes, and leveraging FIXED and window functions like RANK or aggregation logic. Ask questions about variable scoping, date and running total calculations, error handling (e.g., divide-by-zero, row filters), ETL vs Beast Mode placement, and performance optimization.
AI & Data science
Ask questions about DomoAI and get answers from Domo peers.
Domo AI & AI Chat
Domo AI & AI Chat discussions including AI readiness tools, AI Chat interface behavior, AI agent creation and workflows, and AI dictionary or metadata configuration. Ask questions about AI Chat sessions reports, chat history visibility, publication syncing, AI agent errors, and dataset readiness governance.
Managing AI
Managing AI discussions including AI Playground usage, AI project setup, and AI model management within Domo. Ask questions about AI Academy episodes, AI agent errors, AI readiness guidance, and image/upload workflows.
Jupyter Workspaces
Jupyter Workspaces discussions including Notebook execution, scheduling DataFlows, error troubleshooting (e.g., “no output” or workspace down), and package or library support within the workspace. Ask questions about AI features, file share connectors, domojupyter APIs, Jupyter via Workflows, and data science resources.
Automate
Ask questions about App Framework, Workflows, Domo Bricks, Domo Developer, API and get best practices from Domo peers
Workflows
Workflows discussions including Task Center automation, form-based workflows, conditional logic, alerts, and code-driven tasks using Code Engine (JavaScript/Python). Ask questions about email triggers, append/writebacks, dataset logging, API integration, error handling, and workflow-task interactions like Projects & Tasks or dashboards.
Alerts
Alerts discussions including setting up card-based and dataset-based alerts, conditional notifications, and monitoring alert execution behavior. Ask questions about summary number triggers, email content values, multi-dimensional logic, non-firing alerts, and configuration differences across dataset types.
Distribute
Ask questions about Domo Everywhere, Scheduled Reports, Mobile and get best practices from Domo peers
Domo Everywhere
Domo Everywhere discussions including embedding dashboards and cards (public vs private), filtering and access control, performance and layout behavior, and API/client ID management. Ask questions about license tracking, text selection in embedded content, export limitations, embed errors, and configuration of .env and datasetRedirects.
Reporting
Reporting discussions including Scheduled Reports, Report Builder, and Slideshow Publications. Ask questions about bulk managing scheduled reports, CSV/PDF export formatting, report layout customization, interface changes, and admin visibility of reports.
Manage
Ask questions about Governance Administration, Approvals, Teams, Alerts, and Buzz and get best practices from Domo peers
Governance & Security
Governance & Security discussions including managing People, Groups, Roles, Teams, Approvals, and PDP, plus sandbox environment access and activity log investigation. Ask questions about role delegation, dynamic group attributes, SSO/SCIM onboarding, governance toolkit usage, and governance dataset visibility and reporting.
Navigation & Productivity
Navigation & Productivity discussions including navigation layout and customization, Projects & Tasks usage, Goals tracking, and Buzz chat functionality. Ask questions about custom icons in navigation, level-specific dashboard creation, workspace navigation behavior, and project/task visibility in Buzz.
APIs
APIs discussions including Domo REST APIs, Python SDK, Java SDK, data import/export, and App API use cases. Ask questions about authentication (client ID/secret), rate limits, error handling (401/403), dataset append/update, and embedding or snapshot automation.
Add-ins & Plugins
Add-Ins & Plugins discussions including Microsoft add-ins (Excel, PowerPoint), Google Slides, and other third-party integrations. Ask questions about installation errors, legacy vs new plugin behavior, refresh failures, template formatting, iframe embedding, and version differences.
Domo Community Gallery
Watch how our Customers are using Domo to solve their complex problems. Featuring real-world use cases, customer success stories, and community-shared workflows or integrations. Learn how our customers are using Domo to solve their complex problems.
Product Releases
Domo support and product teams are here to live-answer questions about the most recent product releases. Please post questions in this Forum board for all users to benefit (rather than submitting a support ticket).
Domo University
Domo University discussions include self-paced training, instructor-led courses, virtual/in-person learning, and certification paths. Ask questions about course content updates, certification exam tips, platform onboarding improvements, and training resource formatting or errors.
Community Forums
Getting Started
Welcome to Domo's Community Forums! You'll find everything you need to get started in this category.
Community Announcements
Get the latest from Domo's Community Team.
Social Groups
Archive
Old or outdated content that could still be found helpful.