Group By tile not working with large dataset
Hi, I have a dataset of about 4.7M rows. When I try to use the Group By tile in Magic ETL to find a sum of values across IDs, and then join back on the ID, the new sum column is blank. However, in testing, when I filtered the dataset down to about 50K rows to see what was wrong, the example ID I was checking now had the sum column populated. Is there a known issue with aggregations within big datasets? My ETL dataflow is identical to the testing instance, apart from the one filter tile to only have one ID.
Best Answer
-
I have done group bys on similarly large data without a problem. Here are a couple suggestions for debugging:
- Create an Output Tile directly from the GroupBy in your full dataset to see if it's blank before you rejoin it back to the data. (You can delete it later). Double check that there's only one value for your test ID. My intuition is that there won't be (but happy to be proved wrong).
- Filtering down to a smaller size is a good idea, but use more than one ID in your test dataset. You won't be able to tell if it's grouping correctly with a group size of 1.
Good luck!
Please 💡/💖/👍/😊 this post if you read it and found it helpful.
Please accept the answer if it solved your problem.
0
Answers
-
Hi @jrtomici
Can you confirm whether you are witnessing this behavior when you save and run the Magic ETL, or just when you preview? If it's only when you are previewing then I would suspect that you're not seeing any values in your new sum column because you're working with a large dataset and the max you can preview is 400K rows.
The fact that you're not observing the behavior when you are applying a filter is a good indication that if you remove the filter, then save and run the ETL you'll see that the Group By and Join tiles are working as expected.
0 -
@ggenovese it's occurring when I save and run the dataflow. I created a copy of the dataflow and a new output dataset for testing purposes, specifically because of the preview limit.
0 -
That's interesting, can you share some screenshots so that I can see how it's configured?
0 -
I have done group bys on similarly large data without a problem. Here are a couple suggestions for debugging:
- Create an Output Tile directly from the GroupBy in your full dataset to see if it's blank before you rejoin it back to the data. (You can delete it later). Double check that there's only one value for your test ID. My intuition is that there won't be (but happy to be proved wrong).
- Filtering down to a smaller size is a good idea, but use more than one ID in your test dataset. You won't be able to tell if it's grouping correctly with a group size of 1.
Good luck!
Please 💡/💖/👍/😊 this post if you read it and found it helpful.
Please accept the answer if it solved your problem.
0 -
Since there's no preexisting issue with this, I'm going to go back and work on it more. If the problem persists I will provide screenshots. Thank you both!
0
Categories
- All Categories
- 2K Product Ideas
- 2K Ideas Exchange
- 1.6K Connect
- 1.3K Connectors
- 310 Workbench
- 7 Cloud Amplifier
- 9 Federated
- 3K Transform
- 113 SQL DataFlows
- 653 Datasets
- 2.2K Magic ETL
- 4K Visualize
- 2.5K Charting
- 796 Beast Mode
- 78 App Studio
- 44 Variables
- 757 Automate
- 188 Apps
- 480 APIs & Domo Developer
- 72 Workflows
- 17 DomoAI
- 40 Predict
- 17 Jupyter Workspaces
- 23 R & Python Tiles
- 408 Distribute
- 119 Domo Everywhere
- 279 Scheduled Reports
- 10 Software Integrations
- 141 Manage
- 137 Governance & Security
- 8 Domo Community Gallery
- 47 Product Releases
- 12 Domo University
- 5.4K Community Forums
- 41 Getting Started
- 31 Community Member Introductions
- 114 Community Announcements
- 4.8K Archive