ETL Slowed down by large Data Input
I'm currently using a dataset in a few dataflows that contains around 10M rows of data (this includes historical extracts from the last day of each month as well as the previous day). Any ETL process that is using this dataset as an input takes a long time to run due to all of the rows having to load in before anything can be done (the actual ETL steps run quickly). What are some best practices to isolate on the data I need in the data input process to reduce run time?
use a Dataset view (beta feature, ask your csm) to isolate the subset of data that you actually need in ETL..
are you running magic or SQL? Magic can start ETL before all the data is loaded into the ETL environment, with SQL you have to wait until all the data is loaded into a table (and indexed if you're using redshift) before the dataflow can begin.
also, SQL enginges cannot leverage a DSV. if you're using SQL your pipeline has to be:
2) Dataset Copy (connector) of your DSV.
then SQL transform.
If you have the Adrenaline dataflows feature, you may be able to leverage that for faster performance b/c it all happens in Adrenaline, but this is a premium feature.
If you work with your support team, you may be able to
1) materialize your DSV (so you can avoid dataset copy)
2) pull the materialized DSV into SQL, but make it clear that that's your intention b/c i think there's additional backend work that has to happen to us a materialized view in a SQL dataflow.
SHORT VERSION OF THE STORY, you'll have a simpler data pipeline in Magic 2.0Jae Wilson
Check out my 🎥 Domo Training YouTube Channel 👨💻
**Say "Thanks" by clicking the ❤️ in the post that helped you.
**Please mark the post that solves your problem by clicking on "Accept as Solution"2
@jaeW_at_Onyx Thank you so much for the suggestions! I will be switching these ETLs over to Magic 2.0 and take a look into Adrenaline.0
- 7.7K All Categories
- 12 Connect
- 925 Connectors
- 247 Workbench
- 441 Transform
- 1.7K Magic ETL
- 61 SQL DataFlows
- 456 Datasets
- 88 Visualize
- 222 Beast Mode
- 2.1K Charting
- 8 Variables
- 35 Cards, Dashboards, Stories
- 5 Automate
- 349 APIs & Domo Developer
- 85 Apps
- 17 Predict
- 3 Jupyter Workspaces
- 14 R & Python Tiles
- 242 Distribute
- 60 Domo Everywhere
- 241 Scheduled Reports
- 18 Manage
- 39 Governance & Security
- 46 Product Ideas
- 1.1K Ideas Exchange
- 6 Community Forums
- 19 Getting Started
- 6 Community Member Introductions
- 50 Community News
- 18 Event Recordings
- 577 日本支部