Hello, Domosapiens,
I currently have a data flow that is taking 14 min or greater to run.
The design is as such:
- MySQL connector that first runs once to pull in all historical data (7.8M rows). Ex full historical pull query:
select * from table;
- change dataset settings to merge method by changing the query to
select * from table where updated_date > utc_timezone() - INTERVAL 1 DAY;
. As you would assume, the data is pulling in a smaller subset of data vs the full historical run.
- Connector feeds into a MySQL ETL DF that converts timezone for date columns based on each row's location timezone and spits out an output dataset. (this takes 14 minutes to run)
Is my issue lying with one or many of the following:
- should I be creating a sub-table/transformation that is querying my data connector for updated changes?
- maybe with something that you find in MySQL (replace insert into)?
I am open to suggestions. Realistically, is it possible to decrease the dataflow run time from 14 min and greater down to as low as possible? If so, how would you do it?
Grateful!
Isaiah Melendez