Hello, Experts,
I am looking at creating a data flow that will constantly update data from a dataset while keeping a historical dataset in play.
For this scenario lets call:
- Historical Dataset (MySQL connector) - Dataset A
- Update Dataset (MySQL connector) - Dataset B
If you could, please let me know if I am on the right track here or if I need to pivot.
Plan:
- Create data connector via MySQL that will do select * from table join second_table to populate full historical dataset for Dataset A
- Create Magic ETL that will absorb that data into an output dataset called Dataset_A_Output
- Create data connector via MySQL that will do select * from table join second_table where lookback_date >= 3 weeks ago to populate Dataset B with new data constantly to be used for recursion
- Create Magic ETL that will bring in Dataset_A_Output as an input > bring in Dataset B > group by lookback_date and create two fields to be min date and max date
- Join Dataset_A_Output w/ Dataset B or should I?
- Create filter to filter out less than min date and greater than max date
- Append new updating Dataset B w/ the rest of the data flow and output to Prod_dataset_final
The goal is to bring in the historical dataset since it will never change again and new data going forward to update the dataset for any changes within the last 3 weeks.
Let me know if you can help. I greatly appreciate it. Attached is a proposed data flow diagram.
Isaiah