I have a dataflow that has 2 input sources, and have configured it to run when either of the input sources get updated.
If one of the dataflow's input sources gets updated while the dataflow is running, will that immediately trigger a rerun of the dataflow? If it doesn't immediately trigger it, then will it at least trigger it once it has completed?
I was hoping it would abandon the current run of the dataflow and trigger a rerun immediately, otherwise the output dataset will have stale data. In testing this scenario, it looks like the dataflow doesn't get retriggered at all.
I've tested this by adding an additional, small, redundant dataset (dataset_A) to the inputs of a dataflow (dataflow_B) that takes more than 15 minutes to run. I then configured dataflow_B to rerun if dataset_A is updated. I then manually triggered a rerun of dataflow_B, waited a few minutes and refreshed dataset_A which completed within seconds. It looks like the fact that dataset_A is refreshed doesn't stop or affect the currently running dataflow_B dataflow, and didn't cause it to rerun once it finished. That is not ideal behaviour because it leads to stale data in the output dataset.