Recursive ETL

Jones01
Jones01 Contributor
edited August 2022 in Magic ETL

Hi guys,

I am pulling data from from our db looking like

Date|Name|Value

DATE and NAME is the unique key

I sort of understand the recursive ETL and it will append and replace old data with new data but I am missing the point slightly. Should the new data do a complete pull from the db or a subset?

I am looking to just get changes from our db since the last pull and merge those in rather than keep pulling years worth of data all of the time?

Having just watched a domo domopalooza am I right in thinking my query to our db would just pull changes say in the last 10 days to make the set smaller?

Any help would be appreciated.

Best Answer

  • GrantSmith
    GrantSmith Coach
    Answer ✓

    Correct, you'd only pull in the changes that you'd need to be applied to keep your data processing quicker (less records). You'd need an initial pull of all your data to establish your baseline but then can just pull in the records that changed since the last time you've run it.

    **Was this post helpful? Click Agree or Like below**
    **Did this solve your problem? Accept it as a solution!**

Answers

  • GrantSmith
    GrantSmith Coach
    Answer ✓

    Correct, you'd only pull in the changes that you'd need to be applied to keep your data processing quicker (less records). You'd need an initial pull of all your data to establish your baseline but then can just pull in the records that changed since the last time you've run it.

    **Was this post helpful? Click Agree or Like below**
    **Did this solve your problem? Accept it as a solution!**
  • Jones01
    Jones01 Contributor

    @GrantSmith great thanks.

    Yes I believe I have this all working now. Pulling changes from the source every 30 mins and checking two keys on the records seems to be working.

  • Jones01
    Jones01 Contributor

    My dataset has about 5.6 million records and the recursive etl to bring in new records takes 35 seconds.

    Does that sound reasonable?

  • Yeah that sounds about reasonable. The one caveat to recursive dataflows is the don't scale the best as the larger the dataset grows the longer it will take to run the ETL (more data to transfer means more time).

    **Was this post helpful? Click Agree or Like below**
    **Did this solve your problem? Accept it as a solution!**