MySQL Transform Only New Data

Hi,

 

I'm trying to use a MySQL dataflow to transform data as it arrives (every minute via Stream API) but obviously I only want to include the new data not repeat the entire dataset transform each time (2.9m rowss and counting).  Setting the append flag and the frequency on update doesn't do this, I just get the same 2.9m rows over and over again appended on each run.  So how do i modify the job to only include new data? 

 

Sorry if this is basic stuff, I'm pretty new to Domo but know standard data warehousing, ETL, MySQL, MSSQL well from a previous role.  

 

Thanks, Ash. 

 

Edit: I've seen reference in other posts to system fields _BATCH_ID_ and _BATCH_LAST_RUN_ but they do not appear in the field list and I tried to use them in a Beast Mode calculated field and couldn't get it to work.  If it matters, my dataset is being created by the Stream API but I can see batches in Data Center so it is being tracked somewhere.  

Best Answer

  • AS
    AS Coach
    Answer ✓

    Domo doesn't yet handle delta data the best, but they just recently released into beta a dataflow option to help speed up dataflows.  Instead of processing the entire inputs from scratch, Domo has introduced a setting that lets you process just the append portion.  On the backend Domo has each append batch assigned its own data chain link in a data chain, and all of the append links make up the entire append chain.  With this new setting Domo will process just the most recent link. 

    That should save a bunch of processing time.  Inquire with the beta team if you aren't already part of that.

    Aaron
    MajorDomo @ Merit Medical

    **Say "Thanks" by clicking the heart in the post that helped you.
    **Please mark the post that solves your problem by clicking on "Accept as Solution"

Answers

  • AS
    AS Coach
    Answer ✓

    Domo doesn't yet handle delta data the best, but they just recently released into beta a dataflow option to help speed up dataflows.  Instead of processing the entire inputs from scratch, Domo has introduced a setting that lets you process just the append portion.  On the backend Domo has each append batch assigned its own data chain link in a data chain, and all of the append links make up the entire append chain.  With this new setting Domo will process just the most recent link. 

    That should save a bunch of processing time.  Inquire with the beta team if you aren't already part of that.

    Aaron
    MajorDomo @ Merit Medical

    **Say "Thanks" by clicking the heart in the post that helped you.
    **Please mark the post that solves your problem by clicking on "Accept as Solution"