ETL question

I want to implement a function that every time the dataset is updated, it will show me which row is new. What's the best solution, I am thinking about recursive dataflow but not sure how to create one.

Best Answer

  • ArborRose
    ArborRose Coach
    Answer ✓

    Create a Unique Identifier:

    Ensure that your dataset has a unique identifier for each row. If not, you can create one using a combination of columns with something such as

    CONCAT(Column1, '-', Column2) 
    

    Set Up the Initial Dataflow:

    Create a dataflow that will serve as the initial baseline for your dataset. This dataflow will include all the rows in your dataset at the start.

    Create the Recursive Dataflow:

    Create a recursive dataflow that compares the current dataset with the previous version (i.e., the output of the previous run of this dataflow) to identify new rows.

    ** Was this post helpful? Click Agree or Like below. **
    ** Did this solve your problem? Accept it as a solution! **

Answers

  • https://domo-support.domo.com/s/article/360057087393?language=en_US

    This example will show you how to do recursion.

    ** Was this post helpful? Click Agree or Like below. **
    ** Did this solve your problem? Accept it as a solution! **

  • ArborRose
    ArborRose Coach
    Answer ✓

    Create a Unique Identifier:

    Ensure that your dataset has a unique identifier for each row. If not, you can create one using a combination of columns with something such as

    CONCAT(Column1, '-', Column2) 
    

    Set Up the Initial Dataflow:

    Create a dataflow that will serve as the initial baseline for your dataset. This dataflow will include all the rows in your dataset at the start.

    Create the Recursive Dataflow:

    Create a recursive dataflow that compares the current dataset with the previous version (i.e., the output of the previous run of this dataflow) to identify new rows.

    ** Was this post helpful? Click Agree or Like below. **
    ** Did this solve your problem? Accept it as a solution! **

  • You might be able to create some kind of timestamp. But you would probably have to keep a history or something.

    CASE
    WHEN created_at > DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY) THEN 'New Row'
    ELSE 'Existing Row'
    END

    ** Was this post helpful? Click Agree or Like below. **
    ** Did this solve your problem? Accept it as a solution! **