How to append data so you can get a snapshot every time your dataset runs

Utz
Utz Member

Hi I have created a date field in my dataflow using this formula to give me the current date: DATE(TODAY())

I am trying to use this date field to get a snapshot of the data every time the dataset refreshes.

I have my dataset scheduled to refresh every Friday. The first Friday the dataset refreshed it gave me the correct date (5-12-23) for the current date field since it was the first time refreshing but the next Friday the dataset ran it overwrote the previous date (5-12-23) and the data for that week and gave me only the new current date (5-19-23). The (5-12-23) data was gone. I am trying to see a snapshot of the data both times the data refreshed for both weeks but it is only giving me the data for the latest refresh.

Tagged:

Answers

  • If you have your raw dataset (not the snapshot version) as an input and the snapshot dataset as an input you can just calculate your current date with your formula and then use an append tile to join the new version to the historical snapshot dataset and then output to the historical dataset.

    **Was this post helpful? Click Agree or Like below**
    **Did this solve your problem? Accept it as a solution!**
  • Utz
    Utz Member

    Hi Grant, so I only need it to append the data when it refreshes is there a button or option along with the schedule frequency to choose to append the data instead of going down the path of creating a historical snapshot dataset in the dataflow? Something like this image I found this on google and was wondering how to get to this in DOMO.

  • You can tell the Magic ETL DataFlow to run only when certain input datasets have been updated.

    This document may be of use: https://domo-support.domo.com/s/article/360057087393?language=en_US

    Since you're not wanting to replace the old data you can bypass the join and filter sections of the ETL in the documentation.

    **Was this post helpful? Click Agree or Like below**
    **Did this solve your problem? Accept it as a solution!**
  • Utz
    Utz Member

    Hi Grant, I am not trying to replace old data however trying to start a process so that each time moving forward from now that the data set I have is refreshed on its schedule appends the data, and not replaces it. This example I feel like is showing me more on how to append historical data as opposed to showing me how to append data upon a refresh of the data set

  • @Utz

    It sounds like you just want to take your old historical dataset, add your new data to the old dataset with the timestamp?

    In that case using a Magic ETL start by just having your original dataset as an input, feed it into a formula tile to add the snapshot timestamp and then output it to a new dataset.

    Save and run your dataflow and mark the option to only trigger when the input dataset is run

    Go back in and edit your dataflow again and then add your new output dataset as an input dataset and append it to your input dataset.

    This will then duplicate your original dataset and tack on a snapshot data.

    **Was this post helpful? Click Agree or Like below**
    **Did this solve your problem? Accept it as a solution!**
  • Utz
    Utz Member

    @GrantSmith

    Hi Grant, thank you for your response I tried to follow the steps you sent but after initially running the data flow after adding the time stamp when I go to add in the dataset as an input it is giving me an error saying there is no data in this dataset. Here is the image of it

    Another question I had is you said this will duplicate my original dataset and tack on the snapshot but I just want a snapshot each time my dataset is refreshed and have that appended to what is already in the data for I do not want it duplicated.

  • @Utz

    Can you give me some sample data of your existing dataset, the dataset changed and what the final snapshot dataset should look like?

    **Was this post helpful? Click Agree or Like below**
    **Did this solve your problem? Accept it as a solution!**
  • Utz
    Utz Member
    edited May 2023

    @GrantSmith

    Sure here is a sample of my dataset with not all the columns but with a few of them and the key one is the current date I need this current date to represent the snapshot date right now it is showing there is only 1 unique value of May 19 2023 but this dataset ran twice and overwrote the May 12 2023 data that was there the first time it ran. So to give an example the data set is set to run tomorrow and I want May 19 2023 value in the current date along with May 26 2023 for tomorrow after it runs:

    Here is my data flow and how I am getting the current date field using the formula tile:

    Here is the example I am working with from what you had shared to do and following your steps: