Recursive dataflow not storing historical data

Hello,

In our data team, we are trying to build a recursive SQL dataflow to save historical data and append new data as the original dataset is updated daily, following the instructions in this post: https://knowledge.domo.com/Prepare/DataFlow_Tips_and_Tricks/Creating_a_Recursive%2F%2FSnapshot_SQL_DataFlow. However, the way it is set up right now, we are getting the data from the day when we set it up and then the last day when the dataset run, but then this data is erased and replaced with the latest data, so that we end up with only two dates worth of data, but nothing in between.

These are the steps we followed to build this dataflow:

1. Run the following query on our original table to create the dataset we need (all within Domo). Let's call this dataset sampledataset

SELECT club_id AS 'Club', COUNT(member_id) AS 'Client id', NOW() as 'Date'

FROM member_table

WHERE inactive=0 AND is_employee=0

GROUP BY club_id

Update setting: run the dataflow when member_table updates

2. Create a SQL dataflow with sampledataset as input and the following query as output and run it. Let's call this output historicaldataset

SELECT *

FROM sampledataset

No update settings selected

3. Once step 2 is completed, In the same dataflow add historicaldataset as an input

4. Create the following transform:

SELECT *
FROM historicaldataset
WHERE `Date`NOT IN (SELECT `Date`FROM sampledataset)

Generate output table called historical_data

5. Add this additional transform:

SELECT * FROM sampledataset
UNION ALL
SELECT * FROM historical_data

Generate output table called append_new_data_to_historical

6. Generate output using:

SELECT *
FROM append_new_data_to_historical

Dataflow update settings: update only sampledataset.

Obviously we are not setting up something correctly because the data is not being appended, but replaced after the next run. Any help will be greatly appeciated.

Thanks!

Quick Links

Accepted answers

Have you changed your new input to be the last output of your dataflow (append_new_data_to_historical)? That's key to the recursive nature of this method. The output has to be the new input.

Jon

Well, as I learned, when you consume a datasource in a dataflow, you have to make sure you're actually consuming the one that's the output of the dataflow. I think Domo is just particular about that.

I'll try to reconstruct what I did:

1. I created a new dataflow, using my existing data, I'll call it "original_historical_data"

2. So in the first pass at this I just "select * from original_historical_data" into the new dataset I'll call "historical".

3. The I go back remove "original_historical_data" from the Input DataSets and add the "historical" (which I just created).

4. Add my new dataset "this_week" to the Input DataSets...

5. Do my union between the "historical" and "this_week" back into "historical"

That way it's truly recursive.

It's odd, but Domo has to use the dataset it created for this to work. I banged my head on the whiteboard for a few days on this.

If that doesn't make sense, send me a message and I'll give you my number, we can chat.

- Jon

All comments

Have you changed your new input to be the last output of your dataflow (append_new_data_to_historical)? That's key to the recursive nature of this method. The output has to be the new input.

user02464

Hi, thanks for your reply. In the output section, we called the output dataset historicaldataset, just like the input, with the following query:

SELECT *
FROM append_new_data_to_historical

and in the update settings, this is not selected to update, only the sampledataset

Jon

One thing I learned about building recursive datasets, is that you MUST first create the source dataset from within the dataflow. That way you're consuming the dataset from the workflow.

Also, your step 5 may be a bit redundant, you should be able to drop the results of that union right into your final dataset.

- Jon

user02464

Hi Jon,

I agree with step 5 being redundant, but I am a bit confused about why would it be necessary to create the source dataset within the dataflow and how that would work?

Jon

Well, as I learned, when you consume a datasource in a dataflow, you have to make sure you're actually consuming the one that's the output of the dataflow. I think Domo is just particular about that.

I'll try to reconstruct what I did:

1. I created a new dataflow, using my existing data, I'll call it "original_historical_data"

2. So in the first pass at this I just "select * from original_historical_data" into the new dataset I'll call "historical".

3. The I go back remove "original_historical_data" from the Input DataSets and add the "historical" (which I just created).

4. Add my new dataset "this_week" to the Input DataSets...

5. Do my union between the "historical" and "this_week" back into "historical"

That way it's truly recursive.

It's odd, but Domo has to use the dataset it created for this to work. I banged my head on the whiteboard for a few days on this.

If that doesn't make sense, send me a message and I'll give you my number, we can chat.

- Jon

For context, what's the business question here?

For the recursion to work, you create an output dataset that becomes an input dataset to the dataflow.

Something like this:

Input: Historical transactions dataset, static

Input: Daily transactions, replaced every day

Process: Select everything from this historical transactions dataset, union it with the daily transactions dataset, and output that as the main dataset. Run the dataflow. Then open/edit the dataflow to have the main dataset as an input in place of the historical transactions which we don't need anymore. Set the dataflow to trigger whenever the daily transactions updates. That feeds into itself and just builds over time.

Do you have two different dataflows here?

Jon

My challenge was that I actually have values that change in the "old" data from time to time (corrections, etc), and the full dataset is too large to retrieve every time.

I have an intermediate step (which is not relevant to the question here) which removes any ID that;s in the updated dataset before the union.

But, in my case, the businss question was "how do we append when we cannot use the append functionality in the data connector".

- Jon

user02464

Well, in our case the business question is: How can we build a card to show the number of active clients over time when the original dataset (member_table) does not store this information over time?

Our idea is, of course, to build a recursive dataflow to build a dataset that will update everyday and store the COUNT(member_id) per day, but what we have now is a dataflow that saved the data from the historical dataset and is now appending only the data from the latest run, but not saving anything in between.

This is why we need the first step and to this query:

SELECT club_id, COUNT(member_id), NOW()

FROM member_table

to create our source dataset first. I am thinking whether it would make a difference to use _BATCH_LAST_RUN instead of NOW()?

user02464

Hi AS,

I do have two dataflows in the sense that the first one is to create the data source from one of our tables, then use the output of this to power up the recursive dataflow.

user02464

Hi Jon,

I've just tried this approach. It makes sense! Thanks.

[Deleted User]

Did any of the replies help you? If so please click on "Accept as Solution" next to each one that solved the problem.

Thanks!

user02464

Hi Jon,

Just wanted to say thanks because your approach solved the issue. It is a small change, but definitely makes all the difference. Thanks!

other categories

Product Ideas
Have a Domo product enhancement idea? Submit or upvote on ideas in the Ideas Exchange.
Ideas Exchange
Suggest & vote on new features you would like to see implemented in the Domo Product.
Data Connections
Ask questions about Connectors, Workbench, Cloud Amplifier and get best practices from Domo peers
Connectors
Connectors, Custom Connectors, Writeback
Workbench
Ask questions about Workbench, a secure, client-side solution for uploading your on-premise data to Domo.
Cloud Integrations
Ask questions about Cloud Integrations and Federated Data connection to your data warehouse or lake.
Data & ETL
Ask questions about Magic ETL, SQL DataFlows, DataFusion, Dataset Views and get best practices from Domo peers
Magic ETL
Ask Magic ETL questions and get answers from Domo peers
SQL DataFlows
Ask SQL DataFlow questions and get answers from Domo peers
Datasets
Ask DataFusion and Dataset Views questions and get answers from Domo peers
Visualize & Apps
Ask questions about Beast Mode, Cards, Charting, Dashboards, Stories, Variables and get best practices from Domo peers
Dashboards
Ask Cards, Dashboards, and Stories questions and get answers from Domo peers
App Studio
Ask questions about building apps in App Studio.
Pro-code Components
Ask questions about pro-code components and Domo Bricks and get answers from Domo peers.
Charting & Analyzer
Ask questions about charting and Analyzer and get answers from Domo peers.
Calculations & Variables (Beast Mode)
Ask questions about using calculated fields and Variables (Beast Modes) in Analyzer.
AI & Data science
Ask questions about DomoAI and get answers from Domo peers.
Domo AI & AI Chat
Ask questions about AI Chat and AI assistants.
Managing AI
Ask questions about managing AI with AI Playground, AI projects, AI models, and more.
Jupyter Workspaces
Ask questions about Jupyter Workspaces, Notebooks, and file share.
Automate
Ask questions about App Framework, Workflows, Domo Bricks, Domo Developer, API and get best practices from Domo peers
Workflows
Ask questions about Task Center, building automations with Domo Workflows, and executing JavaScript or Python code with Code Engine.
Alerts
Ask questions about managing alerts in Domo and get answers from Domo peers.
Distribute
Ask questions about Domo Everywhere, Scheduled Reports, Mobile and get best practices from Domo peers
Domo Everywhere
Ask questions about embedded analytics with Domo Everywhere.
Reporting
Ask questions about Scheduled Reports, Report Builder, and Slideshow Publications.
Manage
Ask questions about Governance Administration, Approvals, Teams, Alerts, and Buzz and get best practices from Domo peers
Governance & Security
Ask questions about People, Groups, Roles, Sandbox, Activity log, Buzz, Teams, Approvals and PDP and get best practices from Domo peers
Navigation & Productivity
Ask questions about navigation, Projects & Tasks, Goals, and Buzz chat.
APIs
Ask APIs and Developer.domo.com questions and get answers from Domo peers
Add-ins & Plugins
Ask questions about plugins, Microsoft add-ins, and other third-party software integrations.
Domo Community Gallery
Watch how our Customers are using Domo to solve their complex problems.
Product Releases
Domo support and product teams are here to live-answer questions about the most recent product releases. Please post questions in this Forum board for all users to benefit (rather than submitting a support ticket).
Domo University
Questions or discussions related to Domo University, trainings and certifications
Community Forums
Getting Started
Welcome to Domo's Community Forums! You'll find everything you need to get started in this category.
Community Announcements
Get the latest from Domo's Community Team.
Social Groups
Archive
Old or outdated content that could still be found helpful.