Update Method option on dataflow
So I was editing a redshift dataflow this morning and I noticed a new option on the output dataset named Update Method. It has a replace and an append option. I am not sure when this became available or how it should be implemented. I understand that all current datasets replace themselves and normally to append you need to create a recursive dataflow feeding the base data back into itself. Could someone give me an example of a use case for the append option?
Thank you.
-----------------
Chris
Best Answer
-
@cwolman, yes that is a great use case for it.
Key things to note:
- There must be no overlap between Dataset A and Dataset B. You cannot update records that already loaded into Dataset A using this new method.
- Correcting errors is trickier. If a data load must be reloaded or was loaded twice it is difficult to correct it using the new method.
Former Domo employee you can find me in the Dojo Community here @n8isjack0
Answers
-
Hi @cwolman, this is a pretty exciting change but it is for specific situations. It likely will not replace your recursive dataflow.
It will allow you to take data that can simply be appended, but modify it first. Say that you are loading sales transactions. Dealing with hundreds of millions of rows is slow but you can just append the data. This is great, but if you need to do data prep, cleanup, filtering, etc... it has to be done in every card using the data. This new method allows you to transform it before appending it to the dataset.
Former Domo employee you can find me in the Dojo Community here @n8isjack1 -
Would this new functionality work for this scenario?
Basic recursive dataflow
Dataset A - contains 25M rows (base data)
Dataset B - contains 1M rows (new data)
transform dataset B and union to Dataset A for final output. Dataset A now contains 26M rows. Rinse and repeat daily.
Could I edit this existing dataflow and remove Dataset A as an input and simply transform Dataset B and have it append to the output dataset using this new feature?
This would allow me to eliminate the time required to load Dataset A first which would decrease processing time.
-----------------
Chris0 -
@cwolman, yes that is a great use case for it.
Key things to note:
- There must be no overlap between Dataset A and Dataset B. You cannot update records that already loaded into Dataset A using this new method.
- Correcting errors is trickier. If a data load must be reloaded or was loaded twice it is difficult to correct it using the new method.
Former Domo employee you can find me in the Dojo Community here @n8isjack0
Categories
- All Categories
- 1.8K Product Ideas
- 1.8K Ideas Exchange
- 1.5K Connect
- 1.2K Connectors
- 296 Workbench
- 6 Cloud Amplifier
- 8 Federated
- 2.9K Transform
- 99 SQL DataFlows
- 614 Datasets
- 2.2K Magic ETL
- 3.8K Visualize
- 2.5K Charting
- 727 Beast Mode
- 53 App Studio
- 40 Variables
- 677 Automate
- 173 Apps
- 451 APIs & Domo Developer
- 45 Workflows
- 8 DomoAI
- 34 Predict
- 14 Jupyter Workspaces
- 20 R & Python Tiles
- 394 Distribute
- 113 Domo Everywhere
- 275 Scheduled Reports
- 6 Software Integrations
- 121 Manage
- 118 Governance & Security
- Domo Community Gallery
- 32 Product Releases
- 10 Domo University
- 5.4K Community Forums
- 40 Getting Started
- 30 Community Member Introductions
- 108 Community Announcements
- 4.8K Archive