Best practice for dataset refreshing in DataFlow
Hi, I am wondering if there is a better way than mine with regard to dataset refreshing.
I am dealing with an aggregated daily order dataset on DOMO which has more than million records in total.
Let's call it "order_dataset".
The original data source is an external database, so I push data from it via Workbench on daily basis.
Everyday, new records are generated and many past records are updated.
I push those new and updated records to a DOMO dataset with "Replace Existing Datasource" mode.
Let's call this DOMO dataset "newly_uploaded_dataset".
My dataflow gets started as soon as "newly_uploaded_dataset" is refresh by Workbench.
Let me explaine each step in the dataflow, although it's a bit lengty.
I am using MySQL-base dataflow, by the way.
Step #1: Define a stored procedure to drop/recreate indexes for both of "order_dataset" and "newly_uploaded_dataset" and have the procedure executed.
The indexes will be used in a query later.
Since I have no idea how to check if there is an existing index on Domo (I got an error checking INFORMATION_SCHEMA) and I think using a brand-new index is not a bad idea, I drop/recreate everytime.
Step #2: Delete the records in "order_dataset" whose keys match to the records of "newly_uploaded_dataset".
The indexes will be used in this query. This deletion is to avoid record duplication.
Step #3:In Output Dataset step of Dataflow, select "order_dataset" and "newly_uploaded_dataset" with UNION ALL to store the result set to "order_dataset".
At first, I tried to use INSERT "order_dataset" from select * from "newly_uploaded_dataset";
However, I got stuck with a straing "java.sql.SQLException: Data truncated for column xyz" error, so I simply merged these two datasets in Output Dataset and it worked.
This way, newly refreshed "order_dataset" has been created.
This dataflow has been working properly, but actually I haven't maintained the dataset not for a long time, so I am a bit concerned about future issues.
- All Categories
- 1.2K Product Ideas
- 1.2K Ideas Exchange
- 1.3K Connect
- 1.1K Connectors
- 273 Workbench
- 2 Cloud Amplifier
- 3 Federated
- 2.7K Transform
- 78 SQL DataFlows
- 525 Datasets
- 2.1K Magic ETL
- 2.9K Visualize
- 2.2K Charting
- 435 Beast Mode
- 22 Variables
- 252 Cards, Dashboards, Stories
- 513 Automate
- 115 Apps
- 390 APIs & Domo Developer
- 8 Workflows
- 26 Predict
- 10 Jupyter Workspaces
- 16 R & Python Tiles
- 332 Distribute
- 77 Domo Everywhere
- 255 Scheduled Reports
- 66 Manage
- 66 Governance & Security
- 1 Product Release Questions
- Community Forums
- 40 Getting Started
- 26 Community Member Introductions
- 68 Community Announcements
- 4.8K Archive