How to properly partition in Workbench while allowing older data to come in?
I have a SQL Server dataset of 50M rows. Workbench runs the APPEND Update Method HOURLY. I use the table ID as the UPSERT key and the processing query checks the past 7 days ("Insertion Date") for new records. This updates older data unnecessarily with the same data, but protects against issues with the job not running for up to 7 days.
-"Insertion Date" column [datetime in SQL Server] is NOT seen by customers
-"Transaction Date" [datetime in SQL Server] IS seen by customers and used for filtering.
Performance is terrible once we have several cards attached to this dataset, and I'm guessing not using PARTITIONING is part of the problem.
A "day" grain on "Transaction Date" would make sense, but I can't risk data coming in with an old "Transaction Date" potentially removing any existing data from that date. My source data can get old records where "Transaction Date", but a recent "Insertion Date". For example, transactions from 6 months ago could get keyed in today.
What is a proper strategy for partitioning/maintaining this data set that would never lose data, and still update every hour?
Answers
-
What you might need to do is query your dataset to see what transaction dates were updated during the last partition chunk of insertion date and then re-pull the all of the transactions with the transaction dates from the list of transactions dates which were inserted / updated.
So you're partitioning based off the transaction date but keying off the insertion date if that makes sense.
This may not be as performant though depending on the amount of transaction dates that would be updating each job execution.
**Was this post helpful? Click Agree or Like below**
**Did this solve your problem? Accept it as a solution!**0
Categories
- All Categories
- 1.9K Product Ideas
- 1.9K Ideas Exchange
- 1.6K Connect
- 1.3K Connectors
- 303 Workbench
- 6 Cloud Amplifier
- 9 Federated
- 2.9K Transform
- 104 SQL DataFlows
- 640 Datasets
- 2.2K Magic ETL
- 4K Visualize
- 2.5K Charting
- 767 Beast Mode
- 70 App Studio
- 43 Variables
- 715 Automate
- 185 Apps
- 460 APIs & Domo Developer
- 56 Workflows
- 14 DomoAI
- 39 Predict
- 16 Jupyter Workspaces
- 23 R & Python Tiles
- 402 Distribute
- 116 Domo Everywhere
- 277 Scheduled Reports
- 9 Software Integrations
- 134 Manage
- 131 Governance & Security
- 8 Domo Community Gallery
- 44 Product Releases
- 12 Domo University
- 5.4K Community Forums
- 40 Getting Started
- 30 Community Member Introductions
- 111 Community Announcements
- 4.8K Archive