How to properly partition in Workbench while allowing older data to come in?
I have a SQL Server dataset of 50M rows. Workbench runs the APPEND Update Method HOURLY. I use the table ID as the UPSERT key and the processing query checks the past 7 days ("Insertion Date") for new records. This updates older data unnecessarily with the same data, but protects against issues with the job not running for up to 7 days.
-"Insertion Date" column [datetime in SQL Server] is NOT seen by customers
-"Transaction Date" [datetime in SQL Server] IS seen by customers and used for filtering.
Performance is terrible once we have several cards attached to this dataset, and I'm guessing not using PARTITIONING is part of the problem.
A "day" grain on "Transaction Date" would make sense, but I can't risk data coming in with an old "Transaction Date" potentially removing any existing data from that date. My source data can get old records where "Transaction Date", but a recent "Insertion Date". For example, transactions from 6 months ago could get keyed in today.
What is a proper strategy for partitioning/maintaining this data set that would never lose data, and still update every hour?
Answers
-
What you might need to do is query your dataset to see what transaction dates were updated during the last partition chunk of insertion date and then re-pull the all of the transactions with the transaction dates from the list of transactions dates which were inserted / updated.
So you're partitioning based off the transaction date but keying off the insertion date if that makes sense.
This may not be as performant though depending on the amount of transaction dates that would be updating each job execution.
**Was this post helpful? Click Agree or Like below**
**Did this solve your problem? Accept it as a solution!**0
Categories
- All Categories
- 2K Product Ideas
- 2K Ideas Exchange
- 1.6K Connect
- 1.3K Connectors
- 311 Workbench
- 6 Cloud Amplifier
- 9 Federated
- 3.8K Transform
- 657 Datasets
- 115 SQL DataFlows
- 2.2K Magic ETL
- 815 Beast Mode
- 3.3K Visualize
- 2.5K Charting
- 81 App Studio
- 45 Variables
- 775 Automate
- 190 Apps
- 481 APIs & Domo Developer
- 81 Workflows
- 23 Code Engine
- 40 AI and Machine Learning
- 20 AI Chat
- 1 AI Playground
- 1 AI Projects and Models
- 18 Jupyter Workspaces
- 410 Distribute
- 120 Domo Everywhere
- 280 Scheduled Reports
- 10 Software Integrations
- 144 Manage
- 140 Governance & Security
- 8 Domo Community Gallery
- 48 Product Releases
- 12 Domo University
- 5.4K Community Forums
- 41 Getting Started
- 31 Community Member Introductions
- 114 Community Announcements
- 4.8K Archive