Doing UPSERTS via Redshift
As my luck would have it, the genius architects back in IT do an awful lot of INSERTS and UPDATES. At least, it looks that way to me. Out of a 63M row DB, I find about 600K INSERTS and 1M UPDATES weekly.
What's a MajorDomo to do? The support for UPSERTS via MySQL and Domo isn't all that good. I tried but just could not make it work.
I'm having better luck with Redshift at the moment. With a dataflow, I basically encapsulate the INSERTS and REPLACEs in a transaction, as follows:
begin transaction;
--- delete all rows from the source table where the primary key (row_wid) matches those of the rows to be INSERTED.
DELETE from w_quote_f
USING w_quote_f_inserts
WHERE w_quote_f.row_wid = w_quote_f_inserts.row_wid;
--- do the inserts
--- first id the fields to be changed. Since the primary key doesn't match (this is an INSERT), primary key is listed in the field listing. This proved to be very helpful in troubleshooting the next issue, field ordering. Turns out that the fields need to be listed in the exact order so that the field type definitions (character, bigint, etc.) all line up correctly.
INSERT INTO w_quote_f
(
list of fields between parentheses
)
--- then get the replacement values
SELECT list of fields in the same order
FROM w_quote_f_inserts;
end transaction;
The really strange part is that this takes in about 29GB of data but spits out about 49GB. I have no idea what's going on behind the scenes. The row counts match.
Weird. Naturally, because I am lazy, I do the exact same thing for UPDATES.
Anyway, I thought this might help someone.
Comments
-
We used this method before in every upsert situations. Nowadays when the datasets origin from Workbench we are using it's native Upsert.
Now would be great if the DOMO API could also upsert a dataset.
Ricardo Granada
MajorDomo@Lusiaves
**If the post solves your problem, mark it by clicking on "Accept as Solution"
**You can say "Thank you" by clicking the thumbs up in the post that helped you.0 -
@mcoblentz I think you have to talk to your account manager in order to enable it.
Ricardo Granada
MajorDomo@Lusiaves
**If the post solves your problem, mark it by clicking on "Accept as Solution"
**You can say "Thank you" by clicking the thumbs up in the post that helped you.0 -
Aha!
0 -
1
-
In an attempt to measure the difference, I kept the same flow as above and created a second version like this:
begin transaction;
DELETE FROM w_quote_f
WHERE row_wid IN(SELECT row_wid FROM w_quote_f_inserts);
DELETE FROM w_quote_f
WHERE row_wid IN(SELECT row_wid FROM w_quote_f_updates);
insert into w_quote_f
(select * from w_quote_f_inserts);
insert into w_quote_f
(select * from w_quote_f_updates);
end transaction;
VACUUM w_quote_f to 100 percent;
ANALYZE w_quote_f;We will see if this runs better than the first version.
0 -
And I created a third version, which I thought would be really good but Domo is yelling at me that one of the columns (a text column no less) is different. Hmm.
DELETE FROM w_quote_f
WHERE row_wid IN(SELECT row_wid FROM w_quote_f_inserts);
DELETE FROM w_quote_f
WHERE row_wid IN(SELECT row_wid FROM w_quote_f_updates);
ALTER TABLE w_quote_f APPEND FROM w_quote_f_inserts;
ALTER TABLE w_quote_f APPEND FROM w_quote_f_updates;
VACUUM w_quote_f to 100 percent;
ANALYZE w_quote_f;I checked workbench - definitley uploaded as a string type. I checked Domo's version and all the ETL boxes say it's 'TEXT'. I have no idea what it is screaming at me about.
0 -
I tried using
COPY w_quote_f
FROM w_quote_f_INSERT;
COPY w_quote_f
FROM w_quote_f_UPDATE;instead of ALTER TABLE but I couldn't get Domo to run with that either.
0 -
Can you see considerable performance improvments in using "alter table append" vs "insert into"?
Ricardo Granada
MajorDomo@Lusiaves
**If the post solves your problem, mark it by clicking on "Accept as Solution"
**You can say "Thank you" by clicking the thumbs up in the post that helped you.0 -
The tests are running now. I will post as I get data.
the source dataset is about 53M rows. The UPDATES and INSERTS are about 500K each.
1 -
Ok, keep us posted on those results,
Thanks.
Ricardo Granada
MajorDomo@Lusiaves
**If the post solves your problem, mark it by clicking on "Accept as Solution"
**You can say "Thank you" by clicking the thumbs up in the post that helped you.1 -
Great tip! Thank you @mcoblentz
0
Categories
- All Categories
- 1.7K Product Ideas
- 1.7K Ideas Exchange
- 1.5K Connect
- 1.2K Connectors
- 293 Workbench
- 6 Cloud Amplifier
- 8 Federated
- 2.8K Transform
- 97 SQL DataFlows
- 607 Datasets
- 2.1K Magic ETL
- 3.8K Visualize
- 2.4K Charting
- 707 Beast Mode
- 49 App Studio
- 39 Variables
- 667 Automate
- 170 Apps
- 446 APIs & Domo Developer
- 44 Workflows
- 7 DomoAI
- 33 Predict
- 13 Jupyter Workspaces
- 20 R & Python Tiles
- 391 Distribute
- 111 Domo Everywhere
- 274 Scheduled Reports
- 6 Software Integrations
- 115 Manage
- 112 Governance & Security
- Domo Community Gallery
- 31 Product Releases
- 9 Domo University
- 5.3K Community Forums
- 40 Getting Started
- 30 Community Member Introductions
- 103 Community Announcements
- 4.8K Archive