Doing UPSERTS via Redshift

As my luck would have it, the genius architects back in IT do an awful lot of INSERTS and UPDATES.  At least, it looks that way to me.  Out of a 63M row DB, I find about 600K INSERTS and 1M UPDATES weekly.  

 

What's a MajorDomo to do?  The support for UPSERTS via MySQL and Domo isn't all that good.  I tried but just could not make it work.  

 

I'm having better luck with Redshift at the moment.  With a dataflow, I basically encapsulate the INSERTS and REPLACEs in a transaction, as follows:

 

begin transaction;

--- delete all rows from the source table where the primary key (row_wid) matches those of the rows to be INSERTED.  

DELETE from w_quote_f
USING w_quote_f_inserts
WHERE w_quote_f.row_wid = w_quote_f_inserts.row_wid;

--- do the inserts
--- first id the fields to be changed.  Since the primary key doesn't match (this is an INSERT), primary key is listed in the field listing.  This proved to be very helpful in troubleshooting the next issue, field ordering.  Turns out that the fields need to be listed in the exact order so that the field type definitions (character, bigint, etc.) all line up correctly.  

INSERT INTO w_quote_f
(
list of fields between parentheses
)

--- then get the replacement values
SELECT list of fields in the same order
FROM  w_quote_f_inserts;

end transaction;

The really strange part is that this takes in about 29GB of data but spits out about 49GB.  I have no idea what's going on behind the scenes.  The row counts match.   

 

Weird.  Naturally, because I am lazy, I do the exact same thing for UPDATES.  

 

Anyway, I thought this might help someone.  

Tagged:

Comments

  • @DaniBoyI think this might be better placed in Best Practices.  How does one move a post?  

  • RGranada
    RGranada Contributor

    We used this method before in every upsert situations. Nowadays when the datasets origin from Workbench we are using it's native Upsert.

     

    Now would be great if the DOMO API could also upsert a dataset. 

    Ricardo Granada 

    MajorDomo@Lusiaves

    **If the post solves your problem, mark it by clicking on "Accept as Solution"
    **You can say "Thank you" by clicking the thumbs up in the post that helped you.
  • @RGranada  Where is that in Workbench?  

  • RGranada
    RGranada Contributor

    @mcoblentz I think you have to talk to your account manager in order to enable it.

    Ricardo Granada 

    MajorDomo@Lusiaves

    **If the post solves your problem, mark it by clicking on "Accept as Solution"
    **You can say "Thank you" by clicking the thumbs up in the post that helped you.
  • Aha!

  • @mcoblentz

     

    I moved it for you!

    Regards,

    Dani

  • In an attempt to measure the difference, I kept the same flow as above and created a second version like this:

     

    begin transaction;

    DELETE FROM w_quote_f
    WHERE row_wid IN(SELECT row_wid FROM w_quote_f_inserts);

    DELETE FROM w_quote_f
    WHERE row_wid IN(SELECT row_wid FROM w_quote_f_updates);

    insert into w_quote_f
    (select * from w_quote_f_inserts);

    insert into w_quote_f
    (select * from w_quote_f_updates);

    end transaction;

    VACUUM w_quote_f to 100 percent;
    ANALYZE w_quote_f;

    We will see if this runs better than the first version.   

  • And I created a third version, which I thought would be really good but Domo is yelling at me that one of the columns (a text column no less) is different.  Hmm.  

     


    DELETE FROM w_quote_f
    WHERE row_wid IN(SELECT row_wid FROM w_quote_f_inserts);

    DELETE FROM w_quote_f
    WHERE row_wid IN(SELECT row_wid FROM w_quote_f_updates);


    ALTER TABLE w_quote_f APPEND FROM w_quote_f_inserts;

    ALTER TABLE w_quote_f APPEND FROM w_quote_f_updates;


    VACUUM w_quote_f to 100 percent;
    ANALYZE w_quote_f;

    I checked workbench - definitley uploaded as a string type.  I checked Domo's version and all the ETL boxes say it's 'TEXT'.  I have no idea what it is screaming at me about.   

  • I tried using 

     

    COPY w_quote_f
    FROM w_quote_f_INSERT;
    COPY w_quote_f
    FROM w_quote_f_UPDATE;

    instead of ALTER TABLE but I couldn't get Domo to run with that either.  :( 

  • RGranada
    RGranada Contributor

    Can you see considerable performance improvments in using "alter table append" vs "insert into"?

    Ricardo Granada 

    MajorDomo@Lusiaves

    **If the post solves your problem, mark it by clicking on "Accept as Solution"
    **You can say "Thank you" by clicking the thumbs up in the post that helped you.
  • The tests are running now.  I will post as I get data.  

     

    the source dataset is about 53M rows.  The UPDATES and INSERTS are about 500K each.  

  • RGranada
    RGranada Contributor

    Ok, keep us posted on those results,

     

    Thanks.

    Ricardo Granada 

    MajorDomo@Lusiaves

    **If the post solves your problem, mark it by clicking on "Accept as Solution"
    **You can say "Thank you" by clicking the thumbs up in the post that helped you.
  • Great tip! Thank you @mcoblentz 

This discussion has been closed.