Archive

Archive

Doing UPSERTS via Redshift

As my luck would have it, the genius architects back in IT do an awful lot of INSERTS and UPDATES.  At least, it looks that way to me.  Out of a 63M row DB, I find about 600K INSERTS and 1M UPDATES weekly.  

 

What's a MajorDomo to do?  The support for UPSERTS via MySQL and Domo isn't all that good.  I tried but just could not make it work.  

 

I'm having better luck with Redshift at the moment.  With a dataflow, I basically encapsulate the INSERTS and REPLACEs in a transaction, as follows:

 

  1. begin transaction;

    --- delete all rows from the source table where the primary key (row_wid) matches those of the rows to be INSERTED.  

    DELETE from w_quote_f
    USING w_quote_f_inserts
    WHERE w_quote_f.row_wid = w_quote_f_inserts.row_wid;

    --- do the inserts
    --- first id the fields to be changed.  Since the primary key doesn't match (this is an INSERT), primary key is listed in the field listing.  This proved to be very helpful in troubleshooting the next issue, field ordering.  Turns out that the fields need to be listed in the exact order so that the field type definitions (character, bigint, etc.) all line up correctly.  

    INSERT INTO w_quote_f
    (
    list of fields between parentheses
    )

    --- then get the replacement values
    SELECT list of fields in the same order
    FROM  w_quote_f_inserts;

    end transaction;

The really strange part is that this takes in about 29GB of data but spits out about 49GB.  I have no idea what's going on behind the scenes.  The row counts match.   

 

Weird.  Naturally, because I am lazy, I do the exact same thing for UPDATES.  

 

Anyway, I thought this might help someone.  

Tagged:

Welcome!

It looks like you're new here. Members get access to exclusive content, events, rewards, and more. Sign in or register to get started.
Sign In

Comments

  • @DaniBoyI think this might be better placed in Best Practices.  How does one move a post?  

  • Contributor

    We used this method before in every upsert situations. Nowadays when the datasets origin from Workbench we are using it's native Upsert.

     

    Now would be great if the DOMO API could also upsert a dataset. 

    Ricardo Granada 

    MajorDomo@Lusiaves

    **If the post solves your problem, mark it by clicking on "Accept as Solution"
    **You can say "Thank you" by clicking the thumbs up in the post that helped you.
  • @RGranada  Where is that in Workbench?  

  • Contributor

    @mcoblentz I think you have to talk to your account manager in order to enable it.

    Ricardo Granada 

    MajorDomo@Lusiaves

    **If the post solves your problem, mark it by clicking on "Accept as Solution"
    **You can say "Thank you" by clicking the thumbs up in the post that helped you.
  • Aha!

  • @mcoblentz

     

    I moved it for you!

    Regards,

    Dani

  • In an attempt to measure the difference, I kept the same flow as above and created a second version like this:

     

    1. begin transaction;

      DELETE FROM w_quote_f
      WHERE row_wid IN(SELECT row_wid FROM w_quote_f_inserts);

      DELETE FROM w_quote_f
      WHERE row_wid IN(SELECT row_wid FROM w_quote_f_updates);

      insert into w_quote_f
      (select * from w_quote_f_inserts);

      insert into w_quote_f
      (select * from w_quote_f_updates);

      end transaction;

      VACUUM w_quote_f to 100 percent;
      ANALYZE w_quote_f;

    We will see if this runs better than the first version.   

  • And I created a third version, which I thought would be really good but Domo is yelling at me that one of the columns (a text column no less) is different.  Hmm.  

     


    1. DELETE FROM w_quote_f
      WHERE row_wid IN(SELECT row_wid FROM w_quote_f_inserts);

      DELETE FROM w_quote_f
      WHERE row_wid IN(SELECT row_wid FROM w_quote_f_updates);


      ALTER TABLE w_quote_f APPEND FROM w_quote_f_inserts;

      ALTER TABLE w_quote_f APPEND FROM w_quote_f_updates;


      VACUUM w_quote_f to 100 percent;
      ANALYZE w_quote_f;

    I checked workbench - definitley uploaded as a string type.  I checked Domo's version and all the ETL boxes say it's 'TEXT'.  I have no idea what it is screaming at me about.   

  • I tried using 

     

    1. COPY w_quote_f
      FROM w_quote_f_INSERT;
      COPY w_quote_f
      FROM w_quote_f_UPDATE;

    instead of ALTER TABLE but I couldn't get Domo to run with that either.  :( 

  • Contributor

    Can you see considerable performance improvments in using "alter table append" vs "insert into"?

    Ricardo Granada 

    MajorDomo@Lusiaves

    **If the post solves your problem, mark it by clicking on "Accept as Solution"
    **You can say "Thank you" by clicking the thumbs up in the post that helped you.
  • The tests are running now.  I will post as I get data.  

     

    the source dataset is about 53M rows.  The UPDATES and INSERTS are about 500K each.  

  • Contributor

    Ok, keep us posted on those results,

     

    Thanks.

    Ricardo Granada 

    MajorDomo@Lusiaves

    **If the post solves your problem, mark it by clicking on "Accept as Solution"
    **You can say "Thank you" by clicking the thumbs up in the post that helped you.
  • Great tip! Thank you @mcoblentz 

This discussion has been closed.