Append only missing rows

Options

I have 2 datasets. I want to append the data togeather and keep all the lines from dataset A but only the lines from dataset B that are not represented in dataset A. I want to do this on a unique identifier. Any hints on how this would best be occoplished. See attached for an example. 

 

Screen Shot 2020-06-02 at 11.49.13 PM.png

 

Comments

  • jaeW_at_Onyx
    Options

    In the official Domo literature and training they'd recommend you use a 'Remove duplicates' tile and build a recursive dataflow.

    https://knowledge.domo.com/Prepare/DataFlow_Tips_and_Tricks/Creating_a_Recursive%2F%2FSnapshot_ETL_DataFlow

     

    Problem with 'remove duplicates' is there's implicit randomness to how 'duplicates' are defined because you cannot explicitly define that you want to keep data from the left.

     

    You have a few options.

     

    You can APPEND, and then use a Window & Rank tile to assign a row_number.  As part of that process you can apply an ORDER BY clause (date is a popular option) and then PARTITION BY the row_id column.  If you then filter where row_number = 1, you will deduplicate rows.

     

    Alternatively, if there is no reasonable column to ORDER BY, you can OUTER JOIN the data together.

    FILTER where your LEFT row_id is not null, then SELECT the appropriate columns from the left side.  and if your LEFT row_id is null then SELECT the appropriate columns from the right side.

     

    You can do this in Magic or MySQL ... you just have to figure out how to do a row_number with a partition clause in MySQL.

     

    Hope that helps!

     

     

    Jae Wilson
    Check out my 🎥 Domo Training YouTube Channel 👨‍💻

    **Say "Thanks" by clicking the ❤️ in the post that helped you.
    **Please mark the post that solves your problem by clicking on "Accept as Solution"