Option to Remove the 10,000 Duplicate Error for ETL

Because of how fast/easy ETL flows are, I prefer to use them for most flows. My problem is that sometimes I hit the following error, "Error joining data. The left input cannot include over 10,000 duplicates. Please switch your inputs, group your data, or remove duplicates before joining". The Dojo suggests that this error is only shown to stop users from accidentely making cartesian joins, but I know that's not what I'm doing. For example, if I want to join an item table onto a transaction table, then I have to do it in SQL because I'll get the error. This leads me to having multiple flows for the same end result.

 

I suggest you give admins the option to disable the error.

Tagged:

Best Answer

  • DataMaven
    DataMaven Coach
    Answer ✓

    The super secret workaround we all use...

    Right outer join instead of left.  

    This keeps people who don't know what they are doing from breaking stuff, but enables the rest of us to go on blissfully joining.  

    It'll seem awkward at first, but in a bit, you won't even have to think about it.  

    You want to set it up to pull all records from the table on the right, and only those which match from the left.  Simple reverse of what you're used to.  

     

     

    DataMaven
    Breaking Down Silos - Building Bridges
    **Say "Thanks" by clicking a reaction in the post that helped you.
    **Please mark the post that solves your problem by clicking on "Accept as Solution"

Answers

  • DataMaven
    DataMaven Coach
    Answer ✓

    The super secret workaround we all use...

    Right outer join instead of left.  

    This keeps people who don't know what they are doing from breaking stuff, but enables the rest of us to go on blissfully joining.  

    It'll seem awkward at first, but in a bit, you won't even have to think about it.  

    You want to set it up to pull all records from the table on the right, and only those which match from the left.  Simple reverse of what you're used to.  

     

     

    DataMaven
    Breaking Down Silos - Building Bridges
    **Say "Thanks" by clicking a reaction in the post that helped you.
    **Please mark the post that solves your problem by clicking on "Accept as Solution"
  • @DataMaven have you noticed this not working lately? I have a flow disabling because of this error even though it is a right join.

     
  • @Figauro  the question isn't whether you're using a LEFT or a RIGHT join.

     

    The question is which side of the JOIN has the duplicates?  (the many side of the 1:M relationship).  You want the M side to be on the right.

     

    If you think you already have that, then apply a GROUP BY  `SHIP FROM ID` on the LEFT side before the JOIN.  This guarantees non-duplicate values on the LEFT.

    Jae Wilson
    Check out my 🎥 Domo Training YouTube Channel 👨‍💻

    **Say "Thanks" by clicking the ❤️ in the post that helped you.
    **Please mark the post that solves your problem by clicking on "Accept as Solution"