Combine tables using ETL inner join

Options

Hi everyone,

I'm trying to combine two federated tables using the inner join in Magic ETL.

Both tables have over 9.5M rows but when I join them, the result is only 87K.

I tested the join in Snowflake and it resulted in 9.5M by the logic:

SELECT *
FROM 'A'
INNER JOIN 'B'
ON A.X = B.X

What could i be doing wrong?

*I'll need to join other tables after the join I mentioned above, in the same ETL

Answers

  • MarkSnodgrass
    Options

    Here are a couple common issues with joins in Magic ETL

    • It is case sensitive
    • Leading or trailing spaces

    An easy way to address both of these is to add a formula tile to each table prior to the join and use the following functions

    UPPER() - converts the text field to uppercase

    TRIM() - removes any leading or trailing spaces

    You can do this in a single function like this TRIM(UPPER(fieldname))

    Try doing this prior to your join and see if that works.

    **Check out my Domo Tips & Tricks Videos

    **Make sure to <3 any users posts that helped you.
    **Please mark as accepted the ones who solved your issue.
  • AdamC
    AdamC Member
    Options

    Second what Mark said.

    Also, for problem solving. I'd take a small sample of the ones that didn't join and compare them between the two tables. Will probably find what Mark said and there are some leading or trailing blank spaces or case sensitivity.

  • VictorLuiz
    Options

    Thank you for help.

    I tried to uso TRIM and UPPER and the matched rows jumped from 87K to 94k.

    I inclued the REPLACE(fieldname, ' ', '') as well, to remove all leading spaces, so jumped to 134k.

    It's far from the real match rows number, but looks like it is a format divergence.

    Checking the data in the table I didn't see any different format, as shown in the image below:

  • MarkSnodgrass
    Options

    Is it also possible that some of your rows don't have leading zeros and some do? If so, you can use the LPAD() function to add leading zeros if everything should be a certain number of characters.

    **Check out my Domo Tips & Tricks Videos

    **Make sure to <3 any users posts that helped you.
    **Please mark as accepted the ones who solved your issue.
  • VictorLuiz
    Options

    Have zeros completing the number, but they already come with.

    I tried with LPAD() and the matched rows were reduce to 112k.

    I don't know if it's related, but I brought the data from SAP by federated connection, so I created a view of this dataset.