Information missing after Join Data

I have a report just under 3,000 rows. (Left Table)

I have a master list with just over 20,000 rows. (Right Table)

I'm applying a filter to narrow down the master list and then using Left Outer Join Data to pull matching criteria using an email address as my join key. Unfortunately I'm not getting all data from the master list (Right Table). Spot checking both lists before the Join Data shows all data is visible. After the Join Data only about half the rows have the data from the master list and is showing null while all rows from the report have the expected data.

Any idea what might be going wrong? I've use the Join Data function a few times and haven't run into this issue. I've tried to delete all datasets and ETL and recreate and the same thing keeps happening.

Best Answer

  • GrantSmith
    GrantSmith Coach
    Answer ✓

    Does the email address field you're using to join the tables together have the same values in both fields? Can you look at one where the additional data is missing in your master dataset based on the email in the report dataset and see if you can find the same email in the master dataset?

    **Was this post helpful? Click Agree or Like below**
    **Did this solve your problem? Accept it as a solution!**

Answers

  • This is because of using a LEFT join on your dataset. What you're saying is give me all the records in my report table and then give me related information int the master list table if it exists. If you want to get all the data from your master list and supplement it with your report list then you'd need to use a RIGHT join instead since your driving table is the right table (alternatively you can just swap the right and left tables and your left join should work).

    **Was this post helpful? Click Agree or Like below**
    **Did this solve your problem? Accept it as a solution!**
  • You'll have to forgive me, I'm sure I'm not using the best of terms to explain things.

    So my "so called master list" is just my holy grail of information. The report is only on a small fraction of the master and I really only want the 3000 rows from the report. I need to pull data from the larger master list to fill in extra information the report does not grab.

    I did try to swap and all that does is now remove about 1/2 of the data from the other report now.

    I guess an example would be I have 3000 people in my report but I need to grab a height stat from the master list. So it's looking for a matching email address and then populating the height stats I need from the master list. It's only pulling heights for about 1/2 of the people. When I switch the tables as suggested, now I'm missing other data such as the names, stat 1, stat 2, stat 3, but now I have all 3000 heights.

  • Do you have some sample anonymized data you could post so I can get a better understanding of your data structure? Any perhaps what you want your output dataset to look like?

    **Was this post helpful? Click Agree or Like below**
    **Did this solve your problem? Accept it as a solution!**
  • I hope it isn't too anonymized to get the idea here. When I join the data using a unique identifier and pull out these columns, I have their name columns, the TRUE is if my criteria met and then I want to pull the other information you see in the last two columns from the master sheet. The data just stops. Spot checking both datasets I can see everything is matching up as it should but the data just is blank after about 1/3 to 1/2 way through the list.

    If I swap from a left join to a right join, what you see below would be swapped. The last two columns would have all 900 rows of data but the names and TRUE rows would stop populating around the same area.



  • GrantSmith
    GrantSmith Coach
    Answer ✓

    Does the email address field you're using to join the tables together have the same values in both fields? Can you look at one where the additional data is missing in your master dataset based on the email in the report dataset and see if you can find the same email in the master dataset?

    **Was this post helpful? Click Agree or Like below**
    **Did this solve your problem? Accept it as a solution!**
  • So I took a closer look at the email fields and found some inconsistency in regards to capitalization. I added some text formatting to force all lower case before the data join and all data seems to be populating.

    I wouldn't have thought that 1 mismatch on case sensitivity would stop the rest of the data from populating. One bad apple spoiled the bunch it seems. I would think if case sensitivity mattered for a data join it would appear as a checkbox like it does for some other functions. I guess that's why I never considered something so simple to be the problem.

    Not sure if maybe there is some easier way to fix this but I guess the solution is to transform all my text formatting before joins going forward.

    Thanks for pointing me in the right direction!

  • @anthdell Magic 2 JOINs are case sensitive ... which can be annoying if you're coming from a SQL impelmentation that had different collation rules.


    another gotcha can be hidden spaces (make sure to trim and lowercase before the join.)

    Jae Wilson
    Check out my 🎥 Domo Training YouTube Channel 👨‍💻

    **Say "Thanks" by clicking the ❤️ in the post that helped you.
    **Please mark the post that solves your problem by clicking on "Accept as Solution"