What is the point of a full join in Magic ETL?
So I get Left, right, and inner joins. But, what is the point of a full join if it returns null values for any columns that don't have a match? How is that different from an inner join?
Also, how can I make sure that the non-matched items are included in the resulting dataflow?
Best Answer
-
That's hard to say without seeing some sample data, but rather than doing a full join, you should consider doing an append and include all columns from both datasets. It is a commonly recommended approach instead of joining the data.
**Check out my Domo Tips & Tricks Videos
**Make sure to any users posts that helped you.
**Please mark as accepted the ones who solved your issue.0
Answers
-
A full join is going to give you all the records from both tables and join matching records where it can. An inner join would only give you the matching records and exclude all other rows.
This KB article has some good visuals that explain the various types along with a video.
**Check out my Domo Tips & Tricks Videos
**Make sure to any users posts that helped you.
**Please mark as accepted the ones who solved your issue.0 -
What are the cases where it can't return the data? because I'm banging my head against a wall trying to figure out why the ETL keeps dropping my data on a full join. Also, the linked video is blocked at my work due to being on box.com.
0 -
That's hard to say without seeing some sample data, but rather than doing a full join, you should consider doing an append and include all columns from both datasets. It is a commonly recommended approach instead of joining the data.
**Check out my Domo Tips & Tricks Videos
**Make sure to any users posts that helped you.
**Please mark as accepted the ones who solved your issue.0 -
@HowDoIDomo - I will often use the full outer join when creating a dataflow in ETL. I also will include an output dataset after each join. This helps me make sure that I am joining the data correctly and the number of rows are what I expect to find.
The advantage to the full outer join is exactly what you mentioned in your post. By including all rows of both datasets, I can quickly see that when a field from the left dataset is null then whatever I was joining on was not found in the left dataset. Same with the right dataset. This can be very helpful when you are expecting to find a match for all records, or when you expect that every record in the right dataset should have a match in the left dataset.
For example, let's say that my left dataset contains invoice data and my right dataset contains sales rep information.
My assumption would be that each invoice should have a corresponding sales rep, but not all sales reps will have an invoice. This assumption would indicate that I should do a right join. However, by using full outer join, I can get a quick validation that my assumption was correct. After the full outer join, I would add a filter tile to only include any rows where I have invoice data, but no Sales rep data (according to my assumption, this should never happen). I then create an output dataset after that filter tile so that I have a dataset that will only get populated with any invoice data that is NOT linked to a sales rep. I can set an alert on that dataset and subscribe the appropriate people to the alert so that anytime we have an unassigned sale, they can correct it.
Getting back to the data flow, I would then include a filter that would only show the invoices that were assigned and continue with whatever else I had planned to do after the "right join". This essentially mimics a right join, but with the added benefit of alerting you anytime you have unassigned sales.
“There is a superhero in all of us, we just need the courage to put on the cape.” -Superman1 -
So, I am trying to do the append rows, but then everything else becomes null except for the rows that came back null during my joins. I don't know what I'm doing wrong.
I tried doing select rows to get rid of anything that would be a duplicate, but then I started having null data issue again for 3 columns.
0 -
Hi @HowDoIDomo
Do you have some sample / anonymized data you could post to get an understanding of your data format?
**Was this post helpful? Click Agree or Like below**
**Did this solve your problem? Accept it as a solution!**0 -
1) the OUTER JOIN is doing what it's supposed to do. just keep in mind, if you're doing an
tableA.SalesRep
OUTER JOIN tableB.SalesRep
on A.SaleRep = B.SalesRep
50% of the time SalesRep will have data and 50% of the time the column will be NULL because THERE WAS NO MATCH IN TABLE A. In other words, when TableA.SalesRep is null then TableB.SalesRep will be populated.
Solution, use a formula tile to
coalesce(TableA.SalesRep,TableB.SalesRep)
Coalesce will take the first non null value. Apply this pattern to all the JOIN columns.
MORE IMPORTANTLY.
I have never encountered a use case in Domo where it was appropriate to do an OUTER JOIN.
You should probably APPEND your data.
Jae Wilson
Check out my 🎥 Domo Training YouTube Channel 👨💻
**Say "Thanks" by clicking the ❤️ in the post that helped you.
**Please mark the post that solves your problem by clicking on "Accept as Solution"0
Categories
- All Categories
- 1.8K Product Ideas
- 1.8K Ideas Exchange
- 1.5K Connect
- 1.2K Connectors
- 300 Workbench
- 6 Cloud Amplifier
- 8 Federated
- 2.9K Transform
- 100 SQL DataFlows
- 616 Datasets
- 2.2K Magic ETL
- 3.8K Visualize
- 2.5K Charting
- 731 Beast Mode
- 55 App Studio
- 40 Variables
- 682 Automate
- 175 Apps
- 451 APIs & Domo Developer
- 46 Workflows
- 10 DomoAI
- 35 Predict
- 14 Jupyter Workspaces
- 21 R & Python Tiles
- 394 Distribute
- 113 Domo Everywhere
- 275 Scheduled Reports
- 6 Software Integrations
- 122 Manage
- 119 Governance & Security
- 8 Domo Community Gallery
- 38 Product Releases
- 10 Domo University
- 5.4K Community Forums
- 40 Getting Started
- 30 Community Member Introductions
- 107 Community Announcements
- 4.8K Archive