Rows Gone Missing in ETL
I created an ETL. The final step -- where it leads to the output suddenly causes ~20K rows to go missing. Any insights?
Thanks!
Best Answers
-
I just created a sample dataflow to confirm what I have been saying. See image below.
I have a dataset that has 15 rows. I added a filter tile that filters out 3 rows. Notice that it still says 15 rows processed next to filter rows. This is how many rows came into that tile. The result of the filtering resulted in 12 rows, which is what you see in the output dataset.
Your number in your output dataset is the result of your removing duplicates tile. That rows processed is how many rows come into that tile, not how many come out of it.
Hope that makes sense.
**Check out my Domo Tips & Tricks Videos
**Make sure to any users posts that helped you.
**Please mark as accepted the ones who solved your issue.1 -
Thank you. It makes complete sense. Meaning-- the join data tile is reading 700K+. It just means that I need to find those missing rows elsewhere, as the output should be the higher number. Thank you!
1 -
OK, so after a lot of Domo assistance, we have finally discovered the issue. In Domo Beta-- when you do a join, you can also select what you do with repeat columns. I did a right join, but mistakenly fixed the right columns by dropping them. This created a slew of nulls, as there were rows which no longer had my joining element, and therefore were then lost on the next connect.
0
Answers
-
You mean other than "remove duplicates?"
Really not trying to be snide, it sounds like there's a disconnect between what you expect to happen after "remove duplicates" and what is actually happening... If you don't want to lose rows, can you just remove the "remove duplicates" tile?
Jae Wilson
Check out my 🎥 Domo Training YouTube Channel 👨💻
**Say "Thanks" by clicking the ❤️ in the post that helped you.
**Please mark the post that solves your problem by clicking on "Accept as Solution"0 -
That was the previous step- when it went from 700k to 400k. This is just sliding that data to the output.
0 -
@user027926 I think those numbers are telling you how many rows came into that tile, not how many came out of it. So, there were 484,445 coming into the Remove Duplicates tile and after it completed, there were 468,410 rows.
It would be a nice enhancement if they showed both the incoming and outgoing numbers for each tile, but you can deduce it by following the steps.
**Check out my Domo Tips & Tricks Videos
**Make sure to any users posts that helped you.
**Please mark as accepted the ones who solved your issue.1 -
The other number was accurate. It was 700K going in (or it should have been) and the 400K was the outgoing. That's the only way that makes sense (I did a 2 sided join the tile before and then deducted the duplicates)
0 -
You could test with some sample data by making up a basic Excel file with 10 rows or something and run it through a few of the same steps in your current dataflow and see how the numbers look. When I looked at one my dataflow details, that is what I deduced: that it is telling you how many rows it processed (looked at), not how many it outputted.
**Check out my Domo Tips & Tricks Videos
**Make sure to any users posts that helped you.
**Please mark as accepted the ones who solved your issue.0 -
Interesting-- so any idea why it is losing some? Here is a screen shot of a few steps back to give you a sense:
0 -
I just created a sample dataflow to confirm what I have been saying. See image below.
I have a dataset that has 15 rows. I added a filter tile that filters out 3 rows. Notice that it still says 15 rows processed next to filter rows. This is how many rows came into that tile. The result of the filtering resulted in 12 rows, which is what you see in the output dataset.
Your number in your output dataset is the result of your removing duplicates tile. That rows processed is how many rows come into that tile, not how many come out of it.
Hope that makes sense.
**Check out my Domo Tips & Tricks Videos
**Make sure to any users posts that helped you.
**Please mark as accepted the ones who solved your issue.1 -
Thank you. It makes complete sense. Meaning-- the join data tile is reading 700K+. It just means that I need to find those missing rows elsewhere, as the output should be the higher number. Thank you!
1 -
OK, so after a lot of Domo assistance, we have finally discovered the issue. In Domo Beta-- when you do a join, you can also select what you do with repeat columns. I did a right join, but mistakenly fixed the right columns by dropping them. This created a slew of nulls, as there were rows which no longer had my joining element, and therefore were then lost on the next connect.
0
Categories
- All Categories
- 1.8K Product Ideas
- 1.8K Ideas Exchange
- 1.5K Connect
- 1.2K Connectors
- 300 Workbench
- 6 Cloud Amplifier
- 8 Federated
- 2.9K Transform
- 100 SQL DataFlows
- 616 Datasets
- 2.2K Magic ETL
- 3.9K Visualize
- 2.5K Charting
- 738 Beast Mode
- 57 App Studio
- 40 Variables
- 685 Automate
- 176 Apps
- 452 APIs & Domo Developer
- 47 Workflows
- 10 DomoAI
- 36 Predict
- 15 Jupyter Workspaces
- 21 R & Python Tiles
- 394 Distribute
- 113 Domo Everywhere
- 275 Scheduled Reports
- 6 Software Integrations
- 124 Manage
- 121 Governance & Security
- 8 Domo Community Gallery
- 38 Product Releases
- 10 Domo University
- 5.4K Community Forums
- 40 Getting Started
- 30 Community Member Introductions
- 108 Community Announcements
- 4.8K Archive