Data Flow help
Hello, Experts,
I am humbly reaching out for some help.
I have a vendor sending a snapshot of a database table every day and not sending me diffs. I am trying to determine the diffs. The connector type is SFTP CSV. The file they send daily is overwritten daily. I currently have the data connector appending the data from the previous day to the new file they sent over that day.
Is this the correct approach? Should I have the connector setup as Append vs Replace or vice versa?
What success should look like as an output dataset that appends updated records? Below, you will find an image of 6 records that are the same as an individual in the file.
As you can see, it's the same row the only difference is the row_id which is really when the file was run and appended per the connector. So for example, imagine that these 6 rows are John Smith, what we care to see is the original John Smith record and only any new/updated changes. So if there were three changes I would only see three new records.
Thank you so much in advance. We are happy to answer any questions.
Best Answer
-
@imelendez if the rows are exactly the same, does it matter which one it keeps? If there is another column that is a date entered type column, then the rows would become unique and then both would be kept.
If you have rows that are the same and the only difference is a date entered type column, you could use the group by tile and choose min or max on the date column to get the first or last record (whichever one you are wanting to keep) and then your columns that you are looking for to be unique in the select list. This would function the same as remove duplicates but allow you to decide if you want the latest or earliest record.
**Check out my Domo Tips & Tricks Videos
**Make sure to any users posts that helped you.
**Please mark as accepted the ones who solved your issue.0
Answers
-
Since you are using append in your connector, you can create a Magic ETL to give you the unique rows you are looking for. You can use the Remove Duplicates tile in Magic ETL to give you a distinct list of rows.
https://domohelp.domo.com/hc/en-us/articles/360043427373-Magic-ETL-Tiles-Edit-Data#3.
Another approach, but more involved, is to build a recursive dataflow. Here is the KB article for it.
Hope this helps
**Check out my Domo Tips & Tricks Videos
**Make sure to any users posts that helped you.
**Please mark as accepted the ones who solved your issue.1 -
Thanks for that Mark! I am currently removing duplicates and using a DOB column to remove duplicates. My question to you is, let’s say John Smith comes in a month from now through the daily pool, won’t it delete the original John Smith and leave the latest? How does remove duplicates decide? If it does, would this probably not be the best course of action?
0 -
@imelendez if the rows are exactly the same, does it matter which one it keeps? If there is another column that is a date entered type column, then the rows would become unique and then both would be kept.
If you have rows that are the same and the only difference is a date entered type column, you could use the group by tile and choose min or max on the date column to get the first or last record (whichever one you are wanting to keep) and then your columns that you are looking for to be unique in the select list. This would function the same as remove duplicates but allow you to decide if you want the latest or earliest record.
**Check out my Domo Tips & Tricks Videos
**Make sure to any users posts that helped you.
**Please mark as accepted the ones who solved your issue.0 -
We tested what you recommended (removing duplicates) by adding in a new value to the run and adding in more date constraints to the remove dupes object and it worked like a charm.
Thank you so much @MarkSnodgrass ! You are the GOAT!
1 -
@imelendez i'm late to the party, but this video has been there for ages: https://www.youtube.com/watch?v=JNQFZCj8JcQ
Jae Wilson
Check out my 🎥 Domo Training YouTube Channel 👨💻
**Say "Thanks" by clicking the ❤️ in the post that helped you.
**Please mark the post that solves your problem by clicking on "Accept as Solution"0
Categories
- All Categories
- 1.8K Product Ideas
- 1.8K Ideas Exchange
- 1.5K Connect
- 1.2K Connectors
- 296 Workbench
- 6 Cloud Amplifier
- 8 Federated
- 2.9K Transform
- 100 SQL DataFlows
- 614 Datasets
- 2.2K Magic ETL
- 3.8K Visualize
- 2.5K Charting
- 729 Beast Mode
- 53 App Studio
- 40 Variables
- 677 Automate
- 173 Apps
- 451 APIs & Domo Developer
- 45 Workflows
- 8 DomoAI
- 34 Predict
- 14 Jupyter Workspaces
- 20 R & Python Tiles
- 394 Distribute
- 113 Domo Everywhere
- 275 Scheduled Reports
- 6 Software Integrations
- 121 Manage
- 118 Governance & Security
- Domo Community Gallery
- 32 Product Releases
- 10 Domo University
- 5.4K Community Forums
- 40 Getting Started
- 30 Community Member Introductions
- 108 Community Announcements
- 4.8K Archive