Understanding My Data... preventing duplicates after DataFusion and creating a Table in Analyzer
Hello, I'm new to Domo and working with data in general.
I've managed to connect my data from Greenhouse (applicant tracking system). I have a LOT of data and have used DataFusion to combine my large sets of data. That has all worked fine. I'd like to create a few tables that can be updated in real time (Example: current candidate pipelines for open requisitions, offer status etc) I noticed that there are a lot of duplicate rows when manipulating my data in Analyzer. Questions....
-How to get rid of the duplicates in Analyzer if my data is from a DataFusion?
-Why are there duplicates?? If I have my dataset update every day does it keep my historical data? How do I make sure I am getting the most up to date data??
-Do DataFusion update when the dataset updates or do I have to manually update them?
Comments
-
So your duplicates are going to be caused by the joins you setup. Basically if you have the same value repeated more than once in a column you are joining on, you'll end up with duplicates. You'll need to have a unique column of data to join on.
Here's a better explanation of what's going on: Joins and Duplicates
Your dataset question on historical data will depend on how the data pull is setup. If it's set to return all data, then it will have history, if it's only something like current or previous day, you'll need to setup an ETL process to append your daily new records onto a historical dataset. Here's when and how to do that: When to Append
DataFusions will update automatically as the datasets they're built from update.
Hope that helps,
Valiant1
Categories
- 10.5K All Categories
- 5 Connect
- 915 Connectors
- 250 Workbench
- 459 Transform
- 1.7K Magic ETL
- 69 SQL DataFlows
- 476 Datasets
- 186 Visualize
- 250 Beast Mode
- 2.1K Charting
- 11 Variables
- 16 Automate
- 354 APIs & Domo Developer
- 88 Apps
- 3 Workflows
- 20 Predict
- 5 Jupyter Workspaces
- 15 R & Python Tiles
- 245 Distribute
- 62 Domo Everywhere
- 242 Scheduled Reports
- 20 Manage
- 41 Governance & Security
- 170 Product Ideas
- 1.2K Ideas Exchange
- 10 Community Forums
- 27 Getting Started
- 14 Community Member Introductions
- 55 Community News
- 4.5K Archive