Understanding My Data... preventing duplicates after DataFusion and creating a Table in Analyzer

user09909
user09909 Member
edited March 2023 in Datasets

Hello, I'm new to Domo and working with data in general. 

 

I've managed to connect my data from Greenhouse (applicant tracking system). I have a LOT of data and have used DataFusion to combine my large sets of data. That has all worked fine. I'd like to create a few tables that can be updated in real time (Example: current candidate pipelines for open requisitions, offer status etc) I noticed that there are a lot of duplicate rows when manipulating my data in Analyzer. Questions....

 

-How to get rid of the duplicates in Analyzer if my data is from a DataFusion?

-Why are there duplicates?? If I have my dataset update every day does it keep my historical data? How do I make sure I am getting the most up to date data??

-Do DataFusion update when the dataset updates or do I have to manually update them?

Comments

  • So your duplicates are going to be caused by the joins you setup. Basically if you have the same value repeated more than once in a column you are joining on, you'll end up with duplicates. You'll need to have a unique column of data to join on. 

    Here's a better explanation of what's going on: Joins and Duplicates

     

    Your dataset question on historical data will depend on how the data pull is setup. If it's set to return all data, then it will have history, if it's only something like current or previous day, you'll need to setup an ETL process to append your daily new records onto a historical dataset. Here's when and how to do that: When to Append

    DataFusions will update automatically as the datasets they're built from update.

     

    Hope that helps,
    Valiant