Identify a duplicate

damen
damen Contributor

Is there a way to create a column that can help identify a duplicate from another column?

Below is a list of loan ids that we have and some of them have coborrowers but some dont. I dont need to remove the duplicates but I would like to create a new column that can identify quickly which one is a coborrower.

Is there a function formula for that?


If this helps, feel free to agree, accept or awesome it!

Answers

  • SUM(SUM(1)) OVER (PARTITION BY `loan_id`)
    

    This window function beast mode will count the total number of records seen with the same loan_id value. You can compare it to see if it's greater than one to determine if it has a co-borrower.

    **Was this post helpful? Click Agree or Like below**
    **Did this solve your problem? Accept it as a solution!**
  • damen
    damen Contributor

    @GrantSmith

    I've been searching how to fix the syntax error and cant seem to find an answer

    If this helps, feel free to agree, accept or awesome it!

  • Ah, you can't use it within a formula tile. That's just the format for a beast mode. If you're looking to use Magic ETL take your data and feed it into a Group By tile. Group based on the loan_id and COUNT the loan_id to get the total number of records then join that back to your original dataset based on the loan_id. That will get you the number of co-borrowers.

    **Was this post helpful? Click Agree or Like below**
    **Did this solve your problem? Accept it as a solution!**
  • Zel
    Zel Member

    Hi @GrantSmith . When you said here "then join that back to your original dataset based on the loan_id". Will you show us how to do the join part thing? Apologies. I'm trying to replicate this solution, but I'm getting a hard time doing the Join part. 😅 Not sure what to put there. Thank you in advance.

  • Zel
    Zel Member

    This is my Join Tile settings. I have my column Count of ID from Group By, just like what was mentioned above, but the counting output is incorrect.

    The Split Column - shows the new Global Device ID, since I have to omit characters with hyphen.