Archive

Archive

Remove duplicates from large dataset

Is there a faster way than ETL to remove duplicates from a very large dataset?

Comments

  • Domo Employee

    A few options you might try:

    1. Depending on the end view you're after on your cards, you could leverage a distinct operation in a calculated field: 

    1. count(distinct `fieldName`)

    2. Leverage either the R or Python plugins to pull down the data, run a de-duplication function, and then push the data back into Domo:

    1. R: unique(yourDataFrame)
      Python: drop_duplicates(yourDataFrame)

     

    Stack Exchange Reference:

    R Example

    Python Example

  • That's great, thanks. I'll give those a try.

Welcome!

It looks like you're new here. Members get access to exclusive content, events, rewards, and more. Sign in or register to get started.
Sign In