Remove duplicates from large dataset


Is there a faster way than ETL to remove duplicates from a very large dataset?


  • nicolasfeddern
    nicolasfeddern Domo Employee

    A few options you might try:

    1. Depending on the end view you're after on your cards, you could leverage a distinct operation in a calculated field: 

    count(distinct `fieldName`)

    2. Leverage either the R or Python plugins to pull down the data, run a de-duplication function, and then push the data back into Domo:

    R: unique(yourDataFrame)
    Python: drop_duplicates(yourDataFrame)


    Stack Exchange Reference:

    R Example

    Python Example

  • jlazerus

    That's great, thanks. I'll give those a try.