Remove duplicates from large dataset

Options

Is there a faster way than ETL to remove duplicates from a very large dataset?

Comments

  • nicolasfeddern
    nicolasfeddern Domo Employee
    Options

    A few options you might try:

    1. Depending on the end view you're after on your cards, you could leverage a distinct operation in a calculated field: 

    count(distinct `fieldName`)

    2. Leverage either the R or Python plugins to pull down the data, run a de-duplication function, and then push the data back into Domo:

    R: unique(yourDataFrame)
    Python: drop_duplicates(yourDataFrame)

     

    Stack Exchange Reference:

    R Example

    Python Example

  • jlazerus
    Options

    That's great, thanks. I'll give those a try.