Remove duplicate unknown comma separated values within a single 'cell'

Hello good people of Domo

Very frustrated trying to remove duplicates WITHIN a comma seperated list. I have the following webform:

If I 'Group By' (Magic ETL) by 'Who' & 'Manufacturer' (combining 'Model' & 'Colour' with commas) I get the below. This has repeats in R5:C3 (2x 'TT'), R6:C3 (2x V70) and 'Black' is repeated in R5:C4 & R6:C4.

What I would like: The above without the duplicates in the cells mentioned.

Other things you might need to know:

  1. My real data has nearly 1,000,000 rows in it.
  2. I will not know (ahead of time) what the value will be
  3. There could be one value, two values or any number of values separated by the comma. I won't know.

I've tried daisy chaining multiple 'Group By' tiles together; I've tried pulling data out, dealing with it and joining it back later (gets messy).

I'm happy to do this in ETL (my preference) but could do it in beast mode if better.

Any ideas, please?

Thanks Domos

Charlie

Best Answer

  • GrantSmith
    GrantSmith Coach
    Answer ✓

    Feed your dataset into a select columns to pull the who manufacturer and model then put that into a remove duplicates based on the who and manufacturer. Then feed it to a group by and separate with commas. Repeat this process replacing model with color. Then join your two group bys based on who and manufacturer

    **Was this post helpful? Click Agree or Like below**
    **Did this solve your problem? Accept it as a solution!**

Answers

  • GrantSmith
    GrantSmith Coach
    Answer ✓

    Feed your dataset into a select columns to pull the who manufacturer and model then put that into a remove duplicates based on the who and manufacturer. Then feed it to a group by and separate with commas. Repeat this process replacing model with color. Then join your two group bys based on who and manufacturer

    **Was this post helpful? Click Agree or Like below**
    **Did this solve your problem? Accept it as a solution!**
  • jessdoe
    jessdoe Contributor

    You can also try concatenating all of your columns together into one column, deleting duplicates based off of that column, then splitting the columns back out based off your chosen delimiter.

  • Thanks both. @jessdoe I love this idea and it'll work for other problems I have but not this one. Thanks anyway.

    @GrantSmith I think I follow but could you give me a little more detail please? Thanks so much.

  • Here's a screenshot with an example (it's using bogus data but illustrates what I'm talking about):

    **Was this post helpful? Click Agree or Like below**
    **Did this solve your problem? Accept it as a solution!**
  • Thanks @GrantSmith I've got that flow but am having trouble with the configuration somewhere. I'll continue to play around with it - I'm on the right track. Thanks very much indeed.