Remove duplicate unknown comma separated values within a single 'cell'

Options

Hello good people of Domo

Very frustrated trying to remove duplicates WITHIN a comma seperated list. I have the following webform:

If I 'Group By' (Magic ETL) by 'Who' & 'Manufacturer' (combining 'Model' & 'Colour' with commas) I get the below. This has repeats in R5:C3 (2x 'TT'), R6:C3 (2x V70) and 'Black' is repeated in R5:C4 & R6:C4.

What I would like: The above without the duplicates in the cells mentioned.

Other things you might need to know:

  1. My real data has nearly 1,000,000 rows in it.
  2. I will not know (ahead of time) what the value will be
  3. There could be one value, two values or any number of values separated by the comma. I won't know.

I've tried daisy chaining multiple 'Group By' tiles together; I've tried pulling data out, dealing with it and joining it back later (gets messy).

I'm happy to do this in ETL (my preference) but could do it in beast mode if better.

Any ideas, please?

Thanks Domos

Charlie

Best Answers

  • GrantSmith
    GrantSmith Coach
    Answer ✓
    Options

    Feed your dataset into a select columns to pull the who manufacturer and model then put that into a remove duplicates based on the who and manufacturer. Then feed it to a group by and separate with commas. Repeat this process replacing model with color. Then join your two group bys based on who and manufacturer

    **Was this post helpful? Click Agree or Like below**
    **Did this solve your problem? Accept it as a solution!**
  • rco
    rco Contributor
    edited December 2023 Answer ✓
    Options

    @Charlie Bantoft It sounds like you want this formula:

    GROUP_CONCAT(DISTINCT Model SEPARATOR ', ')

    Use the "Add Formula" button instead of the "Add Column" button in Group By and use that formula in place of the "Combine strings separated by ," aggregations. Don't forget to replace Model with Colour for the other one.

    EDIT: For context, the "Combine strings separated by ," aggregation is identical to the formula (note that it does not include the DISTINCT keyword): GROUP_CONCAT(the_column SEPARATOR ', ')

    Randall Oveson <randall.oveson@domo.com>

Answers

  • GrantSmith
    GrantSmith Coach
    Answer ✓
    Options

    Feed your dataset into a select columns to pull the who manufacturer and model then put that into a remove duplicates based on the who and manufacturer. Then feed it to a group by and separate with commas. Repeat this process replacing model with color. Then join your two group bys based on who and manufacturer

    **Was this post helpful? Click Agree or Like below**
    **Did this solve your problem? Accept it as a solution!**
  • jessdoe
    jessdoe Contributor
    Options

    You can also try concatenating all of your columns together into one column, deleting duplicates based off of that column, then splitting the columns back out based off your chosen delimiter.

  • Thanks both. @jessdoe I love this idea and it'll work for other problems I have but not this one. Thanks anyway.

    @GrantSmith I think I follow but could you give me a little more detail please? Thanks so much.

  • GrantSmith
    Options

    Here's a screenshot with an example (it's using bogus data but illustrates what I'm talking about):

    **Was this post helpful? Click Agree or Like below**
    **Did this solve your problem? Accept it as a solution!**
  • Thanks @GrantSmith I've got that flow but am having trouble with the configuration somewhere. I'll continue to play around with it - I'm on the right track. Thanks very much indeed.

  • rco
    rco Contributor
    edited December 2023 Answer ✓
    Options

    @Charlie Bantoft It sounds like you want this formula:

    GROUP_CONCAT(DISTINCT Model SEPARATOR ', ')

    Use the "Add Formula" button instead of the "Add Column" button in Group By and use that formula in place of the "Combine strings separated by ," aggregations. Don't forget to replace Model with Colour for the other one.

    EDIT: For context, the "Combine strings separated by ," aggregation is identical to the formula (note that it does not include the DISTINCT keyword): GROUP_CONCAT(the_column SEPARATOR ', ')

    Randall Oveson <randall.oveson@domo.com>

  • Thank you! Once I have this column sorted @rco how would you advise filtering on its results with an in-dashboard filter?

    so, for example: if a row has the data string 'Volvo, Porsche, Austin' and I want to filter all rows that 'contain' 'Porsche' using a drop down that has all car manufacturers on it - how would I do that?

    The only solution I've found is to use a quick filter which means opening individual cards but I'd like the filter to control the whole dashboard.

    Many thanks.

  • rco
    rco Contributor
    Options

    Since there isn't a "string contains" operator in the dashboard filters, I've had to come up with a somewhat more tedious way of accomplishing this. It does involve opening and modifying all the cards individually, but only during the setup; once it's done you can use a single control on the dashboard to do your filtering.

    On the first card…

    1. Edit in Analyzer and Add Calculated Field. We'll call it "Model Match"
    2. In the Beast Mode editor, go to the Variables tab and Add Variable. Leave its default value empty, let it have type Text, and add a Control for it of type Textbox Input. We'll call the variable "Control_Model"
    3. Still in the Beast Mode editor, write this formula: CASE WHEN INSTR(LOWER(Model), LOWER(Control_Model)) > 0 THEN 1 ELSE 0 END
    4. Validate and save the "Model Match" calculated field.
    5. Add the "Model Match" field as a filter for the card, filtering to the case where it is equal to 1.
    6. Save the card and exit Analyzer.

    On every card on the dashboard, repeat steps 1 and 3 through 6, skipping 2 since we only have to create the variable once.

    Finally, on the dashboard, hit the wrench menu and Edit Dashboard. Add a Control where you choose, and make control the "Control_Model" variable.

    With that, you should be able to use the one control to filter the whole dashboard to specific models in the concatenated model list. When the control is left blank, it should show all data.

    Randall Oveson <randall.oveson@domo.com>

  • Thanks very much indeed @rco - going to try this shortly. Will let you know how I get on. Really appreciate it.

  • Charlie Bantoft
    Options

    This is been very useful by the way @rco. Thank you very much indeed.