How to count strings in a column containing a comma delimited list of strings in each row

Hello, I have a field in which each row contains a comma delimited list of strings. I'm trying to find a way to display how often each string appears in the dataset. For example, the column might be "City":

Row 1 | New York,Boston,San Francisco

Row 2 | Boston

Row 3 | New York,San Diego

Row 4 | New York,Topeka

I like to create some data display that shows New York:3, Boston:2, San Francisco:1, San Diego:1, Topeka:1.

A regular bar graph doesn't recognize the comma and treats each unique string including the commas as a separate entry.

I can split the field into multiple, City1, City2, City3, etc, but I don't know of a data display that looks into multiple columns.

Any advice is appreciated, thank you.


Best Answer

  • ArborRose
    ArborRose Contributor
    Answer ✓

    The way I did it in SQL, it doesn't matter how many commas there are. In ETL, I'd have to create something recusive to make sure you can get any number of elements typed into a string. If you know you won't exceed the number of delimited items you show, something like this will work.

    Using that flow, I then went out and had some website give me random city names which I typed into my webform.


  • ArborRose
    ArborRose Contributor

    You should be able to do a calculated field such as

    CASE when City like '%New York%' then 'New York'
    when City like '%San Diego%' then 'San Diego'
    when City like '%San Francisco%' then 'San Francisco'
    when City like '%Boston%' then 'Boston'
    when City like '%Topeka%' then 'Topeka'


    CASE when INSTR(City,'New York')>0 then 'New York'
    when INSTR(City,'San Diego')>0 then 'San Diego'
    when INSTR(City,'Boston')>0 then 'Boston'
    when INSTR(City,'San Francisco')>0 then 'San Francisco'
    when INSTR(City,'Topeka')>0 then 'Topeka'

    Then use the field in a bar chart

  • Thanks for that solution @ArborRose. It works well for this example.

    What if there are more than five possible strings, for example, what if any city in the U.S. could appear and the data set has tens of thousands of rows? It wouldn't be reasonable to make a INSTR for every potential. Any ideas on how to make this scalable?

  • ArborRose
    ArborRose Contributor

    I'd need to ponder a bit. First instinct - try to break up the string by the delimiter, in this case a comma. Feed the pieces to a table (aka a list) and then aggregate using count.

  • @tom_rotchadl@tom_rotchadl How many cities could be listed in a single row? I'd suggest using a combination of the Split Column and Dynamic Unpivot ETL tiles:

  • ArborRose
    ArborRose Contributor
    edited March 24

    As @MichelleH mentions, in the ETL I'd be looking to use split to get the pieces from the string. I don't have a chance right now to try it in Domo, but I did a quick mock up in SQL with code shown below.

    As I mentioned, I would split the strings to create a table as shown below. Then aggregate the table using count to get the source for the bar chart.