Approach to logically group data via inference of similar text strings

Options

Hey all.

I'm not looking for the detailed solution here as I know it will be complex, potentially require purchased add-ons and most definitely will require my dev team. But I'm hoping someone can guide me to general approaches and Domo capabilities that would be leveraged - if even supportable.

We've got a large data set that is missing solid, referential data to relate common rows. We're building that. But in the mean time, we're looking to do analysis against the current data set and hoping to find a way to apply logic that can group similar rows based on the use of both text (string) and date fields.

Here is an example of the dataset:

In the sample data we would want to look at Underwriter, Corporate Sponsor and Creation Date (within a range to be determined) and using those columns, group the rows. These 4 rows above are one example of a logical grouping of a specific bank working with a specific corporate sponsor where 4 products have been sold to support a single project.

Where would you guide us to explore Domo capabilities/add-ons to help in this interim analysis we're trying to do?

Appreciate your thoughts.

Andy

Best Answer

Answers

  • ColemenWilson
    edited October 2023
    Options

    If you have pre-defined groupings then you can use a formula tile or beastmode to create a new field that groups the data together based on your criteria.

    You could start with something like this:

    CASE WHEN LOWER(`Underwriter`) LIKE '%bankers trust% OR LOWER(`Underwriter`) LIKE '%bankerstrust% AND LOWER(`Corporate Sponsor`) LIKE '%acme%' AND `CreationDate in EST` >= '2020-09-08' AND `CreationDate in EST` <= '2020-11-24' THEN 'Group1'

    WHEN (etc…. keep adding additional groupings as needed)

    END

    Beyond this would require fuzzy matching and some data science work. But if you know the grouping logic then it is definitely doable as shown above.

    If I solved your problem, please select "yes" above

  • afieweger
    Options

    Thank you! The dataset is too large that having to define what to group on would be counter productive. Yea this looks like fuzzy matching and data science. Are there specific Domo capabilities for that work?

  • ColemenWilson
    Answer ✓
    Options

    I'd suggest talking to your Account Executive and/or CSM about Python scripting tiles or Jupyter workspaces.

    From there you can find python packages/libraries that help with fuzzy matching:

    https://medium.com/codex/best-libraries-for-fuzzy-matching-in-python-cbb3e0ef87dd

    If I solved your problem, please select "yes" above