Approach to logically group data via inference of similar text strings
Hey all.
I'm not looking for the detailed solution here as I know it will be complex, potentially require purchased add-ons and most definitely will require my dev team. But I'm hoping someone can guide me to general approaches and Domo capabilities that would be leveraged - if even supportable.
We've got a large data set that is missing solid, referential data to relate common rows. We're building that. But in the mean time, we're looking to do analysis against the current data set and hoping to find a way to apply logic that can group similar rows based on the use of both text (string) and date fields.
Here is an example of the dataset:
In the sample data we would want to look at Underwriter, Corporate Sponsor and Creation Date (within a range to be determined) and using those columns, group the rows. These 4 rows above are one example of a logical grouping of a specific bank working with a specific corporate sponsor where 4 products have been sold to support a single project.
Where would you guide us to explore Domo capabilities/add-ons to help in this interim analysis we're trying to do?
Appreciate your thoughts.
Andy
Best Answer
-
I'd suggest talking to your Account Executive and/or CSM about Python scripting tiles or Jupyter workspaces.
From there you can find python packages/libraries that help with fuzzy matching:
If I solved your problem, please select "yes" above
0
Answers
-
If you have pre-defined groupings then you can use a formula tile or beastmode to create a new field that groups the data together based on your criteria.
You could start with something like this:
CASE WHEN LOWER(`Underwriter`) LIKE '%bankers trust% OR LOWER(`Underwriter`) LIKE '%bankerstrust% AND LOWER(`Corporate Sponsor`) LIKE '%acme%' AND `CreationDate in EST` >= '2020-09-08' AND `CreationDate in EST` <= '2020-11-24' THEN 'Group1'
WHEN (etc…. keep adding additional groupings as needed)
END
Beyond this would require fuzzy matching and some data science work. But if you know the grouping logic then it is definitely doable as shown above.
If I solved your problem, please select "yes" above
1 -
Thank you! The dataset is too large that having to define what to group on would be counter productive. Yea this looks like fuzzy matching and data science. Are there specific Domo capabilities for that work?
0 -
I'd suggest talking to your Account Executive and/or CSM about Python scripting tiles or Jupyter workspaces.
From there you can find python packages/libraries that help with fuzzy matching:
If I solved your problem, please select "yes" above
0
Categories
- All Categories
- 1.8K Product Ideas
- 1.8K Ideas Exchange
- 1.5K Connect
- 1.2K Connectors
- 300 Workbench
- 6 Cloud Amplifier
- 9 Federated
- 2.9K Transform
- 101 SQL DataFlows
- 622 Datasets
- 2.2K Magic ETL
- 3.9K Visualize
- 2.5K Charting
- 745 Beast Mode
- 58 App Studio
- 41 Variables
- 686 Automate
- 176 Apps
- 453 APIs & Domo Developer
- 47 Workflows
- 10 DomoAI
- 36 Predict
- 15 Jupyter Workspaces
- 21 R & Python Tiles
- 395 Distribute
- 113 Domo Everywhere
- 276 Scheduled Reports
- 6 Software Integrations
- 125 Manage
- 122 Governance & Security
- 8 Domo Community Gallery
- 38 Product Releases
- 10 Domo University
- 5.4K Community Forums
- 40 Getting Started
- 30 Community Member Introductions
- 108 Community Announcements
- 4.8K Archive