Similarity between two text columns
I have a dataset with two columns indicating company names. I was wondering what is the best way to determine the similarity between the two?
Perhaps, I can pass 3 columns in R/Python and return 4 columns with cosine similarity. Can I do that? A starter code/example would be great?
Comments
-
**Was this post helpful? Click Agree or Like below**
**Did this solve your problem? Accept it as a solution!**1 -
https://stackoverflow.com/questions/560709/levenshtein-distance-in-t-sql
Levenshtein distance is a common way of calculating the similarity between two text values (i.e. how many characters would you have to change before they are the same. "cat > rat" = 1, "John > Jon" = 1.
You'll have to rewrite it into MySQL but it can be done. For a workflow like this though, you'll want some sort of process where a user accepts or discards a recommendation which you'll want to accumulate (recursively?) in a lookup table of 'approved matches'.
in my above example, you may not want to automatically accept that John and Jon are the same entry... hence the need for a feedback loop. Domo can of course handle that as a simple webform or more dynamically with a custom app with a polished user interface.
Jae Wilson
Check out my 🎥 Domo Training YouTube Channel 👨💻
**Say "Thanks" by clicking the ❤️ in the post that helped you.
**Please mark the post that solves your problem by clicking on "Accept as Solution"0
Categories
- All Categories
- 1.8K Product Ideas
- 1.8K Ideas Exchange
- 1.6K Connect
- 1.2K Connectors
- 300 Workbench
- 6 Cloud Amplifier
- 9 Federated
- 2.9K Transform
- 101 SQL DataFlows
- 623 Datasets
- 2.2K Magic ETL
- 3.9K Visualize
- 2.5K Charting
- 748 Beast Mode
- 60 App Studio
- 41 Variables
- 690 Automate
- 177 Apps
- 455 APIs & Domo Developer
- 48 Workflows
- 10 DomoAI
- 36 Predict
- 15 Jupyter Workspaces
- 21 R & Python Tiles
- 397 Distribute
- 114 Domo Everywhere
- 276 Scheduled Reports
- 7 Software Integrations
- 127 Manage
- 124 Governance & Security
- 8 Domo Community Gallery
- 38 Product Releases
- 10 Domo University
- 5.4K Community Forums
- 40 Getting Started
- 30 Community Member Introductions
- 109 Community Announcements
- 4.8K Archive