Automatic Record Matching - Customers
Howdy!
New Domo customer here - first time posting in the Dojo!
We're bringing in accounts from various systems. Sometimes an account from system A is the same account in system B. I'm wondering if there's a way to do some sort of smart match to automatically establish relationships between records. Something like, if the account name, city, and country are similar, then join.
Has anyone been able to successfully do this within Domo?
Thank you!
Best Answer
-
Hey kpmwinston - I recently had this same requirement and solved it via MagicETL and using the Dynamic Unpivot + Python Scriprting blocks. The idea goes something like this:
- Add dataset A as a datasouce in MagicETL
- Add dataset B as a 2nd datasource in MagicETL
- Drag two unpivot blocks into your flow, and connect one to Dataset A, one to Dataset B
- This will transform both datasets a 'vertical' stacked dataset (turning your rows into columns)
- So both datasets should now be laid out like this:
In your case, as part of setting up the unpivot block, the product column above would be your column names (Customer Name, Customer Address), and sales column above would be equivalent to their values.
- Connect both unpivot blocks to python scripting blocks and leverage the fuzzywuzzy python library to compare the 'values' from dataset A and dataset B to get a confidence score.
- This same process could be done on the column names if you wanted to do a fuzzy match to determine if the different column names between both datasets are related. If they are you could leverage a bit more python to conform the column names between both datasets automatically.
I won't go into a lot of detail on the step above, but the following link provides some sample MagicETL python scripting using the fuzzy wuzzy python library that outputs the matches and confidence scores that you can then connect to an output dataset. In the example link, this person is trying to match addresses from 2 different sources where the addresses are very similar but may not match 100% (so you cant leverage standard joins between them) which sounds nearly identical to what you're trying to achieve.
I know this isn't a detailed step by step, but it should get you close enough to an actual solution to your problem.
3
Answers
-
Hi @kpmwinston ,
Hmm I don't think a Domo ETL join will automatically pick up which columns to use to do the join. You'll have to map them independently as 3 separte joins in the same step, but once you do it you're done!
John Le
You're only one dashboard away.
More video solutions at: https://www.dashboarddudes.com/pantry
John Le
Let me make you the me of your organization for Domo
Click here for more video solutions:
0 -
Unfortunately, there isn't a single common identifier in any the source systems to use for a common join. Let's take this example:
Golden Record
Customer Name: Coca Cola
Customer Address: 123 Coke Lane, North Pole, H0H 0H0
Records From Other Systems
Record 1 Customer Name: Coca-Cola
Record 1 Customer Address: 123 Coke Ln, North Pole, H0H
Record 2 Customer Name: Coke
Record 2 Customer Name: 57 Coke Blvd, South Pole, 00000
In this case, the fields from record 1 aren't exactly the same as the golden record, but are similar and would come back with a high match confidence score.
0 -
@kpmwinston Unfortunately, since the Customer Name or Customer Address aren't exactly the same, you won't be able to join them together. In Magic ETL, you could use some formulas to modify names and addresses so they match, but I'd imagine you'd have a lot of records to change so it wouldn't be feasible.
**Was this post helpful? Click Agree or Like below**
**Did this solve your problem? Accept it as a solution!**
1 -
Hey kpmwinston - I recently had this same requirement and solved it via MagicETL and using the Dynamic Unpivot + Python Scriprting blocks. The idea goes something like this:
- Add dataset A as a datasouce in MagicETL
- Add dataset B as a 2nd datasource in MagicETL
- Drag two unpivot blocks into your flow, and connect one to Dataset A, one to Dataset B
- This will transform both datasets a 'vertical' stacked dataset (turning your rows into columns)
- So both datasets should now be laid out like this:
In your case, as part of setting up the unpivot block, the product column above would be your column names (Customer Name, Customer Address), and sales column above would be equivalent to their values.
- Connect both unpivot blocks to python scripting blocks and leverage the fuzzywuzzy python library to compare the 'values' from dataset A and dataset B to get a confidence score.
- This same process could be done on the column names if you wanted to do a fuzzy match to determine if the different column names between both datasets are related. If they are you could leverage a bit more python to conform the column names between both datasets automatically.
I won't go into a lot of detail on the step above, but the following link provides some sample MagicETL python scripting using the fuzzy wuzzy python library that outputs the matches and confidence scores that you can then connect to an output dataset. In the example link, this person is trying to match addresses from 2 different sources where the addresses are very similar but may not match 100% (so you cant leverage standard joins between them) which sounds nearly identical to what you're trying to achieve.
I know this isn't a detailed step by step, but it should get you close enough to an actual solution to your problem.
3 -
This is incredible. Thank you so much for taking the time to help, @bobbyragsdale ! I'm going to start looking into this.
0
Categories
- All Categories
- 1.8K Product Ideas
- 1.8K Ideas Exchange
- 1.6K Connect
- 1.2K Connectors
- 300 Workbench
- 6 Cloud Amplifier
- 9 Federated
- 2.9K Transform
- 103 SQL DataFlows
- 627 Datasets
- 2.2K Magic ETL
- 3.9K Visualize
- 2.5K Charting
- 755 Beast Mode
- 61 App Studio
- 41 Variables
- 693 Automate
- 178 Apps
- 456 APIs & Domo Developer
- 49 Workflows
- 10 DomoAI
- 38 Predict
- 16 Jupyter Workspaces
- 22 R & Python Tiles
- 398 Distribute
- 115 Domo Everywhere
- 276 Scheduled Reports
- 7 Software Integrations
- 130 Manage
- 127 Governance & Security
- 8 Domo Community Gallery
- 38 Product Releases
- 11 Domo University
- 5.4K Community Forums
- 40 Getting Started
- 30 Community Member Introductions
- 110 Community Announcements
- 4.8K Archive