Automatic Record Matching - Customers

Howdy!

New Domo customer here - first time posting in the Dojo!

We're bringing in accounts from various systems. Sometimes an account from system A is the same account in system B. I'm wondering if there's a way to do some sort of smart match to automatically establish relationships between records. Something like, if the account name, city, and country are similar, then join.

Has anyone been able to successfully do this within Domo?

Thank you!

Best Answer

  • DomoDork
    DomoDork Contributor
    Answer ✓

    Hey kpmwinston - I recently had this same requirement and solved it via MagicETL and using the Dynamic Unpivot + Python Scriprting blocks. The idea goes something like this:

    1. Add dataset A as a datasouce in MagicETL
    2. Add dataset B as a 2nd datasource in MagicETL
    3. Drag two unpivot blocks into your flow, and connect one to Dataset A, one to Dataset B
    4. This will transform both datasets a 'vertical' stacked dataset (turning your rows into columns)
    5. So both datasets should now be laid out like this:


    In your case, as part of setting up the unpivot block, the product column above would be your column names (Customer Name, Customer Address), and sales column above would be equivalent to their values.

    1. Connect both unpivot blocks to python scripting blocks and leverage the fuzzywuzzy python library to compare the 'values' from dataset A and dataset B to get a confidence score.
    2. This same process could be done on the column names if you wanted to do a fuzzy match to determine if the different column names between both datasets are related. If they are you could leverage a bit more python to conform the column names between both datasets automatically.

    I won't go into a lot of detail on the step above, but the following link provides some sample MagicETL python scripting using the fuzzy wuzzy python library that outputs the matches and confidence scores that you can then connect to an output dataset. In the example link, this person is trying to match addresses from 2 different sources where the addresses are very similar but may not match 100% (so you cant leverage standard joins between them) which sounds nearly identical to what you're trying to achieve.

    I know this isn't a detailed step by step, but it should get you close enough to an actual solution to your problem.

    Get Help | DataCrew (circle.so)

Answers

  • Hi @kpmwinston ,

    Hmm I don't think a Domo ETL join will automatically pick up which columns to use to do the join. You'll have to map them independently as 3 separte joins in the same step, but once you do it you're done!

    John Le

    You're only one dashboard away. 

    More video solutions at: https://www.dashboarddudes.com/pantry

    John Le

    You're only one dashboard away.

    Click here for more video solutions: https://www.dashboarddudes.com/pantry

  • Unfortunately, there isn't a single common identifier in any the source systems to use for a common join. Let's take this example:

    Golden Record

    Customer Name: Coca Cola

    Customer Address: 123 Coke Lane, North Pole, H0H 0H0


    Records From Other Systems

    Record 1 Customer Name: Coca-Cola

    Record 1 Customer Address: 123 Coke Ln, North Pole, H0H

    Record 2 Customer Name: Coke

    Record 2 Customer Name: 57 Coke Blvd, South Pole, 00000


    In this case, the fields from record 1 aren't exactly the same as the golden record, but are similar and would come back with a high match confidence score.

  • @kpmwinston Unfortunately, since the Customer Name or Customer Address aren't exactly the same, you won't be able to join them together. In Magic ETL, you could use some formulas to modify names and addresses so they match, but I'd imagine you'd have a lot of records to change so it wouldn't be feasible.

    **Was this post helpful? Click Agree or Like below**

    **Did this solve your problem? Accept it as a solution!**

  • DomoDork
    DomoDork Contributor
    Answer ✓

    Hey kpmwinston - I recently had this same requirement and solved it via MagicETL and using the Dynamic Unpivot + Python Scriprting blocks. The idea goes something like this:

    1. Add dataset A as a datasouce in MagicETL
    2. Add dataset B as a 2nd datasource in MagicETL
    3. Drag two unpivot blocks into your flow, and connect one to Dataset A, one to Dataset B
    4. This will transform both datasets a 'vertical' stacked dataset (turning your rows into columns)
    5. So both datasets should now be laid out like this:


    In your case, as part of setting up the unpivot block, the product column above would be your column names (Customer Name, Customer Address), and sales column above would be equivalent to their values.

    1. Connect both unpivot blocks to python scripting blocks and leverage the fuzzywuzzy python library to compare the 'values' from dataset A and dataset B to get a confidence score.
    2. This same process could be done on the column names if you wanted to do a fuzzy match to determine if the different column names between both datasets are related. If they are you could leverage a bit more python to conform the column names between both datasets automatically.

    I won't go into a lot of detail on the step above, but the following link provides some sample MagicETL python scripting using the fuzzy wuzzy python library that outputs the matches and confidence scores that you can then connect to an output dataset. In the example link, this person is trying to match addresses from 2 different sources where the addresses are very similar but may not match 100% (so you cant leverage standard joins between them) which sounds nearly identical to what you're trying to achieve.

    I know this isn't a detailed step by step, but it should get you close enough to an actual solution to your problem.

    Get Help | DataCrew (circle.so)

  • This is incredible. Thank you so much for taking the time to help, @bobbyragsdale ! I'm going to start looking into this.