Standardizing Names

Options

I need a way to apply a list of replacement values against incoming data. This is to standardize naming on insurance companies. If the data field says "BCBS of Tennesee", "BCBS of Tennessee", "BCBS of Tenn", Blue C Blue S", etc…the result needs to be in a new field with the result "Blue Cross Blue Shield".

Say I have an imported file maintained by another department.

[InsCo] [InsCo_Reclass]
Blue Cross Blue Cross Blue Shield
BCBS Blue Cross Blue Shield
Blue Cross of Tennessee Blue Cross Blue Shield
Blu Crss Blue Cross Blue Shield
Aetna of Tenn Aetna
Aetna of Texas Aetna


Can you think of a way to apply a regular expression or other method against the full table in one formula?

CASE WHEN REGEXP_replace(LOWER(`InsCo`), {some regular express}, '') =` InsCo` then `InsCo_Reclass` ELSE '' END

** Was this post helpful? Click Agree or Like below. **
** Did this solve your problem? Accept it as a solution! **

Comments

  • ArborRose
    Options

    Sorry…that list didn't paste well.

    ** Was this post helpful? Click Agree or Like below. **
    ** Did this solve your problem? Accept it as a solution! **

  • ColemenWilson
    Options

    If you have a table like the one you shared, you could join column B to your data on column A. Then you wouldn't need a formula at all and as new mapping is added by the other department the values in the reclass column will update dynamically.

    If I solved your problem, please select "yes" above

  • GrantSmith
    Options

    Why not just do a left join in an ETL on your lookup table based on the InsCo field to pull in the Reclass value? This way you don't have to keep tweaking a regex

    **Was this post helpful? Click Agree or Like below**
    **Did this solve your problem? Accept it as a solution!**
  • ArborRose
    Options

    I'm looking for a more efficient method. This list will have hundreds of thousands of manually typed named. I don't want hundreds of thousands of entries in the replacement table. A left join must match exactly. I want something such as applying more than one regular expression against the incoming.

    CASE WHEN {reg exp dealing with blue cross blue shield}
    WHEN {reg exp dealing with aetna}
    WHEN {reg exp dealing with liberty}
    etc

    Especially if the imported list contained the regular expression to be used.

    [regular express] [reclassified name]
    xxxxx xxxxxxx

    ** Was this post helpful? Click Agree or Like below. **
    ** Did this solve your problem? Accept it as a solution! **

  • marcel_luthi
    Options

    Correct with a LEFT JOIN, not only will you be able to get what you want, but you could also surface a report that show unique entries that have no replacements to the department that manages the mapping file, so they can work on sorting those out and make sure the data is mapped the best way possible.

    You can build a MegaTable with a BeastMode that looks for nulls in the Reclass field:

    CASE WHEN Reclass IS NULL THEN 'true' ELSE 'false' END
    

    You apply a filter on where this beast mode evaluates to true and can even provide a COUNT on the entries so they'll know not only which entries have not been mapped but how much they're being used.

  • ArborRose
    Options

    Here's a first thought I had. I am still theorizing.

    ** Was this post helpful? Click Agree or Like below. **
    ** Did this solve your problem? Accept it as a solution! **

  • ArborRose
    Options

    Is there any way to put a formula or substring into value mapper? In essence, I want to search for a small string, and if found in the data field, replace the entire field with a static value.

    Case
    When lower(insurance_company_name) like '%blue%'
    or lower(insurance_company_name) like '%cross%'
    or lower(insurance_company_name) like '%shield%'
    or lower(insurance_company_name) like '%bcbs%' then 'Blue Cross Blue Shield'
    END

    Value Mapper seems to only work if the search value contains an entire match.

    ** Was this post helpful? Click Agree or Like below. **
    ** Did this solve your problem? Accept it as a solution! **

  • MichelleH
    Options

    @ArborRose Instead of Value Mapper you could just put that exact case statement into a Formula tile to give you more flexibility.

  • ArborRose
    Options

    Table associations and calculated formulas are inefficient when the data has continual additions, and the dataset will have many entries. I need scaled automation that doesn't require me holding its hand. The task is delegated to someone who won't have permissions to the dataset or ETL.

    Consider my CASE formula for Blue Cross Blue Shield, with someone other than me entering the following into Excel.

    I import that into Domo as table, Ins Mapping. Now I want "Ins Mapping" to feed a calculate formula or ETL value mapper. In the formula scenario, it would be something like

    CASE
    WHEN ins_company like [A2] then [B2]
    WHEN ins_company like [A3] then [B3]
    WHEN ins_company like [A4] then [B4]
    WHEN ins_company like [A5] then [B5]
    WHEN ins_company like [A6] then [B6]
    END

    In the example, BCBS is covered and Aetna is handled. The length of the Excel file will grow daily or weekly. I should never need to look at it myself. Nor should I have to touch the formula. I want the imported table to feed the formula. Note that I only need four entries to cover Blue Cross, even though there may be hundreds of versions typed in the data.

    So I guess my question is - how can a table feed a formula or value mapper?



    ** Was this post helpful? Click Agree or Like below. **
    ** Did this solve your problem? Accept it as a solution! **

  • GrantSmith
    Options

    You'd need to do a cartesian join on your dataset with your mapping dataset, apply the rule, and then reduce your dataset back to the original size.

    Start by adding a new column called Join Column with a value of 1 to each your input dataset and your mapping dataset.

    Join these two datasets together on the new join column.

    Use a formula tile to calculate the new grouping:

    CASE WHEN `insurance_company` LIKE `SearchString` THEN `Reclass` END
    

    Then feed that into a group by, grouping by your identifier field and selecting the max of the formula to get the actual mapping.

    Take that output and join it back to your original dataset.

    **Was this post helpful? Click Agree or Like below**
    **Did this solve your problem? Accept it as a solution!**
  • ArborRose
    Options

    Thanks for all the responses ya'll. I'm very excited to try this Grant.

    ** Was this post helpful? Click Agree or Like below. **
    ** Did this solve your problem? Accept it as a solution! **