Inconsistent State Arbeviations in our Data
In SFDC_TechSupport dataflow, the state column holds inconsistent state acronyms. For example, California will be ca, CA, & cal. Is there a simple way to clean up the state data and get it to work with a United States map?
Link to SFDC_TechSupport dataset:https://rytecdoors.domo.com/datasources/dee055d6-c463-4a1e-8f70-2da011076bad/details/overview
Link to card using this data: https://rytecdoors.domo.com/page/157177905/kpis/details/339647928
Thanks!
Sarah
Best Answer
-
Hi Sarah, you will likely have to run it through a dataflow or use a calculated field (beast mode).
Dataflow/ETL:
An easy way to do it would be to get all the abbreviations from salesforce by creating a data table card from the existing salesforce dataset with the list of all the abbreviations and add an aggregated field (count or sum a field which will then show unique values for the abbreviations). Then export the table to excel (or copy paste into Google Sheets). Add the two letter state next to the abbreviation (format how you want - all uppercase for instance). Then you can either blend/fusion the excel file/google sheet (case sensitive join) to your salesforce dataset or put it through a dataflow. If you put it through a dataflow, you can use UPPER or LOWER to join so if someone puts another value in there that has different capitalization, the dataflow will catch it. Just make sure to get distinct values first (use distinct or group by in a transform if you convert state values to UPPER/LOWER, so that your data will not be duplicated... UPPER(Cal) is the same as CAL).
Calculated Field:
The other option is a calculated field. You could do a long case statement. Capitalization will matter with the specific values you are trying to match unless you use upper or lower on the field for state name. With this approach, you would really only need to identify the non-conforming state values like Cal or something spelled out.
CASE WHEN UPPER(state_name) IN ('CAL','CA','CALIFORNIA') THEN 'CA'
WHEN UPPER(state_name) IN ('OH','OHIO') THEN 'OH'
WHEN UPPER(state_name) IN ('NEW YORK','NYC') THEN 'NY'
ELSE state_name END
The other option is to have your salesforce team clean up their data, but if your company is anything like mine, it would take a long time for it to happen.
Thanks,
Josh2
Answers
-
You could certainly do a clean up via Magic ETL, thinking you would join to a static "Clean Up" Excel doc that would have all the different incorrect states
State S/B
CA CA
ca CA
cal CA
But the first thing I would do is look for other fields in your query that could help, that might be more reliable. Do you have zip code in there?
Domo Consultant
**Say "Thanks" by clicking the "heart" in the post that helped you.
**Please mark the post that solves your problem by clicking on "Accept as Solution"1 -
Hey @sdarmody
Unfoturnately, it looks like this data is being entered manually in SFDC and therefore there is some variation in how different users are entering the states. The only thing for it would be to clean up the data now that it's in Domo (or possibly have them all to go back and fix it, but I don't know if that's an option).
I would personally do this with a Beast Mode. Something like:
CASE WHEN LOWER(`State`) LIKE 'ca%' THEN 'CA'
WHEN LOWER(`State`) LIKE 'n%v%' THEN 'NV'
WHEN ...
It might be a bit on the heavy side, but it should at least help you locate the outliers and then you could move it into a DataFlow if you prefered (for consistency's sake)
Thanks,
Jake
**Say "Thanks" by clicking the "heart" in the post that helped you.
**Please mark the post that solves your problem by clicking on "Accept as Solution"1 -
Hi Sarah, you will likely have to run it through a dataflow or use a calculated field (beast mode).
Dataflow/ETL:
An easy way to do it would be to get all the abbreviations from salesforce by creating a data table card from the existing salesforce dataset with the list of all the abbreviations and add an aggregated field (count or sum a field which will then show unique values for the abbreviations). Then export the table to excel (or copy paste into Google Sheets). Add the two letter state next to the abbreviation (format how you want - all uppercase for instance). Then you can either blend/fusion the excel file/google sheet (case sensitive join) to your salesforce dataset or put it through a dataflow. If you put it through a dataflow, you can use UPPER or LOWER to join so if someone puts another value in there that has different capitalization, the dataflow will catch it. Just make sure to get distinct values first (use distinct or group by in a transform if you convert state values to UPPER/LOWER, so that your data will not be duplicated... UPPER(Cal) is the same as CAL).
Calculated Field:
The other option is a calculated field. You could do a long case statement. Capitalization will matter with the specific values you are trying to match unless you use upper or lower on the field for state name. With this approach, you would really only need to identify the non-conforming state values like Cal or something spelled out.
CASE WHEN UPPER(state_name) IN ('CAL','CA','CALIFORNIA') THEN 'CA'
WHEN UPPER(state_name) IN ('OH','OHIO') THEN 'OH'
WHEN UPPER(state_name) IN ('NEW YORK','NYC') THEN 'NY'
ELSE state_name END
The other option is to have your salesforce team clean up their data, but if your company is anything like mine, it would take a long time for it to happen.
Thanks,
Josh2 -
Good call, I'm currently looking for a good zip code field from SFDC. However, it looks like it's also inconsistent data. Still trying. If this doesn't work, I'll clean up the data by mapping current state abreviations to the desired abreviations. Thanks!
1
Categories
- All Categories
- 1.8K Product Ideas
- 1.8K Ideas Exchange
- 1.6K Connect
- 1.2K Connectors
- 300 Workbench
- 6 Cloud Amplifier
- 9 Federated
- 2.9K Transform
- 103 SQL DataFlows
- 627 Datasets
- 2.2K Magic ETL
- 3.9K Visualize
- 2.5K Charting
- 755 Beast Mode
- 61 App Studio
- 41 Variables
- 693 Automate
- 178 Apps
- 456 APIs & Domo Developer
- 49 Workflows
- 10 DomoAI
- 38 Predict
- 16 Jupyter Workspaces
- 22 R & Python Tiles
- 398 Distribute
- 115 Domo Everywhere
- 276 Scheduled Reports
- 7 Software Integrations
- 130 Manage
- 127 Governance & Security
- 8 Domo Community Gallery
- 38 Product Releases
- 11 Domo University
- 5.4K Community Forums
- 40 Getting Started
- 30 Community Member Introductions
- 110 Community Announcements
- 4.8K Archive