Duplicate value handling in ETL
I have a column where there are around 34 values that are duplicated for one line that is supposed to be for two people. See picture for example. Ideally it'd be great to have one formula that can distinguish where the break would be so I don't have to manually create 34 lines of code, and have to continually add lines when a new duplicate value shows up. Right now my formula etl tile like this, "...WHEN 'Parent Lead Source' = 'OtherMagazine' then 'Other' WHEN..." and so on. I only need the first part of each string like in my example I only need 'Other'. I'm sure there has to be a better way...
Best Answers
-
You could use magic ETL and a regular expression to format your data and then split it.
SPLIT_PART(REGEXP_REPLACE(`Parent Lead Source`, '([a-z])([A-Z])', '$1::$2'), '::', 1)
This will find the occurrence where a lowercase letter precedes an uppercase character and then injects '::' between them to be able to split it apart with the split_part function.
Alternatively, you could do a pure regex solution like:
REGEXP_REPLACE(`Parent Lead Source`, '^(.[a-z])[A-Z].*$', '$1')
**Was this post helpful? Click Agree or Like below**
**Did this solve your problem? Accept it as a solution!**1 -
It doesn't appear that you have a lot of consistency to go off of since some of your first words have spaces and sometimes your first word is also the second word. Are the words at the end consistently the same? If so, you could look at the end of the string using the RIGHT() function and if it equals Referral or Internet, then you could use the REPLACE function to replace that word with nothing.
**Check out my Domo Tips & Tricks Videos
**Make sure to any users posts that helped you.
**Please mark as accepted the ones who solved your issue.1
Answers
-
You could use magic ETL and a regular expression to format your data and then split it.
SPLIT_PART(REGEXP_REPLACE(`Parent Lead Source`, '([a-z])([A-Z])', '$1::$2'), '::', 1)
This will find the occurrence where a lowercase letter precedes an uppercase character and then injects '::' between them to be able to split it apart with the split_part function.
Alternatively, you could do a pure regex solution like:
REGEXP_REPLACE(`Parent Lead Source`, '^(.[a-z])[A-Z].*$', '$1')
**Was this post helpful? Click Agree or Like below**
**Did this solve your problem? Accept it as a solution!**1 -
It doesn't appear that you have a lot of consistency to go off of since some of your first words have spaces and sometimes your first word is also the second word. Are the words at the end consistently the same? If so, you could look at the end of the string using the RIGHT() function and if it equals Referral or Internet, then you could use the REPLACE function to replace that word with nothing.
**Check out my Domo Tips & Tricks Videos
**Make sure to any users posts that helped you.
**Please mark as accepted the ones who solved your issue.1 -
That's a good point, I could do that for the instances where that occurs. Thanks Mark
0
Categories
- All Categories
- 1.8K Product Ideas
- 1.8K Ideas Exchange
- 1.5K Connect
- 1.2K Connectors
- 297 Workbench
- 6 Cloud Amplifier
- 8 Federated
- 2.9K Transform
- 100 SQL DataFlows
- 614 Datasets
- 2.2K Magic ETL
- 3.8K Visualize
- 2.5K Charting
- 729 Beast Mode
- 53 App Studio
- 40 Variables
- 677 Automate
- 173 Apps
- 451 APIs & Domo Developer
- 45 Workflows
- 8 DomoAI
- 34 Predict
- 14 Jupyter Workspaces
- 20 R & Python Tiles
- 394 Distribute
- 113 Domo Everywhere
- 275 Scheduled Reports
- 6 Software Integrations
- 121 Manage
- 118 Governance & Security
- Domo Community Gallery
- 32 Product Releases
- 10 Domo University
- 5.4K Community Forums
- 40 Getting Started
- 30 Community Member Introductions
- 108 Community Announcements
- 4.8K Archive