Separating words in a column where certain words have more spaces than others

I have a column in a dataset of production data. The column is formatted as follows:
Production |
---|
Cherry Tomato 12x1pt - Packed |
Beefsteak 15 LB - Restack |
Mini Sweet Peppers 14x4 FM - Packed |
I'm wondering if there is a way to pull the "12x1pt" "15 LB" and "14x4 FM" into a column for "Size". Given the varying amount of white space, it isn't an easy task through the 'Split Column' plate in the ETL. I feel like this may require RegEx of which I have no experience with.
Thank you in advance.
Best Answer
-
Sorry, trying to write regexes on mobile is not a good idea :D
REGEXP_REPLACE(`PRODUCT`, '^.*?([0-9]+[^ -]*\s?.*?)(?= - ).*$', '$1')
Adding a ? after your \s will make the space optional so it then also works with your examples of Cocktail 16X14OZ - Packed or Cherry Tomato 12x1pt - Packed
Also I utilize a website called regex101.com to build and test my regular expressions which may be helpful.
**Was this post helpful? Click Agree or Like below**
**Did this solve your problem? Accept it as a solution!**0
Answers
-
Will it always be one more more number followed by x and then one or more numbers followed by some extra characters or will that format possibly change?
REGEXP_REPLACE(`Production`, '^.*([0-9]+x[0-9]+[^ ]*).*$', '$1')
This regex follows that pattern where it's looking for one or more digits preceeding the x character and then followed by one or more digits and optionally any number of non space characters.
**Was this post helpful? Click Agree or Like below**
**Did this solve your problem? Accept it as a solution!**0 -
@GrantSmith Would i use a Formula plate for that or would I place it into a "Custom" delimiter within the "Split Column" plate? Also no, they won't always have an x. Some will be whitespace between a # and "LB" "PK" "PT"
0 -
You'd use a formula tile to calculate a new field in Magic ETL.
If your end delimiter is ' - ' then we can tweak it to be something like:
REGEXP_REPLACE(`Production`, '^.*([0-9]+.*) - .*$', '$1')
This simplified version looks for numbers followed by any number of characters until it finds a space-space delimiter.
Will there be other numbers in your product names or just the amounts like you have shown in your examples?
**Was this post helpful? Click Agree or Like below**
**Did this solve your problem? Accept it as a solution!**0 -
@GrantSmith This seems to be removing anything before the x.
"Mini Sweet Peppers 14x4 FM - Packed" comes out as just "4 FM"
0 -
REGEXP_REPLACE(`PRODUCT`, '^.*?([0-9]+[^ -]*\s.*?)(?= - ).*$', '$1')
This works, however it then doesn't work for something like "Cocktail 16X14OZ - Packed". Looks like it's not going to be doable due to the variability of how the original column is formatted.
0 -
Sorry, trying to write regexes on mobile is not a good idea :D
REGEXP_REPLACE(`PRODUCT`, '^.*?([0-9]+[^ -]*\s?.*?)(?= - ).*$', '$1')
Adding a ? after your \s will make the space optional so it then also works with your examples of Cocktail 16X14OZ - Packed or Cherry Tomato 12x1pt - Packed
Also I utilize a website called regex101.com to build and test my regular expressions which may be helpful.
**Was this post helpful? Click Agree or Like below**
**Did this solve your problem? Accept it as a solution!**0 -
@GrantSmith Thank You again!
0
Categories
- All Categories
- Product Ideas
- 2.1K Ideas Exchange
- Connect
- 1.3K Connectors
- 309 Workbench
- 7 Cloud Amplifier
- 10 Federated
- Transform
- 664 Datasets
- 120 SQL DataFlows
- 2.3K Magic ETL
- 825 Beast Mode
- Visualize
- 2.6K Charting
- 88 App Studio
- 46 Variables
- Automate
- 196 Apps
- 486 APIs & Domo Developer
- 94 Workflows
- 24 Code Engine
- AI and Machine Learning
- 23 AI Chat
- 4 AI Projects and Models
- 18 Jupyter Workspaces
- Distribute
- 119 Domo Everywhere
- 283 Scheduled Reports
- 11 Software Integrations
- Manage
- 143 Governance & Security
- 11 Domo Community Gallery
- 49 Product Releases
- 13 Domo University
- Community Forums
- 41 Getting Started
- 31 Community Member Introductions
- 116 Community Announcements
- 5K Archive