Separating words in a column where certain words have more spaces than others
I have a column in a dataset of production data. The column is formatted as follows:
Production |
---|
Cherry Tomato 12x1pt - Packed |
Beefsteak 15 LB - Restack |
Mini Sweet Peppers 14x4 FM - Packed |
I'm wondering if there is a way to pull the "12x1pt" "15 LB" and "14x4 FM" into a column for "Size". Given the varying amount of white space, it isn't an easy task through the 'Split Column' plate in the ETL. I feel like this may require RegEx of which I have no experience with.
Thank you in advance.
Best Answer
-
Sorry, trying to write regexes on mobile is not a good idea :D
REGEXP_REPLACE(`PRODUCT`, '^.*?([0-9]+[^ -]*\s?.*?)(?= - ).*$', '$1')
Adding a ? after your \s will make the space optional so it then also works with your examples of Cocktail 16X14OZ - Packed or Cherry Tomato 12x1pt - Packed
Also I utilize a website called regex101.com to build and test my regular expressions which may be helpful.
**Was this post helpful? Click Agree or Like below**
**Did this solve your problem? Accept it as a solution!**0
Answers
-
Will it always be one more more number followed by x and then one or more numbers followed by some extra characters or will that format possibly change?
REGEXP_REPLACE(`Production`, '^.*([0-9]+x[0-9]+[^ ]*).*$', '$1')
This regex follows that pattern where it's looking for one or more digits preceeding the x character and then followed by one or more digits and optionally any number of non space characters.
**Was this post helpful? Click Agree or Like below**
**Did this solve your problem? Accept it as a solution!**0 -
@GrantSmith Would i use a Formula plate for that or would I place it into a "Custom" delimiter within the "Split Column" plate? Also no, they won't always have an x. Some will be whitespace between a # and "LB" "PK" "PT"
0 -
You'd use a formula tile to calculate a new field in Magic ETL.
If your end delimiter is ' - ' then we can tweak it to be something like:
REGEXP_REPLACE(`Production`, '^.*([0-9]+.*) - .*$', '$1')
This simplified version looks for numbers followed by any number of characters until it finds a space-space delimiter.
Will there be other numbers in your product names or just the amounts like you have shown in your examples?
**Was this post helpful? Click Agree or Like below**
**Did this solve your problem? Accept it as a solution!**0 -
@GrantSmith This seems to be removing anything before the x.
"Mini Sweet Peppers 14x4 FM - Packed" comes out as just "4 FM"
0 -
REGEXP_REPLACE(`PRODUCT`, '^.*?([0-9]+[^ -]*\s.*?)(?= - ).*$', '$1')
This works, however it then doesn't work for something like "Cocktail 16X14OZ - Packed". Looks like it's not going to be doable due to the variability of how the original column is formatted.
0 -
Sorry, trying to write regexes on mobile is not a good idea :D
REGEXP_REPLACE(`PRODUCT`, '^.*?([0-9]+[^ -]*\s?.*?)(?= - ).*$', '$1')
Adding a ? after your \s will make the space optional so it then also works with your examples of Cocktail 16X14OZ - Packed or Cherry Tomato 12x1pt - Packed
Also I utilize a website called regex101.com to build and test my regular expressions which may be helpful.
**Was this post helpful? Click Agree or Like below**
**Did this solve your problem? Accept it as a solution!**0 -
@GrantSmith Thank You again!
0
Categories
- All Categories
- 1.9K Product Ideas
- 1.9K Ideas Exchange
- 1.6K Connect
- 1.3K Connectors
- 303 Workbench
- 6 Cloud Amplifier
- 9 Federated
- 2.9K Transform
- 104 SQL DataFlows
- 640 Datasets
- 2.2K Magic ETL
- 4K Visualize
- 2.5K Charting
- 767 Beast Mode
- 70 App Studio
- 43 Variables
- 716 Automate
- 185 Apps
- 461 APIs & Domo Developer
- 56 Workflows
- 14 DomoAI
- 39 Predict
- 16 Jupyter Workspaces
- 23 R & Python Tiles
- 402 Distribute
- 116 Domo Everywhere
- 277 Scheduled Reports
- 9 Software Integrations
- 134 Manage
- 131 Governance & Security
- 8 Domo Community Gallery
- 44 Product Releases
- 12 Domo University
- 5.4K Community Forums
- 40 Getting Started
- 30 Community Member Introductions
- 111 Community Announcements
- 4.8K Archive