Separating words in a column where certain words have more spaces than others
I have a column in a dataset of production data. The column is formatted as follows:
Production |
---|
Cherry Tomato 12x1pt - Packed |
Beefsteak 15 LB - Restack |
Mini Sweet Peppers 14x4 FM - Packed |
I'm wondering if there is a way to pull the "12x1pt" "15 LB" and "14x4 FM" into a column for "Size". Given the varying amount of white space, it isn't an easy task through the 'Split Column' plate in the ETL. I feel like this may require RegEx of which I have no experience with.
Thank you in advance.
Best Answer
-
Sorry, trying to write regexes on mobile is not a good idea :D
REGEXP_REPLACE(`PRODUCT`, '^.*?([0-9]+[^ -]*\s?.*?)(?= - ).*$', '$1')
Adding a ? after your \s will make the space optional so it then also works with your examples of Cocktail 16X14OZ - Packed or Cherry Tomato 12x1pt - Packed
Also I utilize a website called regex101.com to build and test my regular expressions which may be helpful.
**Was this post helpful? Click Agree or Like below**
**Did this solve your problem? Accept it as a solution!**0
Answers
-
Will it always be one more more number followed by x and then one or more numbers followed by some extra characters or will that format possibly change?
REGEXP_REPLACE(`Production`, '^.*([0-9]+x[0-9]+[^ ]*).*$', '$1')
This regex follows that pattern where it's looking for one or more digits preceeding the x character and then followed by one or more digits and optionally any number of non space characters.
**Was this post helpful? Click Agree or Like below**
**Did this solve your problem? Accept it as a solution!**0 -
@GrantSmith Would i use a Formula plate for that or would I place it into a "Custom" delimiter within the "Split Column" plate? Also no, they won't always have an x. Some will be whitespace between a # and "LB" "PK" "PT"
0 -
You'd use a formula tile to calculate a new field in Magic ETL.
If your end delimiter is ' - ' then we can tweak it to be something like:
REGEXP_REPLACE(`Production`, '^.*([0-9]+.*) - .*$', '$1')
This simplified version looks for numbers followed by any number of characters until it finds a space-space delimiter.
Will there be other numbers in your product names or just the amounts like you have shown in your examples?
**Was this post helpful? Click Agree or Like below**
**Did this solve your problem? Accept it as a solution!**0 -
@GrantSmith This seems to be removing anything before the x.
"Mini Sweet Peppers 14x4 FM - Packed" comes out as just "4 FM"
0 -
REGEXP_REPLACE(`PRODUCT`, '^.*?([0-9]+[^ -]*\s.*?)(?= - ).*$', '$1')
This works, however it then doesn't work for something like "Cocktail 16X14OZ - Packed". Looks like it's not going to be doable due to the variability of how the original column is formatted.
0 -
Sorry, trying to write regexes on mobile is not a good idea :D
REGEXP_REPLACE(`PRODUCT`, '^.*?([0-9]+[^ -]*\s?.*?)(?= - ).*$', '$1')
Adding a ? after your \s will make the space optional so it then also works with your examples of Cocktail 16X14OZ - Packed or Cherry Tomato 12x1pt - Packed
Also I utilize a website called regex101.com to build and test my regular expressions which may be helpful.
**Was this post helpful? Click Agree or Like below**
**Did this solve your problem? Accept it as a solution!**0 -
@GrantSmith Thank You again!
0
Categories
- All Categories
- 1.8K Product Ideas
- 1.8K Ideas Exchange
- 1.5K Connect
- 1.2K Connectors
- 300 Workbench
- 6 Cloud Amplifier
- 8 Federated
- 2.9K Transform
- 100 SQL DataFlows
- 616 Datasets
- 2.2K Magic ETL
- 3.8K Visualize
- 2.5K Charting
- 731 Beast Mode
- 55 App Studio
- 40 Variables
- 682 Automate
- 175 Apps
- 451 APIs & Domo Developer
- 46 Workflows
- 10 DomoAI
- 35 Predict
- 14 Jupyter Workspaces
- 21 R & Python Tiles
- 394 Distribute
- 113 Domo Everywhere
- 275 Scheduled Reports
- 6 Software Integrations
- 122 Manage
- 119 Governance & Security
- 8 Domo Community Gallery
- 38 Product Releases
- 10 Domo University
- 5.4K Community Forums
- 40 Getting Started
- 30 Community Member Introductions
- 107 Community Announcements
- 4.8K Archive