Magic ETL

Magic ETL

Separating words in a column where certain words have more spaces than others

I have a column in a dataset of production data. The column is formatted as follows:

Production

Cherry Tomato 12x1pt - Packed

Beefsteak 15 LB - Restack

Mini Sweet Peppers 14x4 FM - Packed

I'm wondering if there is a way to pull the "12x1pt" "15 LB" and "14x4 FM" into a column for "Size". Given the varying amount of white space, it isn't an easy task through the 'Split Column' plate in the ETL. I feel like this may require RegEx of which I have no experience with.

Thank you in advance.

Welcome!

It looks like you're new here. Members get access to exclusive content, events, rewards, and more. Sign in or register to get started.
Sign In

Best Answer

  • Coach
    Answer ✓

    Sorry, trying to write regexes on mobile is not a good idea :D

    1. REGEXP_REPLACE(`PRODUCT`, '^.*?([0-9]+[^ -]*\s?.*?)(?= - ).*$', '$1')

    Adding a ? after your \s will make the space optional so it then also works with your examples of Cocktail 16X14OZ - Packed or Cherry Tomato 12x1pt - Packed

    @ChrisGaffan

    Also I utilize a website called regex101.com to build and test my regular expressions which may be helpful.

    **Was this post helpful? Click Agree or Like below**
    **Did this solve your problem? Accept it as a solution!**

Answers

  • Will it always be one more more number followed by x and then one or more numbers followed by some extra characters or will that format possibly change?

    1. REGEXP_REPLACE(`Production`, '^.*([0-9]+x[0-9]+[^ ]*).*$', '$1')

    This regex follows that pattern where it's looking for one or more digits preceeding the x character and then followed by one or more digits and optionally any number of non space characters.

    **Was this post helpful? Click Agree or Like below**
    **Did this solve your problem? Accept it as a solution!**
  • Member
    edited July 2024

    @GrantSmith Would i use a Formula plate for that or would I place it into a "Custom" delimiter within the "Split Column" plate? Also no, they won't always have an x. Some will be whitespace between a # and "LB" "PK" "PT"

  • You'd use a formula tile to calculate a new field in Magic ETL.

    If your end delimiter is ' - ' then we can tweak it to be something like:

    1. REGEXP_REPLACE(`Production`, '^.*([0-9]+.*) - .*$', '$1')

    This simplified version looks for numbers followed by any number of characters until it finds a space-space delimiter.

    Will there be other numbers in your product names or just the amounts like you have shown in your examples?

    **Was this post helpful? Click Agree or Like below**
    **Did this solve your problem? Accept it as a solution!**
  • @GrantSmith This seems to be removing anything before the x.

    "Mini Sweet Peppers 14x4 FM - Packed" comes out as just "4 FM"

  • Member
    edited July 2024
    1. REGEXP_REPLACE(`PRODUCT`, '^.*?([0-9]+[^ -]*\s.*?)(?= - ).*$', '$1')

    This works, however it then doesn't work for something like "Cocktail 16X14OZ - Packed". Looks like it's not going to be doable due to the variability of how the original column is formatted.

    @GrantSmith

  • Coach
    Answer ✓

    Sorry, trying to write regexes on mobile is not a good idea :D

    1. REGEXP_REPLACE(`PRODUCT`, '^.*?([0-9]+[^ -]*\s?.*?)(?= - ).*$', '$1')

    Adding a ? after your \s will make the space optional so it then also works with your examples of Cocktail 16X14OZ - Packed or Cherry Tomato 12x1pt - Packed

    @ChrisGaffan

    Also I utilize a website called regex101.com to build and test my regular expressions which may be helpful.

    **Was this post helpful? Click Agree or Like below**
    **Did this solve your problem? Accept it as a solution!**
  • @GrantSmith Thank You again!

Welcome!

It looks like you're new here. Members get access to exclusive content, events, rewards, and more. Sign in or register to get started.
Sign In