Using regexp for pulling multiple values out a text string

cmoreno
cmoreno Member
edited April 2024 in Magic ETL

Hello, I need some help pulling out email addresses from a text string.

My goal is to place these multiple emails listed in the 'Description' column in a separate column, where they'll be separated by ','. Is there way for the search to end before the Sent Date info?

Thank you!

Tagged:

Best Answer

  • GrantSmith
    GrantSmith Coach
    Answer ✓

    Try something like this to make it a bit simpler:

    REPLACE(REGEXP_REPLACE(SPLIT_PART(SPLIT_PART(``, 'Sent:', 1), 'To: ', 2), '[^<]+<([^>]+>)', '$1'), '>', ',')
    
    SPLIT_PART(``, 'Sent:', 1)
    

    Gets evertying in your string before 'Sent: ' - drops everything after it

    SPLIT_PART(SPLIT_PART(``, 'Sent:', 1), 'To: ', 2)
    

    Gets the entire text in between the To: and Sent: values so you're left with just the email addresses and the aliases

    REGEXP_REPLACE(…, '[^<]+<([^>]+>)', '$1')
    

    Finds all occurrences of the email addresses and any values outside of the < > and drops it in favor of what's in the < >. This keeps the trailing > so we can replace it with REPLACE in the last step with commas.

    **Was this post helpful? Click Agree or Like below**
    **Did this solve your problem? Accept it as a solution!**

Answers

  • Are you wanting the From and To both to be included or just the To? Will the From and To fields only have a single email or would there be multiple emails?

    **Was this post helpful? Click Agree or Like below**
    **Did this solve your problem? Accept it as a solution!**
  • @GrantSmith I'm needing just the emails that would be listed in the To field. The emails listed in To field can have multiple email addresses listed. Thank you for your help!

  • GrantSmith
    GrantSmith Coach
    Answer ✓

    Try something like this to make it a bit simpler:

    REPLACE(REGEXP_REPLACE(SPLIT_PART(SPLIT_PART(``, 'Sent:', 1), 'To: ', 2), '[^<]+<([^>]+>)', '$1'), '>', ',')
    
    SPLIT_PART(``, 'Sent:', 1)
    

    Gets evertying in your string before 'Sent: ' - drops everything after it

    SPLIT_PART(SPLIT_PART(``, 'Sent:', 1), 'To: ', 2)
    

    Gets the entire text in between the To: and Sent: values so you're left with just the email addresses and the aliases

    REGEXP_REPLACE(…, '[^<]+<([^>]+>)', '$1')
    

    Finds all occurrences of the email addresses and any values outside of the < > and drops it in favor of what's in the < >. This keeps the trailing > so we can replace it with REPLACE in the last step with commas.

    **Was this post helpful? Click Agree or Like below**
    **Did this solve your problem? Accept it as a solution!**
  • @GrantSmith It worked, thank you!