Isolate email addresses from a column using Magic ETL or BeastMode

blittle
blittle Member
edited March 21 in Magic ETL

Hello, I am trying to isolate email addresses from an Information column in a Dataset. A sample of the data from the column would be something like 'someone's name [email@email.com]'

I don't necessarily need the name part, but I do need the Email Addresses, which are always bracketed in this column by '[]'.

I am using a Split Column tile in Magic ETL, and in the Choose Delimiter to Split On section I have 'Custom' selected and using the following REGEX code that I'm sure is wrong:

REGEX_EXTRACT([Information], "\[(.*?)\]", 0)

The code compiles, but the newly created column is copying all the text from the Information column (Column that's being split) exactly.

Any ideas how I can only grab the email address that's inside the brackets? I'm sure I'm missing something.

Thank you!

Best Answer

  • marcel_luthi
    marcel_luthi Coach
    Answer ✓

    REGEX can be tricky, specially when it comes to replacing, so you might want to try a couple of different options until you find one that works, here's one you can try:

    REGEXP_REPLACE(`Information`,'.*([a-zA-Z0-9._-]+@[a-zA-Z0-9._-]+\.[a-zA-Z0-9_-]+).*','$1')
    

Answers

  • GrantSmith
    GrantSmith Coach
    edited March 21

    it’s not matching anything so it’s returning the original string. You can use a formula tile and REGEXP_REPLACE

    REGEXP_REPLACE(`Information`, '^.*\[([^\]+).*$', '$1')
    

    **Was this post helpful? Click Agree or Like below**
    **Did this solve your problem? Accept it as a solution!**
  • alternatively you can split part the split part

    SPLIT_PART(SPLIT_PART(`Information`, '[',2), ']', 1)
    

    **Was this post helpful? Click Agree or Like below**
    **Did this solve your problem? Accept it as a solution!**
  • blittle
    blittle Member

    For some reason the Formula Tile gives me an Unknown Error Occurred when I try REGEX in the formula bar. That's where I tried it first.

  • Sorry @blittle - Appears it didn't get adequately submitted. I've updated the formula in the comment above.

    **Was this post helpful? Click Agree or Like below**
    **Did this solve your problem? Accept it as a solution!**
  • blittle
    blittle Member

    Hi Grant, Thank you for the help, I appreciate it!
    I tried the amended formula you posted, and 'validate formula' seems to like it, but when you run the preview, I am getting a new error.

    Any thoughts?

    Thanks again!

  • marcel_luthi
    marcel_luthi Coach
    Answer ✓

    REGEX can be tricky, specially when it comes to replacing, so you might want to try a couple of different options until you find one that works, here's one you can try:

    REGEXP_REPLACE(`Information`,'.*([a-zA-Z0-9._-]+@[a-zA-Z0-9._-]+\.[a-zA-Z0-9_-]+).*','$1')
    

  • blittle
    blittle Member

    Wow @marcel_luthi, that's definitely closer than I've been! That did semi work. I'm terrible with REGEX, so it's all trial and error for me.
    It seems to be grabbing the email address, but only returning the last character of the person's name, the @ symbol and domain (example: S@gmail.com). So it appears to be looking at what's inside the brackets, but only grabbing a portion.

    Is it only grabbing a character count of 1 to the left of the '@' symbol?

  • blittle
    blittle Member
    edited March 22

    I think I got the following to work…

    REGEXP_REPLACE(`Information`, '.?([a-zA-Z0-9.-]+@[a-zA-Z0-9.-]+.[a-zA-Z0-9_-]+).', '$1')
    

    It seems to be returning the full email address after an initial test, but I need to open up my filter to look at more rows.