Using Magic ETL to extract a substring using Regex and excluding remaining strings

Options

Hello!

I know variations of this question have been asked in the forums but I haven't been able to find a solution that works for me.

I have a column Project Name that contains Projects out of Financial Force that may contain an At-Risk ticket number, which is formatted as: AR-00000 i.e. literally "AR-" or "AR - " followed by a 5 digit number. For example: AR-12345 Pinegrove Building Supply and Contractors with Consultation Hours. I want to preserve the AR number and delete the remaining strings. The Regex I've written and played with is \b(AR-\d{5})\b

I have tried using MagicETL's "Formula" and "Replace Text" (with the Use Regex flag enabled) tiles but have been unable to preserve the AR ticket number while also dropping the remaining text.

One of my configuration attempts is as follows:

Replace Text Tile
Configuration 1 (Column to search): Project Name
Config 2 (search term with Use Regex enabled): .*\b(AR-\d{5})\b
Config 3 (replace term): Empty string
Result (unchanged): AR-12345 Pinegrove Building Supply and Contractors with Consultation Hours

Can you please advise a solution?

Tagged:

Best Answer

  • GrantSmith
    GrantSmith Coach
    edited July 2023 Answer ✓
    Options

    You'd need to do it in a formula tile instead of the replace text tile with something like this:

    CASE WHEN REGEXP_REPLACE(`Project Name`,'^.*\b(AR-\d{5})\b.*$','$1') <> `Project Name` THEN REGEXP_REPLACE(`Project Name`,'^.*\b(AR-\d{5})\b.*$','$1') END
    

    or alternatively:

    CASE WHEN REGEXP_LIKE(`Project Name`,'^.*\b(AR-\d{5})\b.*$','$1') THEN REGEXP_REPLACE(`Project Name`,'^.*\b(AR-\d{5})\b.*$','$1') END
    

    **Was this post helpful? Click Agree or Like below**
    **Did this solve your problem? Accept it as a solution!**

Answers

  • GrantSmith
    Options

    You'd need to do a replacement variable and tell it to ignore the rest of the string

    Try changing your regex to:

    ^.*\b(AR-\d{5})\b.*$
    

    Then in your replace term use $1 to replace it with your first match group.

    **Was this post helpful? Click Agree or Like below**
    **Did this solve your problem? Accept it as a solution!**
  • DashboardDude
    edited July 2023
    Options

    John Le

    You're only one dashboard away.

    Click here for more video solutions: https://www.dashboarddudes.com/pantry

  • NDoyle
    NDoyle Member
    Options

    Thanks Grant. I updated the following:

    Replace Text tile:
    Config (Column to search): Project Name
    Config 2 (search term with Use Regex enabled): ^.\b(AR-\d{5})\b.$
    Config 3 (replace term): $1

    This results in:

    For rows that do contain an AR number, the result is just the AR number, which is expected. For rows that do not contain an AR number, the Project Name value is preserved but I want it to return null or blank or empty.

    Add Formula tile:
    Config (Name output column): AR num
    Config 2 (formula): REGEXP_REPLACE(`Project Name`,'^.\b(AR-\d{5})\b.*$','$1')

    This results in:

    A new AR num column, which is fine. For rows that do contain an AR number, the result is just the AR number, which is expected. For rows that do not contain an AR number, the Project Name value is preserved but I want it to return null or blank or empty.

    Is there a way to return null when the "parent string" doesn't contain a match?

  • GrantSmith
    GrantSmith Coach
    edited July 2023 Answer ✓
    Options

    You'd need to do it in a formula tile instead of the replace text tile with something like this:

    CASE WHEN REGEXP_REPLACE(`Project Name`,'^.*\b(AR-\d{5})\b.*$','$1') <> `Project Name` THEN REGEXP_REPLACE(`Project Name`,'^.*\b(AR-\d{5})\b.*$','$1') END
    

    or alternatively:

    CASE WHEN REGEXP_LIKE(`Project Name`,'^.*\b(AR-\d{5})\b.*$','$1') THEN REGEXP_REPLACE(`Project Name`,'^.*\b(AR-\d{5})\b.*$','$1') END
    

    **Was this post helpful? Click Agree or Like below**
    **Did this solve your problem? Accept it as a solution!**
  • NDoyle
    NDoyle Member
    Options

    Thanks @DashboardDude for the workaround. This made total sense as an alternative solution! I got stuck in a box and, due to stubbornness, didn't consider a non-Regex solution haha.

  • DashboardDude
    Options

    @NDoyle ,

    Glad it helped! I know the stubbornness route well so wanted to save you a headache.

    Please mark my solution as helpful when you can

    John Le

    You're only one dashboard away.

    Click here for more video solutions: https://www.dashboarddudes.com/pantry