Does anyone know a Regex that only accepts a segment with a number before 'k' and nothing after?

Options
Anna_Otake
Anna_Otake Contributor

Does anyone know how to create a Regex that only accepts a segment with a number before a 'k' and nothing after? Or it can be a regex that replaces everything that doesn't have a number before 'k' and nothing after with an empty string.

I am looking to extract the circ value for our email campaigns but some rows have a circ value while some don't. I tried making a case statement to only take data if the segment has a number and the letter k but some campaigns are named 'back4back' which includes a number and a k.

Is there a way to create a regex formula to only take the segments that have a number right before the letter 'k' and nothing after?
example is: 20k, 5700k, etc.

Tagged:

Best Answer

  • michiko
    michiko Contributor
    Answer ✓
    Options

    Not too good with regex but how about this 2 step process?

Answers

  • GrantSmith
    Options
    '^(\d+)k$'
    

    ^ - Start of the string

    \d - Digits

    + - one more more

    k - The character 'k'

    $ - End of the string

    If you want to plug it into just the number value:

    REGEXP_REPLACE(`column`, '^(\d+)k$', '$1')
    

    **Was this post helpful? Click Agree or Like below**
    **Did this solve your problem? Accept it as a solution!**
  • Anna_Otake
    Anna_Otake Contributor
    Options

    Hi @GrantSmith , thank you for your help! I appreciate it.
    Unfortunately, that formula didn't work for both the regex and the formula tile. Do you have any other recommendations?

  • GrantSmith
    Options

    To make sure I'm understanding your request - what should the output values be for your Circ Value column after applying the regex?

    What about the value of campgear1k? How should that be processed?

    Value 504?

    Value karaoke?

    **Was this post helpful? Click Agree or Like below**
    **Did this solve your problem? Accept it as a solution!**
  • Anna_Otake
    Anna_Otake Contributor
    edited May 2023
    Options

    @GrantSmith this is a demo example that I made just to test things out but essentially, I want the output value to be something like 110,000, 20,000, etc. I want to extract the number + 'k' and then multiply or replace that k with 1000.

    This means I want to take out anything like 99kelly or 3karaokegirls or ask50people. I hope that makes sense!

  • GrantSmith
    Options

    Thanks for the clarification - how is the formula not working? Is it because the original values aren't null or are the fields you're expecting to be converted not being converted to numbers properly?

    **Was this post helpful? Click Agree or Like below**
    **Did this solve your problem? Accept it as a solution!**
  • MarkSnodgrass
    Options

    Here is a non-regex solution you can do in Magic ETL with the formula tile.

    CASE when INSTR(fieldname,'k') > 1 
    

    THEN
    STR_DIGITS(LEFT(fieldname,INSTR(fieldname,'k')-1)) * 1000
    END

    The INSTR() function will find the character position of the letter k. If it finds it then it will do the work in the WHEN statement.

    In the WHEN statement, It will take the LEFT number of characters based on the INSTR function minus 1. The STR_DIGITS() function will remove all non-numeric characters from that string. It will then multiply that by 1000.

    You may need to further tweak this, but this should get you pretty close.

    **Check out my Domo Tips & Tricks Videos

    **Make sure to <3 any users posts that helped you.
    **Please mark as accepted the ones who solved your issue.
  • Anna_Otake
    Anna_Otake Contributor
    Options

    Thank you @GrantSmith and @MarkSnodgrass!

    @MarkSnodgrass, using your formula, I was able to get it pretty close but your formula is still catching anything that has a 'k' followed by a number. On top of this formula you provided, is there a solution to exclude all data that has any number of letters after the 'k'?

  • GrantSmith
    Options

    Try this in a formula tile:

    CASE WHEN REGEXP_LIKE(`EmailName`, '^.*(\d+)k$') THEN REGEXP_REPLACE(`EmailName`, '^.*(\d+)k$', '$1') * 1000 END
    

    It's surrounding it in a case statement to make sure the field will match your regular expression if it does then format it otherwise it'll return null.

    **Was this post helpful? Click Agree or Like below**
    **Did this solve your problem? Accept it as a solution!**
  • Anna_Otake
    Anna_Otake Contributor
    Options

    Thanks @GrantSmith but unfortunately, only one row came back with a numerical value, everything else is null :(

  • GrantSmith
    Options

    Do you have any trailing whitespace in your email names? You could wrap your column name in a TRIM function in the formula.

    **Was this post helpful? Click Agree or Like below**
    **Did this solve your problem? Accept it as a solution!**
  • Anna_Otake
    Anna_Otake Contributor
    edited May 2023
    Options

    Hmm seems like it didn't work. Before this formula tile, I have a Text Formatting tile to make everything into a lowercase, then a String Operations tile to trim some spaces, then a formula tile to identify the last end of the EmailName to name it 'Circ Value'. Do you have any idea why this formula is deleting some of the values in this new column 'Circ Value'? Without the TRIM(), the only value that shows up is that 1000.0 for EmailName '050223_en_we_uk_pendants_campgear1k'

  • michiko
    michiko Contributor
    Answer ✓
    Options

    Not too good with regex but how about this 2 step process?

  • Anna_Otake
    Anna_Otake Contributor
    Options

    Thank you so much for your help @michiko ! It worked :D

    I removed my

    • SPLIT_PART(EmailName,'',LENGTH(EmailName) - LENGTH(REPLACE(EmailName,'','')) + 1)

    formula and applied your two formulas to my tile and it worked perfectly! I appreciate your help and thank you @GrantSmith ad @MarkSnodgrass for joining your brain on this problem as well!