Does anyone know a Regex that only accepts a segment with a number before 'k' and nothing after?

Does anyone know how to create a Regex that only accepts a segment with a number before a 'k' and nothing after? Or it can be a regex that replaces everything that doesn't have a number before 'k' and nothing after with an empty string.

I am looking to extract the circ value for our email campaigns but some rows have a circ value while some don't. I tried making a case statement to only take data if the segment has a number and the letter k but some campaigns are named 'back4back' which includes a number and a k.

Is there a way to create a regex formula to only take the segments that have a number right before the letter 'k' and nothing after?
example is: 20k, 5700k, etc.

Tagged:

Best Answer

  • michiko
    michiko Contributor
    Answer ✓

    Not too good with regex but how about this 2 step process?

Answers

  • '^(\d+)k$'
    

    ^ - Start of the string

    \d - Digits

    + - one more more

    k - The character 'k'

    $ - End of the string

    If you want to plug it into just the number value:

    REGEXP_REPLACE(`column`, '^(\d+)k$', '$1')
    

    **Was this post helpful? Click Agree or Like below**
    **Did this solve your problem? Accept it as a solution!**
  • Hi @GrantSmith , thank you for your help! I appreciate it.
    Unfortunately, that formula didn't work for both the regex and the formula tile. Do you have any other recommendations?

  • To make sure I'm understanding your request - what should the output values be for your Circ Value column after applying the regex?

    What about the value of campgear1k? How should that be processed?

    Value 504?

    Value karaoke?

    **Was this post helpful? Click Agree or Like below**
    **Did this solve your problem? Accept it as a solution!**
  • Anna_Otake
    Anna_Otake Member
    edited May 2023

    @GrantSmith this is a demo example that I made just to test things out but essentially, I want the output value to be something like 110,000, 20,000, etc. I want to extract the number + 'k' and then multiply or replace that k with 1000.

    This means I want to take out anything like 99kelly or 3karaokegirls or ask50people. I hope that makes sense!

  • Thanks for the clarification - how is the formula not working? Is it because the original values aren't null or are the fields you're expecting to be converted not being converted to numbers properly?

    **Was this post helpful? Click Agree or Like below**
    **Did this solve your problem? Accept it as a solution!**
  • Here is a non-regex solution you can do in Magic ETL with the formula tile.

    CASE when INSTR(fieldname,'k') > 1 
    

    THEN
    STR_DIGITS(LEFT(fieldname,INSTR(fieldname,'k')-1)) * 1000
    END

    The INSTR() function will find the character position of the letter k. If it finds it then it will do the work in the WHEN statement.

    In the WHEN statement, It will take the LEFT number of characters based on the INSTR function minus 1. The STR_DIGITS() function will remove all non-numeric characters from that string. It will then multiply that by 1000.

    You may need to further tweak this, but this should get you pretty close.

    **Check out my Domo Tips & Tricks Videos

    **Make sure to <3 any users posts that helped you.
    **Please mark as accepted the ones who solved your issue.
  • Thank you @GrantSmith and @MarkSnodgrass!

    @MarkSnodgrass, using your formula, I was able to get it pretty close but your formula is still catching anything that has a 'k' followed by a number. On top of this formula you provided, is there a solution to exclude all data that has any number of letters after the 'k'?

  • Try this in a formula tile:

    CASE WHEN REGEXP_LIKE(`EmailName`, '^.*(\d+)k$') THEN REGEXP_REPLACE(`EmailName`, '^.*(\d+)k$', '$1') * 1000 END
    

    It's surrounding it in a case statement to make sure the field will match your regular expression if it does then format it otherwise it'll return null.

    **Was this post helpful? Click Agree or Like below**
    **Did this solve your problem? Accept it as a solution!**
  • Thanks @GrantSmith but unfortunately, only one row came back with a numerical value, everything else is null :(

  • Do you have any trailing whitespace in your email names? You could wrap your column name in a TRIM function in the formula.

    **Was this post helpful? Click Agree or Like below**
    **Did this solve your problem? Accept it as a solution!**
  • Anna_Otake
    Anna_Otake Member
    edited May 2023

    Hmm seems like it didn't work. Before this formula tile, I have a Text Formatting tile to make everything into a lowercase, then a String Operations tile to trim some spaces, then a formula tile to identify the last end of the EmailName to name it 'Circ Value'. Do you have any idea why this formula is deleting some of the values in this new column 'Circ Value'? Without the TRIM(), the only value that shows up is that 1000.0 for EmailName '050223_en_we_uk_pendants_campgear1k'

  • michiko
    michiko Contributor
    Answer ✓

    Not too good with regex but how about this 2 step process?

  • Thank you so much for your help @michiko ! It worked :D

    I removed my

    • SPLIT_PART(EmailName,'',LENGTH(EmailName) - LENGTH(REPLACE(EmailName,'','')) + 1)

    formula and applied your two formulas to my tile and it worked perfectly! I appreciate your help and thank you @GrantSmith ad @MarkSnodgrass for joining your brain on this problem as well!