How do I take a portion of a column value and push it to a new column?

Options

I have a column for campaign names that looks roughly like this

A1234567| Campaign 1 | Google Ads | ……

What I need to be able to do is read/grab all the non-space characters before the pipe and put that value in a new column called Campaign ID. In the case above, the new column for Campaign ID would contain A1234567. If the value does not start with an 'A' (or any other letter) followed by at least 1 number then I want to write 'unassigned' in the Campaign ID column. How would I do this?

Tagged:

Best Answers

  • ColemenWilson
    edited July 2023 Answer ✓
    Options

    I would recommend using the split column tile in magic ETL. https://domo-support.domo.com/s/article/360045402873?language=en_US

    It would look something like this:

    If I solved your problem, please select "yes" above

  • MichelleH
    MichelleH Coach
    edited July 2023 Answer ✓
    Options

    @dkonig If you are using MagicETL you can use the Split Column tile to separate a column based on a particular delimiter, in this case the pipe character. You can also do this using the SPLIT_PART function in a formula tile.

    Edit: @colemenwilson beat me to it!

  • GrantSmith
    GrantSmith Coach
    Answer ✓
    Options
    CASE WHEN REGEXP_LIKE(`field`, '^(A\d+).*$') THEN 
      REGEXP_REPLACE(`field`, '^(A\d+).*$', '$1')
    ELSE
    'unassigned'
    END
    

    You can use a formula tile and a regular expression to pull out the values which start with A followed by some numbers.

    ^ - Start of the line

    () - Match group to store the results found - in this case the first match group which is stored in the variable $1

    A - the literal letter A

    \d+ - One or more digits

    .* 0 or more of anything

    $ - The end of the string

    **Was this post helpful? Click Agree or Like below**
    **Did this solve your problem? Accept it as a solution!**
  • marcel_luthi
    marcel_luthi Coach
    Answer ✓
    Options

    With your description the Regex version would seem to be the best fit. Just keep in mind that you'd need to change this since you just mentioned the campaign code might be anywhere in the text, so something like this might do the trick: ^.*\b(A\d+).*$ in this case you don't care that the code starts. You can see the comparison on matches between the two versions here:

Answers

  • ColemenWilson
    edited July 2023 Answer ✓
    Options

    I would recommend using the split column tile in magic ETL. https://domo-support.domo.com/s/article/360045402873?language=en_US

    It would look something like this:

    If I solved your problem, please select "yes" above

  • MichelleH
    MichelleH Coach
    edited July 2023 Answer ✓
    Options

    @dkonig If you are using MagicETL you can use the Split Column tile to separate a column based on a particular delimiter, in this case the pipe character. You can also do this using the SPLIT_PART function in a formula tile.

    Edit: @colemenwilson beat me to it!

  • dkonig
    dkonig Member
    Options

    @colemenwilson I completely missed there was an option for 'custom' when choosing a delimiter. I swear looked for it. But this only solves part of my problem. If the first value does not begin with a letter followed by a number then those values would also be added to the new column. Is there a way to prevent that and simply write 'unassigned' or 'unknown' in that new column?

  • dkonig
    dkonig Member
    Options

    Hmm. I think, while the Split Column tile is simple, that in my case there is a little more complexity. I think this might be a case to use the Add Formula tile? I also found out that older campaign names sometimes have the A1234567 at some other location in the campaign name. Like this

    campaign1 - Google Ads - A1234567 - Other information (not at the beginning and a dash as a separator)

    or

    campaign1 Google Ads A1234567 Other information (not at the beginning and a space as a separator)

    Thoughts on how to solve for this? I'm thinking the forumla needs to look for all instances where a string starts with "A" an is followed by a series of numbers which could be 7 or more followed by a space or something like that.

  • GrantSmith
    GrantSmith Coach
    Answer ✓
    Options
    CASE WHEN REGEXP_LIKE(`field`, '^(A\d+).*$') THEN 
      REGEXP_REPLACE(`field`, '^(A\d+).*$', '$1')
    ELSE
    'unassigned'
    END
    

    You can use a formula tile and a regular expression to pull out the values which start with A followed by some numbers.

    ^ - Start of the line

    () - Match group to store the results found - in this case the first match group which is stored in the variable $1

    A - the literal letter A

    \d+ - One or more digits

    .* 0 or more of anything

    $ - The end of the string

    **Was this post helpful? Click Agree or Like below**
    **Did this solve your problem? Accept it as a solution!**
  • marcel_luthi
    marcel_luthi Coach
    Answer ✓
    Options

    With your description the Regex version would seem to be the best fit. Just keep in mind that you'd need to change this since you just mentioned the campaign code might be anywhere in the text, so something like this might do the trick: ^.*\b(A\d+).*$ in this case you don't care that the code starts. You can see the comparison on matches between the two versions here:

  • dkonig
    dkonig Member
    edited July 2023
    Options

    @marcel_luthi Very helpful. I was just informed today that they plan on dropping the first letter. Of course they needed to complicate this more. Can this be modified to find the string of numbers regardless of whether it starts with an A or not? And tRegex would be the regex to use in the CASE statement @GrantSmith supplied?

  • dkonig
    dkonig Member
    Options

    @GrantSmith I"m not sure I completely understand your CASE statement. This will create a new columns and put the extracted value in it? Does 'field' need to be replaced with the column/field name?

  • GrantSmith
    Options

    This would create a new field with the extracted value if it matches the logic, otherwise it's got a value of Unassigned.

    `field` needs to be replaced with the source field / column which has your campaign name data.

    **Was this post helpful? Click Agree or Like below**
    **Did this solve your problem? Accept it as a solution!**
  • dkonig
    dkonig Member
    edited July 2023
    Options

    I think this might suffice right for a RegEx that would capture any value that starts with and A or B followed by any number of digits or just a series of digits that is t least a minimum of 5?

    ^.\b(A\d+)|(B\d+)|(\d{5,}).$

    If the it is always a minimum of 5 digits then this might be even better or perhaps there is an even shorter version of this rule?

    ^.\b(A\{5,})|(B\{5,})|(\d{5,}).$

  • dkonig
    dkonig Member
    Options

    @GrantSmith This is the case statement I finalized on with some help from ChatGPT but this does not give me the expected result of returning just the match. Instead the entire campaign name is copied. What am I missing?

    CASE WHEN REGEXP_LIKE(`name`, '(M\d{5,}|C\d{5,}|\d{5,})') THEN

    REGEXP_REPLACE(`name`, '(M\d{5,}|C\d{5,}|\d{5,})', '$1')

    ELSE

    'unassigned'

    END

  • marcel_luthi
    Options

    Not sure if the version of MySQL currently used supports it, but you could try changing the REGEXP_REPLACE(`name`, '(M\d{5,}|C\d{5,}|\d{5,})', '$1') for REGEXP_SUBSTR(`name`, '(M\d{5,}|C\d{5,}|\d{5,})') (I know not all the MySQL REGEXP functions work, but it might be worth trying)

  • dkonig
    dkonig Member
    Options

    REGEXP_SUBSTR doesn't appear to be an available function

  • GrantSmith
    Options

    This is because your regex is searching for a mapping and then replacing that with the same thing it found. Because there is no REGEXP_SUBSTING you need to match the entire string and then it'll replace the entire string with your mapping.

    You can do this with the ^ (start of string) and .* (any number of characters) and $ (end of string)

    CASE WHEN REGEXP_LIKE(`name`, '^([MC]?\d{5,}).*$') THEN
    
    REGEXP_REPLACE(`name`, '^([MC]?\d{5,}).*$', '$1')
    
    ELSE
    
    'unassigned'
    
    END
    

    I've also simplified your regular expression

    [] - Match any character in this group, in this case M or C

    []? - Match 0 or 1 of these characters in the group. This will handle M, C and no prefixes

    **Was this post helpful? Click Agree or Like below**
    **Did this solve your problem? Accept it as a solution!**
  • dkonig
    dkonig Member
    Options

    Ah. Yes. I had to update it further based on @marcel_luthi's earlier answer where it would find the value anywhere in the campaign name and not just at the start.

    Like this: ^.\b([MC]?\d{5,}).$