changing a name in ETL

I have an employee name in my raw data shown as >>>

last_name, First_name middle_initial

Doe, John H.

or sometimes shown as >>>

Doe, John Dorian H.

How can I make it so it would appear as >>>

John Doe

or

John Dorian Doe

in DOMO using ETL or beastmode?

Thanks

Best Answer

  • GrantSmith
    GrantSmith Coach
    Answer ✓

    Ah, your data has some other cases than the original one. It's not capturing some of them because it's not configured in the same format.

    Try something like this:

    REGEXP_REPLACE(`Name`, '^([^,]+), (\w+)(( \w+\.)?)|( \w+)$', '$2$5 $1')
    

    Breakdown:

    Group 1: Last Name

    ^([^,]+)
    

    This captures () one or more + characters which are not commas [^,] at the start of the string ^


    Group 2: First Name

    , (\w+)
    

    This captures () one or more + word characters \w following a comma and space


    Groups 3&4: Suffixes / Middle Initials - we don't use these but still need to capture them to tell it to ignore them.

    (( \w+\.)?)
    

    Captures space followed by 1 or more word characters followed by a period \. This can happen 0 or 1 times ? - This will capture the abbreviations at the end of the name


    Group 5: Additional middle name

    |( \w+)$
    

    OR | space followed by 1 or more word characters at the end of the string $

    $2$5 $1
    

    This tells the regexp_replace function to replace everything it matched with the text that was matched in groups 2, 5 and 1 - Group 5 captures the space if there's a second name, otherwise if nothing is captured it won't add an extra space which is why there isn't a space between $2 and $5.



    Regular expressions can be a bit difficult to master but are really powerful I recommend playing around with them on websites like regex101.com to learn how they work in more detail.

    **Was this post helpful? Click Agree or Like below**
    **Did this solve your problem? Accept it as a solution!**

Answers

  • You could use a regular expression in a formula tile with something like this:

    REGEXP_REPLACE(`Name`, '^(\w+), (\w+) ((\w+) )?\w+.*$', '$2 $4 $1')
    


    **Was this post helpful? Click Agree or Like below**
    **Did this solve your problem? Accept it as a solution!**
  • Hi GrantSmith


    Thank you for this.


    But its weird because, it works for some and not for others?

    and its also not consistently capturing the second name of the ones that have second names.


    Thanks

  • I would use the split_part function in a formula tile. It will look for the comma and split accordingly.

    Last Name

    SPLIT_PART(`employeename`, ',' ,1) 
    

    First Name and anything else

    SPLIT_PART(`employeename`, ',' , 2)
    
    **Check out my Domo Tips & Tricks Videos

    **Make sure to <3 any users posts that helped you.
    **Please mark as accepted the ones who solved your issue.
  • GrantSmith
    GrantSmith Coach
    Answer ✓

    Ah, your data has some other cases than the original one. It's not capturing some of them because it's not configured in the same format.

    Try something like this:

    REGEXP_REPLACE(`Name`, '^([^,]+), (\w+)(( \w+\.)?)|( \w+)$', '$2$5 $1')
    

    Breakdown:

    Group 1: Last Name

    ^([^,]+)
    

    This captures () one or more + characters which are not commas [^,] at the start of the string ^


    Group 2: First Name

    , (\w+)
    

    This captures () one or more + word characters \w following a comma and space


    Groups 3&4: Suffixes / Middle Initials - we don't use these but still need to capture them to tell it to ignore them.

    (( \w+\.)?)
    

    Captures space followed by 1 or more word characters followed by a period \. This can happen 0 or 1 times ? - This will capture the abbreviations at the end of the name


    Group 5: Additional middle name

    |( \w+)$
    

    OR | space followed by 1 or more word characters at the end of the string $

    $2$5 $1
    

    This tells the regexp_replace function to replace everything it matched with the text that was matched in groups 2, 5 and 1 - Group 5 captures the space if there's a second name, otherwise if nothing is captured it won't add an extra space which is why there isn't a space between $2 and $5.



    Regular expressions can be a bit difficult to master but are really powerful I recommend playing around with them on websites like regex101.com to learn how they work in more detail.

    **Was this post helpful? Click Agree or Like below**
    **Did this solve your problem? Accept it as a solution!**