Replacing Text in ETL

In my dataset, data can be entered in various formats, leading to multiple representations for the same individual data point. I'm utilizing the 'Replace Text' feature to standardize these names under one common identifier in my recursive ETL. However, this isn't replacing the existing names as intended. Instead, it's adding the last word of the text and creating a new name in my dataset. I have attached a picture of what is happing.

What I want:

What it is formatted as in some cases:

Is there a specific setting I need to click in order for this not to happen? Or maybe move the Replace Text tile to a new location within my recursive ETL?

Best Answers

  • ColemenWilson
    Answer ✓

    I would recommend using the formula tile and using a case statement. Something like this:

    CASE WHEN `city` IN('hon, HI','HI','Hawaii','Whatever else') THEN 'Honolulu, HI' ELSE `city` END

    If I solved your problem, please select "yes" above

  • MichelleH
    MichelleH Coach
    Answer ✓

    @TC1199A You could also use the following in the formula tile to only keep the characters before the second comma, if there are multiple:

    IFNULL(left(`city`,SUBSTRING_INDEX(`city`,',',2)-1),`city`)
    

Answers

  • ColemenWilson
    Answer ✓

    I would recommend using the formula tile and using a case statement. Something like this:

    CASE WHEN `city` IN('hon, HI','HI','Hawaii','Whatever else') THEN 'Honolulu, HI' ELSE `city` END

    If I solved your problem, please select "yes" above

  • @ColemenWilson I will give this a try. would you recommend this for all text replacement? I have some other fields where i am seeing the same issue.

  • I generally use the formula tile over text replace in most applications. You have many more options for text replace because you have the full suite of MySQL text modification available to you. https://dev.mysql.com/doc/refman/8.0/en/string-functions.html

    If I solved your problem, please select "yes" above

  • Can you post some examples of your raw data before your replace text and also what your replace text tile settings look like? There may be a more programatic way of solving your problem with a regular expression.

    **Was this post helpful? Click Agree or Like below**
    **Did this solve your problem? Accept it as a solution!**
  • MichelleH
    MichelleH Coach
    Answer ✓

    @TC1199A You could also use the following in the formula tile to only keep the characters before the second comma, if there are multiple:

    IFNULL(left(`city`,SUBSTRING_INDEX(`city`,',',2)-1),`city`)