Any Way to Use Regular Expressions to Change Case of Specific Words

I've got a requirement to change the appearance of articles and prepositions in words in a column to be displayed as lowercase after running a Title Case on the rest of the phrase. Right now, I have an extensive list of text replacements in Magic ETL to do this, but I was wondering if there might be a way to do this with a regular expression that just lists the words I'm looking for and only converts those to lowercase. Here's a screenshot of the mess I currently deal with:


Tagged:

Comments

  • DomoDork
    DomoDork Contributor

    Hi pstrauss,

    You can do this in a MagicETL. The best way I've found to do this is using the Python Scripting Block. In the example I put together for you, I wanted to make it simple to configure which words end up capitalized without having to reconfigure my ETL every time I want to add a new preposition. In the design pattern below, I just used a Domo Webform to store a list of my words so all you need to do is pop new values into it.

    1) There are 2 data sources in this ETL. The first is a table containing rows of articles text. The second is my dataset of prepositions.

    2) Connect both sources to a Python Scripting block and add the python code shown in the screenshot at the bottom of this post.

    3) Make sure you change any references of 'Articles' or 'Prepositions' to match your dataset names. Then, change 'ArticleText' in the code to whatever the column name is in your articles dataset.

    4) Once you make sure all dataset name and column name references match your existing datasets, run a preview and check the 'SCHEMA' tab on the Python Script block to ensure it's properly picking up the output schema.

    You'll notice a simple capitalise() function. All it does is take your prepositions dataset and convert it into a list. Then it takes your article dataset and converts the article text itself into a list. It then goes through your article text and checks to see each word is in your prepositions list. If it is, it capitalises it with the title() function. If not, it leaves that word unmodified.

    What you are left with is an output dataset where the article text has your prepositions capitalized because we've run a pandas apply() function to your article data and telling it to execute the capitalise() function on your data.

    Edit: I just realized you wanted to opposite of capitalizing, and instead lowercasing your prepositions. Thats easy enough to do by changing the "v.title() if v in list..." to "v.lower() if v in list..."

    Apologies for the mix up! This should still get you 99% of the way there. I hope this helps! :)

    Python Script

    Articles Dataset

    Prepositions Dataset

    Final ETL Output

    Notice all instances of and, to, from, the, with are now all capitalized


  • Thanks so much for the comprehensive suggestion @bobbyragsdale - I'll give this a try in our environment!

  • DomoDork
    DomoDork Contributor

    @pstrauss You're very welcome! If you try it out and get stuck or need any guidance please feel free to message me directly and I'll help where I can. If you implement it successfully, a quick follow-up here would be greatly appreciated as well. I'm just trying to be mindful of others who might find this technique useful and wonder if the solutions proposed are worth their time.