Regex formula works in ETL preview but doesn't show up in dataset output?

ccccc4 · 2024-10-25T15:30:59+00:00

There was an error rendering this rich post.

I'm using the Dataset from Email connector to push a weekly email into Domo. The body of the email contains an account balance, which I'm trying to extract into a new field. The body of the email will always contain "Amount: USD 123,456.00"

I created a regex to match what comes after the "USD " and before the next whitespace, and in the preview for the formula tile it shows the value I'm expecting. However, when I run the dataflow, the new field is blank.

case when REGEXP_LIKE(`body`,'^.*(?<=USD ).*?([^\s]+).*$') then REGEXP_REPLACE(`body`,'^.*(?<=USD ).*?([^\s]+).*$','$1') end

Find more posts tagged with

Accepted answers

All comments

ArborRose

The first part of that regular expression may not work with Domo SQL because part of that is a "lookbehind", which can be finicky. If you try to isolate the numeric part following the USD, you could try something like:

CASE 

    WHEN REGEXP_LIKE(`body`, 'USD\s+\d{1,3}(,\d{3})*(\.\d{2})?')

    THEN REGEXP_REPLACE(`body`, '.*USD\s+(\d{1,3}(,\d{3})*(\.\d{2})?).*', '$1')

END

ggenovese

Also, another thing to consider is whether any downstream operations are being performed on the Amount Paid field. In other words, maybe the regex is working but there is another step in the workflow that is causing it to become a NULL value.

ccccc4

I tried this and am getting the same thing. The preview renders, but the output is empty. I'm not doing any other operations after this in the dataflow.

CASE 

    WHEN REGEXP_LIKE(`body`, 'USD\s+\d{1,3}(,\d{3})*(\.\d{2})?')
    THEN REGEXP_REPLACE(`body`, '.*USD\s+(\d{1,3}(,\d{3})*(\.\d{2})?).*', '$1') end

ggenovese

I'd suggest adding an else so that you can determine whether the issue is that your WHEN isn't evaluating to true or whether the issue is with the replace

ArborRose

If the preview works but the output is empty, it might be there are characters unseen. Have you tried using trim/replace? I'm not sure if this formula is quite right as I'm not in an example.

CASE 

    WHEN REGEXP_LIKE(REGEXP_REPLACE(`body`, '^\s+|\s+$', ''), 'USD\s+\d{1,3}(,\d{3})*(\.\d{2})?')

    THEN REGEXP_REPLACE(REGEXP_REPLACE(`body`, '^\s+|\s+$', ''), '.*USD\s+(\d{1,3}(,\d{3})*(\.\d{2})?).*', '$1')

END

ccccc4

It's getting closer. Now, it starts with the dollar amount but also grabs everything that comes after that.

case 

when REGEXP_LIKE(trim(`body`),'^.*USD\s+(\d{1,3},\d{1,3}\.\d{1,2}).*') 

then REGEXP_REPLACE(trim(`body`),'^.*USD\s+(\d{1,3},\d{1,3}\.\d{1,2}).*','$1') 

else 0 end

MarkSnodgrass

I am not a fan of regex as I find it difficult to read, maintain and troubleshoot. I would suggest the following:

Take this example:

I use the formula tile and start by using the RIGHT() and INSTR() to find the starting position of Amount USD:

I then use the SPLIT_PART() function to look for the amount by getting the string after the third space and stops when it gets to the fourth space. This eliminates all the text after the amount.

Finally, I format it as a decimal by replacing the comma with nothing and then using TRY_CAST() function to safely convert it.

Hope this helps.

ArborRose

If you continue with regexp, you can try something like this…

CASE 

    WHEN REGEXP_LIKE(REGEXP_REPLACE(`body`, '^\s+|\s+$', ''), 'USD\s+\d{1,3}(,\d{3})*(\.\d{2})?')

    THEN REGEXP_REPLACE(REGEXP_REPLACE(`body`, '^\s+|\s+$', ''), '.*USD\s+(\d{1,3}(?:,\d{3})*(?:\.\d{2})?).*?', '$1')

END

rco

https://community-forums.domo.com/main/discussion/comment/99091#Comment_99091

I agree with Mark, for something like this you're better off without regex. I would note, however, that you're also slightly better off just using two SPLIT_PART invocations, like this:

SPLIT_PART(SPLIT_PART(my_string, 'My Prefix', 2), 'My Suffix', 1)

Paired SPLIT_PART invocations like this will give you the part of your string that appears between the two constant strings. So, for our specific case:

SPLIT_PART(SPLIT_PART(body, 'Amount: USD ', 2), ' ', 1)

I'd also recommend using SQUASH_WHITESPACE on the body first thing, so that subtle differences in the text don't cause the process to fail, like multiple spaces appearing in between 'Amount:' and 'USD' or 'USD' and the digits, or perhaps a newline appearing after the digits instead of a space. SQUASH_WHITESPACE will take care of all those cases:

SPLIT_PART(SPLIT_PART(SQUASH_WHITESPACE(body), 'Amount: USD ', 2), ' ', 1)

In Mark's example, this would replace the "amount" formula and the "StringWithAmount" would be unnecessary. Though you could of course split this formula into another named stage if you desired.

All of that said, I am still curious about what was happening that was causing the preview to produce a result while the execution was not. Unfortunately the screenshot does not include enough of the result row to determine whether the failure was happening for the same input value as had been working in the preview. There would be no difference in the regex behavior between the two, so there must have been a difference in the input data the two were working with. Could it be that the Input DataSet tile is set to load only new rows? Just a curiosity.