Text to Floating Decimal Issue

Options

Hey there! So, I am trying to alter a text into a floating decimal but I keep getting the "failed to parse" warning. For context, I'm uploading a CSV file that already has the field I'm needing as a currency. But, when I upload into Domo it turns it into a text. I will be needing to aggregate (sum) this field so I need it into a number instead of text.

I've tried replacing the '$' as an empty string and then altering the text to floating decimal, but that didn't work. I've also tried using the text formatting tile, but that doesn't appear to be giving me what I need either since removing the decimal only results in incorrect numbers.

Does anyone know a workaround for this without having to change the CSV file every time it's uploaded?

*I provided a screenshot of how the field looks like the CSV file prior to uploading into Domo.

Best Answer

  • rco
    rco Contributor
    edited April 16 Answer ✓
    Options

    I would recommend trying to understand the problem a little better before fixing it in that particular way. The error message you're getting back should include an example of a failing value. Something like "Failed to parse text '$1.23' as type…", or perhaps "Failed to parse text '#VALUE!'…". If we figure out exactly what's failing to parse, we can fix the problem in a more focused way.

    The expression CAST(REGEXP_REPLACE(x, '[^0-9.]', '') AS DECIMAL) will replace any character in x that isn't a digit or a period with the empty string (effectively deleting it) and then convert the resulting string to DECIMAL, which is called "Fixed Decimal" in other places. There are a few issues with this.

    For one, there are other characters that may be present in valid numbers coming from Excel which will be quietly deleted, resulting in incorrect values that may appear correct. Here are some valid floating-point number representations that will be successfully but wrongly processed by this expression:

    1. -5.5 will be parsed as 5.5

    2. 1.6E+3 will be parsed as 1.63

    If you know your data well, then these concerns may not apply. Perhaps there are not and never will be negative values, and you know the values are never represented in scientific notation (which, although surprising, is actually the standard way of representing floating-point numbers beyond a certain number of digits).

    That said, in most cases I would strongly recommend at least modifying the regular expression from [^0-9.] to [^-+0-9.eE] to cover all the valid floating point values (besides the NaN values, but it's less likely you have those in your data).

    But if it were me, I'd figure out specifically why the error is occurring and fix it in a focused way. For example, if the problem is that Excel error values like #VALUE! are in your data, you could get rid of all of them with an expression like CASE WHEN STR_STARTS_WITH(Ad_fees, '#') THEN NULL ELSE CAST(Ad_fees AS DOUBLE) END (use DOUBLE if you want a floating-point number and DECIMAL if you want a fixed-point decimal). Or, if the problem is whitespace characters, you can use CAST(SQUASH_WHITESPACE(Ad_fees) AS DOUBLE). If you have dollar signs, as I'm sure you already know, you can use REPLACE(Ad_fees, '$', '') (be careful with $ in a regex; it has a special meaning so you need to escape it. REPLACE is always better than REGEXP_REPLACE if you just want to replace a single static string or character). If you can't figure out what's in your data that's causing the problem, you could do STR_TO_CHARNAMES(Ad_fees) and export the result and look through it for anything surprising. That will surface any weird, hard-to-spot things in your string like special whitespace characters or Unicode characters that look the same as common characters but aren't.

    The advantage of taking the extra time to do it this way is that if a value that you didn't expect does show up in your data, the system will tell you about it instead of quietly dealing with it. That's a tradeoff, of course. Quietly dealing with it might be exactly what you want. But do keep in mind that if you use a simple character-whitelist regex like [^-+0-9.eE] and new data includes values like 1/5 or 5.000,00 it will not warn you before simply interpreting it as 15 and 5.00

    Randall Oveson <randall.oveson@domo.com>

Answers

  • MichelleH
    Options

    @Mwatson It's possible that if there are extra characters, like commas, in your field that it is still not able to convert your text to a floating decimal. Try using this formula to remove all non-number characters except decimal points and cast to a decimal:

    CAST(REGEX_REPLACE(`Ad fees`,'[^0-9.]','') as DECIMAL())
    

  • Mwatson
    Mwatson Member
    Options

    That's what it was. Thank you for thinking of that! Only difference from your formula is I had to use REGEXP_REPLACE function instead of REGEX_REPLACE. Here was my final output:

    CAST(REGEXP_REPLACE(Ad fees,'[^0-9.]','') as DECIMAL())

    Thanks again!

  • Mwatson
    Mwatson Member
    Options

    @MichelleH There is one more similar situation I'm having issues with that you might be able to help me on. I'm trying to change this "Date" field into a Date instead of Text. It's also having issues parsing.

    What's the best formula to fix that? I have attached a picture of how the Date string is.

  • rco
    rco Contributor
    edited April 16 Answer ✓
    Options

    I would recommend trying to understand the problem a little better before fixing it in that particular way. The error message you're getting back should include an example of a failing value. Something like "Failed to parse text '$1.23' as type…", or perhaps "Failed to parse text '#VALUE!'…". If we figure out exactly what's failing to parse, we can fix the problem in a more focused way.

    The expression CAST(REGEXP_REPLACE(x, '[^0-9.]', '') AS DECIMAL) will replace any character in x that isn't a digit or a period with the empty string (effectively deleting it) and then convert the resulting string to DECIMAL, which is called "Fixed Decimal" in other places. There are a few issues with this.

    For one, there are other characters that may be present in valid numbers coming from Excel which will be quietly deleted, resulting in incorrect values that may appear correct. Here are some valid floating-point number representations that will be successfully but wrongly processed by this expression:

    1. -5.5 will be parsed as 5.5

    2. 1.6E+3 will be parsed as 1.63

    If you know your data well, then these concerns may not apply. Perhaps there are not and never will be negative values, and you know the values are never represented in scientific notation (which, although surprising, is actually the standard way of representing floating-point numbers beyond a certain number of digits).

    That said, in most cases I would strongly recommend at least modifying the regular expression from [^0-9.] to [^-+0-9.eE] to cover all the valid floating point values (besides the NaN values, but it's less likely you have those in your data).

    But if it were me, I'd figure out specifically why the error is occurring and fix it in a focused way. For example, if the problem is that Excel error values like #VALUE! are in your data, you could get rid of all of them with an expression like CASE WHEN STR_STARTS_WITH(Ad_fees, '#') THEN NULL ELSE CAST(Ad_fees AS DOUBLE) END (use DOUBLE if you want a floating-point number and DECIMAL if you want a fixed-point decimal). Or, if the problem is whitespace characters, you can use CAST(SQUASH_WHITESPACE(Ad_fees) AS DOUBLE). If you have dollar signs, as I'm sure you already know, you can use REPLACE(Ad_fees, '$', '') (be careful with $ in a regex; it has a special meaning so you need to escape it. REPLACE is always better than REGEXP_REPLACE if you just want to replace a single static string or character). If you can't figure out what's in your data that's causing the problem, you could do STR_TO_CHARNAMES(Ad_fees) and export the result and look through it for anything surprising. That will surface any weird, hard-to-spot things in your string like special whitespace characters or Unicode characters that look the same as common characters but aren't.

    The advantage of taking the extra time to do it this way is that if a value that you didn't expect does show up in your data, the system will tell you about it instead of quietly dealing with it. That's a tradeoff, of course. Quietly dealing with it might be exactly what you want. But do keep in mind that if you use a simple character-whitelist regex like [^-+0-9.eE] and new data includes values like 1/5 or 5.000,00 it will not warn you before simply interpreting it as 15 and 5.00

    Randall Oveson <randall.oveson@domo.com>