Why is the same record appearing multiple times with the wrong value

Hi gang! Happy Friday!

I have two attributes with values from 1 to 9, then the SUM of the PlannedCost for each record determines a third value from 1 to 4. I then SUM these three numbers to get a "Project Score"

Do you see anything wrong with this code to accomplish what I described above?

ROUND(IFNULL(`Business Value (high = 9)_p`,0)+IFNULL(`IT Complexity (low = 9)_p`,0)+
CASE
when SUM(`PlannedCost`) >= 0 AND SUM(`PlannedCost`)< 100000 then '4'
when SUM(`PlannedCost`) >= 100000 AND SUM(`PlannedCost`)< 200000 then '3'
when SUM(`PlannedCost`) >= 200000 AND SUM(`PlannedCost`)< 10000000 then '2'
when SUM(`PlannedCost`) >= 10000000 then '1'
else 'No Budget Defined'
END)

In this example, the same record is appearing three times. Each record should just appear once. The SUM of the PlannedCost is 315,552 for this record (confirmed in the dataset), so it should be assigned a 2 for Cost, but that's not even appearing in the table below.

I don't see any oddities in the dataset, so I think I've just messed up the Beast Mode?

Thanks!

Answers

  • Aggregation Context: If records are appearing multiple times, it might be due to the aggregation context. You need to ensure that the aggregation functions (SUM, ROUND, IFNULL, etc.) are correctly applied. This can often be a result of improper grouping in the dataset or the way the calculation is being performed.

    Data Type Mismatch: Ensure that all fields and calculations are of the correct data type. In the CASE statement, you’re returning strings (e.g., '4', '3', etc.) but if you’re doing arithmetic with these later, they should be integers.

    Handling Aggregate Functions: If SUM is being used inside the CASE statement, ensure that it's applied to the correct level of aggregation. Since you want each record to appear only once, the aggregation should be applied correctly across your dataset.

    Value Assignment: Make sure the PlannedCost value is properly assigned to the correct score in your CASE statement. It seems that you want to map the ranges to specific scores.

    Project Score:

    ROUND(
    IFNULL(`Business Value (high = 9)_p`, 0) +
    IFNULL(`IT Complexity (low = 9)_p`, 0) +
    CASE
    WHEN SUM(`PlannedCost`) >= 0 AND SUM(`PlannedCost`) < 100000 THEN 4
    WHEN SUM(`PlannedCost`) >= 100000 AND SUM(`PlannedCost`) < 200000 THEN 3
    WHEN SUM(`PlannedCost`) >= 200000 AND SUM(`PlannedCost`) < 10000000 THEN 2
    WHEN SUM(`PlannedCost`) >= 10000000 THEN 1
    ELSE 0 -- Default value if none of the cases match
    END
    )

    ** Was this post helpful? Click Agree or Like below. **
    ** Did this solve your problem? Accept it as a solution! **

  • Agree with everything ArborRose is saying here and would add to double-check your assumptions that all these values are ready for math. Are they actually numbers or are you converting a string value via beast mode? Do you have extra spaces after those values of 9/6 there?

  • Also….review use of Coalesce() when dealing with null values.

    ** Was this post helpful? Click Agree or Like below. **
    ** Did this solve your problem? Accept it as a solution! **

  • PJG
    PJG Member

    Hi both,

    Thank you for your replies. I was starting to lean toward the issue being the way the dataset was configured, so I have created a new dataset with an ETL, where I'm using a Group By to SUM the Planned Costs. This ensures there is just one row per project, and after using Arbor's beautifully cleaned up Beast Mode, the result is as expected with no more duplicates!

    Note that the source data for Business Value & Complexity were indeed "numbers formatted as text" (when exported to Excel), but this did not seem to impact the calculation, but I understand why I should use 1 rather than "1".

    Thank you!