Using aggregation in case statements


I have a data set of sales where a sale is either a "new sale" or an "add sale".  This value is stored in `calcSaleType`.  I would like to graph the ratio of new sales to add sales over time but keep turning up null values.


The value of a sale is stored in `total_cost`.


I've tried the following with no luck:


(case when `sale_type`='new sale' then sum(`total_cost`) end) /
(case when `sale_type`='add sale' then sum(`total_cost`) end)


Even when I try simply graphing the following, I get mostly null values even though every row contains a `calcSaleType` value.  What am I missing here?


  • GrantSmith

    Hi @user060355 


    You're getting NULL values because of your denominator case statement. SQL returns NULL if your denominator is NULL. You don't have an ELSE clause on your CASE statement so it's returning null because there isn't an add sale record.


    You can default your null values to 0 in your case statement and bring out your SUM aggregate to surround each case statement. For example:

    CASE WHEN SUM(case when `sale_type`='add sale' then `total_cost` else 0 end) = 0 THEN 0
      SUM(case when `sale_type`='new sale' then `total_cost` else 0 end) /
      SUM(case when `sale_type`='add sale' then `total_cost` else 0 end)

    Also added another case to check for a possible divide by 0 error and handle gracefully.

    **Was this post helpful? Click Agree or Like below**
    **Did this solve your problem? Accept it as a solution!**
  • jaeW_at_Onyx

    while @GrantSmith 's statement that you can't divide by 0 is true, that's not the root cause of your problem. (i suspect).


    you can't write this.  even if you ignore the divide by 0 issue,  it's still logically wrong because your beast mode will only ever return either 0 or NULL.  Because after aggregation has been applied, the sale_type can only be either 'new sale' OR 'add sale'  so it's only possible for either the numerator OR the denominator to be populated.

    (case when `sale_type`='new sale' then sum(`total_cost`) end) /
    (case when `sale_type`='add sale' then sum(`total_cost`) end)


    it's impossible to have both a numerator AND A denominator because you're evaluating the case statement AFTER aggregation.


    so ... what if you DO want to evaluate the CASE statement after the metric?  then rewrite your function.


    WHEN 'Sale_Type' = 'option 1' then SUM(col1)

    WHEN 'Sale_type' = 'option 2' then SUM(col1) / SUM(col2)

    ELSE ( SUM(col1) + sum(col2) ) / sum(col3)


    you can of course swap out col1 with a CASE statement ... which we'll see in a second, but the point is, in the above example the metric present is selected AFTER aggregation. 


    This is an edge case and usually this isn't what people want.



    As @GrantSmith  described, run your CASE statement INSIDE the Aggregation, so that you evaluate each row BEFORE you aggregate.


    In it's root form, you probably want this:


    SUM(case when `sale_type`='new sale' then `total_cost` end) /
    SUM(case when `sale_type`='add sale' then `total_cost` end)


    The rest of Grant's case statement is error checking to avoid dividing by 0 or NULL which we know is mathematically impossible.



    @GrantSmith 's ELSE 0 clause is related to confusion around how SQL aggregates NULLS.   This is how normal SQL works... i'm told it's counter-intuitive :P.


    When you add values at the row-level you can't sum NULLs.

    ex. sum( 5+ 2 + null )  will yield null.

    WHY?  ... well assum NULL means 'i don't know.'  If you took 5 + 2 + "i don't know" you can't assume you'll get 7 right?  


    You can however SUM() a COLUMN that contains nulls.

    ex. SELECT SUM (col1) where col1 has rows containing the values 5, 4, 0, and NULL will yield 9.


    NULL and 0 aren't the same.  Assume my above example was weekly temperatures from a sensor.  And in the last week, there was a sensor failure so the temperature wasn't measured.

    If you're reporting average temperature would you expect the result to be 9 / 3 or 9/4?


    Remember, if you say 9/4 then you assume that on the last week the temperature was 0 degrees.... and that's just not intuitively what we'd expect from an average measurement.  



    In the main part of the calculation, the ELSE 0 clause in both the numerator and denominator are unnecessary because you can SUM(colContainingNulls) but you cannot SUM(col1+col2ContainsNulls)


    CASE WHEN SUM(case when `sale_type`='add sale' then `total_cost` else 0 end) = 0 THEN 0
      SUM(case when `sale_type`='new sale' then `total_cost` ) /
      SUM(case when `sale_type`='add sale' then `total_cost` )



    Grant leads with a CASE statement to check the denominator for zero or NULL (to short circuit the beast mode if it would produce a divide by 0 error)


    You have two choices:

    SUM(case when `sale_type`='add sale' then `total_cost` else 0 end ) = 0


    IFNULL(SUM(case when `sale_type`='add sale' then `total_cost` end ),0) = 0


    I like the second method because it means my check is exactly the same as the denominator in the main beast mode and therefore doesn't look like a typo.  I don't like the second method, because now i'm introducing the IFNULL() function (i.e. if my denominator is null replace with 0)


    WHEN SUM(case when `sale_type`='add sale' then `total_cost` else 0 end ) = 0 THEN 0
      SUM(case when `sale_type`='new sale' then `total_cost`) /
      SUM(case when `sale_type`='add sale' then `total_cost`)



    Hope that helps.


    Jae Wilson
    Check out my 🎥 Domo Training YouTube Channel 👨‍💻

    **Say "Thanks" by clicking the ❤️ in the post that helped you.
    **Please mark the post that solves your problem by clicking on "Accept as Solution"