Group by when using case statements?

I'm trying to group the column 'opzegreden' into one line if the sum is below 5.

I'm using the following beastmode which gives me not the desired result (see screenshot)
case when sum(Aantal) <= 5 then 'Why' else Opzegreden end


If I use the following beastmode then it does it work and does group..

case when Opzegreden like '%no time%' then 'why'
when Opzegreden like '%price%' then 'why'
else Opzegreden end

I'm not using any filters or sorting.

Answers

  • What is the structure of your data? What type of visualization are you utilizing? What do you want the output data or visualization to look like?

    You may need to utilize a window function to calculate the category.

    CASE WHEN SUM(SUM(`Aantal`)) OVER (PARTITION BY `Your Grouping Columns`) <= 5 THEN 'Why' ELSE `Opzegreden` END
    

    **Was this post helpful? Click Agree or Like below**
    **Did this solve your problem? Accept it as a solution!**
  • Thanks for your reply @GrantSmith

    Sadly that doesn't seem to work. Probably because my explination of the case was lacking.

    I've a list of cancellation reasons ('opzegreden').

    • reason 1 : 15 people
    • reason 2 : 10 people
    • reason 3 : 7 people
    • reason 4 : 3 people
    • reason 5: 2 people

    But I want to group the reasons that have less than 5 people because it clutters my data.
    So instead of reason 4 & 5 I want a row with grouped reasons.

    • reason 1 : 15 people
    • reason 2 : 10 people
    • reason 3 : 7 people
    • grouped reasons 4 : 5 people

    But instead it gives me:

    • reason 1 : 15 people
    • reason 2 : 10 people
    • reason 3 : 7 people
    • grouped reasons 4 : 3 people
    • grouped reasons 4: 2 people

    In this case I only have this beastmode as a column and one which counts the amount of people that cancelled in that category. (I'm using a donut graph if that matters anything)

    Hopefully it's clear now what I'm trying to accomplish.

  • @GrantSmith Is this possible in the case I described above? :)

  • This won't work in a beast mode as you can't aggregate (sum of total per reason) an aggregate (total per reason). You'd need to pre-aggregate the data in an ETL first and then do your final grouping in the card.

    **Was this post helpful? Click Agree or Like below**
    **Did this solve your problem? Accept it as a solution!**
  • @user046467 can you post a sample of what the dataset you have looks like? As Grant said, you can do a lot with Windows functions (or the Fixed function) but there are limitations for which pre-aggregating needs to happen at a ETL level, but without fully understanding what your data looks like, we're mostly guessing in the dark. (I fear there might be an easy answer we might be overlooking due to lack of insights on the data)

  • 3 columns: (could even be two as I just use E-mail as a count column)

    Cancellation date - Emailadress - Reason of cancellation

    1 may - email1 - Too expensive
    10 may - email2 - Too expensive
    15 may - email3 - Missing features
    20 may - email4 - Lacking skills
    1 june - email5 - Missing features
    2 june - email6 - Too expensive
    3 june - email7 - Slow delivery

    And now I want to group 'lacking skills' and 'slow delivery' under 'other'

  • ST_-Superman-_
    edited June 2023

    @user046467

    One solution could be to adjust the chart properties. For example, if I use the Maximum slices before 'Other' property from the data you provided I can change the graph to show the desired result:

    The advantage to this, is that you can mouse over the "Other" pie and see what it contains as well:


    “There is a superhero in all of us, we just need the courage to put on the cape.” -Superman
  • marcel_luthi
    marcel_luthi Coach
    edited June 2023

    @ST_-Superman-_ suggestion sounds like the most scalable and clean solution, so that way the system will always show the top N categories and group the rest of them under other but still allow you to see how that other is made up when interacting with the card. This makes it dynamic, so that in the future or for the the specific subset of entries you're looking at you have different top contributors you'll always get the top N for what is currently shown and then other.

    On the other hand if you want to specifically group specific ones under other you can do this via ETL and do the segment based on this ETL, there would be 2 approaches:

    Leave specific ones ungroupped and label all others as Other

    CASE WHEN Reason of cancellationIN ('Too expensive','Missing features') THEN Reason of cancellation ELSE 'Other' END
    

    Group specific ones as Other and leave the rest ungrupped

    CASE WHEN Reason of cancellationIN ('Lacking skills','Slow delivery') THEN 'Other' ELSE Reason of cancellation END
    

    The downside on these two options, is that as new reasons are added, you'll need to update your BeastMode to decide how to handle those new reasons accordingly, while the Chart approach takes care of that on it's own.