Calculating Percent of Total, and Getting Correct Totals

I have this table where I'm using count(distinct()) to get counts of various attributes, but when I add a "Percent of total" column to each one, the percentages add up to more than 100%. In addition, my totals at the bottom of the counts are incorrect.

I've attached a screenshot of the issue along with an example of the formula being used:

Underlying formula for the "Current" column:
COUNT(DISTINCT
(
CASE when `caregiver.action.status` = 'current' then `caregiver.action.ownerId`
END
)
)

Underlying formula for the "Primary" column:
COUNT(DISTINCT
(
CASE when `caregiver.action.status` = 'primary' then `caregiver.action.ownerId`
END
)
)

Tagged:

Best Answers

  • david_cunningham
    Answer ✓

    @pstrauss when you say that it returns zero values, do you mean that the table isn't showing any rows? Or that the values are 0 for each row? Please include a screenshot of what you're seeing so that I can help troubleshoot.

    Also, I noticed that your case statement changed. In your initial post, all of your column names started with "caregiver.". Looks like in this latest one you sent over, you are mixing "caregiver" and "care_recipient". Is that intended?

    It looks like your table is based on caregiver.action.slug

    David Cunningham

    ** Was this post helpful? Click Agree 😀, Like 👍️, or Awesome ❤️ below **
    ** Did this solve your problem? Accept it as a solution! ✔️**

  • DavidChurchman
    edited May 24 Answer ✓

    For your second question, I believe you could just wrap your case statement in the other David C's formula:

    SUM(COUNT(DISTINCT [YOUR CASE STAMENENT HERE])) over (partition by group)

    But for your first question, I'm not sure I understand what you want for the totals of your % column. You have OwnerIDs that fall across multiple actions. You have 220 distinct OwnerIDs, but 456 distinct OwnerIDs when grouped by actions (sum of your "Current" column). So if you have 50 in the "insurance" action, is the desired percent 50/220 OwnerIDs or 50/456 OwnerIDs grouped by action? For your row-level values, I would think 50/220 is more intuitive/meaningful. 50/220 of your OwnerIDs have the insurance action. Personally, I would keep your % column as is. Sure, if you added up those percentages, you'd get more than 100%, but what would that value represent? Would it mean anything to you? I think I would just turn off the totals for the % columns.

    Please 💡/💖/👍/😊 this post if you read it and found it helpful.

    Please accept the answer if it solved your problem.

Answers

  • david_cunningham
    edited May 24

    @pstrauss

    It's because the total is calculated based on the overall distinct count, but the rows are the distinct count within the group. Take this example dataset.

    To do this with a distinct count, you would use the following beast mode

    SUM(COUNT(DISTINCT id)) over (partition by group)
    

    You can see that this results in a total of the distinct counts by group. You'll have to decide whether or not that is actually what you want to evaluate against.

    David Cunningham

    ** Was this post helpful? Click Agree 😀, Like 👍️, or Awesome ❤️ below **
    ** Did this solve your problem? Accept it as a solution! ✔️**

  • pstrauss
    pstrauss Member
    edited May 24

    @david_cunningham thanks for the reply. That makes sense. However, how would I modify the example beast mode function to handle a case statement on the distinct count? For instance, I only want to count records that satisfy this criteria:

    CASE when care_recipient.action.status = 'primary' then care_recipient.action.ownerId
    END

    I tried this and it returns zero values:

    SUM(COUNT(DISTINCT CASE when care_recipient.action.status = 'primary' then care_recipient.action.ownerId
    END)) over (partition by caregiver.action.slug)

  • david_cunningham
    Answer ✓

    @pstrauss when you say that it returns zero values, do you mean that the table isn't showing any rows? Or that the values are 0 for each row? Please include a screenshot of what you're seeing so that I can help troubleshoot.

    Also, I noticed that your case statement changed. In your initial post, all of your column names started with "caregiver.". Looks like in this latest one you sent over, you are mixing "caregiver" and "care_recipient". Is that intended?

    It looks like your table is based on caregiver.action.slug

    David Cunningham

    ** Was this post helpful? Click Agree 😀, Like 👍️, or Awesome ❤️ below **
    ** Did this solve your problem? Accept it as a solution! ✔️**

  • DavidChurchman
    edited May 24 Answer ✓

    For your second question, I believe you could just wrap your case statement in the other David C's formula:

    SUM(COUNT(DISTINCT [YOUR CASE STAMENENT HERE])) over (partition by group)

    But for your first question, I'm not sure I understand what you want for the totals of your % column. You have OwnerIDs that fall across multiple actions. You have 220 distinct OwnerIDs, but 456 distinct OwnerIDs when grouped by actions (sum of your "Current" column). So if you have 50 in the "insurance" action, is the desired percent 50/220 OwnerIDs or 50/456 OwnerIDs grouped by action? For your row-level values, I would think 50/220 is more intuitive/meaningful. 50/220 of your OwnerIDs have the insurance action. Personally, I would keep your % column as is. Sure, if you added up those percentages, you'd get more than 100%, but what would that value represent? Would it mean anything to you? I think I would just turn off the totals for the % columns.

    Please 💡/💖/👍/😊 this post if you read it and found it helpful.

    Please accept the answer if it solved your problem.

  • pstrauss
    pstrauss Member

    @david_cunningham

    Whoopsie. Thanks for the catch on caregiver vs. care_recipient. I'm now getting the expected results from the following function:

    SUM(COUNT (DISTINCT CASE when caregiver.action.status = 'current' then caregiver.action.ownerId
    END)) over (partition by caregiver.action.slug)

    Thanks to you and @DavidChurchman