Generate an Average ignoring duplicate rows

mberkeley
mberkeley Contributor

I am trying to generate an average value for a single value card, The data has duplicates of this value due to other data within the set. In the example below, I am looking for the average number of seconds per customer.

I need to be able to dedupe Customer 1, then average the remaining rows. Can this be done with a Beast Mode calculation, if so, how?

In the example above the average should be 72.5, not 61.


Thanks

Tagged:

Answers

  • Hi @mberkeley

    Is it possible for a customer to have different times or are the times always the same? You'll likely need to remove the duplicates via a data flow or data view before brining it into your card to calculate your average.

    **Was this post helpful? Click Agree or Like below**
    **Did this solve your problem? Accept it as a solution!**
  • mberkeley
    mberkeley Contributor

    A customer could have different values at different times. One of the other values that would be used to uniquely identify a row would be a timestamp.

  • mberkeley
    mberkeley Contributor

    I am really looking for AVG('time') for each distinct (or avg) time per customer/timestamp)

  • @mberkeley if i had to guess you have times duplicated because you JOIN'ed two datasets together.

    If that's the case, instead of JOINing them APPEND them. That way your times don't Duplicate and you don't have to do avg(DISTINCT)


    https://www.youtube.com/watch?v=PVbOeLSae9o&t=250s

    Jae Wilson
    Check out my 🎥 Domo Training YouTube Channel 👨‍💻

    **Say "Thanks" by clicking the ❤️ in the post that helped you.
    **Please mark the post that solves your problem by clicking on "Accept as Solution"
  • mberkeley
    mberkeley Contributor

    Unfortunately not, these are actually web analytics where there are several values for page categories and the value for ‘Time On Page’ that i am trying to average. So I am looking for the average time on page, ignoring all of the extra rows with category info.

  • @mberkeley then split your data upstream into the two separate columns / sets of data.

    create a stream of data that includes excludes / dedupes category info add a constant "Activity Type" = "page_stats".

    create a second stream that includes category info but doesn't include the metrics for "time in seconds" "ActtivityType" = "category stats" then append them.

    now you have a dataset where you don't have to do complex math on the page_stats columns.

    Jae Wilson
    Check out my 🎥 Domo Training YouTube Channel 👨‍💻

    **Say "Thanks" by clicking the ❤️ in the post that helped you.
    **Please mark the post that solves your problem by clicking on "Accept as Solution"