Average Probability for every userid

Hello team,

I'm trying to make a beast-mode calculation or build a logic around the below topic

there’s a field called “usersessionid” that can be a unique value or it could repeat more than once.  There is then another column called “probability” that’s right next to it.  The calculation i need to build it, a new column that populates with the average of the “probability” field for each unique “usersessionid” in the dataset.  Below is an example of what I’m trying to achieve.  The Average probability field is the new column that needs to get created.

 

Usersessionid

Probability

Average probability

123

5

8

123

10

8

123

9

8

456

20

14

456

8

14

789

3

3

 

Thank you in advance.

 

Best Answer

  • rahul93
    rahul93 Contributor
    Answer ✓

    Hi, 

    I dont think this is possible in beast mode (it can be done for each user sesssion individually). 

    It can be done easily in a dataflow, please see below :

    1) select avg(probability) as "average probability", usersessionid
    from table
    group by usersessionid.
    (transform name : average_prob)


    2) select a.*, b."average probablity"
    from table a
    left join average_prob b
    on a.usersessionid = b.usersessionid

    Put the output of the 2nd point in the output dataset and use that to create a card. 

     

    Thanks,

Answers

  • rahul93
    rahul93 Contributor
    Answer ✓

    Hi, 

    I dont think this is possible in beast mode (it can be done for each user sesssion individually). 

    It can be done easily in a dataflow, please see below :

    1) select avg(probability) as "average probability", usersessionid
    from table
    group by usersessionid.
    (transform name : average_prob)


    2) select a.*, b."average probablity"
    from table a
    left join average_prob b
    on a.usersessionid = b.usersessionid

    Put the output of the 2nd point in the output dataset and use that to create a card. 

     

    Thanks,

  • Thank you very much, Rahul.

     

    I tried doing these steps and looks like the left join is taking more than 14hours of time. It definitely ran well when I individually ran the queries. I might have to tweak the dataflow a little. Let me do that first and then, will accept this as a solution once it runs.

     

    Best!

  • rahul93
    rahul93 Contributor

    How many rows do you have? is it in millions?

  • Yes.

     

    4.6M rows.

  • rahul93
    rahul93 Contributor

    I would say identify the smallest and the largest usersessionid numbers and maybe use 2 - 3 transforms and break out your data. so basically you are doing the same thing 2- 3 times using less data so it takes lesser time. Let me know if you want a code example

  • Hello Rahul,

     

    I have used the same logic on a smaller dataset and then built my historical dataset. I used three unique identifiers to make the join.

     

    Thank you very much!