Average Probability for every userid

user05789 · April 2018

Hello team,

I'm trying to make a beast-mode calculation or build a logic around the below topic

there’s a field called “usersessionid” that can be a unique value or it could repeat more than once. There is then another column called “probability” that’s right next to it. The calculation i need to build it, a new column that populates with the average of the “probability” field for each unique “usersessionid” in the dataset. Below is an example of what I’m trying to achieve. The Average probability field is the new column that needs to get created.

Usersessionid	Probability	Average probability
123	5	8
123	10	8
123	9	8
456	20	14
456	8	14
789	3	3

Thank you in advance.

rahul93 · April 2018

Hi,

I dont think this is possible in beast mode (it can be done for each user sesssion individually).

It can be done easily in a dataflow, please see below :

1) select avg(probability) as "average probability", usersessionid
 from table 
group by usersessionid. 
(transform name : average_prob)


2) select a.*, b."average probablity" 
from table a 
left join average_prob b
on  a.usersessionid = b.usersessionid

Put the output of the 2nd point in the output dataset and use that to create a card.

Thanks,

rahul93 · April 2018

Hi,

I dont think this is possible in beast mode (it can be done for each user sesssion individually).

It can be done easily in a dataflow, please see below :

1) select avg(probability) as "average probability", usersessionid
 from table 
group by usersessionid. 
(transform name : average_prob)


2) select a.*, b."average probablity" 
from table a 
left join average_prob b
on  a.usersessionid = b.usersessionid

Put the output of the 2nd point in the output dataset and use that to create a card.

Thanks,

user05789 · April 2018

Thank you very much, Rahul.

I tried doing these steps and looks like the left join is taking more than 14hours of time. It definitely ran well when I individually ran the queries. I might have to tweak the dataflow a little. Let me do that first and then, will accept this as a solution once it runs.

Best!

rahul93 · April 2018

How many rows do you have? is it in millions?

user05789 · April 2018

Yes.

4.6M rows.

rahul93 · April 2018

I would say identify the smallest and the largest usersessionid numbers and maybe use 2 - 3 transforms and break out your data. so basically you are doing the same thing 2- 3 times using less data so it takes lesser time. Let me know if you want a code example

user05789 · April 2018

Hello Rahul,

I have used the same logic on a smaller dataset and then built my historical dataset. I used three unique identifiers to make the join.

Thank you very much!

Average Probability for every userid

Best Answer

Answers

Categories