Average Probability for every userid
Hello team,
I'm trying to make a beast-mode calculation or build a logic around the below topic
there’s a field called “usersessionid” that can be a unique value or it could repeat more than once. There is then another column called “probability” that’s right next to it. The calculation i need to build it, a new column that populates with the average of the “probability” field for each unique “usersessionid” in the dataset. Below is an example of what I’m trying to achieve. The Average probability field is the new column that needs to get created.
Usersessionid | Probability | Average probability |
123 | 5 | 8 |
123 | 10 | 8 |
123 | 9 | 8 |
456 | 20 | 14 |
456 | 8 | 14 |
789 | 3 | 3 |
Thank you in advance.
Best Answer
-
Hi,
I dont think this is possible in beast mode (it can be done for each user sesssion individually).
It can be done easily in a dataflow, please see below :1) select avg(probability) as "average probability", usersessionid
from table
group by usersessionid.
(transform name : average_prob)
2) select a.*, b."average probablity"
from table a
left join average_prob b
on a.usersessionid = b.usersessionidPut the output of the 2nd point in the output dataset and use that to create a card.
Thanks,
0
Answers
-
Hi,
I dont think this is possible in beast mode (it can be done for each user sesssion individually).
It can be done easily in a dataflow, please see below :1) select avg(probability) as "average probability", usersessionid
from table
group by usersessionid.
(transform name : average_prob)
2) select a.*, b."average probablity"
from table a
left join average_prob b
on a.usersessionid = b.usersessionidPut the output of the 2nd point in the output dataset and use that to create a card.
Thanks,
0 -
Thank you very much, Rahul.
I tried doing these steps and looks like the left join is taking more than 14hours of time. It definitely ran well when I individually ran the queries. I might have to tweak the dataflow a little. Let me do that first and then, will accept this as a solution once it runs.
Best!
0 -
How many rows do you have? is it in millions?
0 -
Yes.
4.6M rows.
0 -
I would say identify the smallest and the largest usersessionid numbers and maybe use 2 - 3 transforms and break out your data. so basically you are doing the same thing 2- 3 times using less data so it takes lesser time. Let me know if you want a code example
0 -
Hello Rahul,
I have used the same logic on a smaller dataset and then built my historical dataset. I used three unique identifiers to make the join.
Thank you very much!
1
Categories
- All Categories
- 1.7K Product Ideas
- 1.7K Ideas Exchange
- 1.5K Connect
- 1.2K Connectors
- 292 Workbench
- 4 Cloud Amplifier
- 8 Federated
- 2.8K Transform
- 95 SQL DataFlows
- 601 Datasets
- 2.1K Magic ETL
- 3.7K Visualize
- 2.4K Charting
- 689 Beast Mode
- 43 App Studio
- 39 Variables
- 658 Automate
- 170 Apps
- 441 APIs & Domo Developer
- 42 Workflows
- 5 DomoAI
- 32 Predict
- 12 Jupyter Workspaces
- 20 R & Python Tiles
- 385 Distribute
- 110 Domo Everywhere
- 269 Scheduled Reports
- 6 Software Integrations
- 112 Manage
- 109 Governance & Security
- 8 Domo University
- 25 Product Releases
- Community Forums
- 39 Getting Started
- 29 Community Member Introductions
- 98 Community Announcements
- Domo Community Gallery
- 4.8K Archive