Average Probability for every userid
Hello team,
I'm trying to make a beast-mode calculation or build a logic around the below topic
there’s a field called “usersessionid” that can be a unique value or it could repeat more than once. There is then another column called “probability” that’s right next to it. The calculation i need to build it, a new column that populates with the average of the “probability” field for each unique “usersessionid” in the dataset. Below is an example of what I’m trying to achieve. The Average probability field is the new column that needs to get created.
Usersessionid | Probability | Average probability |
123 | 5 | 8 |
123 | 10 | 8 |
123 | 9 | 8 |
456 | 20 | 14 |
456 | 8 | 14 |
789 | 3 | 3 |
Thank you in advance.
Best Answer
-
Hi,
I dont think this is possible in beast mode (it can be done for each user sesssion individually).
It can be done easily in a dataflow, please see below :1) select avg(probability) as "average probability", usersessionid
from table
group by usersessionid.
(transform name : average_prob)
2) select a.*, b."average probablity"
from table a
left join average_prob b
on a.usersessionid = b.usersessionidPut the output of the 2nd point in the output dataset and use that to create a card.
Thanks,
0
Answers
-
Hi,
I dont think this is possible in beast mode (it can be done for each user sesssion individually).
It can be done easily in a dataflow, please see below :1) select avg(probability) as "average probability", usersessionid
from table
group by usersessionid.
(transform name : average_prob)
2) select a.*, b."average probablity"
from table a
left join average_prob b
on a.usersessionid = b.usersessionidPut the output of the 2nd point in the output dataset and use that to create a card.
Thanks,
0 -
Thank you very much, Rahul.
I tried doing these steps and looks like the left join is taking more than 14hours of time. It definitely ran well when I individually ran the queries. I might have to tweak the dataflow a little. Let me do that first and then, will accept this as a solution once it runs.
Best!
0 -
How many rows do you have? is it in millions?
0 -
Yes.
4.6M rows.
0 -
I would say identify the smallest and the largest usersessionid numbers and maybe use 2 - 3 transforms and break out your data. so basically you are doing the same thing 2- 3 times using less data so it takes lesser time. Let me know if you want a code example
0 -
Hello Rahul,
I have used the same logic on a smaller dataset and then built my historical dataset. I used three unique identifiers to make the join.
Thank you very much!
1
Categories
- All Categories
- 2K Product Ideas
- 2K Ideas Exchange
- 1.6K Connect
- 1.3K Connectors
- 311 Workbench
- 6 Cloud Amplifier
- 9 Federated
- 3.8K Transform
- 660 Datasets
- 117 SQL DataFlows
- 2.2K Magic ETL
- 816 Beast Mode
- 3.3K Visualize
- 2.5K Charting
- 84 App Studio
- 46 Variables
- 778 Automate
- 190 Apps
- 482 APIs & Domo Developer
- 83 Workflows
- 23 Code Engine
- 41 AI and Machine Learning
- 20 AI Chat
- 1 AI Playground
- 2 AI Projects and Models
- 18 Jupyter Workspaces
- 411 Distribute
- 120 Domo Everywhere
- 280 Scheduled Reports
- 11 Software Integrations
- 145 Manage
- 141 Governance & Security
- 8 Domo Community Gallery
- 48 Product Releases
- 12 Domo University
- 5.4K Community Forums
- 41 Getting Started
- 31 Community Member Introductions
- 115 Community Announcements
- 4.8K Archive