Correlation Coefficient in Beastmode
Has anyone successfully calculated a Pearson Correlation Coefficient in Beastmode? My idea is to create a scatter plot that graphically displays the relationship between two fields in a dataset, and display the Correlation Coefficient for those two datasets as the Summary Number.
For example, if the card is filtered for a date range of March 1 - March 31, the key fields in the underlying dataset might look like the following table, which would result in a Correlation Coefficient of 0.5669 (this is the value that I would want to display as the Summary Number for the card). Is this possible?
Thanks!
Revenue | Duration |
$603 | 185 |
$262 | 123 |
$906 | 287 |
$429 | 179 |
$872 | 266 |
$302 | 160 |
$185 | 126 |
$596 | 293 |
$481 | 234 |
$173 | 104 |
$628 | 11 |
$296 | 194 |
$272 | 46 |
Best Answers
-
I also cannot get the corr() function to work. However I did get it to work by manually creating the sql math in a beastmode.
Use this code:
(count(x) * sum(x * y) - sum(x) * sum(y)) / (sqrt(count(x) * sum(x * x) - sum(x) * sum(x)) * sqrt(count(x) * sum(y * y) - sum(y) * sum(y)))
Then replace x and y with your measure names.
4 -
How to build a correlation plot to compare multiple variables!
Jae Wilson
Check out my 🎥 Domo Training YouTube Channel 👨💻
**Say "Thanks" by clicking the ❤️ in the post that helped you.
**Please mark the post that solves your problem by clicking on "Accept as Solution"2
Answers
-
You can do it in a data transform, but couldn't in a beastmode as far as my testing.
You'll need to create a data transform that calculates the Pearson R using the following logic:
(Avg(`Revenue` * `Duration`) – (Avg(`Revenue`) * Avg(`Duration`))) / (STDDEV_POP(`Revenue`) * STDDEV_POP(`Duration`))
You have to break it into 2 selects, the first:
SELECT Avg(`Revenue` * `Duration`) AS a,
(Avg(`Revenue`) * Avg(`Duration`)) AS b,
(STDDEV_POP(`Revenue`) * STDDEV_POP(`Duration`)) as c
FROM testingAnd the 2nd:
SELECT (`a`-`b`)/`c` as 'Pearson r'
FROm transform_data_1As you can see, this will give you the correct result:
From there you can basically do the following
SELECT *, (SELECT `Pearson R` from PearsonCalculation) as `Pearson`
FROM datasetThen use a beastmode for MAX(`Pearson R`) as your summary number. It won't be dynamic where you can change the date range, but maybe someone else can figure out a way to get the beastmode portion to work.
Sincerely,
ValiantSpur
**Please mark "Accept as Solution" if this post solves your problem
**Say "Thanks" by clicking the "heart" in the post that helped you.1 -
Second option is to use the Corr function (which does a Pearson Correlation for you)
corr(`Duration`,`Revenue`)
You can use that as a beastmode but it doesn't let you put it in the summary number.
2 -
Thanks for your ideas! I'll try the first one later on. I did try using the Corr() function in beastmode, but regardless of which fields I use I keep receiving the error "An issue has occurred during processing. We are unable to complete the request at this time." I noticed that Corr does not appear in the Functions list - perhaps it is in beta version?
0 -
I don't believe so, we aren't running any beta versions in our production environment and I was able to get it to work. Perhaps make sure both fields are integers?
1 -
I've tried multiple times to make the corr() function work, even creating a very simple test dataset that matches the one I provided earlier. No matter what I try, the formula validates, but I always receive the error
"An issue has occurred during processing. We are unable to complete the request at this time."Have you gotten the corr() function to work?
Thanks again for all your help
0 -
Yes, I was actually able to use your test dataset, create a beastmode with the corr function and then it worked fine. It sounds like it may be worth reaching out to Domo Support at this point. There may be a quirk going on with your instance.
1 -
I also cannot get the corr() function to work. However I did get it to work by manually creating the sql math in a beastmode.
Use this code:
(count(x) * sum(x * y) - sum(x) * sum(y)) / (sqrt(count(x) * sum(x * x) - sum(x) * sum(x)) * sqrt(count(x) * sum(y * y) - sum(y) * sum(y)))
Then replace x and y with your measure names.
4 -
Smart workaround!
0 -
How to build a correlation plot to compare multiple variables!
Jae Wilson
Check out my 🎥 Domo Training YouTube Channel 👨💻
**Say "Thanks" by clicking the ❤️ in the post that helped you.
**Please mark the post that solves your problem by clicking on "Accept as Solution"2 -
@jaeW_at_Onyx I cleaned up your post.
Going forward use the word editor click on video and drop the youtube url in there "from the web".
Keep these coming.
Cheers!
0
Categories
- All Categories
- 1.8K Product Ideas
- 1.8K Ideas Exchange
- 1.6K Connect
- 1.2K Connectors
- 300 Workbench
- 6 Cloud Amplifier
- 9 Federated
- 2.9K Transform
- 102 SQL DataFlows
- 626 Datasets
- 2.2K Magic ETL
- 3.9K Visualize
- 2.5K Charting
- 755 Beast Mode
- 61 App Studio
- 41 Variables
- 693 Automate
- 178 Apps
- 456 APIs & Domo Developer
- 49 Workflows
- 10 DomoAI
- 38 Predict
- 16 Jupyter Workspaces
- 22 R & Python Tiles
- 398 Distribute
- 115 Domo Everywhere
- 276 Scheduled Reports
- 7 Software Integrations
- 130 Manage
- 127 Governance & Security
- 8 Domo Community Gallery
- 38 Product Releases
- 11 Domo University
- 5.4K Community Forums
- 40 Getting Started
- 30 Community Member Introductions
- 110 Community Announcements
- 4.8K Archive