Correlation Coefficient in Beastmode

Has anyone successfully calculated a Pearson Correlation Coefficient in Beastmode?  My idea is to create a scatter plot that graphically displays the relationship between two fields in a dataset, and display the Correlation Coefficient for those two datasets as the Summary Number.

 

For example, if the card is filtered for a date range of March 1 - March 31, the key fields in the underlying dataset might look like the following table, which would result in a Correlation Coefficient of 0.5669 (this is the value that I would want to display as the Summary Number for the card).  Is this possible?

Thanks!

RevenueDuration
$603185
$262123
$906287
$429179
$872266
$302160
$185126
$596293
$481234
$173104
$62811
$296194
$27246

Best Answers

  • Dean_Wangerin
    Dean_Wangerin Contributor
    Answer ✓

    I also cannot get the corr() function to work. However I did get it to work by manually creating the sql math in a beastmode.

    Use this code:

    (count(x) * sum(x * y) - sum(x) * sum(y)) / (sqrt(count(x) * sum(x * x) - sum(x) * sum(x)) * sqrt(count(x) * sum(y * y) - sum(y) * sum(y)))

    Then replace x and y with your measure names.

  • jaeW_at_Onyx
    jaeW_at_Onyx Coach
    Answer ✓

    How to build a correlation plot to compare multiple variables!

     

     

    Jae Wilson
    Check out my 🎥 Domo Training YouTube Channel 👨‍💻

    **Say "Thanks" by clicking the ❤️ in the post that helped you.
    **Please mark the post that solves your problem by clicking on "Accept as Solution"

Answers

  • BlueRooster
    BlueRooster Domo Employee

    You can do it in a data transform, but couldn't in a beastmode as far as my testing.

     

    You'll need to create a data transform that calculates the Pearson R using the following logic:

     (Avg(`Revenue` * `Duration`) – (Avg(`Revenue`) * Avg(`Duration`))) / (STDDEV_POP(`Revenue`) * STDDEV_POP(`Duration`))

     

    You have to break it into 2 selects, the first:

    SELECT 	 Avg(`Revenue` * `Duration`) AS a,
    (Avg(`Revenue`) * Avg(`Duration`)) AS b,
    (STDDEV_POP(`Revenue`) * STDDEV_POP(`Duration`)) as c
    FROM testing

    And the 2nd:

    SELECT (`a`-`b`)/`c` as 'Pearson r'
    FROm transform_data_1

    As you can see, this will give you the correct result:

    image.png

    From there you can basically do the following

    SELECT *, (SELECT `Pearson R` from PearsonCalculation) as `Pearson`
    FROM dataset

    Then use a beastmode for MAX(`Pearson R`) as your summary number. It won't be dynamic where you can change the date range, but maybe someone else can figure out a way to get the beastmode portion to work.

     

    Sincerely,

    ValiantSpur

     

    **Please mark "Accept as Solution" if this post solves your problem
    **Say "Thanks" by clicking the "heart" in the post that helped you.

  • BlueRooster
    BlueRooster Domo Employee

    Second option is to use the Corr function (which does a Pearson Correlation for you)

    corr(`Duration`,`Revenue`)

    You can use that as a beastmode but it doesn't let you put it in the summary number.

     

  • nitot
    nitot Member

    Thanks for your ideas!  I'll try the first one later on.  I did try using the Corr() function in beastmode, but regardless of which fields I use I keep receiving the error "An issue has occurred during processing. We are unable to complete the request at this time."  I noticed that Corr does not appear in the Functions list - perhaps it is in beta version?

  • BlueRooster
    BlueRooster Domo Employee

    I don't believe so, we aren't running any beta versions in our production environment and I was able to get it to work. Perhaps make sure both fields are integers?

  • nitot
    nitot Member

    I've tried multiple times to make the corr() function work, even creating a very simple test dataset that matches the one I provided earlier.  No matter what I try, the formula validates, but I always receive the error

    "An issue has occurred during processing. We are unable to complete the request at this time."

    Have you gotten the corr() function to work?

     

    Thanks again for all your help

  • BlueRooster
    BlueRooster Domo Employee

    Yes, I was actually able to use your test dataset, create a beastmode with the corr function and then it worked fine. It sounds like it may be worth reaching out to Domo Support at this point. There may be a quirk going on with your instance.

  • Ritwik
    Ritwik Contributor

    @nitot @Valiant 

     

    I'm also struggling with getting the corr() beastmode to manifest in a card without throwing back that error. Any thoughts on why the error occurs as I reach out to support in the meantime? 

  • Dean_Wangerin
    Dean_Wangerin Contributor
    Answer ✓

    I also cannot get the corr() function to work. However I did get it to work by manually creating the sql math in a beastmode.

    Use this code:

    (count(x) * sum(x * y) - sum(x) * sum(y)) / (sqrt(count(x) * sum(x * x) - sum(x) * sum(x)) * sqrt(count(x) * sum(y * y) - sum(y) * sum(y)))

    Then replace x and y with your measure names.

  • nitot
    nitot Member

    Smart workaround!

  • jaeW_at_Onyx
    jaeW_at_Onyx Coach
    Answer ✓

    How to build a correlation plot to compare multiple variables!

     

     

    Jae Wilson
    Check out my 🎥 Domo Training YouTube Channel 👨‍💻

    **Say "Thanks" by clicking the ❤️ in the post that helped you.
    **Please mark the post that solves your problem by clicking on "Accept as Solution"
  • @jaeW_at_Onyx I cleaned up your post.

     

    Going forward use the word editor click on video and drop the youtube url in there "from the web".

     

    Keep these coming.

    Cheers!

This discussion has been closed.