How do you handle averages skewed by outliers?

Hello -

 

We are attempting to calculate the average days to close for our sales opportunities. The idea is to be able to drill down to the lead source in order to determine whether or not certain lead sources take fewer days to close. The issue that we are running into is that some of the averages are skewed with some opportunities taking 500 - 600 days to close. 

 

Has anyone come up with a way to calculate an average days to close that factors in these types of anomolies and produces an average that's not skewed?

 

Thanks!

Best Answer

  • BlueRooster
    BlueRooster Domo Employee
    Answer ✓

    You could try this:

     

    Step 1: Apply a "Percentile" column against your time to close date range. If you use a data transform to create a column that calculates your time to close, you could then use the follwoing sample code to build a 'Percentile' column for the values you find.

    SELECT 
    a.*,
    ROUND(100.0 * (SELECT COUNT(*) FROM test_table AS b WHERE b.`Values` <= a.`Values` ) / total.cnt, 1 )
    AS percentile FROM test_table AS a
    CROSS JOIN (
    SELECT COUNT(*) AS cnt
    FROM test_table) AS total
    ORDER BY percentile DESC

    Just replace the test_table with your dataset name and `Values` with your date difference column.

     

    Step 2: Create a BeastMode that filters out the 99th percentile (or adjust it fit your data). You could do something like this:

    CASE WHEN `Percentile` < 99 THEN 1 ELSE 0 END

    Just put that BeastMode in your Filters section and set the filter to "IS 1". 

     

    That would allow the averages in the rest of your card to be done without those outliers. 

     

    Let me know if you have any questions. Hope this helps.


    Sincerely,
    ValiantSpur

     

    **Please mark "Accept as Solution" if this post solves your problem
    **Say "Thanks" by clicking the "heart" in the post that helped you.

Answers

  • BlueRooster
    BlueRooster Domo Employee
    Answer ✓

    You could try this:

     

    Step 1: Apply a "Percentile" column against your time to close date range. If you use a data transform to create a column that calculates your time to close, you could then use the follwoing sample code to build a 'Percentile' column for the values you find.

    SELECT 
    a.*,
    ROUND(100.0 * (SELECT COUNT(*) FROM test_table AS b WHERE b.`Values` <= a.`Values` ) / total.cnt, 1 )
    AS percentile FROM test_table AS a
    CROSS JOIN (
    SELECT COUNT(*) AS cnt
    FROM test_table) AS total
    ORDER BY percentile DESC

    Just replace the test_table with your dataset name and `Values` with your date difference column.

     

    Step 2: Create a BeastMode that filters out the 99th percentile (or adjust it fit your data). You could do something like this:

    CASE WHEN `Percentile` < 99 THEN 1 ELSE 0 END

    Just put that BeastMode in your Filters section and set the filter to "IS 1". 

     

    That would allow the averages in the rest of your card to be done without those outliers. 

     

    Let me know if you have any questions. Hope this helps.


    Sincerely,
    ValiantSpur

     

    **Please mark "Accept as Solution" if this post solves your problem
    **Say "Thanks" by clicking the "heart" in the post that helped you.

  • Sounds like a possible solution....only hiccup is that no one on my team has any experience with mySQL and I'd need to find an extra kidney to bribe anyone in our IT department to take a look at it.

     

    Any idea if the same might be possible usignmagic ETL? I was looking at some of the function options but I'm not seeing anything indicating the ability to add a percentile column. 

     

     

  • Just for the amusement of those more familiar with MySQL...I gave it a shot but appear to be failing miserably. :(

     

    In my transform I have:

     

    SELECT
    a.*, ROUND(100.0 * (SELECT COUNT(*) FROM mscrm_opportunities_sales_cycle_duration AS b WHERE b. `ETL_Days_to_Close` <= a.`ETL_Days_to_Close` ) / total.cnt, 1 )
    AS percentile FROM mscrm_opportunities_sales_cycle_duration AS a CROSS JOIN ( SELECT COUNT(*) AS cnt FROM mscrm_opportunities_sales_cycle_duration) AS total ORDER BY percentile DESC

     

    But I'm recieving an error " The database reported a syntax error. Unknown column 'ETL_Days_to_Close' in 'where clause'"

     

    `ETL_Days_To_Close` is defintely a column in the dataset....Smiley Frustrated

     

     

     

     

This discussion has been closed.