How do you handle averages skewed by outliers?
Hello -
We are attempting to calculate the average days to close for our sales opportunities. The idea is to be able to drill down to the lead source in order to determine whether or not certain lead sources take fewer days to close. The issue that we are running into is that some of the averages are skewed with some opportunities taking 500 - 600 days to close.
Has anyone come up with a way to calculate an average days to close that factors in these types of anomolies and produces an average that's not skewed?
Thanks!
Best Answer
-
You could try this:
Step 1: Apply a "Percentile" column against your time to close date range. If you use a data transform to create a column that calculates your time to close, you could then use the follwoing sample code to build a 'Percentile' column for the values you find.
SELECT
a.*,
ROUND(100.0 * (SELECT COUNT(*) FROM test_table AS b WHERE b.`Values` <= a.`Values` ) / total.cnt, 1 )
AS percentile FROM test_table AS a
CROSS JOIN (
SELECT COUNT(*) AS cnt
FROM test_table) AS total
ORDER BY percentile DESCJust replace the test_table with your dataset name and `Values` with your date difference column.
Step 2: Create a BeastMode that filters out the 99th percentile (or adjust it fit your data). You could do something like this:
CASE WHEN `Percentile` < 99 THEN 1 ELSE 0 END
Just put that BeastMode in your Filters section and set the filter to "IS 1".
That would allow the averages in the rest of your card to be done without those outliers.
Let me know if you have any questions. Hope this helps.
Sincerely,
ValiantSpur**Please mark "Accept as Solution" if this post solves your problem
**Say "Thanks" by clicking the "heart" in the post that helped you.0
Answers
-
You could try this:
Step 1: Apply a "Percentile" column against your time to close date range. If you use a data transform to create a column that calculates your time to close, you could then use the follwoing sample code to build a 'Percentile' column for the values you find.
SELECT
a.*,
ROUND(100.0 * (SELECT COUNT(*) FROM test_table AS b WHERE b.`Values` <= a.`Values` ) / total.cnt, 1 )
AS percentile FROM test_table AS a
CROSS JOIN (
SELECT COUNT(*) AS cnt
FROM test_table) AS total
ORDER BY percentile DESCJust replace the test_table with your dataset name and `Values` with your date difference column.
Step 2: Create a BeastMode that filters out the 99th percentile (or adjust it fit your data). You could do something like this:
CASE WHEN `Percentile` < 99 THEN 1 ELSE 0 END
Just put that BeastMode in your Filters section and set the filter to "IS 1".
That would allow the averages in the rest of your card to be done without those outliers.
Let me know if you have any questions. Hope this helps.
Sincerely,
ValiantSpur**Please mark "Accept as Solution" if this post solves your problem
**Say "Thanks" by clicking the "heart" in the post that helped you.0 -
Sounds like a possible solution....only hiccup is that no one on my team has any experience with mySQL and I'd need to find an extra kidney to bribe anyone in our IT department to take a look at it.
Any idea if the same might be possible usignmagic ETL? I was looking at some of the function options but I'm not seeing anything indicating the ability to add a percentile column.
0 -
Just for the amusement of those more familiar with MySQL...I gave it a shot but appear to be failing miserably.
In my transform I have:
SELECT
a.*, ROUND(100.0 * (SELECT COUNT(*) FROM mscrm_opportunities_sales_cycle_duration AS b WHERE b. `ETL_Days_to_Close` <= a.`ETL_Days_to_Close` ) / total.cnt, 1 )
AS percentile FROM mscrm_opportunities_sales_cycle_duration AS a CROSS JOIN ( SELECT COUNT(*) AS cnt FROM mscrm_opportunities_sales_cycle_duration) AS total ORDER BY percentile DESCBut I'm recieving an error " The database reported a syntax error. Unknown column 'ETL_Days_to_Close' in 'where clause'"
`ETL_Days_To_Close` is defintely a column in the dataset....
0
Categories
- All Categories
- 1.8K Product Ideas
- 1.8K Ideas Exchange
- 1.5K Connect
- 1.2K Connectors
- 297 Workbench
- 6 Cloud Amplifier
- 8 Federated
- 2.9K Transform
- 100 SQL DataFlows
- 614 Datasets
- 2.2K Magic ETL
- 3.8K Visualize
- 2.5K Charting
- 729 Beast Mode
- 53 App Studio
- 40 Variables
- 677 Automate
- 173 Apps
- 451 APIs & Domo Developer
- 45 Workflows
- 8 DomoAI
- 34 Predict
- 14 Jupyter Workspaces
- 20 R & Python Tiles
- 394 Distribute
- 113 Domo Everywhere
- 275 Scheduled Reports
- 6 Software Integrations
- 121 Manage
- 118 Governance & Security
- Domo Community Gallery
- 32 Product Releases
- 10 Domo University
- 5.4K Community Forums
- 40 Getting Started
- 30 Community Member Introductions
- 108 Community Announcements
- 4.8K Archive