How do you handle averages skewed by outliers?
Hello -
We are attempting to calculate the average days to close for our sales opportunities. The idea is to be able to drill down to the lead source in order to determine whether or not certain lead sources take fewer days to close. The issue that we are running into is that some of the averages are skewed with some opportunities taking 500 - 600 days to close.
Has anyone come up with a way to calculate an average days to close that factors in these types of anomolies and produces an average that's not skewed?
Thanks!
Best Answer
-
You could try this:
Step 1: Apply a "Percentile" column against your time to close date range. If you use a data transform to create a column that calculates your time to close, you could then use the follwoing sample code to build a 'Percentile' column for the values you find.
SELECT
a.*,
ROUND(100.0 * (SELECT COUNT(*) FROM test_table AS b WHERE b.`Values` <= a.`Values` ) / total.cnt, 1 )
AS percentile FROM test_table AS a
CROSS JOIN (
SELECT COUNT(*) AS cnt
FROM test_table) AS total
ORDER BY percentile DESCJust replace the test_table with your dataset name and `Values` with your date difference column.
Step 2: Create a BeastMode that filters out the 99th percentile (or adjust it fit your data). You could do something like this:
CASE WHEN `Percentile` < 99 THEN 1 ELSE 0 END
Just put that BeastMode in your Filters section and set the filter to "IS 1".
That would allow the averages in the rest of your card to be done without those outliers.
Let me know if you have any questions. Hope this helps.
Sincerely,
ValiantSpur**Please mark "Accept as Solution" if this post solves your problem
**Say "Thanks" by clicking the "heart" in the post that helped you.0
Answers
-
You could try this:
Step 1: Apply a "Percentile" column against your time to close date range. If you use a data transform to create a column that calculates your time to close, you could then use the follwoing sample code to build a 'Percentile' column for the values you find.
SELECT
a.*,
ROUND(100.0 * (SELECT COUNT(*) FROM test_table AS b WHERE b.`Values` <= a.`Values` ) / total.cnt, 1 )
AS percentile FROM test_table AS a
CROSS JOIN (
SELECT COUNT(*) AS cnt
FROM test_table) AS total
ORDER BY percentile DESCJust replace the test_table with your dataset name and `Values` with your date difference column.
Step 2: Create a BeastMode that filters out the 99th percentile (or adjust it fit your data). You could do something like this:
CASE WHEN `Percentile` < 99 THEN 1 ELSE 0 END
Just put that BeastMode in your Filters section and set the filter to "IS 1".
That would allow the averages in the rest of your card to be done without those outliers.
Let me know if you have any questions. Hope this helps.
Sincerely,
ValiantSpur**Please mark "Accept as Solution" if this post solves your problem
**Say "Thanks" by clicking the "heart" in the post that helped you.0 -
Sounds like a possible solution....only hiccup is that no one on my team has any experience with mySQL and I'd need to find an extra kidney to bribe anyone in our IT department to take a look at it.
Any idea if the same might be possible usignmagic ETL? I was looking at some of the function options but I'm not seeing anything indicating the ability to add a percentile column.
0 -
Just for the amusement of those more familiar with MySQL...I gave it a shot but appear to be failing miserably.
In my transform I have:
SELECT
a.*, ROUND(100.0 * (SELECT COUNT(*) FROM mscrm_opportunities_sales_cycle_duration AS b WHERE b. `ETL_Days_to_Close` <= a.`ETL_Days_to_Close` ) / total.cnt, 1 )
AS percentile FROM mscrm_opportunities_sales_cycle_duration AS a CROSS JOIN ( SELECT COUNT(*) AS cnt FROM mscrm_opportunities_sales_cycle_duration) AS total ORDER BY percentile DESCBut I'm recieving an error " The database reported a syntax error. Unknown column 'ETL_Days_to_Close' in 'where clause'"
`ETL_Days_To_Close` is defintely a column in the dataset....
0
Categories
- All Categories
- 1.7K Product Ideas
- 1.7K Ideas Exchange
- 1.5K Connect
- 1.2K Connectors
- 292 Workbench
- 4 Cloud Amplifier
- 8 Federated
- 2.8K Transform
- 95 SQL DataFlows
- 602 Datasets
- 2.1K Magic ETL
- 3.7K Visualize
- 2.4K Charting
- 695 Beast Mode
- 43 App Studio
- 39 Variables
- 658 Automate
- 170 Apps
- 441 APIs & Domo Developer
- 42 Workflows
- 5 DomoAI
- 32 Predict
- 12 Jupyter Workspaces
- 20 R & Python Tiles
- 386 Distribute
- 111 Domo Everywhere
- 269 Scheduled Reports
- 6 Software Integrations
- 113 Manage
- 110 Governance & Security
- 8 Domo University
- 30 Product Releases
- Community Forums
- 39 Getting Started
- 29 Community Member Introductions
- 98 Community Announcements
- Domo Community Gallery
- 4.8K Archive