FIltering by percentile in Beast Mode/ETL


I recently had the need to filter a list of pages by pageviews. I only want pages with pageviews in the 80% percentile or higher.

This had to be done on a weekly basis. And I don't want to use a fixed-number cutoff (e.g. filter when pageview is >1,000, which is a fixed number), since pageviews > 1,000 might be considered good this week, but average the next (consider Black Friday week for example).


Appreciate any help or advice on this. I am agnostic to the calculation being done in Beast Mode or within ETL.




  • Valiant

    Luckily I had to do the same thing a couple weeks ago. Here's the query I used in a SQL transform to generate a 'Percentile' column on the dataset. 


    You should be able to edit this to fit your needs:

    ROUND(100.0 * (SELECT COUNT(*) FROM inputDatset AS b WHERE b.`Values` <= a.`Values` ) / total.cnt, 1 )
    AS percentile FROM inputDatset AS a
    SELECT COUNT(*) AS cnt
    FROM inputDatset) AS total
    ORDER BY percentile DESC

    You'll just need to replace the `Values` field with your pageviews field and then the inputDataset with whatever your initial table is.


    Once you have your new output, you can filter on your percentile column >= 80.


    Let me know if you have any other questions.





    **Please mark "Accept as Solution" if this post solves your problem
    **Say "Thanks" by clicking the "heart" in the post that helped you.