FIltering by percentile in Beast Mode/ETL

I recently had the need to filter a list of pages by pageviews. I only want pages with pageviews in the 80% percentile or higher.

This had to be done on a weekly basis. And I don't want to use a fixed-number cutoff (e.g. filter when pageview is >1,000, which is a fixed number), since pageviews > 1,000 might be considered good this week, but average the next (consider Black Friday week for example).

 

Appreciate any help or advice on this. I am agnostic to the calculation being done in Beast Mode or within ETL.

 

Marv

Comments

  • Luckily I had to do the same thing a couple weeks ago. Here's the query I used in a SQL transform to generate a 'Percentile' column on the dataset. 

     

    You should be able to edit this to fit your needs:

    SELECT 
    a.*,
    ROUND(100.0 * (SELECT COUNT(*) FROM inputDatset AS b WHERE b.`Values` <= a.`Values` ) / total.cnt, 1 )
    AS percentile FROM inputDatset AS a
    CROSS JOIN (
    SELECT COUNT(*) AS cnt
    FROM inputDatset) AS total
    ORDER BY percentile DESC

    You'll just need to replace the `Values` field with your pageviews field and then the inputDataset with whatever your initial table is.

     

    Once you have your new output, you can filter on your percentile column >= 80.

     

    Let me know if you have any other questions.

     

    Sincerely,

    ValiantSpur

     

    **Please mark "Accept as Solution" if this post solves your problem
    **Say "Thanks" by clicking the "heart" in the post that helped you.