How to show monthly user engagement scores for 12 months preceding customer churn

I have a data set that has a row for each customer each day with their rolling 30 day average user engagement score. So columns would look something like customer_id, customer_name, engagement_score, date. In a separate table I have customer churn data, with columns along the lines of customer_id, customer_name, churn_date. Does anyone know how I would join these two data sets to create a table that has a row for each customer, their churn date and columns with engagement scores for each of the 12 months preceding their churn date?

Best Answers

  • JasonAltenburg
    edited October 2023 Answer ✓

    You could likely do this within a MySQL dataflow

    I think you could do it with Magic as well.

    Here's a high-level outline:

    1. Join the Tables: Join the engagement table with the churn table on customer_id.
    2. Date Range Filtering: Filter the joined rows such that the engagement score dates are within 12 months preceding the churn date.
    3. Pivot the Data: Manually pivot the filtered data to get engagement scores as columns for each of the 12 months preceding churn.

    CODE:

    SELECT
    e.customer_id,
    e.customer_name,
    c.churn_date,
    MAX(CASE WHEN TIMESTAMPDIFF(MONTH, e.date, c.churn_date) = 0 THEN e.30_day_avg_engagement END) AS month_0,
    MAX(CASE WHEN TIMESTAMPDIFF(MONTH, e.date, c.churn_date) = 1 THEN e.30_day_avg_engagement END) AS month_1,
    MAX(CASE WHEN TIMESTAMPDIFF(MONTH, e.date, c.churn_date) = 2 THEN e.30_day_avg_engagement END) AS month_2,
    MAX(CASE WHEN TIMESTAMPDIFF(MONTH, e.date, c.churn_date) = 3 THEN e.30_day_avg_engagement END) AS month_3,
    MAX(CASE WHEN TIMESTAMPDIFF(MONTH, e.date, c.churn_date) = 4 THEN e.30_day_avg_engagement END) AS month_4,
    MAX(CASE WHEN TIMESTAMPDIFF(MONTH, e.date, c.churn_date) = 5 THEN e.30_day_avg_engagement END) AS month_5,
    MAX(CASE WHEN TIMESTAMPDIFF(MONTH, e.date, c.churn_date) = 6 THEN e.30_day_avg_engagement END) AS month_6,
    MAX(CASE WHEN TIMESTAMPDIFF(MONTH, e.date, c.churn_date) = 7 THEN e.30_day_avg_engagement END) AS month_7,
    MAX(CASE WHEN TIMESTAMPDIFF(MONTH, e.date, c.churn_date) = 8 THEN e.30_day_avg_engagement END) AS month_8,
    MAX(CASE WHEN TIMESTAMPDIFF(MONTH, e.date, c.churn_date) = 9 THEN e.30_day_avg_engagement END) AS month_9,
    MAX(CASE WHEN TIMESTAMPDIFF(MONTH, e.date, c.churn_date) = 10 THEN e.30_day_avg_engagement END) AS month_10,
    MAX(CASE WHEN TIMESTAMPDIFF(MONTH, e.date, c.churn_date) = 11 THEN e.30_day_avg_engagement END) AS month_11
    FROM
    dev_engagement e
    JOIN
    dev_churn c ON e.customer_id = c.customer_id
    WHERE
    c.churn_date >= e.date AND TIMESTAMPDIFF(MONTH, e.date, c.churn_date) BETWEEN 0 AND 11
    GROUP BY
    e.customer_id, e.customer_name, c.churn_date;

    You end up with an output like:

    From sources (sample data, not intended to be representative of your real dataset, as I think it's likely you wouldn't have engagement scores from churned customers)

  • marcel_luthi
    marcel_luthi Coach
    Answer ✓

    You can do this with ETL, but since your Score in Table 1 is daily and you want to show it monthly, you'll need to decide how that should be handled, would if be an average of the average (which is not often recommended), the max score in the period or the last score for the month, depending on this, the way to build the ETL will chance slightly. Something like:

    1. Do a Tile that will aggregate Table 1 per customer Id and Month, and give you a single value for each month. Output would be 3 columns Score Month (date), CustomerId (String), Score (Number).
    2. Then you'll do a Full Outer Join of this new table with your Table 2 where you have the Churn Dates on CustomerId.
    3. Do a Formula tile that Calculates the Month Difference between the Score Month and the Churn Date, rounded.
    4. Filter the outcome to include only those where the difference is between 0 and 11 (or 1 and 12).

    This should get you what you need, the most tricky part is clearly defining how the monthly entries should be summarized to provide a single monthly number.

    Let us know if this helps.

Answers

  • JasonAltenburg
    edited October 2023 Answer ✓

    You could likely do this within a MySQL dataflow

    I think you could do it with Magic as well.

    Here's a high-level outline:

    1. Join the Tables: Join the engagement table with the churn table on customer_id.
    2. Date Range Filtering: Filter the joined rows such that the engagement score dates are within 12 months preceding the churn date.
    3. Pivot the Data: Manually pivot the filtered data to get engagement scores as columns for each of the 12 months preceding churn.

    CODE:

    SELECT
    e.customer_id,
    e.customer_name,
    c.churn_date,
    MAX(CASE WHEN TIMESTAMPDIFF(MONTH, e.date, c.churn_date) = 0 THEN e.30_day_avg_engagement END) AS month_0,
    MAX(CASE WHEN TIMESTAMPDIFF(MONTH, e.date, c.churn_date) = 1 THEN e.30_day_avg_engagement END) AS month_1,
    MAX(CASE WHEN TIMESTAMPDIFF(MONTH, e.date, c.churn_date) = 2 THEN e.30_day_avg_engagement END) AS month_2,
    MAX(CASE WHEN TIMESTAMPDIFF(MONTH, e.date, c.churn_date) = 3 THEN e.30_day_avg_engagement END) AS month_3,
    MAX(CASE WHEN TIMESTAMPDIFF(MONTH, e.date, c.churn_date) = 4 THEN e.30_day_avg_engagement END) AS month_4,
    MAX(CASE WHEN TIMESTAMPDIFF(MONTH, e.date, c.churn_date) = 5 THEN e.30_day_avg_engagement END) AS month_5,
    MAX(CASE WHEN TIMESTAMPDIFF(MONTH, e.date, c.churn_date) = 6 THEN e.30_day_avg_engagement END) AS month_6,
    MAX(CASE WHEN TIMESTAMPDIFF(MONTH, e.date, c.churn_date) = 7 THEN e.30_day_avg_engagement END) AS month_7,
    MAX(CASE WHEN TIMESTAMPDIFF(MONTH, e.date, c.churn_date) = 8 THEN e.30_day_avg_engagement END) AS month_8,
    MAX(CASE WHEN TIMESTAMPDIFF(MONTH, e.date, c.churn_date) = 9 THEN e.30_day_avg_engagement END) AS month_9,
    MAX(CASE WHEN TIMESTAMPDIFF(MONTH, e.date, c.churn_date) = 10 THEN e.30_day_avg_engagement END) AS month_10,
    MAX(CASE WHEN TIMESTAMPDIFF(MONTH, e.date, c.churn_date) = 11 THEN e.30_day_avg_engagement END) AS month_11
    FROM
    dev_engagement e
    JOIN
    dev_churn c ON e.customer_id = c.customer_id
    WHERE
    c.churn_date >= e.date AND TIMESTAMPDIFF(MONTH, e.date, c.churn_date) BETWEEN 0 AND 11
    GROUP BY
    e.customer_id, e.customer_name, c.churn_date;

    You end up with an output like:

    From sources (sample data, not intended to be representative of your real dataset, as I think it's likely you wouldn't have engagement scores from churned customers)

  • marcel_luthi
    marcel_luthi Coach
    Answer ✓

    You can do this with ETL, but since your Score in Table 1 is daily and you want to show it monthly, you'll need to decide how that should be handled, would if be an average of the average (which is not often recommended), the max score in the period or the last score for the month, depending on this, the way to build the ETL will chance slightly. Something like:

    1. Do a Tile that will aggregate Table 1 per customer Id and Month, and give you a single value for each month. Output would be 3 columns Score Month (date), CustomerId (String), Score (Number).
    2. Then you'll do a Full Outer Join of this new table with your Table 2 where you have the Churn Dates on CustomerId.
    3. Do a Formula tile that Calculates the Month Difference between the Score Month and the Churn Date, rounded.
    4. Filter the outcome to include only those where the difference is between 0 and 11 (or 1 and 12).

    This should get you what you need, the most tricky part is clearly defining how the monthly entries should be summarized to provide a single monthly number.

    Let us know if this helps.