window function

I have the following table and i would like to understand how to get the difference between dates:

id | status | date

1 | disconnect | april 4

1 | reconnect | april 7

2 | disconnect | june 4

I would like to understand the average days that persons reconnect in the entire dataset. Can a partition function work here? The result here should see customer 1 it takes 3 days to reconnect.

Any help would be appreciated. Thanks!

Tagged:

Best Answers

  • marcel_luthi
    marcel_luthi Coach
    edited November 2023 Answer ✓

    Yes, a Lag function of 1 (Window and Rank Tile), partitioned by customer and sorted by date ASC, should be able to add next to each row the date of the previous entry, so your example would become:

    id | status | date | previous date

    1 | disconnect | april 4 |

    1 | reconnect | april 7 | april 4

    2 | disconnect | june 4 |

    From here, you can calculate the time from the previous date to the date and that will tell you the number of days between changes. Do a card that AVGs this time by the customer (id), filtering only to Status of Reconnect and you should get what you're looking for. (You can change the aggregation for different answers)

  • DavidChurchman
    Answer ✓

    Is it possible to have two disconnects in a row for an ID or does it always alternate disconnect/reconnect? If it always alternates, you could rank by id/status, split the dataset into disconnects and reconnects, and then join on id and rank. So first disconnect joins with first reconnect.

    If the pattern breaks, that doesn't work, though.

    This feels a bit hacked together, but I played with this solution: use filters to split your dataset into Disconnects and Reconnects, do a full join to multiple every disconnect by ever reconnect. Then, filter for just reconnects with dates greater than disconnects (or no reconnnects). Then rank disconnects by reconnect date and filter for just the 1st reconnect after the disconnect. Something like this:

    The formula tile:

    Kind of inefficient since it multiplies the dataset, so I'd be curious if someone has something more elegant for that case.

    Please 💡/💖/👍/😊 this post if you read it and found it helpful.

    Please accept the answer if it solved your problem.

Answers

  • marcel_luthi
    marcel_luthi Coach
    edited November 2023 Answer ✓

    Yes, a Lag function of 1 (Window and Rank Tile), partitioned by customer and sorted by date ASC, should be able to add next to each row the date of the previous entry, so your example would become:

    id | status | date | previous date

    1 | disconnect | april 4 |

    1 | reconnect | april 7 | april 4

    2 | disconnect | june 4 |

    From here, you can calculate the time from the previous date to the date and that will tell you the number of days between changes. Do a card that AVGs this time by the customer (id), filtering only to Status of Reconnect and you should get what you're looking for. (You can change the aggregation for different answers)

  • thank you for this, very helpful. So, what if the data looked like this? If I did this with the partition, it would pick up the last disconnect date? I would only want to see the difference between the dates between disconnect and reconnect.

    id | status | date

    1 | disconnect | april 4

    1 | reconnect | april 7

    2 | disconnect | june 4

    1 | disconnect | may 5th

  • DavidChurchman
    Answer ✓

    Is it possible to have two disconnects in a row for an ID or does it always alternate disconnect/reconnect? If it always alternates, you could rank by id/status, split the dataset into disconnects and reconnects, and then join on id and rank. So first disconnect joins with first reconnect.

    If the pattern breaks, that doesn't work, though.

    This feels a bit hacked together, but I played with this solution: use filters to split your dataset into Disconnects and Reconnects, do a full join to multiple every disconnect by ever reconnect. Then, filter for just reconnects with dates greater than disconnects (or no reconnnects). Then rank disconnects by reconnect date and filter for just the 1st reconnect after the disconnect. Something like this:

    The formula tile:

    Kind of inefficient since it multiplies the dataset, so I'd be curious if someone has something more elegant for that case.

    Please 💡/💖/👍/😊 this post if you read it and found it helpful.

    Please accept the answer if it solved your problem.