I want to find a way to use a window function to find the duration a temperature variable was above a threshold, the timestamp at which the the temperature was above a threshold, and the range of temperatures when they were above the threshold.
The data is organized as an identifier column, the timestamp column, and the temperature column. Below is an example of the data that I am working with.
Identifier | Timestamp | Temperature |
---|
1 | 2022-07-08 10:00:00 | 47 |
1 | 2022-07-08 10:30:00 | 53 |
1 | 2022-07-08 11:00:00 | 57 |
1 | 2022-07-08 11:30:00 | 59 |
1 | 2022-07-08 12:00:00 | 44 |
2 | 2022-07-08 10:00:00 | 48 |
| . . . | |
Let's say that my threshold temperature is 50. I would thus want to find the time until the temperature goes back below 50, find the timestamp when the temperature was first above the threshold, find the range of these temperatures as the max and min, and find the Identifier that corresponds to the temperature going above the threshold. For example in the above table, I would want the following:
'Identifier: 1, Timestamp of first excursion: 2022-07-08 10:30:00, Temperature Range: 53-59, Excursion Duration: 1 hour 30 minutes'.
This dataset has over 400 million rows so I believe that the windowing function will be much more helpful than Magic ETL.
I am open to any suggestions and please feel free to send me any documentation that you think may be helpful.
Thanks! 😀