In Google BigQuery, I have a list of events in a single support session which are tagged by event names. Each support issue resolved_time
is the timestamp of the resolved
event. Each issue start_time
is the first occurrence of either of the events message
, pending
, or unresolved
either at the absolute beginning of the session or following a resolved tag.
In other words, the state of the conversation is either open or resolved. The state becomes open with the first occurrence of a message
, pending
, or unresolved
event, and is closed with a resolved
event.
Currently I have the table with timestamp
and event_name
only. Would like to join in start_time
and resolved_time
.
In the example below, this results in 4 separate support issues. Three being resolved with resolved
events and the last being unresolved since it is not closed with a resolved
event.
timestamp | event_name | start_time | resolved_time |
---|---|---|---|
2021-01-15 20:27:59 UTC | unresolved | 2021-01-15 20:27:59 UTC | 2021-01-20 19:31:37 UTC |
2021-01-16 03:02:46 UTC | message | 2021-01-15 20:27:59 UTC | 2021-01-20 19:31:37 UTC |
2021-01-20 19:31:37 UTC | resolved | 2021-01-15 20:27:59 UTC | 2021-01-20 19:31:37 UTC |
2021-01-21 00:13:43 UTC | pending | 2021-01-21 00:13:43 UTC | 2021-01-23 23:38:46 UTC |
2021-01-23 23:38:46 UTC | resolved | 2021-01-21 00:13:43 UTC | 2021-01-23 23:38:46 UTC |
2021-01-24 00:38:17 UTC | message | 2021-01-24 00:38:17 UTC | 2021-01-24 02:19:44 UTC |
2021-01-24 00:42:31 UTC | unresolved | 2021-01-24 00:38:17 UTC | 2021-01-24 02:19:44 UTC |
2021-01-24 02:19:44 UTC | resolved | 2021-01-24 00:38:17 UTC | 2021-01-24 02:19:44 UTC |
2021-01-25 15:55:50 UTC | message | 2021-01-25 15:55:50 UTC | NULL |
2021-01-25 15:59:55 UTC | unresolved | 2021-01-25 15:55:50 UTC | NULL |
WITH sample_table AS ( SELECT TIMESTAMP("2021-01-15 20:27:59") `timestamp` , "unresolved" event_name UNION ALL SELECT TIMESTAMP("2021-01-16 03:02:46") , "message" UNION ALL SELECT TIMESTAMP("2021-01-20 19:31:37") , "resolved" UNION ALL SELECT TIMESTAMP("2021-01-21 00:13:43") , "pending" UNION ALL SELECT TIMESTAMP("2021-01-23 23:38:46") , "resolved" UNION ALL SELECT TIMESTAMP("2021-01-24 00:38:17") , "message" UNION ALL SELECT TIMESTAMP("2021-01-24 00:42:31") , "unresolved" UNION ALL SELECT TIMESTAMP("2021-01-24 02:19:44") , "resolved" UNION ALL SELECT TIMESTAMP("2021-01-25 15:55:50") , "message" UNION ALL SELECT TIMESTAMP("2021-01-25 15:59:55") , "unresolved" ) SELECT * FROM sample_table ORDER BY timestamp ASC
Thank you for any help in advance.
Advertisement
Answer
Moving MAX
and MIN
could be a good fit here:
WITH sample_table AS ( SELECT TIMESTAMP("2021-01-15 20:27:59") timestamp, "unresolved" event_name UNION ALL SELECT TIMESTAMP("2021-01-16 03:02:46"), "message" UNION ALL SELECT TIMESTAMP("2021-01-20 19:31:37"), "resolved" UNION ALL SELECT TIMESTAMP("2021-01-21 00:13:43"), "pending" UNION ALL SELECT TIMESTAMP("2021-01-23 23:38:46"), "resolved" UNION ALL SELECT TIMESTAMP("2021-01-24 00:38:17"), "message" UNION ALL SELECT TIMESTAMP("2021-01-24 00:42:31"), "unresolved" UNION ALL SELECT TIMESTAMP("2021-01-24 02:19:44"), "resolved" UNION ALL SELECT TIMESTAMP("2021-01-25 15:55:50"), "message" UNION ALL SELECT TIMESTAMP("2021-01-25 15:59:55"), "unresolved" ) SELECT *, MAX(IF(prev_event='resolved' OR prev_event IS NULL, timestamp, NULL)) OVER (ORDER BY timestamp ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS start_time, MIN(IF(event_name='resolved', timestamp, NULL)) OVER (ORDER BY timestamp ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS resolved_time, FROM ( SELECT *, LAG(event_name) OVER (ORDER BY timestamp) as prev_event, FROM sample_table )