Skip to content
Advertisement

Bigquery split rows by timestamp of event

In Google BigQuery, I have a list of events in a single support session which are tagged by event names. Each support issue resolved_time is the timestamp of the resolved event. Each issue start_time is the first occurrence of either of the events message, pending, or unresolved either at the absolute beginning of the session or following a resolved tag.

In other words, the state of the conversation is either open or resolved. The state becomes open with the first occurrence of a message, pending, or unresolved event, and is closed with a resolved event.

Currently I have the table with timestamp and event_name only. Would like to join in start_time and resolved_time.

In the example below, this results in 4 separate support issues. Three being resolved with resolved events and the last being unresolved since it is not closed with a resolved event.

timestamp event_name start_time resolved_time
2021-01-15 20:27:59 UTC unresolved 2021-01-15 20:27:59 UTC 2021-01-20 19:31:37 UTC
2021-01-16 03:02:46 UTC message 2021-01-15 20:27:59 UTC 2021-01-20 19:31:37 UTC
2021-01-20 19:31:37 UTC resolved 2021-01-15 20:27:59 UTC 2021-01-20 19:31:37 UTC
2021-01-21 00:13:43 UTC pending 2021-01-21 00:13:43 UTC 2021-01-23 23:38:46 UTC
2021-01-23 23:38:46 UTC resolved 2021-01-21 00:13:43 UTC 2021-01-23 23:38:46 UTC
2021-01-24 00:38:17 UTC message 2021-01-24 00:38:17 UTC 2021-01-24 02:19:44 UTC
2021-01-24 00:42:31 UTC unresolved 2021-01-24 00:38:17 UTC 2021-01-24 02:19:44 UTC
2021-01-24 02:19:44 UTC resolved 2021-01-24 00:38:17 UTC 2021-01-24 02:19:44 UTC
2021-01-25 15:55:50 UTC message 2021-01-25 15:55:50 UTC NULL
2021-01-25 15:59:55 UTC unresolved 2021-01-25 15:55:50 UTC NULL
WITH sample_table AS (
    SELECT  TIMESTAMP("2021-01-15 20:27:59") `timestamp` , "unresolved" event_name UNION ALL
    SELECT  TIMESTAMP("2021-01-16 03:02:46") , "message"  UNION ALL
    SELECT  TIMESTAMP("2021-01-20 19:31:37") , "resolved"  UNION ALL
    SELECT  TIMESTAMP("2021-01-21 00:13:43") , "pending"  UNION ALL
    SELECT  TIMESTAMP("2021-01-23 23:38:46") , "resolved" UNION ALL
    SELECT  TIMESTAMP("2021-01-24 00:38:17") , "message" UNION ALL
    SELECT  TIMESTAMP("2021-01-24 00:42:31") , "unresolved" UNION ALL
    SELECT  TIMESTAMP("2021-01-24 02:19:44") , "resolved" UNION ALL
    SELECT  TIMESTAMP("2021-01-25 15:55:50") , "message" UNION ALL
    SELECT  TIMESTAMP("2021-01-25 15:59:55") , "unresolved" )

SELECT * FROM sample_table
ORDER BY timestamp ASC

Thank you for any help in advance.

Advertisement

Answer

Moving MAX and MIN could be a good fit here:

WITH sample_table AS (
    SELECT  TIMESTAMP("2021-01-15 20:27:59") timestamp, "unresolved" event_name UNION ALL
    SELECT  TIMESTAMP("2021-01-16 03:02:46"), "message" UNION ALL
    SELECT  TIMESTAMP("2021-01-20 19:31:37"), "resolved" UNION ALL
    SELECT  TIMESTAMP("2021-01-21 00:13:43"), "pending" UNION ALL
    SELECT  TIMESTAMP("2021-01-23 23:38:46"), "resolved" UNION ALL
    SELECT  TIMESTAMP("2021-01-24 00:38:17"), "message" UNION ALL
    SELECT  TIMESTAMP("2021-01-24 00:42:31"), "unresolved" UNION ALL
    SELECT  TIMESTAMP("2021-01-24 02:19:44"), "resolved" UNION ALL
    SELECT  TIMESTAMP("2021-01-25 15:55:50"), "message" UNION ALL
    SELECT  TIMESTAMP("2021-01-25 15:59:55"), "unresolved"
)
SELECT 
  *, 
  MAX(IF(prev_event='resolved' OR prev_event IS NULL, timestamp, NULL)) OVER (ORDER BY timestamp ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS start_time,
  MIN(IF(event_name='resolved', timestamp, NULL)) OVER (ORDER BY timestamp ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS resolved_time,
FROM (
  SELECT 
    *,
    LAG(event_name) OVER (ORDER BY timestamp) as prev_event,
  FROM sample_table
)

enter image description here

User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement