I have a table in which sensor events are recorded. And a random log is selected for testing every hour at a random time. I want to find N events which are within T seconds of each hourly picking.
Data looks like this
eventID eventTs picked_for_testing
123 2021-04-01 13:21:45 0
3532 2020-12-16 08:31:12 1
987341 2021-04-01 13:21:41 1
Event 1 and 3 both happened within 4 seconds of each other. I want to find all such events that happened within say 10 seconds.
I was able to get it by doing a cross join of all the selected events with non-selected and then did a time diff between the eventTs and then filtered it. But it is taking too long because of the number of events in the data.
Is there better way to do it?
Advertisement
Answer
For that requirements BigQuery analytic functions seems to be the way to go. In particular navigation functions.
It is quite hard to give you a proper query without knowing the exact outcome you are looking for. I’ll assume you only want a table with the following schema:
eventID, next_eventID_within_10
So, this query will do the job:
WITH data as (
SELECT 123 as eventID, timestamp("2021-04-01 13:21:45") as eventTs, 0 as picked_for_testing
UNION ALL
SELECT 3532 as eventID, timestamp("2020-12-16 08:31:12") as eventTs, 1 as picked_for_testing
UNION ALL
SELECT 987341 as eventID, timestamp("2021-04-01 13:21:41") as eventTs, 1 as picked_for_testing
UNION ALL
SELECT 87345874 as eventID, timestamp("2021-04-01 13:21:49") as eventTs, 1 as picked_for_testing)
, tmp as (
SELECT
eventID,
picked_for_testing,
eventTs,
IF(TIMESTAMP_DIFF(LEAD(eventTs) OVER (ORDER BY eventTs), eventTs, SECOND) <= 10, LEAD(eventID) OVER (ORDER BY eventTs), Null) as next_eventID_within_10
FROM data)
SELECT eventID, next_eventID_within_10
FROM tmp
WHERE next_eventID_within_10 IS NOT NULL