I have a table in which sensor events are recorded. And a random log is selected for testing every hour at a random time. I want to find N events which are within T seconds of each hourly picking.
Data looks like this
eventID eventTs picked_for_testing 123 2021-04-01 13:21:45 0 3532 2020-12-16 08:31:12 1 987341 2021-04-01 13:21:41 1
Event 1 and 3 both happened within 4 seconds of each other. I want to find all such events that happened within say 10 seconds.
I was able to get it by doing a cross join of all the selected events with non-selected and then did a time diff between the eventTs and then filtered it. But it is taking too long because of the number of events in the data.
Is there better way to do it?
Advertisement
Answer
For that requirements BigQuery analytic functions seems to be the way to go. In particular navigation functions.
It is quite hard to give you a proper query without knowing the exact outcome you are looking for. I’ll assume you only want a table with the following schema:
eventID, next_eventID_within_10
So, this query will do the job:
WITH data as ( SELECT 123 as eventID, timestamp("2021-04-01 13:21:45") as eventTs, 0 as picked_for_testing UNION ALL SELECT 3532 as eventID, timestamp("2020-12-16 08:31:12") as eventTs, 1 as picked_for_testing UNION ALL SELECT 987341 as eventID, timestamp("2021-04-01 13:21:41") as eventTs, 1 as picked_for_testing UNION ALL SELECT 87345874 as eventID, timestamp("2021-04-01 13:21:49") as eventTs, 1 as picked_for_testing) , tmp as ( SELECT eventID, picked_for_testing, eventTs, IF(TIMESTAMP_DIFF(LEAD(eventTs) OVER (ORDER BY eventTs), eventTs, SECOND) <= 10, LEAD(eventID) OVER (ORDER BY eventTs), Null) as next_eventID_within_10 FROM data) SELECT eventID, next_eventID_within_10 FROM tmp WHERE next_eventID_within_10 IS NOT NULL