Skip to content
Advertisement

Count actions that occur only after a certain time from the previous action

I have a dataset in BigQuery where I want to count all non-duplicate actions. A duplicate action is one that occurs within a certain timeframe or ‘countdown’ from prior actions. Once the countdown reaches 0, the timer resets and the very next action is no longer considered to be duplicate.

In my example, the countdown timer starts at 10 seconds.

  • Sample code: http://sqlfiddle.com/#!17/2dd69/1
    • HitTime: Timestamp (seconds) for each row
    • DesiredOutput: What I am trying to create. Does not exist in my dataset
    • DesiredOutputExplation: An explanation to help explain the logic. Does not exist in my dataset

enter image description here

Advertisement

Answer

So I took the advice from shawnt00 and produced the following output in BigQuery.

DECLARE counter INT64 DEFAULT 1;
DECLARE timerStart INT64 DEFAULT 10;
DECLARE maxRows INT64;
DECLARE countdown INT64;


CREATE TEMP TABLE myData
(HitTime INT64);

INSERT INTO myData VALUES
(0),(21),(24),(30),(32);

CREATE TEMP TABLE myData2 AS
SELECT 
  *,
  ROW_NUMBER() OVER(ORDER BY HitTime) AS rowNum,
  HitTime - LAG(HitTime, 1) OVER(ORDER BY HitTime) AS timeBetween,
  0 AS Output
FROM myData;

SET maxRows = (SELECT COUNT(*) FROM myData);
SET countdown = timerStart;

WHILE counter<=maxRows DO

  IF 
    (SELECT timeBetween FROM  myData2 WHERE rowNum = counter) IS NULL OR
    (SELECT timeBetween FROM  myData2 WHERE rowNum = counter) >= countdown THEN 
    UPDATE myData2 SET Output = 1 WHERE rowNum = counter;
    SET countdown = timerStart;
  ELSE
    UPDATE myData2 SET Output = 0 WHERE rowNum = counter;
    SET countdown = countdown - (SELECT timeBetween FROM  myData2 WHERE rowNum = counter);
  END IF;
  
  SET counter=counter+1;

END WHILE;

SELECT HitTime, Output FROM myData2 
ORDER BY HitTime;

While this does produce my desired output, I am unsure if there is a more effective way to do this (I’m pretty new to T-SQL). Especially considering my production dataset has ~800M rows in it.

User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement