Output number of non-consecutive failures from historical data in Bigquery

Question

this is related to my previous scenario. I have a dataset like this: Aside from outputting the timestamp in which a user first committed a failure, and consecutively commits a failure status every day, leading up to today (2022-04-29), I also want to output the non-consecutive block of days in which Karl or A…

Accepted Answer

Take a look at below query although it&#8217;s not refined yet. Hoping help you find some clue to approach your problem.failure_blocks is for figuring out each consecutive failing days.last_blocks is for finding last failing block to identify started_recently_failing_timestampMain query generates the expected output from previous CTEs.WITH failure_table AS (  SELECT 'Andrea' AS name, 'Failure' AS status, TIMESTAMP '2022-04-28 4:00:00' AS dt   UNION ALL SELECT 'Karl', 'Failure', '2022-04-28 4:00:00'   UNION ALL SELECT 'Andrea', 'Failure', '2022-04-27 4:00:00'   UNION ALL SELECT 'Karl', 'Failure', '2022-04-27 4:00:00'   UNION ALL SELECT 'Andrea', 'Failure', '2022-04-26 4:00:00'   UNION ALL SELECT 'Andrea', 'Failure', '2022-04-25 4:00:00'   UNION ALL SELECT 'Andrea', 'Failure', '2022-03-30 4:00:00'   UNION ALL SELECT 'Andrea', 'Failure', '2022-03-29 4:00:00'   UNION ALL SELECT 'Andrea', 'Failure', '2022-03-28 4:00:00'   UNION ALL SELECT 'Karl', 'Failure', '2022-03-28 4:00:00'   UNION ALL SELECT 'Andrea', 'Failure', '2022-03-15 4:00:00'),failure_blocks AS (  SELECT *,         COUNTIF(diff <> 1) OVER (PARTITION BY name) AS total_failure_blocks,         COUNT(*) OVER (PARTITION BY name) AS total_days_failing,         SUM(diff - 1) OVER (PARTITION BY name ORDER BY dt) AS block,    FROM (      SELECT name, status, dt, IFNULL(DATE_DIFF(dt, LAG(dt) OVER (PARTITION BY name ORDER BY dt), DAY), 0) AS diff        FROM failure_table    )),last_blocks AS (SELECT * EXCEPT(diff, block),        COUNT(*) OVER (PARTITION BY name, block) AS recent_days_failing,       FIRST_VALUE(dt) OVER (PARTITION BY name, block ORDER BY dt) AS block_start_dt  FROM failure_blocks)SELECT name, status,        MAX(block_start_dt) OVER (PARTITION BY name) AS started_recently_failing_timestamp,       recent_days_failing,       total_days_failing,       total_failure_blocks,  FROM last_blocks  WHERE TRUE QUALIFY dt = started_recently_failing_timestamp;

name	status	started recently failing timestamp	recent days failing	total days failing	total failure blocks
Andrea	Failure	2022-04-25 4:00:00	4	8	3
Karl	Failure	2022-04-27 4:00:00	2	3	2

Advertisement

Answer