Given a table that has sometimes repeated statuses within a group (in this case “vehicles”), I want to consolidate those statuses into a single row and aggregate status_seconds. The data looks like this (I’ll include some TSQL below to select dummy data into a temp table to make it easy to work with this example)
I want to, for example, consolidate the first three rows of this table for vehicle T101 into a single row with status_seconds = 1+1+2 (4 seconds). For the dummy data, these are the vehicles with consecutive status rows that need to be consolidated.
Note that in rows 5-7, the travelling status for T101 is broken up by a T102 status.
This seems like a problem for a recursive CTE to me, but I’m finding it difficult to solve.
So far, I’ve been able to identify the anchor nodes for the above. I.e. for each vehicle, I can identify the last occurrence of a status for a vehicle. Here’s the dummy data and a CTE that identifies Anchor nodes.
CREATE TABLE ##vehiclesAndStates ( id INT, vehicle_name VARCHAR(30), vehicle_status VARCHAR(30), status_end_time DATETIME, status_seconds INT ) INSERT INTO ##vehiclesAndStates VALUES (100, 'T101', 'STOPPED', '2020-12-04 09:43:18.000', 1) ,(801, 'T101', 'STOPPED', '2020-12-04 09:43:19.000', 1) ,(745, 'T101', 'STOPPED', '2020-12-04 09:43:20.000', 2) ,(925, 'T101', 'TURNING', '2020-12-04 09:43:22.000', 1) ,(626, 'T101', 'TRAVELLING', '2020-12-04 09:43:23.000', 10) ,(401, 'T102', 'STOPPED', '2020-12-04 09:43:23.000', 10) ,(201, 'T101', 'TRAVELLING', '2020-12-04 09:43:33.000', 1) ,(808, 'T102', 'STOPPPED', '2020-12-04 09:43:33.000', 3) ,(707, 'T102', 'STOPPPED', '2020-12-04 09:43:35.000', 7) ,(888, 'T101', 'TURNING', '2020-12-04 09:43:34.000', 1) ,(42, 'T101', 'STOPPED', '2020-12-04 09:43:35.000', 3) ,(2, 'T102', 'PARKED', '2020-12-04 09:43:35.000', 10) ,(911, 'T101', 'TRAVELLING', '2020-12-04 09:43:35.000', 1) SELECT * FROM ##vehiclesAndStates -- identify anchor nodes: rows where the previous status for a vehicle was different ;with cte_AnchorNodes as ( SELECT i.* FROM ( SELECT a.ID ,a.vehicle_name ,a.vehicle_status ,a.status_end_time ,a.status_seconds ,previous_vehicle_status = LAG(a.vehicle_status,1) OVER ( ORDER BY a.vehicle_name, a.status_end_time ) ,previous_ID = LAG(a.ID,1) OVER ( ORDER BY a.vehicle_name, a.status_end_time ) FROM ##vehiclesAndStates a ) i WHERE i.vehicle_status <> IsNull(i.previous_vehicle_status, 'Handle Nulls') )
RESULT
However, I’m struggling to make the recursive CTE work:
–Select * From cte_AnchorNodes a order by a.vehicle_name, a.status_end_time
,cteRecursiveStatuses (Id, VehicleName, VehicleStatus, StatusEndTime, recursionDepth) AS ( SELECT a.ID ,a.vehicle_name ,a.vehicle_status ,a.status_end_time ,0 recursionDepth FROM cte_AnchorNodes a UNION ALL SELECT ?? FROM ##vehiclesAndStates b JOIN cteRecursiveStatuses r ON r.Id = ?? AND b.vehicle_status = r.VehicleStatus ) Select * From cteRecursiveStatuses DROP TABLE ##vehiclesAndStates
Advertisement
Answer
This is a typical gaps-and-islands problem, where you want to group together “adjacent” rows that share the same vehicle and status (the islands).
You don’t need a recursive query for this: window functions can get this done. Here, the simplest approach probably is to use the difference between row numbers to identify the groups.
select vehicle_name, vehicle_status, min(status_end_time) as min_status_end_time, max(status_end_time) as max_status_end_time, sum(status_seconds) as sum_status_seconds from ( select vs.*, row_number() over(partition by vehicle_name order by status_end_time) rn1, row_number() over(partition by vehicle_name, vehicle_status order by status_end_time) rn2 from ##vehiclesAndStates vs ) t group by vehicle_name, vehicle_status, rn1 - rn2 order by vehicle_name, min(status_end_time)
You can run the subquery separately and look how the row numbers change to understand more.
For your sample data, the query returns:
vehicle_name | vehicle_status | min_status_end_time | max_status_end_time | sum_status_seconds :----------- | :------------- | :---------------------- | :---------------------- | -----------------: T101 | STOPPED | 2020-12-04 09:43:18.000 | 2020-12-04 09:43:20.000 | 4 T101 | TURNING | 2020-12-04 09:43:22.000 | 2020-12-04 09:43:22.000 | 1 T101 | TRAVELLING | 2020-12-04 09:43:23.000 | 2020-12-04 09:43:33.000 | 11 T101 | TURNING | 2020-12-04 09:43:34.000 | 2020-12-04 09:43:34.000 | 1 T101 | TRAVELLING | 2020-12-04 09:43:35.000 | 2020-12-04 09:43:35.000 | 1 T101 | STOPPED | 2020-12-04 09:43:35.000 | 2020-12-04 09:43:35.000 | 3 T102 | STOPPED | 2020-12-04 09:43:23.000 | 2020-12-04 09:43:23.000 | 10 T102 | STOPPPED | 2020-12-04 09:43:33.000 | 2020-12-04 09:43:35.000 | 10 T102 | PARKED | 2020-12-04 09:43:35.000 | 2020-12-04 09:43:35.000 | 10