I’m trying to get the latest date of starting event before it occurs again.
Data I have:
+-----+------------------+--------+--------+--+ | ID | DATE | REGION | STATUS | | +-----+------------------+--------+--------+--+ | 200 | 2020-04-10 6:00 | USA | START | | | 200 | 2020-04-10 7:00 | USA | ACTIVE | | | 200 | 2020-04-10 13:00 | USA | STOP | | | 200 | 2020-04-10 15:00 | USA | START | | | 200 | 2020-04-10 16:00 | USA | ACTIVE | | | 200 | 2020-04-10 19:00 | USA | ACTIVE | | | 200 | 2020-04-10 20:00 | USA | STOP | | +-----+------------------+--------+--------+--+
What I want to make:
+-----+------------------+--------+--------+------------------+ | ID | DATE | REGION | STATUS | LAST_START_DATE | +-----+------------------+--------+--------+------------------+ | 200 | 2020-04-10 6:00 | USA | START | 2020-04-10 6:00 | | 200 | 2020-04-10 7:00 | USA | ACTIVE | 2020-04-10 6:00 | | 200 | 2020-04-10 13:00 | USA | STOP | 2020-04-10 6:00 | | 200 | 2020-04-10 15:00 | USA | START | 2020-04-10 15:00 | | 200 | 2020-04-10 16:00 | USA | ACTIVE | 2020-04-10 15:00 | | 200 | 2020-04-10 20:00 | USA | STOP | 2020-04-10 15:00 | | 200 | 2020-04-10 20:00 | USA | STOP | 2020-04-10 15:00 | +-----+------------------+--------+--------+------------------+
I want to create another column with the latest start time and record it, then update it whenever it happens, and keep that value from there until start appears again.
What I tried:
SELECT id, date, region, status, last_start_date FROM (SELECT t1.*, MAX(CASE WHEN status = 'START' THEN date END) OVER (PARTITION BY id,region) as last_start_date FROM table1 t1 ) t1 GROUP BY id,region,status
This didn’t work since MAX only returns the latest one showing up, which isn’t what I was trying to do:
+-----+------------------+--------+--------+------------------+ | ID | DATE | REGION | STATUS | LAST_START_DATE | +-----+------------------+--------+--------+------------------+ | 200 | 2020-04-10 6:00 | USA | START | 2020-04-10 15:00 | | 200 | 2020-04-10 7:00 | USA | ACTIVE | 2020-04-10 15:00 | | 200 | 2020-04-10 13:00 | USA | STOP | 2020-04-10 15:00 | | 200 | 2020-04-10 15:00 | USA | START | 2020-04-10 15:00 | | 200 | 2020-04-10 16:00 | USA | ACTIVE | 2020-04-10 15:00 | | 200 | 2020-04-10 20:00 | USA | STOP | 2020-04-10 15:00 | +-----+------------------+--------+--------+------------------+
What would be the right approach here to achieve my goal?
Advertisement
Answer
You almost did it – you just need to add order by
clause:
DECLARE @DataSource TABLE ( [ID] INT ,[DATE] DATETIME2(0) ,[REGION] CHAR(3) ,[STATUS] VARCHAR(12) ); INSERT INTO @DataSource ([ID], [DATE], [REGION], [STATUS]) VALUES ('200', '2020-04-10 6:00', 'USA', 'START') ,('200', '2020-04-10 7:00', 'USA', 'ACTIVE') ,('200', '2020-04-10 13:00', 'USA', 'STOP') ,('200', '2020-04-10 15:00', 'USA', 'START') ,('200', '2020-04-10 16:00', 'USA', 'ACTIVE') ,('200', '2020-04-10 19:00', 'USA', 'ACTIVE') ,('200', '2020-04-10 20:00', 'USA', 'STOP'); SELECT * ,MAX(CASE WHEN status = 'START' THEN date END) OVER (PARTITION BY id,region ORDER BY [DATE]) FROM @DataSource;