I’m trying to get the latest date of starting event before it occurs again.
Data I have:
x
+-----+------------------+--------+--------+--+
| ID | DATE | REGION | STATUS | |
+-----+------------------+--------+--------+--+
| 200 | 2020-04-10 6:00 | USA | START | |
| 200 | 2020-04-10 7:00 | USA | ACTIVE | |
| 200 | 2020-04-10 13:00 | USA | STOP | |
| 200 | 2020-04-10 15:00 | USA | START | |
| 200 | 2020-04-10 16:00 | USA | ACTIVE | |
| 200 | 2020-04-10 19:00 | USA | ACTIVE | |
| 200 | 2020-04-10 20:00 | USA | STOP | |
+-----+------------------+--------+--------+--+
What I want to make:
+-----+------------------+--------+--------+------------------+
| ID | DATE | REGION | STATUS | LAST_START_DATE |
+-----+------------------+--------+--------+------------------+
| 200 | 2020-04-10 6:00 | USA | START | 2020-04-10 6:00 |
| 200 | 2020-04-10 7:00 | USA | ACTIVE | 2020-04-10 6:00 |
| 200 | 2020-04-10 13:00 | USA | STOP | 2020-04-10 6:00 |
| 200 | 2020-04-10 15:00 | USA | START | 2020-04-10 15:00 |
| 200 | 2020-04-10 16:00 | USA | ACTIVE | 2020-04-10 15:00 |
| 200 | 2020-04-10 20:00 | USA | STOP | 2020-04-10 15:00 |
| 200 | 2020-04-10 20:00 | USA | STOP | 2020-04-10 15:00 |
+-----+------------------+--------+--------+------------------+
I want to create another column with the latest start time and record it, then update it whenever it happens, and keep that value from there until start appears again.
What I tried:
SELECT id, date, region, status, last_start_date
FROM (SELECT t1.*,
MAX(CASE WHEN status = 'START' THEN date END) OVER (PARTITION BY id,region) as last_start_date
FROM table1 t1
) t1
GROUP BY id,region,status
This didn’t work since MAX only returns the latest one showing up, which isn’t what I was trying to do:
+-----+------------------+--------+--------+------------------+
| ID | DATE | REGION | STATUS | LAST_START_DATE |
+-----+------------------+--------+--------+------------------+
| 200 | 2020-04-10 6:00 | USA | START | 2020-04-10 15:00 |
| 200 | 2020-04-10 7:00 | USA | ACTIVE | 2020-04-10 15:00 |
| 200 | 2020-04-10 13:00 | USA | STOP | 2020-04-10 15:00 |
| 200 | 2020-04-10 15:00 | USA | START | 2020-04-10 15:00 |
| 200 | 2020-04-10 16:00 | USA | ACTIVE | 2020-04-10 15:00 |
| 200 | 2020-04-10 20:00 | USA | STOP | 2020-04-10 15:00 |
+-----+------------------+--------+--------+------------------+
What would be the right approach here to achieve my goal?
Advertisement
Answer
You almost did it – you just need to add order by
clause:
DECLARE @DataSource TABLE
(
[ID] INT
,[DATE] DATETIME2(0)
,[REGION] CHAR(3)
,[STATUS] VARCHAR(12)
);
INSERT INTO @DataSource ([ID], [DATE], [REGION], [STATUS])
VALUES ('200', '2020-04-10 6:00', 'USA', 'START')
,('200', '2020-04-10 7:00', 'USA', 'ACTIVE')
,('200', '2020-04-10 13:00', 'USA', 'STOP')
,('200', '2020-04-10 15:00', 'USA', 'START')
,('200', '2020-04-10 16:00', 'USA', 'ACTIVE')
,('200', '2020-04-10 19:00', 'USA', 'ACTIVE')
,('200', '2020-04-10 20:00', 'USA', 'STOP');
SELECT *
,MAX(CASE WHEN status = 'START' THEN date END) OVER (PARTITION BY id,region ORDER BY [DATE])
FROM @DataSource;