Can you please help with the following grouping: Name can have states that can be repeated several times and it is essential in output to have all the state changes over time and to have the starting date of the state
Here is the input data:
| name | state | date ------------------------------ | A | X | 01.03.2021 | A | X | 02.03.2021 | A | X | 03.03.2021 | A | Y | 04.03.2021 | A | Y | 05.03.2021 | A | X | 06.03.2021 | A | X | 07.03.2021 | B | S | 01.03.2021 | B | S | 02.03.2021 | B | T | 03.03.2021 | B | T | 04.03.2021 | B | T | 05.03.2021 | B | T | 06.03.2021 | B | S | 07.03.2021
Desired output
| name | state | date ------------------------------ | A | X | 01.03.2021 | A | Y | 04.03.2021 | A | X | 06.03.2021 | B | S | 01.03.2021 | B | T | 03.03.2021 | B | S | 07.03.2021
Advertisement
Answer
Assuming you want to find the start date per consecutive sequence of state per name, this is one solution:
WITH cte1 AS ( SELECT name , ROW_NUMBER() OVER (PARTITION BY name ORDER BY date) - ROW_NUMBER() OVER (PARTITION BY name, state ORDER BY date) AS n , state , date FROM logs ) SELECT name , MIN(state) AS state , MIN(date) AS min_date FROM cte1 GROUP BY name, n ORDER BY name, min_date ;
Result:
+------+-------+------------+ | name | state | min_date | +------+-------+------------+ | A | X | 2021-03-01 | | A | Y | 2021-03-04 | | A | X | 2021-03-06 | | B | S | 2021-03-01 | | B | T | 2021-03-03 | | B | S | 2021-03-07 | +------+-------+------------+
The basic form works with most databases, PG, MySQL 8.0, MariaDB 10.2.2+, SQL Server, Oracle (with some adjustments in “date” quoting, type (VARCHAR2
) and INSERT
form).