Skip to content
Advertisement

SQL OR Linq: How to categorize into different batches when category changes

I have the following data:

Timestamp           Min     MAX     Category
2019-03-22 08:10    12.00   5.10    AAAA
2019-03-22 08:11    10.00   5.20    AAAA
2019-03-22 08:12    11.00   4.90    AAAA
2019-03-22 08:13    12.00   5.30    BBBB
2019-03-22 08:14    14.00   5.20    BBBB
2019-03-22 08:15    9.40    5.20    CCCC
2019-03-22 08:16    7.40    5.10    CCCC
2019-03-22 08:17    14.00   5.10    AAAA
2019-03-22 08:17    14.00   5.10    AAAA

I am looking for a SQL (or Linq) query to group this data based on the Category changes to be able to see the start and end of each category over time.

Advertisement

Answer

If I followed you correctly, you are dealing with a gaps and island problem.

In pure SQL, this is usually solved using window functions and aggregation. The difference between row numbers gives you the group each record belongs to:

SELECT 
    category, 
    MIN(timestamp) start_timestap, 
    MAX(timestamp) end_timestap
FROM (
  SELECT
      t.*,
      ROW_NUMBER() OVER(ORDER BY Timestamp) rn1,
      ROW_NUMBER() OVER(PARTITION BY Category ORDER BY Timestamp) rn2
  FROM mytable t
) x
GROUP BY category, rn1 - rn2
ORDER BY start_timestap

In this DB Fiddle with your sample data, this returns:

| category | start_timestap      | end_timestap        |
| -------- | ------------------- | ------------------- |
| AAAA     | 2019-03-22 08:10:00 | 2019-03-22 08:12:00 |
| BBBB     | 2019-03-22 08:13:00 | 2019-03-22 08:14:00 |
| CCCC     | 2019-03-22 08:15:00 | 2019-03-22 08:16:00 |
| AAAA     | 2019-03-22 08:17:00 | 2019-03-22 08:17:00 |

NB: this is a MySQL 8.0 DB Fiddle, however this is standard ANSI SQL that should work on any platform that supports window functions.

User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement