Get distinct values within time chunks

I have a table with different species and their occurences (timestamp) from field survey:

species | timestamp
A       | 2021-04-01T12:35
A       | 2021-04-01T12:36
B       | 2021-04-01T12:36
A       | 2021-04-01T12:37
A       | 2021-04-01T12:39
C       | 2021-04-01T12:40
B       | 2021-04-01T12:42
C       | 2021-04-01T12:44
C       | 2021-04-01T12:44
B       | 2021-04-01T12:47

​x
 
species | timestampA       | 2021-04-01T12:35A       | 2021-04-01T12:36B       | 2021-04-01T12:36A       | 2021-04-01T12:37A       | 2021-04-01T12:39C       | 2021-04-01T12:40B       | 2021-04-01T12:42C       | 2021-04-01T12:44C       | 2021-04-01T12:44B       | 2021-04-01T12:47​

Now I want to count them, not in total but in chunks of 10 minutes each, where only the first occurence is counted. That means between 12:31 and 12:40 there are species A,B,C. Between 12:41 and 12:50 there are species B,C.

For a single 10 minute chunk I can do something like

SELECT DISTINCT(species) 
FROM table
WHERE timestamp IS <condition>

 
SELECT DISTINCT(species) FROM tableWHERE timestamp IS <condition>​

But what I need is something like:

chunk_start_time | chunk_end_time   | species 
2021-04-01T12:31 | 2021-04-01T12:40 | A       
2021-04-01T12:31 | 2021-04-01T12:40 | B       
2021-04-01T12:31 | 2021-04-01T12:40 | C       
2021-04-01T12:41 | 2021-04-01T12:50 | B       
2021-04-01T12:41 | 2021-04-01T12:50 | C

 
chunk_start_time | chunk_end_time   | species 2021-04-01T12:31 | 2021-04-01T12:40 | A       2021-04-01T12:31 | 2021-04-01T12:40 | B       2021-04-01T12:31 | 2021-04-01T12:40 | C       2021-04-01T12:41 | 2021-04-01T12:50 | B       2021-04-01T12:41 | 2021-04-01T12:50 | C       ​

My timestamp has no seconds. That’s the reason why it is x1 to x0 minutes. How can I do the math in SQLite or should I better use Python pandas for that?

Answer

Use datetime function strftime() to produce the chunks:

SELECT DISTINCT
       strftime(
         '%Y-%m-%dT%H:%M', 
         timestamp, 
         '-' || CASE WHEN timestamp LIKE '%0' THEN 9 ELSE SUBSTR(timestamp, -1) - 1 END || ' minute'
       ) chunk_start_time,
       strftime(
         '%Y-%m-%dT%H:%M', 
         timestamp, 
         '+' || CASE WHEN timestamp LIKE '%0' THEN 0 ELSE 10 - SUBSTR(timestamp, -1) END || ' minute'
       ) chunk_end_time,
       species
FROM tablename
ORDER BY chunk_start_time, species;

 
SELECT DISTINCT       strftime(         '%Y-%m-%dT%H:%M',          timestamp,          '-' || CASE WHEN timestamp LIKE '%0' THEN 9 ELSE SUBSTR(timestamp, -1) - 1 END || ' minute'       ) chunk_start_time,       strftime(         '%Y-%m-%dT%H:%M',          timestamp,          '+' || CASE WHEN timestamp LIKE '%0' THEN 0 ELSE 10 - SUBSTR(timestamp, -1) END || ' minute'       ) chunk_end_time,       speciesFROM tablenameORDER BY chunk_start_time, species;​

See the demo.

Advertisement

Answer