Skip to content
Advertisement

SQL (BigQuery) Grouping Runtime Per Day

I have the following data which I want to group into seconds per day in BigQuery.
Source Table:

+--------------+---------------------+---------------------+
| ComputerName | StartDatetime       | EndDatetime         |
+--------------+---------------------+---------------------+
| Computer1    | 2020-06-10T21:01:28 | 2020-06-10T21:20:19 |
+--------------+---------------------+---------------------+
| Computer1    | 2020-06-10T22:54:01 | 2020-06-11T05:21:48 |
+--------------+---------------------+---------------------+
| Computer2    | 2020-06-08T09:11:54 | 2020-06-10T11:36:27 |
+--------------+---------------------+---------------------+

I want to be able to visualise the data in the following way

+------------+--------------+------------------+
| Date       | ComputerName | Runtime(Seconds) |
+------------+--------------+------------------+
| 2020-10-10 | Computer1    | 5089             |
+------------+--------------+------------------+
| 2020-10-11 | Computer1    | 19308            |
+------------+--------------+------------------+
| 2020-10-08 | Computer2    | 53285            |
+------------+--------------+------------------+
| 2020-10-09 | Computer2    | 86400            |
+------------+--------------+------------------+
| 2020-10-10 | Computer2    | 41787            |
+------------+--------------+------------------+

I am not too sure of the way I should approach this. Some input would be greatly appreciated.

Advertisement

Answer

This is an interval overlap problem. You can solve this by splitting each time period into separate days and then looking at the overlap for each day:

with t as (
      select 'Computer1' as computername, datetime '2020-06-10T21:01:28' as startdatetime, datetime '2020-06-10T21:20:19' as enddatetime union all
      select 'Computer1' as computername, datetime '2020-06-10T22:54:01' as startdatetime, datetime '2020-06-11T05:21:48' as enddatetime union all
      select 'Computer2' as computername, datetime '2020-06-08T09:11:54' as startdatetime, datetime '2020-06-10T11:36:27' as enddatetime 
     )
select dte, t.computername,
       sum(case when enddatetime >= dte and
                     startdatetime < date_add(dte, interval 1 day)
                then datetime_diff(least(date_add(dte, interval 1 day), enddatetime), 
                                   greatest(dte, startdatetime),
                                   second)
           end) as runtime_seconds
from (select t.*, 
             generate_date_array(date(t.startdatetime), date(t.enddatetime), interval 1 day) gda
      from t 
     ) t cross join
     unnest(gda) dte
group by dte, t.computername;
User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement