I have the following data which I want to group into seconds per day in BigQuery.
Source Table:
+--------------+---------------------+---------------------+ | ComputerName | StartDatetime | EndDatetime | +--------------+---------------------+---------------------+ | Computer1 | 2020-06-10T21:01:28 | 2020-06-10T21:20:19 | +--------------+---------------------+---------------------+ | Computer1 | 2020-06-10T22:54:01 | 2020-06-11T05:21:48 | +--------------+---------------------+---------------------+ | Computer2 | 2020-06-08T09:11:54 | 2020-06-10T11:36:27 | +--------------+---------------------+---------------------+
I want to be able to visualise the data in the following way
+------------+--------------+------------------+ | Date | ComputerName | Runtime(Seconds) | +------------+--------------+------------------+ | 2020-10-10 | Computer1 | 5089 | +------------+--------------+------------------+ | 2020-10-11 | Computer1 | 19308 | +------------+--------------+------------------+ | 2020-10-08 | Computer2 | 53285 | +------------+--------------+------------------+ | 2020-10-09 | Computer2 | 86400 | +------------+--------------+------------------+ | 2020-10-10 | Computer2 | 41787 | +------------+--------------+------------------+
I am not too sure of the way I should approach this. Some input would be greatly appreciated.
Advertisement
Answer
This is an interval overlap problem. You can solve this by splitting each time period into separate days and then looking at the overlap for each day:
x
with t as (
select 'Computer1' as computername, datetime '2020-06-10T21:01:28' as startdatetime, datetime '2020-06-10T21:20:19' as enddatetime union all
select 'Computer1' as computername, datetime '2020-06-10T22:54:01' as startdatetime, datetime '2020-06-11T05:21:48' as enddatetime union all
select 'Computer2' as computername, datetime '2020-06-08T09:11:54' as startdatetime, datetime '2020-06-10T11:36:27' as enddatetime
)
select dte, t.computername,
sum(case when enddatetime >= dte and
startdatetime < date_add(dte, interval 1 day)
then datetime_diff(least(date_add(dte, interval 1 day), enddatetime),
greatest(dte, startdatetime),
second)
end) as runtime_seconds
from (select t.*,
generate_date_array(date(t.startdatetime), date(t.enddatetime), interval 1 day) gda
from t
) t cross join
unnest(gda) dte
group by dte, t.computername;