Skip to content
Advertisement

Query that counts total records per day and total records with same time timestamp and id per day in Bigquery

I have timeseries data like this:

time id value
2018-04-25 22:00:00 UTC A 1
2018-04-25 23:00:00 UTC A 2
2018-04-25 23:00:00 UTC A 2.1
2018-04-25 23:00:00 UTC B 1
2018-04-26 23:00:00 UTC B 1.3

How do i write a query to produce an output table with these columns:

  • date: the truncated time
  • records: the number of records during this date
  • records_conflicting_time_id: the number of records during this date where the combination of time, id are not unique. In the example data above the two records with id==A at 2018-04-25 23:00:00 UTC would be counted for date 2018-04-25

So the output of our query should be:

date records records_conflicting_time_id
2018-04-25 4 2
2018-04-26 1 0

Getting records is easy, i just truncate the time to get date and then group by date. But i’m really struggling to produce a column that counts the number of records where id + time is not unique over that date…

Advertisement

Answer

Consider below approach

select date(time) date, 
  sum(cnt) records, 
  sum(if(cnt > 1, cnt, 0)) conflicting_records
from (
  select time, id, count(*) cnt
  from your_table
  group by time, id
)
group by date              

if applied to sample data in your question – output is

enter image description here

User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement