I have timeseries data like this:
time | id | value |
---|---|---|
2018-04-25 22:00:00 UTC | A | 1 |
2018-04-25 23:00:00 UTC | A | 2 |
2018-04-25 23:00:00 UTC | A | 2.1 |
2018-04-25 23:00:00 UTC | B | 1 |
2018-04-26 23:00:00 UTC | B | 1.3 |
How do i write a query to produce an output table with these columns:
- date: the truncated time
- records: the number of records during this date
- records_conflicting_time_id: the number of records during this date where the combination of
time
,id
are not unique. In the example data above the two records with id==A at 2018-04-25 23:00:00 UTC would be counted for date 2018-04-25
So the output of our query should be:
date | records | records_conflicting_time_id |
---|---|---|
2018-04-25 | 4 | 2 |
2018-04-26 | 1 | 0 |
Getting records
is easy, i just truncate the time
to get date and then group by date
. But i’m really struggling to produce a column that counts the number of records where id
+ time
is not unique over that date…
Advertisement
Answer
Consider below approach
select date(time) date, sum(cnt) records, sum(if(cnt > 1, cnt, 0)) conflicting_records from ( select time, id, count(*) cnt from your_table group by time, id ) group by date
if applied to sample data in your question – output is