I have timeseries data like this:
| time | id | value |
|---|---|---|
| 2018-04-25 22:00:00 UTC | A | 1 |
| 2018-04-25 23:00:00 UTC | A | 2 |
| 2018-04-25 23:00:00 UTC | A | 2.1 |
| 2018-04-25 23:00:00 UTC | B | 1 |
| 2018-04-26 23:00:00 UTC | B | 1.3 |
How do i write a query to produce an output table with these columns:
- date: the truncated time
- records: the number of records during this date
- records_conflicting_time_id: the number of records during this date where the combination of
time,idare not unique. In the example data above the two records with id==A at 2018-04-25 23:00:00 UTC would be counted for date 2018-04-25
So the output of our query should be:
| date | records | records_conflicting_time_id |
|---|---|---|
| 2018-04-25 | 4 | 2 |
| 2018-04-26 | 1 | 0 |
Getting records is easy, i just truncate the time to get date and then group by date. But i’m really struggling to produce a column that counts the number of records where id + time is not unique over that date…
Advertisement
Answer
Consider below approach
select date(time) date, sum(cnt) records, sum(if(cnt > 1, cnt, 0)) conflicting_records from ( select time, id, count(*) cnt from your_table group by time, id ) group by date
if applied to sample data in your question – output is
