I currently have the following UsersData table which gives aggregated historical data at a given particular date:
Date | UserID | Name | isActive |
---|---|---|---|
2021-10-01 | 1 | Sam | 1 |
2021-10-01 | 2 | Dan | 1 |
2021-10-08 | 1 | Sam | 0 |
2021-10-08 | 2 | Dan | 1 |
Requirement
My requirement is to create another aggregated data which will show active vs inactive record for a the above given dates – in a single row. So something like below:
Date | Active | Inactive | Total |
---|---|---|---|
2021-10-01 | 2 | 0 | 2 |
2021-10-08 | 1 | 1 | 2 |
My SQL queries so far
Now when I try the following individual queries it works fine:
select date, count(distinct userid) AS ActiveCount from User where isActive= 1 group by date select date, count(distinct userid) AS InactiveCount from User where isActive= 0 group by date
But since I need to display the statistics in a single row for each date, I tried the following query but there’s something which I seem to be doing wrong here:
select date, (select count(distinct userid) from User where isActive= 1 group by date) AS Active, (select count(distinct userid) from User where isActive= 0 group by date) AS Inactive, count(distinct userid) AS total from userdata group by date order by date
With this I get the output for inactive and active records as the sum of both results – Active = 3 (2 from first date + 1 from second date) and ‘Inactive’ = 2 (0 from first date +1 from second date) Whereas ‘TotalCount’ value is accurate.
Here’s the output which I get with the above query:
Date | Active | Inactive | Total |
---|---|---|---|
2021-10-01 | 3 | 1 | 2 |
2021-10-08 | 3 | 1 | 2 |
What am I doing wrong here? What would be the correct query? I’m executing these currently in Databricks Delta Lake SQL.
Advertisement
Answer
select date , count(distinct userid) filter (where isActive= 1) AS ActiveCount , count(distinct userid) filter (where isActive= 0) AS InactiveCount , count(distinct userid) Total from User group by date