Skip to content
Advertisement

Return active vs inactive records count against a given date in a single column

I currently have the following UsersData table which gives aggregated historical data at a given particular date:

Date UserID Name isActive
2021-10-01 1 Sam 1
2021-10-01 2 Dan 1
2021-10-08 1 Sam 0
2021-10-08 2 Dan 1

Requirement

My requirement is to create another aggregated data which will show active vs inactive record for a the above given dates – in a single row. So something like below:

Date Active Inactive Total
2021-10-01 2 0 2
2021-10-08 1 1 2

My SQL queries so far

Now when I try the following individual queries it works fine:

select date, count(distinct userid) AS ActiveCount from User where isActive= 1 group by date
select date, count(distinct userid) AS InactiveCount from User where isActive= 0 group by date

But since I need to display the statistics in a single row for each date, I tried the following query but there’s something which I seem to be doing wrong here:

select
date,
(select count(distinct userid) from User where isActive= 1 group by date) AS Active,
(select count(distinct userid) from User where isActive= 0 group by date) AS Inactive,
count(distinct userid) AS total 
from userdata
group by date
order by date

With this I get the output for inactive and active records as the sum of both results – Active = 3 (2 from first date + 1 from second date) and ‘Inactive’ = 2 (0 from first date +1 from second date) Whereas ‘TotalCount’ value is accurate.

Here’s the output which I get with the above query:

Date Active Inactive Total
2021-10-01 3 1 2
2021-10-08 3 1 2

What am I doing wrong here? What would be the correct query? I’m executing these currently in Databricks Delta Lake SQL.

Answer

select date
    , count(distinct userid) filter (where isActive= 1) AS ActiveCount 
    , count(distinct userid) filter (where isActive= 0) AS InactiveCount 
    , count(distinct userid) Total
from User 
group by date