Date distinct count over week

Tags: , , , ,



Im trying to get a distinct count of user ids logs per day with every week as a partition for the distinct identification. e.g. if one user logs on Friday/Saturday of week 1, and on Monday/Friday of week 2, i want to have a column that counts 1 in Friday and 0 in Saturday week 1, and 1 in Monday and 0 in Friday week 2.

If i have the following data:

https://i.stack.imgur.com/COf69.png

I expect this result:

https://i.stack.imgur.com/kTRdu.png

Im working with Presto, and have tried with dense_rank() solutions but with no avail.

Answer

You can use window functions and aggregation:

select date, iso_week, sum(case when rn = 1 then 1 else 0 end) cnt
from (
    select t.*, 
        row_number() over(partition by userid, iso_week order by date) min_date
    from mytable t
) t
group by date, iso_week

Better yet, using the standard where clause to aggregate functions, which Presto supports:

select date, iso_week, count(*) filter(where rn = 1) cnt
from (
    select t.*, 
        row_number() over(partition by userid, iso_week order by date) min_date
    from mytable t
) t
group by date, iso_week


Source: stackoverflow