Skip to content
Advertisement

Daily active counts of THING from rows detailing change of THING

Say I have a table of people who have used my service on a day N and a table describing what theme the users have changed to. There’s no table available telling me what theme they are currently using. What I would like to do is to see that on a daily basis.

So let’s say the changes table looks like this.

| user_ID |   date   | theme |
|---------|----------|-------|
|  user1  | 1.1.2021 | Dark  |
|  user1  | 4.1.2021 | Light |
|  user2  | 2.1.2021 | Dark  |
|  user2  | 6.1.2021 | Light |

The activity table has just the user_ID and a date they accessed the service reported.

| user_ID |   date   |
|---------|----------|
|  user1  | 1.1.2021 |
|  user1  | 2.1.2021 |
|  user1  | 3.1.2021 |
|  user1  | 4.1.2021 |
|  user1  | 5.1.2021 |
|  user1  | 6.1.2021 |
|  user2  | 2.1.2021 |
|  user2  | 3.1.2021 |
|  user2  | 4.1.2021 |
|  user2  | 5.1.2021 |
|  user2  | 6.1.2021 |

Now what I would like to do is join the first table to the second one so the theme they are using on an active date would be listed there.

| user_ID |   date   | theme |
|---------|----------|-------|
|  user1  | 1.1.2021 | Dark  |
|  user1  | 2.1.2021 | Dark  |
|  user1  | 3.1.2021 | Dark  |
|  user1  | 4.1.2021 | Light |
|  user1  | 5.1.2021 | Light |
|  user1  | 6.1.2021 | Light |
|  user2  | 2.1.2021 | Dark  |
|  user2  | 3.1.2021 | Dark  |
|  user2  | 4.1.2021 | Dark  |
|  user2  | 5.1.2021 | Dark  |
|  user2  | 6.1.2021 | Light |

How do I achieve this? Assume there can be an unlimited amount of themes.

Advertisement

Answer

One method is a correlated subquery, but I’m not sure if Presto supports this:

select a.*,
       (select c.theme
        from changes c
        where c.user_id = a.user_id and
              c.date <= a.date
        order by c.date desc
        limit 1
       ) as theme
from activity a;

A perhaps more efficient method is to use left join but to calculate the “end” date for each change:

select a.*, c.theme
from activity a left join
     (select c.*,
             lead(date) over (partition by user_id order by date) as next_date
      from changes c
     ) c
     on a.user_id = c.user_id and
        a.date >= c.date and
        (a.date < c.next_date or c.next_date is null);
User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement