Skip to content
Advertisement

What SQL query can be used to limit continious periods by parameter value, and then to calculate datediff inside them?

I have a table of phone calls consisting of user_id, call_date, city, where city can be either A or B. It looks like this:

user_id call_date city
1 2021-01-01 A
1 2021-01-02 B
1 2021-01-03 B
1 2021-01-05 B
1 2021-01-10 A
1 2021-01-12 B
1 2021-01-16 A
2 2021-01-17 A
2 2021-01-20 B
2 2021-01-22 B
2 2021-01-23 A
2 2021-01-24 B
2 2021-01-26 B
2 2021-01-30 A

For this table, we need to select for each user all the periods when he was in city B. These periods are counted in days and start when the first call is made from city B, and end as soon as the next call is made from city A. So for user_id = 1 fist period starts on 2021-01-02 and ands on 2021-01-10. There can be several such periods for each user.

The result should be the following table:

user_id period_1 period_2
1 8 4
2 3 6

Can you please tell me how I can limit the periods according to the condition of the problem, and then calculate the datediff within each period? Thank you

Answer

This is a typical gaps and islands problem. You need to group consecutive rows first, then find the first call_date of the next group. Sample code for Postgres is below, the same may be adapted to another DBMS by applying appropriate function to calculate the difference in days.

with a (user_id, call_date, city)
as (
  select *
  from ( values
    ('1', date '2021-01-01', 'A'),
    ('1', date '2021-01-02', 'B'),
    ('1', date '2021-01-03', 'B'),
    ('1', date '2021-01-05', 'B'),
    ('1', date '2021-01-10', 'A'),
    ('1', date '2021-01-12', 'B'),
    ('1', date '2021-01-16', 'A'),
    ('2', date '2021-01-17', 'A'),
    ('2', date '2021-01-20', 'B'),
    ('2', date '2021-01-22', 'B'),
    ('2', date '2021-01-23', 'A'),
    ('2', date '2021-01-24', 'B'),
    ('2', date '2021-01-26', 'B'),
    ('2', date '2021-01-30', 'A')
  ) as t
)
, grp as (
  /*Identify groups*/
  select a.*,
    /*This is a grouping of consecutive rows:
      they will have the same difference between
      two row_numbers while the more detailed
      row_number changes, which means the attribute had changed.

    */
    dense_rank() over(
      partition by user_id
      order by call_date asc
    ) - 
    dense_rank() over(
      partition by user_id, city
      order by call_date asc
    ) as grp,
    /*Get next call date*/
    lead(call_date, 1, call_date)
      over(
        partition by user_id
        order by call_date asc
      ) as next_dt
  from a
)
select
  user_id,
  city,
  min(call_date) as dt_from,
  max(next_dt) as dt_to,
  max(next_dt) - min(call_date) as diff
from grp
where city = 'B'
group by user_id, grp, city
order by 1, 3
user_id | city | dt_from    | dt_to      | diff
:------ | :--- | :--------- | :--------- | ---:
1       | B    | 2021-01-02 | 2021-01-10 |    8
1       | B    | 2021-01-12 | 2021-01-16 |    4
2       | B    | 2021-01-20 | 2021-01-23 |    3
2       | B    | 2021-01-24 | 2021-01-30 |    6

db<>fiddle here