What SQL query can be used to limit continious periods by parameter value, and then to calculate datediff inside them?

Question

I have a table of phone calls consisting of user_id, call_date, city, where city can be either A or B. It looks like this: user_id call_date city 1 2021-01-01 A 1 2021-01-02 B 1 2021-01-03 B 1 2021-01-05 B 1 2021-01-10 A 1 2021-01-12 B 1 2021-01-16 A 2 2021-01-17 A 2 2021-01-20 B 2 2021-01-22 B 2 2021-01-23 A

Accepted Answer

This is a typical gaps and islands problem. You need to group consecutive rows first, then find the first call_date of the next group. Sample code for Postgres is below, the same may be adapted to another DBMS by applying appropriate function to calculate the difference in days.with a (user_id, call_date, city)as (  select *  from ( values    ('1', date '2021-01-01', 'A'),    ('1', date '2021-01-02', 'B'),    ('1', date '2021-01-03', 'B'),    ('1', date '2021-01-05', 'B'),    ('1', date '2021-01-10', 'A'),    ('1', date '2021-01-12', 'B'),    ('1', date '2021-01-16', 'A'),    ('2', date '2021-01-17', 'A'),    ('2', date '2021-01-20', 'B'),    ('2', date '2021-01-22', 'B'),    ('2', date '2021-01-23', 'A'),    ('2', date '2021-01-24', 'B'),    ('2', date '2021-01-26', 'B'),    ('2', date '2021-01-30', 'A')  ) as t), grp as (  /*Identify groups*/  select a.*,    /*This is a grouping of consecutive rows:      they will have the same difference between      two row_numbers while the more detailed      row_number changes, which means the attribute had changed.    */    dense_rank() over(      partition by user_id      order by call_date asc    ) -     dense_rank() over(      partition by user_id, city      order by call_date asc    ) as grp,    /*Get next call date*/    lead(call_date, 1, call_date)      over(        partition by user_id        order by call_date asc      ) as next_dt  from a)select  user_id,  city,  min(call_date) as dt_from,  max(next_dt) as dt_to,  max(next_dt) - min(call_date) as difffrom grpwhere city = 'B'group by user_id, grp, cityorder by 1, 3user_id | city | dt_from    | dt_to      | diff:------ | :--- | :--------- | :--------- | ---:1       | B    | 2021-01-02 | 2021-01-10 |    81       | B    | 2021-01-12 | 2021-01-16 |    42       | B    | 2021-01-20 | 2021-01-23 |    32       | B    | 2021-01-24 | 2021-01-30 |    6db<>fiddle here

user_id	call_date	city
1	2021-01-01	A
1	2021-01-02	B
1	2021-01-03	B
1	2021-01-05	B
1	2021-01-10	A
1	2021-01-12	B
1	2021-01-16	A
2	2021-01-17	A
2	2021-01-20	B
2	2021-01-22	B
2	2021-01-23	A
2	2021-01-24	B
2	2021-01-26	B
2	2021-01-30	A

user_id	period_1	period_2
1	8	4
2	3	6

Advertisement

Answer