Skip to content
Advertisement

Overlap the intervals using mysql

    +------+------------+------------+
    | id   | start_date | end_date   |
    +------+------------+------------+
    |    1 | 2019-01-01 | 2019-01-12 |
    |    1 | 2019-01-10 | 2019-01-27 |
    |    1 | 2019-01-13 | 2019-01-15 |
    |    1 | 2019-01-18 | 2019-01-25 |
    |    1 | 2019-02-10 | 2019-02-15 |
    |    2 | 2019-01-10 | 2019-01-15 |
    +------+------------+------------+

how to merge the overlapping intervals and get the below result in mysql(8.X)?

    +------+------------+------------+
    | id   | start_date | end_date   |
    +------+------------+------------+
    |    1 | 2019-01-01 | 2019-01-27 |
    |    1 | 2019-02-10 | 2019-02-15 |
    |    2 | 2019-01-10 | 2019-01-15 |
    +------+------------+------------+

Below are the commands that can be used in mysql to create the table =>

    insert into interval_dates(id, start_date, end_date) values(1, '2019-01-01', '2019-01-12');
    insert into interval_dates(id, start_date, end_date) values(1, '2019-01-10', '2019-01-27');
    insert into interval_dates(id, start_date, end_date) values(1, '2019-01-13', '2019-01-15');
    insert into interval_dates(id, start_date, end_date) values(1, '2019-01-18', '2019-01-25');
    insert into interval_dates(id, start_date, end_date) values(1, '2019-02-10', '2019-02-15');
    insert into interval_dates(id, start_date, end_date) values(2, '2019-01-10', '2019-01-15');

can you please share any elegant solution without inserting into intermediate table in mysql(8.X)?

Advertisement

Answer

Please check:

SELECT id, start_date, MAX(end_date) end_date
FROM ( SELECT id,
              @p_start := CASE WHEN (start_date > @p_end) OR (@p_id < id)
                               THEN start_date
                               ELSE @p_start
                               END start_date,
              @p_end := CASE WHEN (end_date > @p_end) OR (@p_id < id)
                             THEN end_date
                             ELSE @p_end
                             END end_date,
              @p_id := id
       FROM mytable, ( SELECT @p_id := MIN(id)-1, 
                              @p_start := MIN(start_date) - INTERVAL 1 DAY, 
                              @p_end := MIN(start_date) - INTERVAL 1 DAY
                       FROM mytable ) variables
       ORDER BY id, start_date, end_date ) subquery
GROUP BY id, start_date;

fiddle (applicable even in 5.6).

I have not yet been able to find the source data that produces the wrong result.

User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement