I am looking for better solutions to a fairly generic problem with temporal tables.
say we have
table_a (some_value int, date_from date, date_to date)
and a series of similar tables table_b
, table_c
, …
the actual system is an HR system that tracks persons, contracts, assignments, salaries, … all they are all dated in the way described above.
I need to join such tables (e.g. imagine getting all rows with the same some_value
) and return the period for which such join is valid (that is all the rows overlap over such period).
with two tables it is easy (ignore NULL
values for the time being)
select a.some_value, greatest(a.date_from, b.date_from) date_from, least(a.date_to, b.date_to) date_to from table_a a join table_b b on a.some_value = b.some_value where a.date_from < b.date_to and b.date_from < a.date_to
this become quadratically harder with more tables, because with three table (A, B, C) you need to check the overlap between A and B, B and C, C and A. with N tables this grows as N^2.
so I have written a pl/sql pipelined function (call it dated_join
) that given two intervals it returns one row with the overlapping period or nothing if it doesn’t overlap
so I can have for three tables
select a.some_value, period_b.date_from, period_b.date_to from table_a a join table_b b on a.some_value = b.some_value join table(dated_join(a.date_from, a.date_to, b.date_from, b.date_to)) period_a join table_c c on a.some_value = c.some_value join table(dated_join(period_a.date_from, period_a.date_to, c.date_from, c.date_to)) period_b
this scales linearly to N values, because each period is joined only with the previous one and it carries forward the overlapping period.
question: is it possible to make this strategy work with OUTER JOIN
s? I cannot find any half-decent solution.
is there anything in the SQL:2011 temporal extensions that would help with this?
thanks for your help
Advertisement
Answer
If you want to join multiple table so that they all overlap the same period then you can use GREATEST
and LEAST
:
SELECT t1.date_from AS t1_from, t2.date_from AS t2_from, t3.date_from AS t3_from, t4.date_from AS t4_from, t1.date_to AS t1_to, t2.date_to AS t2_to, t3.date_to AS t3_to, t4.date_to AS t4_to FROM table1 t1 INNER JOIN table2 t2 ON ( t1.date_to > t2.date_from AND t1.date_from < t2.date_to ) INNER JOIN table3 t3 ON ( LEAST( t1.date_to, t2.date_to ) > t3.date_from AND GREATEST( t1.date_from, t2.date_from ) < t3.date_to ) INNER JOIN table4 t4 ON ( LEAST( t1.date_to, t2.date_to, t3.date_to ) > t4.date_from AND GREATEST( t1.date_from, t2.date_from, t3.date_from ) < t4.date_to );
Which, for the sample data:
CREATE TABLE table1 ( date_from, date_to ) AS SELECT DATE '2020-01-01', DATE '2020-01-10' FROM DUAL UNION ALL SELECT DATE '2020-02-10', DATE '2020-02-15' FROM DUAL UNION ALL SELECT DATE '2020-03-15', DATE '2020-03-18' FROM DUAL; CREATE TABLE table2 ( date_from, date_to ) AS SELECT DATE '2020-01-05', DATE '2020-01-15' FROM DUAL UNION ALL SELECT DATE '2020-02-09', DATE '2020-02-16' FROM DUAL UNION ALL SELECT DATE '2020-03-16', DATE '2020-03-18' FROM DUAL; CREATE TABLE table3 ( date_from, date_to ) AS SELECT DATE '2020-01-01', DATE '2020-01-02' FROM DUAL UNION ALL SELECT DATE '2020-01-09', DATE '2020-01-16' FROM DUAL UNION ALL SELECT DATE '2020-02-08', DATE '2020-02-17' FROM DUAL UNION ALL SELECT DATE '2020-03-15', DATE '2020-03-17' FROM DUAL; CREATE TABLE table4 ( date_from, date_to ) AS SELECT DATE '2020-01-02', DATE '2020-01-12' FROM DUAL UNION ALL SELECT DATE '2020-02-08', DATE '2020-02-17' FROM DUAL UNION ALL SELECT DATE '2020-03-16', DATE '2020-03-19' FROM DUAL;
Outputs:
T1_FROM | T2_FROM | T3_FROM | T4_FROM | T1_TO | T2_TO | T3_TO | T4_TO :-------- | :-------- | :-------- | :-------- | :-------- | :-------- | :-------- | :-------- 15-MAR-20 | 16-MAR-20 | 15-MAR-20 | 16-MAR-20 | 18-MAR-20 | 18-MAR-20 | 17-MAR-20 | 19-MAR-20 10-FEB-20 | 09-FEB-20 | 08-FEB-20 | 08-FEB-20 | 15-FEB-20 | 16-FEB-20 | 17-FEB-20 | 17-FEB-20 01-JAN-20 | 05-JAN-20 | 09-JAN-20 | 02-JAN-20 | 10-JAN-20 | 15-JAN-20 | 16-JAN-20 | 12-JAN-20
and, if you want it so that there is an overlap with any part of the ranges, then swap the GREATEST
and LEAST
:
SELECT t1.date_from AS t1_from, t2.date_from AS t2_from, t3.date_from AS t3_from, t4.date_from AS t4_from, t1.date_to AS t1_to, t2.date_to AS t2_to, t3.date_to AS t3_to, t4.date_to AS t4_to FROM table1 t1 INNER JOIN table2 t2 ON ( t1.date_to > t2.date_from AND t1.date_from < t2.date_to ) INNER JOIN table3 t3 ON ( GREATEST( t1.date_to, t2.date_to ) > t3.date_from AND LEAST( t1.date_from, t2.date_from ) < t3.date_to ) INNER JOIN table4 t4 ON ( GREATEST( t1.date_to, t2.date_to, t3.date_to ) > t4.date_from AND LEAST( t1.date_from, t2.date_from, t3.date_from ) < t4.date_to );
Which outputs:
T1_FROM | T2_FROM | T3_FROM | T4_FROM | T1_TO | T2_TO | T3_TO | T4_TO :-------- | :-------- | :-------- | :-------- | :-------- | :-------- | :-------- | :-------- 15-MAR-20 | 16-MAR-20 | 15-MAR-20 | 16-MAR-20 | 18-MAR-20 | 18-MAR-20 | 17-MAR-20 | 19-MAR-20 10-FEB-20 | 09-FEB-20 | 08-FEB-20 | 08-FEB-20 | 15-FEB-20 | 16-FEB-20 | 17-FEB-20 | 17-FEB-20 01-JAN-20 | 05-JAN-20 | 01-JAN-20 | 02-JAN-20 | 10-JAN-20 | 15-JAN-20 | 02-JAN-20 | 12-JAN-20 01-JAN-20 | 05-JAN-20 | 09-JAN-20 | 02-JAN-20 | 10-JAN-20 | 15-JAN-20 | 16-JAN-20 | 12-JAN-20
db<>fiddle here