I want to join two tables by ‘EmployeeId’ and ‘CalendarMonthId’, but problem is that I want to connect record with the bigest CalednarMonthId if not exist equal. I use Azure Synapse SQL pool. Example data and code
create table emp_contr ( EmployeeId int, CalendarMonthId int, IsDeleted int --login varchar(20) ) create table contr ( ContractId int, EmployeeId int, CalendarMonthId int, value int ) insert into emp_contr values (1, 202201, 0) insert into emp_contr values (1, 202202, 0) insert into emp_contr values (1, 202205, 0) insert into emp_contr values (1, 202206, 0) insert into emp_contr values (2, 202202, 0) insert into emp_contr values (2, 202203, 0) insert into contr values (1, 1, 202201, 5) insert into contr values (2, 1, 202202, 2) insert into contr values (40, 2, 202202, 2) insert into contr values (50, 2, 202203, 0)
Base on this data I have problem with connect table: emp_contr, row: EmployeeId:1, CalendarMonthId:202205 with row from table contr from the same user and maximum CalendarMonthId:202202. I have query like this, but it doesn’t work
select * from emp_contr ec join contr c on ec.EmployeeId = c.EmployeeId and (ec.CalendarMonthId = c.CalendarMonthId or ec.CalendarMonthId > max(c.CalendarMonthId)) order by ec.EmployeeId
I expect result like this:
EmployeeId | Emp_Contr_CalendarMonthId | Contr_CalendarMonthId | Value | ContractId |
---|---|---|---|---|
1 | 202206 | 202202 | 2 | 2 |
1 | 202205 | 202202 | 2 | 2 |
1 | 202202 | 202202 | 2 | 2 |
1 | 202201 | 202201 | 5 | 1 |
2 | 202203 | 202203 | 0 | 50 |
2 | 202202 | 202202 | 2 | 40 |
I don’t know what I should do. Maybe should I add row in temporary table based on contr table with the same values like bigest one but different CalendarMonthId
Advertisement
Answer
TL;DR:
select ec.EmployeeId, ec.CalendarMonthId Emp_Contr_CalendarMonthId, coalesce(c.CalendarMonthId,c2.CalendarMonthId) Contr_CalendarMonthId, coalesce(c.value,c2.value) value, coalesce(c.ContractId,c2.ContractId) ContractId, -- additional columns for debugging c.CalendarMonthId, c.value, c.ContractId, c2.CalendarMonthId, c2.value, c2.ContractId from emp_contr ec left join contr c on ec.EmployeeId = c.EmployeeId and ec.CalendarMonthId = c.CalendarMonthId join ( select * from ( select EmployeeId, CalendarMonthId, value, ContractId, row_number() over (partition by EmployeeId order by CalendarMonthId desc) rn from contr ) x where rn = 1 ) c2 on ec.EmployeeId = c2.EmployeeId order by ec.EmployeeId, ec.CalendarMonthId desc
The coalesce
assumes all relevant columns are NOT NULL
. If this is not the case you’ll have to replace them with CASE-expressions in which you check if c.CalendarMonthId is null
.
Explanation
The statement joins contr
twice.
First using an outer join to obtain the matching results when present, without loosing rows without a match.
The second join is an inner join.
This assumes there is at least one contr
row per emp_contr
The join does join an inline view which
uses the window function row_number()
to filter only the last row per employee when ordered by CalendarMonthId
The last step is to combine the results, and pick the one from the first join if present and the one from the second join if not.