I have a table where I account records are loaded daily. If an account comes in to the table and it already exists I keep both records in the database but need to update the record with older record_date setting its end_date to today. Account_ID is the the unique account identifier. Secondly, if the record_date’s are the same, I keep the newer entry which can be identified by the table_ID Identity field.
eg
TABLE_ID,account_ID,end_date,record_date 2 28576800 NULL 2020-10-20 23 28576800 NULL 2020-10-20 20 32477400 NULL 2020-11-09 22 32477400 NULL 2020-11-13
This is how the table looks when the data is loaded.
- I need to update end_date in row table_id=20 because row table_id=22 has a newer record_date.
- I need to update end_date in row table_id=2 because as the record_dates are the same we end the smaller table_ID
I’ve tried;
UPDATE dbo.accounts SET END_DATE = GETUTCDATE() where END_DATE is null and Record_date not in (SELECT MAX(Record_date ) as mxrptDate FROM dbo.accounts GROUP BY account_ID)
This doesn’t work as expected as it doesn’t look specifically for the max(record_date) of the particular account_iD I want to update.
For when the recrod_date’s are the same, I use the same kind of query, I think it works because the table_id’s are unique. But feel there must be a better way.
UPDATE dbo.accounts SET END_DATE = GETUTCDATE() where END_DATE is null and table_id not in (SELECT MAX(table_id ) as mxtblid FROM dbo.accounts GROUP BY account_ID)
Advertisement
Answer
Your second query is almost there, you just need to correlate the subquery:
UPDATE ac SET END_DATE = GETUTCDATE() FROM dbo.accounts ac where ac.END_DATE is null and ac.table_id <> (SELECT MAX(table_id) as mxtblid FROM dbo.accounts ac2 WHERE ac2.account_ID = ac.account_ID)
But we can also use window functions to solve this without resorting to a self-join. This may be more or less performant, do test.
We can select the correct rows with a subquery, and directly update those rows:
UPDATE ac SET END_DATE = GETUTCDATE() FROM ( SELECT *, rn = ROW_NUMBER() OVER (PARTITION BY account_ID ORDER BY TABLE_ID DESC) FROM dbo.accounts ac where ac.END_DATE is null ) ac WHERE ac.rn > 1;
EDIT
It seems you also want ordering by date
The final solution which will either find the newest record_date or if there is more than one record with that date, take the highest table_id:
UPDATE ac SET END_DATE = GETUTCDATE() FROM ( SELECT *, rn = ROW_NUMBER() OVER (PARTITION BY account_ID ORDER BY record_date desc, TABLE_ID DESC) FROM dbo.accounts ac where ac.END_DATE is null ) ac WHERE ac.rn > 1;