Skip to content
Advertisement

SQL Server Update Row where date is not the max date when records are grouped by some ID

I have a table where I account records are loaded daily. If an account comes in to the table and it already exists I keep both records in the database but need to update the record with older record_date setting its end_date to today. Account_ID is the the unique account identifier. Secondly, if the record_date’s are the same, I keep the newer entry which can be identified by the table_ID Identity field.

eg

TABLE_ID,account_ID,end_date,record_date
2   28576800    NULL    2020-10-20
23  28576800    NULL    2020-10-20
20  32477400    NULL    2020-11-09
22  32477400    NULL    2020-11-13

screenshot of table before any updates

This is how the table looks when the data is loaded.

  1. I need to update end_date in row table_id=20 because row table_id=22 has a newer record_date.
  2. I need to update end_date in row table_id=2 because as the record_dates are the same we end the smaller table_ID

I’ve tried;

UPDATE dbo.accounts
SET END_DATE = GETUTCDATE()
where END_DATE is null
and  Record_date not in
(SELECT MAX(Record_date ) as mxrptDate
         FROM dbo.accounts
         GROUP BY account_ID)  

This doesn’t work as expected as it doesn’t look specifically for the max(record_date) of the particular account_iD I want to update.

For when the recrod_date’s are the same, I use the same kind of query, I think it works because the table_id’s are unique. But feel there must be a better way.

    UPDATE dbo.accounts
    SET END_DATE = GETUTCDATE()
    where END_DATE is null
    and  table_id not in
    (SELECT MAX(table_id ) as mxtblid
  FROM dbo.accounts
GROUP BY account_ID)  

Advertisement

Answer

Your second query is almost there, you just need to correlate the subquery:

UPDATE ac
SET END_DATE = GETUTCDATE()
FROM dbo.accounts ac
where ac.END_DATE is null
  and ac.table_id <>
    (SELECT MAX(table_id) as mxtblid
      FROM dbo.accounts ac2
    WHERE ac2.account_ID = ac.account_ID)  


But we can also use window functions to solve this without resorting to a self-join. This may be more or less performant, do test.

We can select the correct rows with a subquery, and directly update those rows:

UPDATE ac
SET END_DATE = GETUTCDATE()
FROM (
    SELECT *,
        rn = ROW_NUMBER() OVER (PARTITION BY account_ID ORDER BY TABLE_ID DESC)
    FROM dbo.accounts ac
    where ac.END_DATE is null
) ac
WHERE ac.rn > 1;


EDIT

It seems you also want ordering by date

The final solution which will either find the newest record_date or if there is more than one record with that date, take the highest table_id:

UPDATE ac
SET END_DATE = GETUTCDATE()
FROM (
    SELECT *,
        rn = ROW_NUMBER() OVER (PARTITION BY account_ID ORDER BY record_date desc, TABLE_ID DESC)
    FROM dbo.accounts ac
    where ac.END_DATE is null
) ac
WHERE ac.rn > 1;

User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement