I’ve read just about every question on here that I can find that is referencing getting the latest record from a subquery, but I just can’t work out how to make it work in my situation.
I’m creating an SSRS report for use on SQL Server 2008.
In the database is a table of contacts and DBSdata. I want to pull up a list of contacts and the latest record (many of the fields from that row) from the DBSdata table (expiry date furthest in the future)
Contacts ======== PKContactID ContactName ----------- ----------- 1 JONES Chris 2 SMITH Mary 3 GREY Jean DBSdata ======= Ordinal FKContactID ExpiryDate IssueDate DBSType ------- ----------- ---------- --------- ------- 3 1 2021-09-01 2019-09-01 Internal 2 1 2019-08-31 2017-08-31 External 1 1 2017-07-01 2015-07-01 Internal 2 2 2021-04-15 2019-04-15 Internal 1 2 2019-05-05 2017-05-06 External 1 3 2018-01-03 2016-03-02 External
And the result I’d like is:
Latest DBS ========== PKContactID ContactName ExpiryDate IssueDate DBSType ------------------------------------------------------------------- 3 GREY Jean 2018-01-03 2016-03-02 External 1 JONES Chris 2021-09-01 2019-09-01 Internal 2 SMITH Mary 2021-04-15 2019-04-15 Internal
[The DBSData table doesn’t have it’s own Primary Key field – that’s not something I have control over, unfortunately… And the ordinal increases per contact, so FKContactID+Ordinal is unique….]
This is the code I’ve kind of got to, but it isn’t working. The system I’m uploading the SSRS to doesn’t give me any useful error message at all, so I can’t be more specific about what isn’t working I’m afraid. I get none of the SSRS report displayed, just an error saying the dataset source isn’t working.
SELECT c.PKContactID, c.ContactName, d.ExpiryDate, d.IssueDate, d.DBSType FROM Contacts c LEFT JOIN ( SELECT TOP 1 FKContactID, ExpiryDate, IssueDate, DBSType FROM DBSData WHERE FKContactID = c.PKContactID ORDER BY ExpiryDate DESC ) d ON c.PKContactID = d.FKContactID ORDER BY c.ContactName
I suspect it’s something to do with that WHERE
in the subquery, but if I don’t have that, that whole table is using the WHOLE table and returning 1 row, not the top 1 for that contact.
Advertisement
Answer
Your method would work using APPLY
, instead of JOIN
:
SELECT c.PKContactID, c.ContactName, d.ExpiryDate, d.IssueDate, d.DBSType FROM Contacts c OUTER APPLY (SELECT TOP 1 d.* FROM DBSData d WHERE d.FKContactID = c.PKContactID ORDER BY d.ExpiryDate DESC ) d ORDER BY c.ContactName;
Technically APPLY
implements something called a lateral join. This is like a correlated subquery, but it can return multiple rows and multiple columns. Lateral joins are very powerful, and this is a good example for using them.
For performance, you want indexes on DBSData(FKContactID, ExpiryDate DESC)
(perhaps including the other columns you want as well) and Contacts(ContactName)
.
With the right indexes, I would expect this to have performance at least as good as other methods.
An alternative that also typically has good performance is using a correlated subquery for filtering:
SELECT c.PKContactID, c.ContactName, d.ExpiryDate, d.IssueDate, d.DBSType FROM Contacts c LEFT JOIN DBSData d ON d.FKContactID = c.PKContactID AND d.ExpiryDate = (SELECT MAX(d2.ExpiryDate) FROM DBSData d WHERE d2.FKContactID = d.FKContactID );
Note that to match the LEFT JOIN
, the correlation condition needs to be in the ON
clause, not the WHERE
clause.
Finally, if you do use window functions, I would recommend a subquery for getting the first row:
SELECT c.PKContactID, c.ContactName, d.ExpiryDate, d.IssueDate, d.DBSType FROM Contacts c LEFT JOIN (SELECT d.*, ROW_NUMBER() OVER (PARTITION BY d.FKContactID ORDER BY d.PKContactID DESC) as seqnum FROM DBSData d ) d ON d.FKContactID = c.PKContactID AND d.seqnum = 1;
Doing the subquery before the JOIN
gives more opportunities for the optimizer to produce a better execution plan.