Skip to content
Advertisement

SQL subquery with latest record

I’ve read just about every question on here that I can find that is referencing getting the latest record from a subquery, but I just can’t work out how to make it work in my situation.

I’m creating an SSRS report for use on SQL Server 2008.

In the database is a table of contacts and DBSdata. I want to pull up a list of contacts and the latest record (many of the fields from that row) from the DBSdata table (expiry date furthest in the future)

Contacts
========
PKContactID      ContactName
-----------      -----------
1                JONES Chris
2                SMITH Mary
3                GREY Jean


DBSdata
=======
Ordinal   FKContactID   ExpiryDate   IssueDate     DBSType
-------   -----------   ----------   ---------     -------
3         1             2021-09-01   2019-09-01    Internal
2         1             2019-08-31   2017-08-31    External
1         1             2017-07-01   2015-07-01    Internal
2         2             2021-04-15   2019-04-15    Internal
1         2             2019-05-05   2017-05-06    External
1         3             2018-01-03   2016-03-02    External

And the result I’d like is:

Latest DBS
==========
PKContactID   ContactName      ExpiryDate    IssueDate     DBSType
-------------------------------------------------------------------
3             GREY Jean        2018-01-03    2016-03-02    External
1             JONES Chris      2021-09-01    2019-09-01    Internal
2             SMITH Mary       2021-04-15    2019-04-15    Internal

[The DBSData table doesn’t have it’s own Primary Key field – that’s not something I have control over, unfortunately… And the ordinal increases per contact, so FKContactID+Ordinal is unique….]

This is the code I’ve kind of got to, but it isn’t working. The system I’m uploading the SSRS to doesn’t give me any useful error message at all, so I can’t be more specific about what isn’t working I’m afraid. I get none of the SSRS report displayed, just an error saying the dataset source isn’t working.

    SELECT
        c.PKContactID, c.ContactName, d.ExpiryDate, d.IssueDate, d.DBSType
    FROM
        Contacts c
        LEFT JOIN (
                      SELECT TOP 1 FKContactID, ExpiryDate, IssueDate, DBSType
                      FROM DBSData
                      WHERE FKContactID = c.PKContactID
                      ORDER BY ExpiryDate DESC
                  ) d ON c.PKContactID = d.FKContactID
    ORDER BY
       c.ContactName

I suspect it’s something to do with that WHERE in the subquery, but if I don’t have that, that whole table is using the WHOLE table and returning 1 row, not the top 1 for that contact.

Advertisement

Answer

Your method would work using APPLY, instead of JOIN:

SELECT c.PKContactID, c.ContactName,
       d.ExpiryDate, d.IssueDate, d.DBSType
FROM Contacts c OUTER APPLY
     (SELECT TOP 1 d.*
      FROM DBSData d
      WHERE d.FKContactID = c.PKContactID
      ORDER BY d.ExpiryDate DESC
     ) d
ORDER BY c.ContactName;

Technically APPLY implements something called a lateral join. This is like a correlated subquery, but it can return multiple rows and multiple columns. Lateral joins are very powerful, and this is a good example for using them.

For performance, you want indexes on DBSData(FKContactID, ExpiryDate DESC) (perhaps including the other columns you want as well) and Contacts(ContactName).

With the right indexes, I would expect this to have performance at least as good as other methods.

An alternative that also typically has good performance is using a correlated subquery for filtering:

SELECT c.PKContactID, c.ContactName,
       d.ExpiryDate, d.IssueDate, d.DBSType
FROM Contacts c LEFT JOIN
     DBSData d
     ON d.FKContactID = c.PKContactID AND
        d.ExpiryDate = (SELECT MAX(d2.ExpiryDate)
                        FROM DBSData d
                        WHERE d2.FKContactID = d.FKContactID
                       );

Note that to match the LEFT JOIN, the correlation condition needs to be in the ON clause, not the WHERE clause.

Finally, if you do use window functions, I would recommend a subquery for getting the first row:

SELECT c.PKContactID, c.ContactName,
       d.ExpiryDate, d.IssueDate, d.DBSType
FROM Contacts c LEFT JOIN
     (SELECT d.*,
             ROW_NUMBER() OVER (PARTITION BY d.FKContactID ORDER BY d.PKContactID DESC) as seqnum
      FROM DBSData d
     ) d
     ON d.FKContactID = c.PKContactID AND
        d.seqnum = 1;

Doing the subquery before the JOIN gives more opportunities for the optimizer to produce a better execution plan.

User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement