Skip to content
Advertisement

Cannot exclude previous non-duplicate rows

In a nutshell, here it is:

  • I have 1000(ish) employees who have multiple recurrent annual training requirements
  • I need to be able to sort the employees by County, Facility, Employee, and Type of Training (and also allow for sorted lists at each level)
  • I want to display only the most recent date the Employee took the training

What I’ve tried so far:

  • I’ve been successful when dealing with only one Employee’s record:

    DECLARE @Skill int
    SET @Skill = 81
    
    SELECT TOP 1
        P.lastname+', '+P.firstname AS Employee,    
        P.external_id,
        PC.job_title,
        SD.name,
        SV.schedule_days as ExpireInterval,
        PO.course_startdate,
        DATEADD(DD,SV.schedule_days,PO.course_startdate) as ExpireDate
    
    FROM portfolio PO
        INNER JOIN person P ON PO.person_id=P.person_id
        INNER JOIN e_component EC ON PO.component_id=EC.component_id
        JOIN skill_value SV ON EC.component_id=SV.object_id
        JOIN skill_description SD ON SV.skill_id=SD.skill_id
        JOIN person_custom PC ON P.person_id=PC.person_id
    
    GROUP BY 
        PO.person_id, 
        PO.course_startdate, 
        SV.skill_id, 
        P.lastname, 
        P.firstname, 
        P.external_id, 
        PC.job_title, 
        SD.name, 
        SV.schedule_days, 
        SD.language_id
    
    HAVING SD.language_id=26
        AND PO.person_id=123456
        AND SV.skill_id= @Skill
    
    ORDER BY Employee, PO.course_startdate DESC
    

NOTE: The excessive JOINS are due to the lack of FK relationships in the host database. Our vendor designed it to rely mostly on code built into their front end so I’m working with what I’ve got.

The previously listed code returns the following result:

Most Recent Record for Employee #123456

  • When I try to pull the most recent record from a list of employees however:

    DECLARE @Skill int
    SET @Skill = 81
    
    SELECT
        P.lastname+', '+P.firstname AS Employee,    
        P.external_id,
        PC.job_title,
        SD.name,
        SV.schedule_days as ExpireInterval,
        PO.course_startdate,
        DATEADD(DD,SV.schedule_days,PO.course_startdate) as ExpireDate
    
    FROM portfolio PO
        INNER JOIN person P ON PO.person_id=P.person_id
        INNER JOIN e_component EC ON PO.component_id=EC.component_id
        JOIN skill_value SV ON EC.component_id=SV.object_id
        JOIN skill_description SD ON SV.skill_id=SD.skill_id
        JOIN person_custom PC ON P.person_id=PC.person_id
    
    GROUP BY 
        PO.person_id, 
        PO.course_startdate, 
        SV.skill_id, 
        P.lastname, 
        P.firstname, 
        P.external_id, 
        PC.job_title, 
        SD.name, 
        SV.schedule_days, 
        SD.language_id
    
    HAVING SD.language_id=26
        AND PO.person_id IN (SELECT DISTINCT person_id FROM portfolio)
        AND SV.skill_id= @Skill
    
    ORDER BY Employee, PO.course_startdate DESC
    

I get multiple entries for the same Employee (e.g.different times that employee has taken the training with the same skill_id).

What I want to do is something like this:

IF count(SV.skill_id)>1
    THEN SELECT TOP 1 component_id --for each individual
        FROM portfolio

I just can’t figure out where to put the condition to have it give me one record per person. I’ve tried assigning local variables, moving the SELECT subquery around to various columns, adding and removing constraints… etc. Nothing has worked so far.

I’m using the following software:

  • SQL Server Management Studio 2014 & 2017 (the live DB is on 2014 and I have a static one on 2017 for development purposes)
  • Report Builder 3.0 (my company hasn’t upgraded to the latest and greatest yet)

P.S. If there is a method of sorting the records on the report form itself using Regular Expressions, please let me know!

Advertisement

Answer

A couple of observations, then an answer.

In SQL Server, INNER JOIN and JOIN mean the same thing.

As @DaleBurrell notes, unless you’re filtering by an aggregated value, use a WHERE clause rather than a HAVING clause. The WHERE is applied earlier in the query processing and you should see modestly better performance putting your filtering there. Also, it’s more “standard”, if you will.

Finally, I removed your filtering sub-query for person_id because it’s a self-join to portfolio that I couldn’t see a good reason for. If there are additional criteria in there that make it useful, go ahead and put it back.

With that said, your second attempt was really close. If you RANK your results using your existing ORDER BY clause, then apply TOP (1) WITH TIES, it will return the #1 ranked result for each employee, ordered by date.

DECLARE @Skill int
SET @Skill = 81

SELECT TOP (1) WITH TIES
    P.lastname+', '+P.firstname AS Employee,    
    P.external_id,
    PC.job_title,
    SD.name,
    SV.schedule_days as ExpireInterval,
    PO.course_startdate,
    DATEADD(DD,SV.schedule_days,PO.course_startdate) as ExpireDate
FROM portfolio PO
    JOIN person P ON PO.person_id=P.person_id
    JOIN e_component EC ON PO.component_id=EC.component_id
    JOIN skill_value SV ON EC.component_id=SV.object_id
    JOIN skill_description SD ON SV.skill_id=SD.skill_id
    JOIN person_custom PC ON P.person_id=PC.person_id
    JOIN portfolio PF ON PO.person_id = PF.person_id

WHERE SD.language_id=26
    AND SV.skill_id= @Skill

GROUP BY 
    PO.person_id, 
    PO.course_startdate, 
    SV.skill_id, 
    P.lastname, 
    P.firstname, 
    P.external_id, 
    PC.job_title, 
    SD.name, 
    SV.schedule_days, 
    SD.language_id

ORDER BY RANK() OVER (PARTITION BY Employee ORDER BY PO.course_startdate DESC)
User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement