Skip to content
Advertisement

Is there a difference between Oracle SQL ‘KEEP’ for multiple columns and ‘KEEP’ for one and GROUP BY for the rest?

I’m just now learning about KEEP in Oracle SQL, but I cannot seem to find documentation that explains why their examples use KEEP in all columns that are not indexed.

I have a table with 5 columns

PERSON_ID | BRANCH | YEAR | STATUS | TIMESTAMP  
123456    | 0001   | 2017 | 1      | 1-1-2017   (ROW 1)  
123456    | 0001   | 2017 | 2      | 2-1-2017   (ROW 2)  
123456    | 0002   | 2017 | 3      | 3-1-2017   (ROW 3)  
123456    | 0001   | 2017 | 2      | 4-1-2017   (ROW 4)  
123456    | 0001   | 2018 | 2      | 1-1-2018   (ROW 5)  
123456    | 0001   | 2018 | 3      | 2-1-2018   (ROW 6)  

I want to return the row of the most recent timestamp by person, branch, and year, so rows 3, 4, and 6.

RESULTS    
PERSON_ID | BRANCH | YEAR | STATUS | TIME_STAMP  
123456    | 0002   | 2017 | 3      | 3-1-2017   (ROW 3)  
123456    | 0001   | 2017 | 2      | 4-1-2017   (ROW 4)  
123456    | 0001   | 2018 | 3      | 2-1-2018   (ROW 6)  

To get the entire row, I would normally I would write something like this:

SELECT * 
FROM STATUS_TABLE a
WHERE a.TIME_STAMP = 
    (
    SELECT MAX(sub.TIME_STAMP) 
    FROM STATUS_TABLE sub
    WHERE a.PERSON_ID = sub.PERSON_ID
        AND a.YEAR = sub.YEAR
        AND a.BRANCH = sub.BRANCH
    )

But I’m learning I can write this:

SELECT 
    a.PERSON_ID, 
    a.YEAR, 
    a.BRANCH,
    MAX(a.STATUS) KEEP (DENSE_RANK FIRST ORDER BY TIME_STAMP DESC)
FROM STATUS_TABLE a
GROUP BY a.PERSON_ID, a.YEAR, a.BRANCH;

My concern is that a lot of the documentation and example I’m finding doesn’t put all the group-by columns in GROUP BY, but rather they write a KEEP statement for many columns.

Like this:

SELECT 
    a.PERSON_ID, 
    MAX(a.YEAR) KEEP (DENSE_RANK FIRST ORDER BY TIME_STAMP DESC), 
    MAX(a.BRANCH) KEEP (DENSE_RANK FIRST ORDER BY TIME_STAMP DESC),
    MAX(a.STATUS) KEEP (DENSE_RANK FIRST ORDER BY TIME_STAMP DESC)
FROM STATUS_TABLE a
GROUP BY a.PERSON_ID;

QUESTION
If I know that there will never be duplicates on TIME_STAMP for an ID, YEAR, and BRANCH, can I write it the first way or do I still need to write it the 2nd way. Using the first way, I get the results I’m expecting, but I can’t seem to find any explanation of this method and what the differences may be.

Are there any?

Advertisement

Answer

Your aggregation queries are different. When you have:

GROUP BY a.PERSON_ID, a.YEAR, a.BRANCH

Your result set will have one row in the result set for each combination of the three columns.

If you specify:

GROUP BY a.PERSON_ID

Then there is one row only for each PERSON_ID. Under some circumstances, this is the same as the above version. But only when there is one YEAR and BRANCH per PERSON_ID. That is not true in your data.

These versions are functionally equivalent for most practical purposes to your version with the correlated subquery. One difference is what happens if any of the grouping/correlation columns are NULL. The GROUP BY keeps these groupings. The correlated subquery filters them out.

User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement