I’m just now learning about KEEP in Oracle SQL, but I cannot seem to find documentation that explains why their examples use KEEP in all columns that are not indexed.
I have a table with 5 columns
PERSON_ID | BRANCH | YEAR | STATUS | TIMESTAMP 123456 | 0001 | 2017 | 1 | 1-1-2017 (ROW 1) 123456 | 0001 | 2017 | 2 | 2-1-2017 (ROW 2) 123456 | 0002 | 2017 | 3 | 3-1-2017 (ROW 3) 123456 | 0001 | 2017 | 2 | 4-1-2017 (ROW 4) 123456 | 0001 | 2018 | 2 | 1-1-2018 (ROW 5) 123456 | 0001 | 2018 | 3 | 2-1-2018 (ROW 6)
I want to return the row of the most recent timestamp by person, branch, and year, so rows 3, 4, and 6.
RESULTS PERSON_ID | BRANCH | YEAR | STATUS | TIME_STAMP 123456 | 0002 | 2017 | 3 | 3-1-2017 (ROW 3) 123456 | 0001 | 2017 | 2 | 4-1-2017 (ROW 4) 123456 | 0001 | 2018 | 3 | 2-1-2018 (ROW 6)
To get the entire row, I would normally I would write something like this:
SELECT * FROM STATUS_TABLE a WHERE a.TIME_STAMP = ( SELECT MAX(sub.TIME_STAMP) FROM STATUS_TABLE sub WHERE a.PERSON_ID = sub.PERSON_ID AND a.YEAR = sub.YEAR AND a.BRANCH = sub.BRANCH )
But I’m learning I can write this:
SELECT a.PERSON_ID, a.YEAR, a.BRANCH, MAX(a.STATUS) KEEP (DENSE_RANK FIRST ORDER BY TIME_STAMP DESC) FROM STATUS_TABLE a GROUP BY a.PERSON_ID, a.YEAR, a.BRANCH;
My concern is that a lot of the documentation and example I’m finding doesn’t put all the group-by columns in GROUP BY, but rather they write a KEEP statement for many columns.
Like this:
SELECT a.PERSON_ID, MAX(a.YEAR) KEEP (DENSE_RANK FIRST ORDER BY TIME_STAMP DESC), MAX(a.BRANCH) KEEP (DENSE_RANK FIRST ORDER BY TIME_STAMP DESC), MAX(a.STATUS) KEEP (DENSE_RANK FIRST ORDER BY TIME_STAMP DESC) FROM STATUS_TABLE a GROUP BY a.PERSON_ID;
QUESTION
If I know that there will never be duplicates on TIME_STAMP for an ID, YEAR, and BRANCH, can I write it the first way or do I still need to write it the 2nd way. Using the first way, I get the results I’m expecting, but I can’t seem to find any explanation of this method and what the differences may be.
Are there any?
Advertisement
Answer
Your aggregation queries are different. When you have:
GROUP BY a.PERSON_ID, a.YEAR, a.BRANCH
Your result set will have one row in the result set for each combination of the three columns.
If you specify:
GROUP BY a.PERSON_ID
Then there is one row only for each PERSON_ID
. Under some circumstances, this is the same as the above version. But only when there is one YEAR
and BRANCH
per PERSON_ID
. That is not true in your data.
These versions are functionally equivalent for most practical purposes to your version with the correlated subquery. One difference is what happens if any of the grouping/correlation columns are NULL
. The GROUP BY
keeps these groupings. The correlated subquery filters them out.