Let’s assume I have a table T with the columns X, Y, Z where Z is some comparable value (I have to make it work on a PostgreSQL database but a solution that additionally supports oracle would be even nicer).
What I want to achieve is best described by the following, sadly invalid query:
SELECT X, Y, MAX(Z) FROM T OVER (PARTITION BY Z) as max_z GROUP BY Y WHERE Z < z_threshold AND Z = max_z AND X = some_x
This is invalid since window function result cannot be accessed in a where clause as I understood from the (PostgreSQL) documentation.
In other words, for each group of rows with the same y-value I want to select the row which has the maximum Z
in that group.
Currently my solution is to just select all rows where X = some_x
and Z < z_threshold
and then “post-process” the data outside of sql, which is not very efficient with large datasets.
Is it possible to achieve this result with plain sql or is there any other more efficient solution to this?
Advertisement
Answer
You just need a subquery:
SELECT t.* FROM (SELECT X, Y, Z, MAX(Z) OVER (PARTITION BY Y) as max_z FROM T ) t WHERE z = max_z;
If there are duplicates maximuma for a given y
, this returns all the duplicates. Often ROW_NUMBER()
would be used for this purpose, but you started with MAX()
so this also works.
Note: In Postgres, I would recommend DISTINCT ON
:
select distinct on (y) t.* from t order by y, z desc;