Skip to content
Advertisement

Select all rows which match a condition dependent on over rows

Let’s assume I have a table T with the columns X, Y, Z where Z is some comparable value (I have to make it work on a PostgreSQL database but a solution that additionally supports oracle would be even nicer).

What I want to achieve is best described by the following, sadly invalid query:

SELECT X, Y, MAX(Z) 
FROM T OVER (PARTITION BY Z) as max_z 
GROUP BY Y 
WHERE Z < z_threshold AND Z = max_z AND X = some_x

This is invalid since window function result cannot be accessed in a where clause as I understood from the (PostgreSQL) documentation.

In other words, for each group of rows with the same y-value I want to select the row which has the maximum Z in that group.

Currently my solution is to just select all rows where X = some_x and Z < z_threshold and then “post-process” the data outside of sql, which is not very efficient with large datasets.

Is it possible to achieve this result with plain sql or is there any other more efficient solution to this?

Advertisement

Answer

You just need a subquery:

SELECT t.*
FROM (SELECT X, Y, Z, MAX(Z) OVER (PARTITION BY Y) as max_z
      FROM T
     ) t
WHERE z = max_z;

If there are duplicates maximuma for a given y, this returns all the duplicates. Often ROW_NUMBER() would be used for this purpose, but you started with MAX() so this also works.

Note: In Postgres, I would recommend DISTINCT ON:

select distinct on (y) t.*
from t
order by y, z desc;
User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement