Skip to content
Advertisement

SQL group by a field and only return one joined row for each grouping

Table data

+-----+----------------+--------+----------------+
| ID  |  Required_by   |  Name  |  Another_Field |
+-----+----------------+--------+----------------+
| 1   |  7 August      |  cat   |  X             |
| 2   |  7 August      |  cat   |  Y             |
| 3   |  10 August     |  cat   |  Z             |
| 4   |  11 August     |  dog   |  A             |
+-----+----------------+--------+----------------+

What I want to do is group by the name, then for each group choose one of the rows with the earliest required by date.

For this data set, I would like to end up with either rows 1 and 4, or rows 2 and 4.

Expected result:

+-----+----------------+--------+----------------+
| ID  |  Required_by   |  Name  |  Another_Field |
+-----+----------------+--------+----------------+
| 1   |  7 August      |  cat   |  X             |
| 4   |  11 August     |  dog   |  A             |
+-----+----------------+--------+----------------+

OR

+-----+----------------+--------+----------------+
| ID  |  Required_by   |  Name  |  Another_Field |
+-----+----------------+--------+----------------+
| 2   |  7 August      |  cat   |  Y             |
| 4   |  11 August     |  dog   |  A             |
+-----+----------------+--------+----------------+

I have something that returns 1,2 and 4 but I’m not sure how to only pick one from the first group to get the desired result. I’m joining the grouping with the data table so that I can get the ID and another_field back after the grouping.

SELECT d.id, d.name, d.required_by, d.another_field
FROM 
(
  SELECT min(required_by) as min_date, name
  FROM data
  GROUP BY name
) agg
INNER JOIN 
data d
on d.required_by = agg.min_date AND d.name = agg.name

Advertisement

Answer

This is typically solved using window functions:

select d.id, d.name, d.required_by, d.another_field
from (
  select id, name, required_by, another_field, 
         row_number() over (partition by name order by required_by) as rn
  from data
) d
where d.rn = 1;

In Postgres using distinct on() is typically faster:

select distinct on (name) *
from data
order by name, required_by

Online example

User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement