SQL group by a field and only return one joined row for each grouping

Table data

+-----+----------------+--------+----------------+
| ID  |  Required_by   |  Name  |  Another_Field |
+-----+----------------+--------+----------------+
| 1   |  7 August      |  cat   |  X             |
| 2   |  7 August      |  cat   |  Y             |
| 3   |  10 August     |  cat   |  Z             |
| 4   |  11 August     |  dog   |  A             |
+-----+----------------+--------+----------------+

​x
 
+-----+----------------+--------+----------------+| ID  |  Required_by   |  Name  |  Another_Field |+-----+----------------+--------+----------------+| 1   |  7 August      |  cat   |  X             || 2   |  7 August      |  cat   |  Y             || 3   |  10 August     |  cat   |  Z             || 4   |  11 August     |  dog   |  A             |+-----+----------------+--------+----------------+​

What I want to do is group by the name, then for each group choose one of the rows with the earliest required by date.

For this data set, I would like to end up with either rows 1 and 4, or rows 2 and 4.

Expected result:

+-----+----------------+--------+----------------+
| ID  |  Required_by   |  Name  |  Another_Field |
+-----+----------------+--------+----------------+
| 1   |  7 August      |  cat   |  X             |
| 4   |  11 August     |  dog   |  A             |
+-----+----------------+--------+----------------+

 
+-----+----------------+--------+----------------+| ID  |  Required_by   |  Name  |  Another_Field |+-----+----------------+--------+----------------+| 1   |  7 August      |  cat   |  X             || 4   |  11 August     |  dog   |  A             |+-----+----------------+--------+----------------+​

+-----+----------------+--------+----------------+
| ID  |  Required_by   |  Name  |  Another_Field |
+-----+----------------+--------+----------------+
| 2   |  7 August      |  cat   |  Y             |
| 4   |  11 August     |  dog   |  A             |
+-----+----------------+--------+----------------+

 
+-----+----------------+--------+----------------+| ID  |  Required_by   |  Name  |  Another_Field |+-----+----------------+--------+----------------+| 2   |  7 August      |  cat   |  Y             || 4   |  11 August     |  dog   |  A             |+-----+----------------+--------+----------------+​

I have something that returns 1,2 and 4 but I’m not sure how to only pick one from the first group to get the desired result. I’m joining the grouping with the data table so that I can get the ID and another_field back after the grouping.

SELECT d.id, d.name, d.required_by, d.another_field
FROM 
(
  SELECT min(required_by) as min_date, name
  FROM data
  GROUP BY name
) agg
INNER JOIN 
data d
on d.required_by = agg.min_date AND d.name = agg.name

 
SELECT d.id, d.name, d.required_by, d.another_fieldFROM (  SELECT min(required_by) as min_date, name  FROM data  GROUP BY name) aggINNER JOIN data don d.required_by = agg.min_date AND d.name = agg.name​

Answer

This is typically solved using window functions:

select d.id, d.name, d.required_by, d.another_field
from (
  select id, name, required_by, another_field, 
         row_number() over (partition by name order by required_by) as rn
  from data
) d
where d.rn = 1;

 
select d.id, d.name, d.required_by, d.another_fieldfrom (  select id, name, required_by, another_field,          row_number() over (partition by name order by required_by) as rn  from data) dwhere d.rn = 1;​

In Postgres using distinct on() is typically faster:

select distinct on (name) *
from data
order by name, required_by

 
select distinct on (name) *from dataorder by name, required_by​

Online example

Advertisement

Answer