Table data
+-----+----------------+--------+----------------+ | ID | Required_by | Name | Another_Field | +-----+----------------+--------+----------------+ | 1 | 7 August | cat | X | | 2 | 7 August | cat | Y | | 3 | 10 August | cat | Z | | 4 | 11 August | dog | A | +-----+----------------+--------+----------------+
What I want to do is group by the name, then for each group choose one of the rows with the earliest required by date.
For this data set, I would like to end up with either rows 1 and 4, or rows 2 and 4.
Expected result:
+-----+----------------+--------+----------------+ | ID | Required_by | Name | Another_Field | +-----+----------------+--------+----------------+ | 1 | 7 August | cat | X | | 4 | 11 August | dog | A | +-----+----------------+--------+----------------+
OR
+-----+----------------+--------+----------------+ | ID | Required_by | Name | Another_Field | +-----+----------------+--------+----------------+ | 2 | 7 August | cat | Y | | 4 | 11 August | dog | A | +-----+----------------+--------+----------------+
I have something that returns 1,2 and 4 but I’m not sure how to only pick one from the first group to get the desired result. I’m joining the grouping with the data
table so that I can get the ID
and another_field
back after the grouping.
SELECT d.id, d.name, d.required_by, d.another_field FROM ( SELECT min(required_by) as min_date, name FROM data GROUP BY name ) agg INNER JOIN data d on d.required_by = agg.min_date AND d.name = agg.name
Advertisement
Answer
This is typically solved using window functions:
select d.id, d.name, d.required_by, d.another_field from ( select id, name, required_by, another_field, row_number() over (partition by name order by required_by) as rn from data ) d where d.rn = 1;
In Postgres using distinct on()
is typically faster:
select distinct on (name) * from data order by name, required_by