I have following data in my PostgreSQL 11 database count_tbl
(Windows 10 x64 machine).
grp id value
1 1 19.7
1 2 19.7
1 3 19.7
1 4 19.7
1 5 19.7
1 6 19.7
1 7 18.8
1 8 18.8
1 9 18.8
1 10 18.8
1 11 18.8
1 12 18.8
2 1 18.6
2 2 18.6
2 3 18.6
2 4 18.6
2 5 18.6
2 6 0.0
2 7 0.0
2 8 0.0
2 9 21.4
2 10 21.4
2 11 0.0
2 12 0.0
The following query, for each group (grp
), finds the most frequent value:
Select Distinct on (grp)
grp,
case
when freq > 1
then value
else 0.0
end as freq_val
From
(
Select
grp,
value,
count(id) as freq
From count_tbl
Group by grp, value
Order by grp
) s1
Order by grp, freq desc, value desc;
The output of above query is:
grp freq_val
1 19.7
2 18.6
Now, I would like to associate the most frequent value to the corresponding IDs (for example, for grp = 1
, most frequent value 19.7
have ids 1, 2, 3, 4, 5, 6), let say, as an integer array
. My expected output is:
grp freq_val ids
1 19.7 {1,2,3,4,5,6}
2 18.6 {1,2,3,4,5}
Does someone care to reflect on this or suggest how this can be achieved?
Advertisement
Answer
You can use window functions and aggregation:
select
grp,
value freq_val,
array_agg(id) ids
from (
select
c.*,
rank() over(partition by grp order by value desc) rn
from count_tbl
) t
where rn = 1
group by grp, value
The inner query ranks records having the same grp
by decreasing value
. Then, the outer query filter on the top rows per froup and aggregates.
Edit: if you want the most occuring value per group (instead of the highest value) and its associated ids, then you can aggregate in the subquery instead:
select *
from (
select
grp,
value freq_val,
array_agg(id) ids
rank() over(partition by grp order by count(*) desc) rn
from count_tbl
group by grp, value
) t
where rn = 1