I have following data in my PostgreSQL 11 database count_tbl (Windows 10 x64 machine).
grp id value 1 1 19.7 1 2 19.7 1 3 19.7 1 4 19.7 1 5 19.7 1 6 19.7 1 7 18.8 1 8 18.8 1 9 18.8 1 10 18.8 1 11 18.8 1 12 18.8 2 1 18.6 2 2 18.6 2 3 18.6 2 4 18.6 2 5 18.6 2 6 0.0 2 7 0.0 2 8 0.0 2 9 21.4 2 10 21.4 2 11 0.0 2 12 0.0
The following query, for each group (grp), finds the most frequent value:
Select Distinct on (grp)
grp,
case
when freq > 1
then value
else 0.0
end as freq_val
From
(
Select
grp,
value,
count(id) as freq
From count_tbl
Group by grp, value
Order by grp
) s1
Order by grp, freq desc, value desc;
The output of above query is:
grp freq_val 1 19.7 2 18.6
Now, I would like to associate the most frequent value to the corresponding IDs (for example, for grp = 1, most frequent value 19.7 have ids 1, 2, 3, 4, 5, 6), let say, as an integer array. My expected output is:
grp freq_val ids
1 19.7 {1,2,3,4,5,6}
2 18.6 {1,2,3,4,5}
Does someone care to reflect on this or suggest how this can be achieved?
Advertisement
Answer
You can use window functions and aggregation:
select
grp,
value freq_val,
array_agg(id) ids
from (
select
c.*,
rank() over(partition by grp order by value desc) rn
from count_tbl
) t
where rn = 1
group by grp, value
The inner query ranks records having the same grp by decreasing value. Then, the outer query filter on the top rows per froup and aggregates.
Edit: if you want the most occuring value per group (instead of the highest value) and its associated ids, then you can aggregate in the subquery instead:
select *
from (
select
grp,
value freq_val,
array_agg(id) ids
rank() over(partition by grp order by count(*) desc) rn
from count_tbl
group by grp, value
) t
where rn = 1