I have following data in my PostgreSQL 11 database count_tbl
(Windows 10 x64 machine).
grp id value 1 1 19.7 1 2 19.7 1 3 19.7 1 4 19.7 1 5 19.7 1 6 19.7 1 7 18.8 1 8 18.8 1 9 18.8 1 10 18.8 1 11 18.8 1 12 18.8 2 1 18.6 2 2 18.6 2 3 18.6 2 4 18.6 2 5 18.6 2 6 0.0 2 7 0.0 2 8 0.0 2 9 21.4 2 10 21.4 2 11 0.0 2 12 0.0
The following query, for each group (grp
), finds the most frequent value:
Select Distinct on (grp) grp, case when freq > 1 then value else 0.0 end as freq_val From ( Select grp, value, count(id) as freq From count_tbl Group by grp, value Order by grp ) s1 Order by grp, freq desc, value desc;
The output of above query is:
grp freq_val 1 19.7 2 18.6
Now, I would like to associate the most frequent value to the corresponding IDs (for example, for grp = 1
, most frequent value 19.7
have ids 1, 2, 3, 4, 5, 6), let say, as an integer array
. My expected output is:
grp freq_val ids 1 19.7 {1,2,3,4,5,6} 2 18.6 {1,2,3,4,5}
Does someone care to reflect on this or suggest how this can be achieved?
Advertisement
Answer
You can use window functions and aggregation:
select grp, value freq_val, array_agg(id) ids from ( select c.*, rank() over(partition by grp order by value desc) rn from count_tbl ) t where rn = 1 group by grp, value
The inner query ranks records having the same grp
by decreasing value
. Then, the outer query filter on the top rows per froup and aggregates.
Edit: if you want the most occuring value per group (instead of the highest value) and its associated ids, then you can aggregate in the subquery instead:
select * from ( select grp, value freq_val, array_agg(id) ids rank() over(partition by grp order by count(*) desc) rn from count_tbl group by grp, value ) t where rn = 1