I have following data in my PostgreSQL 11 database count_tbl (Windows 10 x64 machine).
grp id value 1 1 19.7 1 2 19.7 1 3 19.7 1 4 19.7 1 5 19.7 1 6 19.7 1 7 18.8 1 8 18.8 1 9 18.8 1 10 18.8 1 11 18.8 1 12 18.8 2 1 18.6 2 2 18.6 2 3 18.6 2 4 18.6 2 5 18.6 2 6 0.0 2 7 0.0 2 8 0.0 2 9 21.4 2 10 21.4 2 11 0.0 2 12 0.0
The following query, for each group (grp), finds the most frequent value:
Select Distinct on (grp)
    grp,
    case
        when freq > 1
        then value
        else 0.0
    end as freq_val
From
(
    Select
        grp,
        value,
        count(id) as freq
    From count_tbl
    Group by grp, value
    Order by grp
) s1
Order by grp, freq desc, value desc;
The output of above query is:
grp freq_val 1 19.7 2 18.6
Now, I would like to associate the most frequent value to the corresponding IDs (for example, for grp = 1, most frequent value 19.7 have ids 1, 2, 3, 4, 5, 6), let say, as an integer array. My expected output is:
grp    freq_val     ids
1      19.7         {1,2,3,4,5,6}
2      18.6         {1,2,3,4,5}
Does someone care to reflect on this or suggest how this can be achieved?
Advertisement
Answer
You can use window functions and aggregation:
select
    grp,
    value freq_val,
    array_agg(id) ids
from (
    select
        c.*,
        rank() over(partition by grp order by value desc) rn
    from count_tbl
) t
where rn = 1
group by grp, value
The inner query ranks records having the same grp by decreasing value. Then, the outer query filter on the top rows per froup and aggregates.
Edit: if you want the most occuring value per group (instead of the highest value) and its associated ids, then you can aggregate in the subquery instead:
select *
from (
    select
        grp,
        value freq_val,
        array_agg(id) ids
        rank() over(partition by grp order by count(*) desc) rn
    from count_tbl
    group by grp, value
) t
where rn = 1