Skip to content
Advertisement

How do I get IDs associated with the most frequent value in PostgreSQL?

I have following data in my PostgreSQL 11 database count_tbl (Windows 10 x64 machine).

grp     id     value
1       1      19.7
1       2      19.7
1       3      19.7
1       4      19.7
1       5      19.7
1       6      19.7
1       7      18.8
1       8      18.8
1       9      18.8
1       10     18.8
1       11     18.8
1       12     18.8
2       1      18.6
2       2      18.6
2       3      18.6
2       4      18.6
2       5      18.6
2       6      0.0
2       7      0.0
2       8      0.0
2       9      21.4
2       10     21.4
2       11     0.0
2       12     0.0

The following query, for each group (grp), finds the most frequent value:

Select Distinct on (grp)
    grp,
    case
        when freq > 1
        then value
        else 0.0
    end as freq_val
From
(
    Select
        grp,
        value,
        count(id) as freq
    From count_tbl
    Group by grp, value
    Order by grp
) s1
Order by grp, freq desc, value desc;

The output of above query is:

grp    freq_val
1      19.7
2      18.6

Now, I would like to associate the most frequent value to the corresponding IDs (for example, for grp = 1, most frequent value 19.7 have ids 1, 2, 3, 4, 5, 6), let say, as an integer array. My expected output is:

grp    freq_val     ids
1      19.7         {1,2,3,4,5,6}
2      18.6         {1,2,3,4,5}

Does someone care to reflect on this or suggest how this can be achieved?

Advertisement

Answer

You can use window functions and aggregation:

select
    grp,
    value freq_val,
    array_agg(id) ids
from (
    select
        c.*,
        rank() over(partition by grp order by value desc) rn
    from count_tbl
) t
where rn = 1
group by grp, value

The inner query ranks records having the same grp by decreasing value. Then, the outer query filter on the top rows per froup and aggregates.


Edit: if you want the most occuring value per group (instead of the highest value) and its associated ids, then you can aggregate in the subquery instead:

select *
from (
    select
        grp,
        value freq_val,
        array_agg(id) ids
        rank() over(partition by grp order by count(*) desc) rn
    from count_tbl
    group by grp, value
) t
where rn = 1
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement