Skip to content
Advertisement

Group by multiple columns and limit per group – Postgres

I’m creating a messaging app as a side project and I’m trying to query a user’s conversations efficiently.

The messages table structure is basic right now with some dummy data:

| id   | sender_id | receiver_id | message | created_at |
|------|-----------|-------------|---------|------------|
| 1    | 1         | 2           | text    | time       |
| 2    | 2         | 1           | text    | time       |
| 3    | 1         | 2           | text    | time       |
| 4    | 1         | 3           | text    | time       |
| 5    | 3         | 2           | text    | time       |
| 6    | 3         | 1           | text    | time       |
| 7    | 2         | 1           | text    | time       |

I’d like to be able to query the DB and group by “conversation” – A.K.A any rows that have the same sender_id or receiver_id in either column – rows (1, 2, 3, 7), (4, 6), (5). I’d like to be able to limit each “group” to n rows and order them by the created_at column. It would ideally look like (created_at values are arbitrary numbers to show descending values):

| id   | sender_id | receiver_id | message | created_at |
|------|-----------|-------------|---------|------------|
| 1    | 1         | 2           | text    | 400        |
| 2    | 2         | 1           | text    | 300        |
| 3    | 1         | 2           | text    | 200        |
| 7    | 2         | 1           | text    | 100        |
| 4    | 1         | 3           | text    | 700        |
| 6    | 3         | 1           | text    | 500        |
| 5    | 3         | 2           | text    | 300        |

Ideally there would be an additional column added that would number each group (easy to create multi-dimensional array.

So far I’ve been able to “group” by sender/receiver ids, order by created_at, and limit the number per group. However, It’s not quite right. Here’s the query:

SELECT
    filter.id, filter.sender_id, filter.receiver_id, filter.message, filter.created_at
FROM (
    SELECT messages.*,
        rank() OVER (
            PARTITION BY sender_id
            ORDER BY created_at DESC
        )
    FROM messages
    WHERE messages.sender_id = 1 or messages.receiver_id = 1
) filter WHERE rank <= 50;

My result set looks like this:

| id   | sender_id | receiver_id | message | created_at |
|------|-----------|-------------|---------|------------|
| 1    | 1         | 2           | text    | 400        |
| 3    | 1         | 2           | text    | 300        |
| 4    | 1         | 3           | text    | 700        |
| 2    | 2         | 1           | text    | 300        |
| 7    | 2         | 1           | text    | 100        |
| 6    | 3         | 1           | text    | 500        |
| 5    | 3         | 2           | text    | 300        |

You can see that row 3 and 6 should be grouped but aren’t.

Advertisement

Answer

You can use rank(). To limit the number of records per conversation (ie sender/receiver or receiver/sender tuple), you can use a partition like least(sender_id, receiver_id), greatest(sender_id, receiver_id):

select filter.id, filter.sender_id, filter.receiver_id, filter.message, filter.created_at
from (
    select
        t.*,
        rank() over(
            partition by least(sender_id, receiver_id), greatest(sender_id, receiver_id)
            order by created_at desc
        ) rn
    from mytable t
) t
where rn <= 50
order by least(sender_id, receiver_id), greatest(sender_id, receiver_id), rn
User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement