I'm creating a messaging app as a side project and I'm trying to query a user's conversations efficiently. The messages table structure is basic right now with some dummy data: I'd like to be able to query the DB and group by "conversation" - A.K.A any rows that have the same sender_id or receiver_id in either column - rows (1,

Group by multiple columns and limit per group – Postgres

I’m creating a messaging app as a side project and I’m trying to query a user’s conversations efficiently.

The messages table structure is basic right now with some dummy data:

| id   | sender_id | receiver_id | message | created_at |
|------|-----------|-------------|---------|------------|
| 1    | 1         | 2           | text    | time       |
| 2    | 2         | 1           | text    | time       |
| 3    | 1         | 2           | text    | time       |
| 4    | 1         | 3           | text    | time       |
| 5    | 3         | 2           | text    | time       |
| 6    | 3         | 1           | text    | time       |
| 7    | 2         | 1           | text    | time       |

​x
 
| id   | sender_id | receiver_id | message | created_at ||------|-----------|-------------|---------|------------|| 1    | 1         | 2           | text    | time       || 2    | 2         | 1           | text    | time       || 3    | 1         | 2           | text    | time       || 4    | 1         | 3           | text    | time       || 5    | 3         | 2           | text    | time       || 6    | 3         | 1           | text    | time       || 7    | 2         | 1           | text    | time       |​

I’d like to be able to query the DB and group by “conversation” – A.K.A any rows that have the same sender_id or receiver_id in either column – rows (1, 2, 3, 7), (4, 6), (5). I’d like to be able to limit each “group” to n rows and order them by the created_at column. It would ideally look like (created_at values are arbitrary numbers to show descending values):

| id   | sender_id | receiver_id | message | created_at |
|------|-----------|-------------|---------|------------|
| 1    | 1         | 2           | text    | 400        |
| 2    | 2         | 1           | text    | 300        |
| 3    | 1         | 2           | text    | 200        |
| 7    | 2         | 1           | text    | 100        |
| 4    | 1         | 3           | text    | 700        |
| 6    | 3         | 1           | text    | 500        |
| 5    | 3         | 2           | text    | 300        |

 
| id   | sender_id | receiver_id | message | created_at ||------|-----------|-------------|---------|------------|| 1    | 1         | 2           | text    | 400        || 2    | 2         | 1           | text    | 300        || 3    | 1         | 2           | text    | 200        || 7    | 2         | 1           | text    | 100        || 4    | 1         | 3           | text    | 700        || 6    | 3         | 1           | text    | 500        || 5    | 3         | 2           | text    | 300        |​

Ideally there would be an additional column added that would number each group (easy to create multi-dimensional array.

So far I’ve been able to “group” by sender/receiver ids, order by created_at, and limit the number per group. However, It’s not quite right. Here’s the query:

SELECT
    filter.id, filter.sender_id, filter.receiver_id, filter.message, filter.created_at
FROM (
    SELECT messages.*,
        rank() OVER (
            PARTITION BY sender_id
            ORDER BY created_at DESC
        )
    FROM messages
    WHERE messages.sender_id = 1 or messages.receiver_id = 1
) filter WHERE rank <= 50;

 
SELECT    filter.id, filter.sender_id, filter.receiver_id, filter.message, filter.created_atFROM (    SELECT messages.*,        rank() OVER (            PARTITION BY sender_id            ORDER BY created_at DESC        )    FROM messages    WHERE messages.sender_id = 1 or messages.receiver_id = 1) filter WHERE rank <= 50;​​

My result set looks like this:

| id   | sender_id | receiver_id | message | created_at |
|------|-----------|-------------|---------|------------|
| 1    | 1         | 2           | text    | 400        |
| 3    | 1         | 2           | text    | 300        |
| 4    | 1         | 3           | text    | 700        |
| 2    | 2         | 1           | text    | 300        |
| 7    | 2         | 1           | text    | 100        |
| 6    | 3         | 1           | text    | 500        |
| 5    | 3         | 2           | text    | 300        |

 
| id   | sender_id | receiver_id | message | created_at ||------|-----------|-------------|---------|------------|| 1    | 1         | 2           | text    | 400        || 3    | 1         | 2           | text    | 300        || 4    | 1         | 3           | text    | 700        || 2    | 2         | 1           | text    | 300        || 7    | 2         | 1           | text    | 100        || 6    | 3         | 1           | text    | 500        || 5    | 3         | 2           | text    | 300        |​

You can see that row 3 and 6 should be grouped but aren’t.

Answer

You can use rank(). To limit the number of records per conversation (ie sender/receiver or receiver/sender tuple), you can use a partition like least(sender_id, receiver_id), greatest(sender_id, receiver_id):

select filter.id, filter.sender_id, filter.receiver_id, filter.message, filter.created_at
from (
    select
        t.*,
        rank() over(
            partition by least(sender_id, receiver_id), greatest(sender_id, receiver_id)
            order by created_at desc
        ) rn
    from mytable t
) t
where rn <= 50
order by least(sender_id, receiver_id), greatest(sender_id, receiver_id), rn

 
select filter.id, filter.sender_id, filter.receiver_id, filter.message, filter.created_atfrom (    select        t.*,        rank() over(            partition by least(sender_id, receiver_id), greatest(sender_id, receiver_id)            order by created_at desc        ) rn    from mytable t) twhere rn <= 50order by least(sender_id, receiver_id), greatest(sender_id, receiver_id), rn​

Advertisement

Answer