I’m creating a messaging app as a side project and I’m trying to query a user’s conversations efficiently.
The messages
table structure is basic right now with some dummy data:
| id | sender_id | receiver_id | message | created_at | |------|-----------|-------------|---------|------------| | 1 | 1 | 2 | text | time | | 2 | 2 | 1 | text | time | | 3 | 1 | 2 | text | time | | 4 | 1 | 3 | text | time | | 5 | 3 | 2 | text | time | | 6 | 3 | 1 | text | time | | 7 | 2 | 1 | text | time |
I’d like to be able to query the DB and group by “conversation” – A.K.A any rows that have the same sender_id
or receiver_id
in either column – rows (1, 2, 3, 7), (4, 6), (5). I’d like to be able to limit each “group” to n rows and order them by the created_at
column. It would ideally look like (created_at values are arbitrary numbers to show descending values):
| id | sender_id | receiver_id | message | created_at | |------|-----------|-------------|---------|------------| | 1 | 1 | 2 | text | 400 | | 2 | 2 | 1 | text | 300 | | 3 | 1 | 2 | text | 200 | | 7 | 2 | 1 | text | 100 | | 4 | 1 | 3 | text | 700 | | 6 | 3 | 1 | text | 500 | | 5 | 3 | 2 | text | 300 |
Ideally there would be an additional column added that would number each group (easy to create multi-dimensional array.
So far I’ve been able to “group” by sender/receiver ids, order by created_at
, and limit the number per group. However, It’s not quite right. Here’s the query:
SELECT filter.id, filter.sender_id, filter.receiver_id, filter.message, filter.created_at FROM ( SELECT messages.*, rank() OVER ( PARTITION BY sender_id ORDER BY created_at DESC ) FROM messages WHERE messages.sender_id = 1 or messages.receiver_id = 1 ) filter WHERE rank <= 50;
My result set looks like this:
| id | sender_id | receiver_id | message | created_at | |------|-----------|-------------|---------|------------| | 1 | 1 | 2 | text | 400 | | 3 | 1 | 2 | text | 300 | | 4 | 1 | 3 | text | 700 | | 2 | 2 | 1 | text | 300 | | 7 | 2 | 1 | text | 100 | | 6 | 3 | 1 | text | 500 | | 5 | 3 | 2 | text | 300 |
You can see that row 3 and 6 should be grouped but aren’t.
Advertisement
Answer
You can use rank()
. To limit the number of records per conversation (ie sender/receiver or receiver/sender tuple), you can use a partition like least(sender_id, receiver_id), greatest(sender_id, receiver_id)
:
select filter.id, filter.sender_id, filter.receiver_id, filter.message, filter.created_at from ( select t.*, rank() over( partition by least(sender_id, receiver_id), greatest(sender_id, receiver_id) order by created_at desc ) rn from mytable t ) t where rn <= 50 order by least(sender_id, receiver_id), greatest(sender_id, receiver_id), rn