I’m creating a messaging app as a side project and I’m trying to query a user’s conversations efficiently.
The messages
table structure is basic right now with some dummy data:
| id | sender_id | receiver_id | message | created_at |
|------|-----------|-------------|---------|------------|
| 1 | 1 | 2 | text | time |
| 2 | 2 | 1 | text | time |
| 3 | 1 | 2 | text | time |
| 4 | 1 | 3 | text | time |
| 5 | 3 | 2 | text | time |
| 6 | 3 | 1 | text | time |
| 7 | 2 | 1 | text | time |
I’d like to be able to query the DB and group by “conversation” – A.K.A any rows that have the same sender_id
or receiver_id
in either column – rows (1, 2, 3, 7), (4, 6), (5). I’d like to be able to limit each “group” to n rows and order them by the created_at
column. It would ideally look like (created_at values are arbitrary numbers to show descending values):
| id | sender_id | receiver_id | message | created_at |
|------|-----------|-------------|---------|------------|
| 1 | 1 | 2 | text | 400 |
| 2 | 2 | 1 | text | 300 |
| 3 | 1 | 2 | text | 200 |
| 7 | 2 | 1 | text | 100 |
| 4 | 1 | 3 | text | 700 |
| 6 | 3 | 1 | text | 500 |
| 5 | 3 | 2 | text | 300 |
Ideally there would be an additional column added that would number each group (easy to create multi-dimensional array.
So far I’ve been able to “group” by sender/receiver ids, order by created_at
, and limit the number per group. However, It’s not quite right. Here’s the query:
SELECT
filter.id, filter.sender_id, filter.receiver_id, filter.message, filter.created_at
FROM (
SELECT messages.*,
rank() OVER (
PARTITION BY sender_id
ORDER BY created_at DESC
)
FROM messages
WHERE messages.sender_id = 1 or messages.receiver_id = 1
) filter WHERE rank <= 50;
My result set looks like this:
| id | sender_id | receiver_id | message | created_at |
|------|-----------|-------------|---------|------------|
| 1 | 1 | 2 | text | 400 |
| 3 | 1 | 2 | text | 300 |
| 4 | 1 | 3 | text | 700 |
| 2 | 2 | 1 | text | 300 |
| 7 | 2 | 1 | text | 100 |
| 6 | 3 | 1 | text | 500 |
| 5 | 3 | 2 | text | 300 |
You can see that row 3 and 6 should be grouped but aren’t.
Advertisement
Answer
You can use rank()
. To limit the number of records per conversation (ie sender/receiver or receiver/sender tuple), you can use a partition like least(sender_id, receiver_id), greatest(sender_id, receiver_id)
:
select filter.id, filter.sender_id, filter.receiver_id, filter.message, filter.created_at
from (
select
t.*,
rank() over(
partition by least(sender_id, receiver_id), greatest(sender_id, receiver_id)
order by created_at desc
) rn
from mytable t
) t
where rn <= 50
order by least(sender_id, receiver_id), greatest(sender_id, receiver_id), rn