Skip to content
Advertisement

ORDER BY DESC with LIMIT doesn’t return the newest rows, unless I ORDER BY twice

I am writing a chat app and want to get a list of recent contacts to show in the sidebar.

My table has the standard chat fields (sender, receiver, msg, date).

I’m currently getting a list of recent contacts by getting a list of messages with distinct (sender, receiver) pair:

select * from (
  select distinct on (sender, receiver) *
  from (
    select *
    from chats
    where sender = login or receiver = login
    order by date desc
  ) as Q1
  limit 30
) as Q2
order by date desc

The result returned is wrong as it doesn’t contain the newest messages.

I found that I could fix it by flipping the first order by to order by date asc. I only need to flip the first asc and not the second.

Does anyone know why it works?

My only guess is that DISTINCT ON is filtering from the bottom of the list rather than from the top.

Advertisement

Answer

This would achieve your objective:

SELECT *
FROM  (
   SELECT DISTINCT ON (sender, receiver) *
   FROM   chats
   WHERE  login IN (sender, receiver)
   ORDER  BY sender, receiver, date DESC  -- plus tiebreaker column?
   ) sub
ORDER  BY date DESC, sender, receiver
LIMIT  30

See:

This picks the latest row per (sender, receiver) in the subquery sub. The sort order has to agree with DISTINCT ON, so the result is sorted by sender, receiver first and the latest rows overall are not necessarily on top.
Hence we need a 2nd ORDER BY in the outer SELECT. To get deterministic results in case of multiple pairs having the same latest date, append sender, receiver as tiebreaker (which are guaranteed to be unique at this stage of the query).

Remaining possible problems:

  • If login can be NULL, the query breaks.

  • If there can be multiple rows with the same latest date per (sender, receiver), an arbitrary row from this set is selected. Add another column to the inner ORDER BY as deterministic tiebreaker. Like the PK.

  • If date can be NULL, the descending sort order breaks. Fix with NULLS LAST in both instances. See:

Depending on undisclosed data distribution, there may be (much) faster query styles. See:

User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement