Efficiently implementing pagination in postgres while avoiding duplicates?

Let’s say you’re making a site like reddit and you want to have infinitely scrolling posts (25 at a time, and 25 more loaded when you reach the bottom)

The naive solution uses LIMIT AND OFFSET, but these are not entirely desirable due to performance considerations and also because if a new post is added to a page while you’re on it, when you navigate to the next page, there will be a duplicate.

So solutions instead recommend using a WHERE clause and then sorting by some discrete, unique value. When loading more pages you pass the current value and then the database knows where to go from there.

I would use that, except in my situation these values are not unique. For example, one of the sorting options for posts involves non-unique values. So the former solution wouldn’t work because it would possibly contain duplicates.

One idea I had is to just return 1,000 or so post id’s on the initial page load. Then the client would send the first 25 post ids to the server to retrieve those. If they scrolled down to the bottom then they would send the next 25 to get the post data for those, etc.

The only issue with this is that it is also not perfectly efficient. If the user doesn’t scroll at all it was a waste to send those 1000 post ids.

Is there a proper solution to this? How does one efficiently handle pagination that eliminates duplicates when the sorting option involves non-unique values?

Answer

Just add an unique column as a last column to sort on. If the user wants to sort his books by “author, title”, just change it to “author, title, book_id”.

Advertisement

Answer