Skip to content
Advertisement

PostgreSQL limit by variable value in a CTE

I am trying to fetch some results from the database using two SQL select queries, I am using CTE, but since I have millions of rows I need to limit the results, I want to limit results to 20, what I tried is:

Aggregate functions don’t seem to work on limit, moreover, even if they do, I would have a problem if 20-count() returns a negative number, the Database engine will throw an exception so I have to use max(0,20-count()) or something like that. Could somebody help with this?

You may argue I could use a single select query, I think I can’t because one query fetched the new events that were added since I started processing them, so to avoid overcomplicating the client(consumer), the consumer only keeps track of two integers the max and min id processed, that means the results I get every time should have contiguous Ids so there is no gap and missing rows, for that matter I need to make the first query to order by id ascending order, and for the second query since I am interested in the earliest events, I need to order it by id descending order.

Update

This is the sample data:

I assume the following starting point, I started when there was only five rows, the first time the client fetched the rows with:

he got events with id 5,3. While processing a producer pushed more events, now the client wants to get 3 events that have an id either > 5 or < 4, so he does:

which fails.

Problem aside

For the curious, to understand why the gap may happen if we use a single query, add few more entries:

Then query with one single select statement:

We will get now, events with id 10,9,8.

what happens? well, the client would have processed now the following ranges:

  • Events with id from 5 to 3.
  • Events with id from 10 to 8

so if we have a client very simple(as I want it to be), since it keeps track of only two integers, the max id and the min id processed, it would think he processed all ids from 10 to 3, which is not the case! we would have missed events with id 7 and 6. To overcome this either I have to make a complex client remembering ranges rather than two simple integers, or use what I said.

Update2

Somebody known as jer_s in the postgresql slack channel gave me the solution, I just have to add another CTE query that would only contain the row count, then use that value for the limit:

This would return events with id 7,6,2 so the events being processed would be contiguous, from 7 to 2!

Advertisement

Answer

I really don’t follow what you are trying to do and why simpler version do not work.

However, I can see the superficial problem you are having with this code:

The simple solution is to use window functions:

Note that you need to also list all the columns explicitly to handle the seqnum column.

User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement