PostgreSQL limit by variable value in a CTE

I am trying to fetch some results from the database using two SQL select queries, I am using CTE, but since I have millions of rows I need to limit the results, I want to limit results to 20, what I tried is:

with fetch_new_events as 
(
select * from events e where id > @las_max_id_processed 
order by e.id
    limit 20), 
fetch_old_events as (
 select * from events e where id < @las_min_id_processed 
    order by e.id desc
    limit 20-count(fetch_new_events.id))
select * from fetch_new_events
union select * from fetch_old_events

Aggregate functions don’t seem to work on limit, moreover, even if they do, I would have a problem if 20-count() returns a negative number, the Database engine will throw an exception so I have to use max(0,20-count()) or something like that. Could somebody help with this?

You may argue I could use a single select query, I think I can’t because one query fetched the new events that were added since I started processing them, so to avoid overcomplicating the client(consumer), the consumer only keeps track of two integers the max and min id processed, that means the results I get every time should have contiguous Ids so there is no gap and missing rows, for that matter I need to make the first query to order by id ascending order, and for the second query since I am interested in the earliest events, I need to order it by id descending order.

Update

This is the sample data:

create table events
(
    id serial not null
        constraint events_pk
            primary key,
    name varchar
);
insert into public.events (id, name) values (7, 'event7');
insert into public.events (id, name) values (6, 'event6');
insert into public.events (id, name) values (5, 'event5');
insert into public.events (id, name) values (4, 'event4');
insert into public.events (id, name) values (3, 'event3');
insert into public.events (id, name) values (2, 'event2');
insert into public.events (id, name) values (1, 'event1');

I assume the following starting point, I started when there was only five rows, the first time the client fetched the rows with:

select * from events e
    order by e.id desc
    limit 3

he got events with id 5,3. While processing a producer pushed more events, now the client wants to get 3 events that have an id either > 5 or < 4, so he does:

with fetch_new_events as
(
select * from events e where id > 5
order by e.id
    limit 3),
fetch_old_events as (
 select * from events e where id < 3
    order by e.id desc
    limit 3-count(fetch_new_events.id))
select * from fetch_new_events
union select * from fetch_old_events;

which fails.

Problem aside

For the curious, to understand why the gap may happen if we use a single query, add few more entries:

insert into public.events (id, name) values (8, 'event8');
insert into public.events (id, name) values (9, 'event9');
insert into public.events (id, name) values (10, 'event10');

Then query with one single select statement:

select * from events e
where e.id > 5 or e.id < 3
order by e.id desc
limit 3

We will get now, events with id 10,9,8.

what happens? well, the client would have processed now the following ranges:

Events with id from 5 to 3.
Events with id from 10 to 8

so if we have a client very simple(as I want it to be), since it keeps track of only two integers, the max id and the min id processed, it would think he processed all ids from 10 to 3, which is not the case! we would have missed events with id 7 and 6. To overcome this either I have to make a complex client remembering ranges rather than two simple integers, or use what I said.

Update2

Somebody known as jer_s in the postgresql slack channel gave me the solution, I just have to add another CTE query that would only contain the row count, then use that value for the limit:

with new_events as (
    select * from events e
    where e.id > 5
    order by e.id
    limit 3
    ), new_row_count as (
        select count('*') as row_count from new_events
    ), old_events as (
        select * from events e
        where e.id < 3
        order by e.id desc
        limit 3-(select row_count from new_row_count)
    )
select * from new_events
union
select *
from old_events;

This would return events with id 7,6,2 so the events being processed would be contiguous, from 7 to 2!

Answer

I really don’t follow what you are trying to do and why simpler version do not work.

However, I can see the superficial problem you are having with this code:

limit 20-count(fetch_new_events.id)

The simple solution is to use window functions:

with fetch_new_events as (
      select *
      from events e
      where id > @las_max_id_processed 
      order by e.id
      limit 20
     ), 
     fetch_old_events as (
      select e.*
      from (select e.*, row_number() over (order by e.id desc) as seqnum
            from events e
            where id < @las_min_id_processed 
           ) e
      where seqnum <= 20 - (select count(*) from fetch_new_events)
     )
select . . .   -- list the columns you want here
from fetch_new_events
union
select . . .   -- list the columns you want here
from fetch_old_events;

Note that you need to also list all the columns explicitly to handle the seqnum column.

Update

Problem aside

Update2

Advertisement

Answer