Skip to content
Advertisement

Strange behavior from lag(), skipping over certain rows

I have a table we’ll call service, toy version is:

The last five system fields are for database bookkeeping, and to enable soft (front end) deletion.

There’s another table service_log which, together with service allows us to view a full revision history for each record in service. Fields of service_log are the same as service, plus a few other fields that facilitate the revision history. On update or insert to service, the state of the data is recorded as an insert to service_log. Obviously service_log.service_id is not a primary key nor is it unique, since each service record may have multiple revisions.

Our database has security features preventing hard (database-level) deletion of records. However, while investigating a bug I wanted to rule out the possibility that there may have been records hard-deleted from service but with evidence still present in service_log. The following query looking for service_log records without service_id present in service returns ~800 records (about 0.5% of all services since 2020):

However, the anomaly goes away when I restructure the query to use GENERATE_SERIES(). And the “missing” rows from service are actually there when I query them individually.

All this leads me to believe that lag(service_id) OVER (ORDER BY service_id) is skipping some records in the table. Which makes me think maybe the index postgres created for the primary key is corrupt? Is this the likely culprit, and if so what is the best way to fix it? Is there potentially a different reason lag() is missing some primary keys?

Advertisement

Answer

What is reason for the WHERE statement and what is the service_date for the records returned? Your WHERE statement can easily create a gap.

Your requirement can also be more easily satisfied by a simple LEFT JOIN.

User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement