I have a table we’ll call
service, toy version is:
CREATE TABLE service ( service_id SERIAL PRIMARY KEY, client_id INTEGER NOT NULL REFERENCES client (client_id), service_date TIMESTAMP(0) NOT NULL CHECK (service_date::DATE <= CURRENT_DATE), -- system fields added_by INTEGER NOT NULL REFERENCES staff (staff_id), added_at TIMESTAMP(0) NOT NULL DEFAULT CURRENT_TIMESTAMP, changed_by INTEGER NOT NULL REFERENCES staff (staff_id), changed_at TIMESTAMP(0) NOT NULL DEFAULT CURRENT_TIMESTAMP, is_deleted BOOLEAN NOT NULL DEFAULT FALSE )
The last five system fields are for database bookkeeping, and to enable soft (front end) deletion.
There’s another table
service_log which, together with
service allows us to view a full revision history for each record in
service. Fields of
service_log are the same as
service, plus a few other fields that facilitate the revision history. On update or insert to
service, the state of the data is recorded as an insert to
service_log.service_id is not a primary key nor is it unique, since each service record may have multiple revisions.
Our database has security features preventing hard (database-level) deletion of records. However, while investigating a bug I wanted to rule out the possibility that there may have been records hard-deleted from
service but with evidence still present in
service_log. The following query looking for
service_log records without
service_id present in
service returns ~800 records (about 0.5% of all services since 2020):
WITH services_in_context AS ( SELECT service_id, lag(service_id) OVER (ORDER BY service_id) AS prev_id FROM service WHERE service_date::DATE >= '1/1/2020'::DATE ) SELECT DISTINCT log.service_id FROM service_log log INNER JOIN services_in_context s ON log.service_id BETWEEN s.prev_id + 1 AND s.service_id - 1 ORDER BY log.service_id DESC;
However, the anomaly goes away when I restructure the query to use
GENERATE_SERIES(). And the “missing” rows from
service are actually there when I query them individually.
All this leads me to believe that
lag(service_id) OVER (ORDER BY service_id) is skipping some records in the table. Which makes me think maybe the index postgres created for the primary key is corrupt? Is this the likely culprit, and if so what is the best way to fix it? Is there potentially a different reason
lag() is missing some primary keys?
What is reason for the
WHERE statement and what is the
service_date for the records returned? Your
WHERE statement can easily create a gap.
WITH services_in_context AS ( SELECT service_id, lag(service_id) OVER (ORDER BY service_id) AS prev_id FROM service ) SELECT DISTINCT log.service_id FROM service_log log INNER JOIN services_in_context s ON log.service_id BETWEEN s.prev_id + 1 AND s.service_id - 1 ORDER BY log.service_id DESC;
Your requirement can also be more easily satisfied by a simple
SELECT sl.* FROM service_log sl LEFT JOIN service s ON s.service_id = sl.service_id WHERE s.service_id IS NULL