Skip to content
Advertisement

PostgreSQL query which return metrics joining different tables need to be more efficient and fast

I have a query that can be seen on this fiddle and I’m relatively new to PSQL, having a few months of experience.

DB Fiddle

The link to the third party is to show the result of it otherwise would be difficult to visualize it here.

This query collects and returns some metrics joining 3 tables about candidates being in a specific status during a medical study process.

The query in the real life is very slow and needs some adjustments to be written better.

The indexes are already applied on the original DB but still would be useful to find a better way.

The section of the query which I’m not convinced at this point is lines 29 to 42 from the DB fiddle link above.

I was wondering if by any chance could be changed to use a left join or another method to make a better performance.

It is returning per every candidate something as for example for one candidate the table below

id studyId siteId statusType statusFrom statusTo first_contact
1 Study1 Site1 INCOMPLETE 2021-07-20 09:30:52.101055+00 2021-07-20 09:31:53.568346+00 NULL
1 Study1 Site1 PENDING_CALLCENTER 2021-07-20 09:31:53.568346+00 2021-07-20 09:35:34.171876+00 2021-07-20 09:31:55.849+00
1 Study1 Site1 PENDING_SITE 2021-07-20 09:35:34.171876+00 2021-07-20 09:52:42.185163+00 2021-07-20 09:35:56.642+00
1 Study1 Site1 REJECTED_SITE 2021-07-20 09:53:08.874271+00 NULL NULL

To explain, the same candidate, in a range of time can be in different statuses and for example, Candidate 1 (for simplicity C1) in the table above, was in 4 statuses.

C1 stayed in incomplete status from/to and on this status period was not contacted so that why we have null on last column first_contact.

C1 was contacted in the next 2 statuses and we determine the first_contact checking from the activities table the minimum activity timestamps of the types PHONE, EMAIL, and SMS in the range of time C1 was in that specific status from/to. The first_contact is the first time a candidate was contacted by EMAIL PHONE or SMS.

On the last status, we have statusTo = null so also firstContact is null.

The result I’m getting now is correct but seems not efficient to me and probably can get better but have no confidence in the way to manage the change in this situation.

Advertisement

Answer

First using LEAD to eliminate an extra join. Plus a cte to precompute max id can probably also help

db<>fiddle , with comparing the results.

User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement