PostgreSQL query which return metrics joining different tables need to be more efficient and fast

Question

I have a query that can be seen on this fiddle and I&#8217;m relatively new to PSQL, having a few months of experience. DB Fiddle The link to the third party is to show the result of it otherwise would be difficult to visualize it here. This query collects and returns some metrics joining 3 tables about candi…

Accepted Answer

First using LEAD to eliminate an extra join. Plus a cte to precompute max id can probably also helpWITH sa as(   select candidate_id, type   from (        SELECT candidate_id, type, row_number() OVER (partition by candidate_id ORDER BY id desc) rn    FROM public.statuses    ) t   where rn = 1 and type != 'ANONYMISED'), statuses_flow AS (    SELECT        c.id,        c.study_id,        c.site_id,        s.type AS status_type,        s.timestamp AS status_from,        sa.type,        lead(s.timestamp) OVER (partition by c.id ORDER BY s.candidate_id,            s.timestamp) AS statusTo    FROM        public.candidates c        JOIN public.statuses s ON s.candidate_id = c.id        JOIN  sa ON sa.candidate_id = c.id    WHERE        c.study_id in('INIT1')        AND c.site_id in('Test1')        AND sa.type != 'ANONYMISED' )SELECT    statuses_flow.id, statuses_flow.study_id AS "studyId",     statuses_flow.site_id AS "siteId", statuses_flow.status_type AS "statusType",     statuses_flow.status_from AS "statusFrom",     statuses_flow.statusTo,     CASE WHEN statuses_flow.statusTo IS NULL THEN        NULL    ELSE        (        SELECT            created_at AS first_contact        FROM            public.activities        WHERE            candidate_id = statuses_flow.id            AND TYPE in('PHONE', 'SMS', 'EMAIL')            AND created_at BETWEEN statuses_flow.status_from            AND statuses_flow.statusTo ORDER BY                created_at FETCH FIRST 1 ROWS ONLY)    END AS "first_contact"FROM    statuses_flowWHERE    statuses_flow.status_type in('PENDING_SITE', 'PENDING_CALLCENTER', 'INCOMPLETE', 'REJECTED_CALLCENTER', 'REJECTED_SITE', 'CONSENTED')ORDER BY    statuses_flow.id,    statuses_flow.status_fromdb<>fiddle , with comparing the results.

id	studyId	siteId	statusType	statusFrom	statusTo	first_contact
1	Study1	Site1	INCOMPLETE	2021-07-20 09:30:52.101055+00	2021-07-20 09:31:53.568346+00	NULL
1	Study1	Site1	PENDING_CALLCENTER	2021-07-20 09:31:53.568346+00	2021-07-20 09:35:34.171876+00	2021-07-20 09:31:55.849+00
1	Study1	Site1	PENDING_SITE	2021-07-20 09:35:34.171876+00	2021-07-20 09:52:42.185163+00	2021-07-20 09:35:56.642+00
1	Study1	Site1	REJECTED_SITE	2021-07-20 09:53:08.874271+00	NULL	NULL

Advertisement

Answer