I have a query:
SELECT study."id" FROM study JOIN report ON (report."studyId" = study."id") WHERE study.facts->'patientName'->>'value' = 'HELLO WORLD' OR report.variables->'patientName'->>'value' = 'HELLO WORLD'
All tables have indexes.
Why is this query taking 4.5s across 6000 rows? Explain analyze output below:
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | QUERY PLAN | |--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | Hash Join (cost=383.69..1403.67 rows=39 width=52) (actual time=2734.257..2734.260 rows=0 loops=1) | | Hash Cond: ((study.id)::text = (report."studyId")::text) | | Join Filter: ((((study.facts -> 'patientName'::text) ->> 'value'::text) = 'HELLO WORLD'::text) OR (((report.variables -> 'patientName'::text) ->> 'value'::text) = 'HELLO WORLD'::text)) | | Rows Removed by Join Filter: 7453 | | -> Seq Scan on study (cost=0.00..1000.23 rows=7523 width=70) (actual time=0.020..13.548 rows=7523 loops=1) | | -> Hash (cost=290.53..290.53 rows=7453 width=70) (actual time=5.052..5.053 rows=7453 loops=1) | | Buckets: 8192 Batches: 1 Memory Usage: 808kB | | -> Seq Scan on report (cost=0.00..290.53 rows=7453 width=70) (actual time=0.014..3.235 rows=7453 loops=1) | | Planning Time: 0.896 ms | | Execution Time: 2734.323 ms | +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
I have a UNION query that does the same thing, but is much quicker (0.001s). I want to know more about why my JOIN query is so much slower:
SELECT id::text FROM study WHERE study.facts->'patientName'->>'value' = 'HELLO WORLD' UNION SELECT report."studyId"::text FROM report WHERE report.variables->'patientName'->>'value' = 'HELLO WORLD';
+-------------------------------------------------------------------------------------------------------------------------------------------------+ | QUERY PLAN | |-------------------------------------------------------------------------------------------------------------------------------------------------| | HashAggregate (cost=143.12..143.51 rows=39 width=32) (actual time=0.040..0.041 rows=0 loops=1) | | Group Key: ((study.id)::text) | | -> Append (cost=4.58..143.02 rows=39 width=32) (actual time=0.038..0.039 rows=0 loops=1) | | -> Bitmap Heap Scan on study (cost=4.58..134.14 rows=38 width=32) (actual time=0.026..0.026 rows=0 loops=1) | | Recheck Cond: (((facts -> 'patientName'::text) ->> 'value'::text) = 'HELLO WORLD'::text) | | -> Bitmap Index Scan on "IDX_facts_patientName" (cost=0.00..4.57 rows=38 width=0) (actual time=0.023..0.023 rows=0 loops=1) | | Index Cond: (((facts -> 'patientName'::text) ->> 'value'::text) = 'HELLO WORLD'::text) | | -> Index Scan using "IDX_variables_patientName" on report (cost=0.28..8.30 rows=1 width=32) (actual time=0.012..0.012 rows=0 loops=1) | | Index Cond: (((variables -> 'patientName'::text) ->> 'value'::text) = 'HELLO WORLD'::text) | | Planning Time: 0.560 ms | | Execution Time: 0.103 ms | +-------------------------------------------------------------------------------------------------------------------------------------------------+
Advertisement
Answer
OR
is always difficult for the optimiser(s). Since you dont need any fields from the report
table, you could tuck that away into an EXISTS() subquery, this will probably result in a bitmap-index scan.:
SELECT s."id" FROM study s WHERE s.facts->'patientName'->>'value' = 'HELLO WORLD' OR EXISTS ( SELECT * FROM report r WHERE r."studyId" = s."id" AND r.variables->'patientName'->>'value' = 'HELLO WORLD' );