I have a table where I store rows with external ids. Quite often I need to select latest timestamp for given external ids. Now it is a bottleneck for my app
Query:
SELECT DISTINCT ON ("T1"."external_id") "T1"."external_id", "T1"."timestamp" FROM "T1" WHERE "T1"."external_id" IN ('825889935', '825904511') ORDER BY "T1"."external_id" ASC, "T1"."timestamp" DESC
Explain:
Unique (cost=169123.13..169123.19 rows=12 width=18) (actual time=1327.443..1334.118 rows=2 loops=1) -> Sort (cost=169123.13..169123.16 rows=12 width=18) (actual time=1327.441..1334.112 rows=2 loops=1) Sort Key: external_id, timestamp DESC Sort Method: quicksort Memory: 25kB -> Gather (cost=1000.00..169122.91 rows=12 width=18) (actual time=752.577..1334.056 rows=2 loops=1) Workers Planned: 2 Workers Launched: 2 -> Parallel Seq Scan on T1 (cost=0.00..168121.71 rows=5 width=18) (actual time=921.649..1300.556 rows=1 loops=3) Filter: ((external_id)::text = ANY ('{825889935,825904511}'::text[])) Rows Removed by Filter: 1168882 Planning Time: 0.592 ms Execution Time: 1334.159 ms
What could I do to make this query faster? Or probably should I use completely different query?
UPDATE:
Added new query plan as asked @jahrl. It looks like query is faster but previous query plan was made under the load and now it works similar time
Finalize GroupAggregate (cost=169121.80..169123.21 rows=12 width=18) (actual time=321.009..322.410 rows=2 loops=1) Group Key: external_id -> Gather Merge (cost=169121.80..169123.04 rows=10 width=18) (actual time=321.003..322.403 rows=2 loops=1) Workers Planned: 2 Workers Launched: 2 -> Partial GroupAggregate (cost=168121.77..168121.86 rows=5 width=18) (actual time=318.671..318.672 rows=1 loops=3) Group Key: external_id -> Sort (cost=168121.77..168121.78 rows=5 width=18) (actual time=318.664..318.665 rows=1 loops=3) Sort Key: external_id Sort Method: quicksort Memory: 25kB Worker 0: Sort Method: quicksort Memory: 25kB Worker 1: Sort Method: quicksort Memory: 25kB -> Parallel Seq Scan on T1 (cost=0.00..168121.71 rows=5 width=18) (actual time=144.338..318.611 rows=1 loops=3) Filter: ((external_id)::text = ANY ('{825889935,825904511}'::text[])) Rows Removed by Filter: 1170827 Planning Time: 0.093 ms Execution Time: 322.441 ms
Advertisement
Answer
Perhaps a basic GROUP BY
query will perform better?
SELECT "T1"."external_id", MAX("T1"."timestamp") as "timestamp" FROM "T1" WHERE "T1"."external_id" IN ('825889935', '825904511') GROUP BY "T1"."external_id" ORDER BY "T1"."external_id" ASC
And, as @melcher said, don’t forget an (“external_id”, “timestamp”) index!