Skip to content
Advertisement

Select distinct very slow

I have a table where I store rows with external ids. Quite often I need to select latest timestamp for given external ids. Now it is a bottleneck for my app

Query:

SELECT DISTINCT ON ("T1"."external_id") "T1"."external_id", "T1"."timestamp" 
FROM "T1" 
WHERE "T1"."external_id" IN ('825889935', '825904511')
ORDER BY "T1"."external_id" ASC, "T1"."timestamp" DESC

Explain:

Unique  (cost=169123.13..169123.19 rows=12 width=18) (actual time=1327.443..1334.118 rows=2 loops=1)
   ->  Sort  (cost=169123.13..169123.16 rows=12 width=18) (actual time=1327.441..1334.112 rows=2 loops=1)
         Sort Key: external_id, timestamp DESC
         Sort Method: quicksort  Memory: 25kB
         ->  Gather  (cost=1000.00..169122.91 rows=12 width=18) (actual time=752.577..1334.056 rows=2 loops=1)
               Workers Planned: 2
               Workers Launched: 2
               ->  Parallel Seq Scan on T1  (cost=0.00..168121.71 rows=5 width=18) (actual time=921.649..1300.556 rows=1 loops=3)
                     Filter: ((external_id)::text = ANY ('{825889935,825904511}'::text[]))
                     Rows Removed by Filter: 1168882
 Planning Time: 0.592 ms
 Execution Time: 1334.159 ms

What could I do to make this query faster? Or probably should I use completely different query?

UPDATE:

Added new query plan as asked @jahrl. It looks like query is faster but previous query plan was made under the load and now it works similar time

Finalize GroupAggregate  (cost=169121.80..169123.21 rows=12 width=18) (actual time=321.009..322.410 rows=2 loops=1)
   Group Key: external_id
   ->  Gather Merge  (cost=169121.80..169123.04 rows=10 width=18) (actual time=321.003..322.403 rows=2 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Partial GroupAggregate  (cost=168121.77..168121.86 rows=5 width=18) (actual time=318.671..318.672 rows=1 loops=3)
               Group Key: external_id
               ->  Sort  (cost=168121.77..168121.78 rows=5 width=18) (actual time=318.664..318.665 rows=1 loops=3)
                     Sort Key: external_id
                     Sort Method: quicksort  Memory: 25kB
                     Worker 0:  Sort Method: quicksort  Memory: 25kB
                     Worker 1:  Sort Method: quicksort  Memory: 25kB
                     ->  Parallel Seq Scan on T1  (cost=0.00..168121.71 rows=5 width=18) (actual time=144.338..318.611 rows=1 loops=3)
                           Filter: ((external_id)::text = ANY ('{825889935,825904511}'::text[]))
                           Rows Removed by Filter: 1170827
 Planning Time: 0.093 ms
 Execution Time: 322.441 ms

Advertisement

Answer

Perhaps a basic GROUP BY query will perform better?

SELECT "T1"."external_id", MAX("T1"."timestamp") as "timestamp"
FROM "T1" 
WHERE "T1"."external_id" IN ('825889935', '825904511')
GROUP BY "T1"."external_id"
ORDER BY "T1"."external_id" ASC

And, as @melcher said, don’t forget an (“external_id”, “timestamp”) index!

User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement