I have the following query (Postgres) which gives the correct results, but it is very slow. Any ideas how I can make it faster?
SELECT DISTINCT(vesselimo), vesselname, (SELECT COUNT(portid) FROM schedules s2 WHERE s1.vesselimo = s2.vesselimo AND status = 'Deleted') AS noDeleted, (SELECT COUNT(portid) FROM schedules s2 WHERE s1.vesselimo = s2.vesselimo AND status = 'Added') AS noAdded FROM schedules s1 ORDER BY vesselname;
Result:
9303807 "ABIDJAN EXPRESS" "34" "19" 9732319 "AL MASHRAB" "28" "11" 9461867 "APL CHONGQING" "36" "9" 9597549 "APL MIAMI" "92" "70" 9314935 "AS CAROLINA" "75" "46" 9387425 "EMPIRE" "77" "68" 9757187 "MILANO BRIDGE" "16" "0" 9769271 "MOL TRIUMPH" "25" "8" 9337626 "NYK CONSTELLATION" "45" "27" 9388340 "ONE COSMOS" "67" "41" 9485007 "YM MASCULINITY" "46" "23"
Schema:
CREATE TABLE public.schedules ( portcallid bigint NOT NULL DEFAULT nextval('schedules_seq'::regclass), batchid bigint NOT NULL, vesselimo integer, vesselname text COLLATE pg_catalog."default", service text COLLATE pg_catalog."default", serviceseq integer, portid text COLLATE pg_catalog."default", portname text COLLATE pg_catalog."default", arrival timestamp with time zone, departure timestamp with time zone, status text COLLATE pg_catalog."default", last_update timestamp with time zone, CONSTRAINT schedules_pkey PRIMARY KEY (portcallid) )
Thanks, appreciate any advise.
Advertisement
Answer
I would recommend aggregation:
select s.vesselimo, s.vesselname, count(*) filter (where status = 'Deleted') as num_deleted, count(*) filter (where status = 'Added') as num_added from schedules s group by s.vesselimo, s.vesselname; order by vesselname;
Note that the parentheses after DISTINCT
around (vesselimo)
are quite misleading. SELECT DISTINCT
applies to the entire row, not just to some parenthesized expression.
If you are using an old version of Postgres, then it might not support filter
. You can phrase that as:
sum( (status = 'Deleted')::int ) as num_deleted, sum( (status = 'Added')::int ) as num_added