I have this below query which takes about 15-20 secs to run.
with cte0 as (
SELECT
label,
date,
CASE
WHEN
Lead(label || date || "number") OVER (PARTITION BY label || date || "number" ORDER BY "label", "date", "number", "time") IS NULL
THEN
'1'::numeric
ELSE
'0'::numeric
END As "unique"
FROM table_data
LEFT JOIN table_mapper ON
table_mapper."type" = table_data."type"
WHERE Date BETWEEN date_trunc('month', current_date - 1) and current_date - 1
)
SELECT 'MTD' as "label", round(sum("unique") / count("unique") *100,1) as "value" FROM cte0 WHERE "date" BETWEEN date_trunc('month', current_date - 1) AND current_date -1
UNION ALL
SELECT 'Week' as "label", round(sum("unique") / count("unique") *100,1) as "value" FROM cte0 WHERE "date" BETWEEN date_trunc('week', current_date - 1) AND current_date -1
UNION ALL
SELECT 'FTD' as "label", round(sum("unique") / count("unique") *100,1) as "value" FROM cte0 WHERE "date" = current_date -1
In the table table_data I have a index on date column.
CREATE INDEX ix_cli_date ON table_data USING btree (date);
Table Definition (d table_data)
Table "public.table_data"
Column | Type | Modifiers
------------------+------------------------+-----------
date | date | not null
number | bigint | not null
time | time without time zone | not null
end time | time without time zone | not null
duration | integer | not null
time1 | integer | not null
time2 | integer | not null
time3 | integer | not null
time4 | integer | not null
time5 | integer | not null
time6 | integer | not null
time7 | integer | not null
type | text | not null
name | text | not null
id1 | integer | not null
id2 | integer | not null
key | integer | not null
status | text | not null
Indexes:
"ix_cli_date" btree (date)
Table Definition (d table_mapper)
Table "public.table_mapper" Column | Type | Modifiers ------------+------+----------- type | text | not null label | text | not null label2 | text | not null label3 | text | not null label4 | text | not null label5 | text | not null
EXPLAIN ANALYZE of the query
Result (cost=184342.66..230332.86 rows=3 width=64) (actual time=23377.923..25695.478 rows=3 loops=1)"
CTE cte0"
-> WindowAgg (cost=121516.06..156751.65 rows=612793 width=23) (actual time=14578.000..18985.958 rows=696157 loops=1)"
-> Sort (cost=121516.06..123048.04 rows=612793 width=23) (actual time=14577.975..17084.405 rows=696157 loops=1)"
Sort Key: (((table_mapper.label || (table_data.date)::text) || (table_data."number")::text)), table_mapper.label, table_data.date, table_data."number", table_data."time""
Sort Method: external merge Disk: 39480kB"
-> Hash Left Join (cost=11.96..37474.21 rows=612793 width=23) (actual time=1.449..3308.718 rows=696157 loops=1)"
Hash Cond: (table_data."type" = table_mapper."type")"
-> Index Scan using ix_cli_date on table_data (cost=0.02..29036.36 rows=612793 width=38) (actual time=0.141..946.648 rows=696157 loops=1)"
Index Cond: ((date >= date_trunc('month'::text, ((('now'::text)::date - 1))::timestamp with time zone)) AND (date Hash (cost=7.53..7.53 rows=353 width=25) (actual time=1.275..1.275 rows=336 loops=1)"
Buckets: 1024 Batches: 1 Memory Usage: 15kB"
-> Seq Scan on table_mapper (cost=0.00..7.53 rows=353 width=25) (actual time=0.020..0.589 rows=336 loops=1)"
-> Append (cost=27591.00..73581.21 rows=3 width=64) (actual time=23377.920..25695.467 rows=3 loops=1)"
-> Aggregate (cost=27591.00..27591.02 rows=1 width=32) (actual time=23377.917..23377.918 rows=1 loops=1)"
-> CTE Scan on cte0 (cost=0.00..27575.68 rows=3064 width=32) (actual time=14578.052..22335.236 rows=696157 loops=1)"
Filter: ((date = date_trunc('month'::text, ((('now'::text)::date - 1))::timestamp with time zone)))"
-> Aggregate (cost=27591.00..27591.02 rows=1 width=32) (actual time=1741.509..1741.510 rows=1 loops=1)"
-> CTE Scan on cte0 (cost=0.00..27575.68 rows=3064 width=32) (actual time=20.009..1522.352 rows=168261 loops=1)"
Filter: ((date = date_trunc('week'::text, ((('now'::text)::date - 1))::timestamp with time zone)))"
-> Aggregate (cost=18399.11..18399.13 rows=1 width=32) (actual time=576.029..576.030 rows=1 loops=1)"
-> CTE Scan on cte0 (cost=0.00..18383.79 rows=3064 width=32) (actual time=9.308..546.735 rows=23486 loops=1)"
Filter: (date = (('now'::text)::date - 1))"
Total runtime: 25710.506 ms"
Description :
I’m taking the unique count and repeated count from the table_data and this where LEAD helped me out where I give the value 0 for the last repeated value of a column.
Suppose I have 3 x in a column. I give 1 value to the first 2 x and the 3rd x is given 0.
Actually through a cte I’m taking the entire rows from the table table_data and doing some calculation using the lead and concatinating the strings for a defined date range where each row 1 and 0 value is defined as per the criteria.
If the lead is null it’ll be counted as 1 and if it is not null then 0.
And the I return 3 rows MTD, Current Week and FTD respectively with a calculation on taking the sum() I got from the lead and the count(*) entire rows.
For MTD I have the sum and count for the current month.
For Week – It’s the current week and FTD is for yesterday.
Advertisement
Answer
WITH cte AS (
SELECT d.thedate
, lead(m.label) OVER (PARTITION BY m.label, d.thedate, d.number
ORDER BY d.thetime) AS leader
FROM table_data d
LEFT JOIN table_mapper m USING (type)
WHERE thedate BETWEEN date_trunc('month', current_date - 1)
AND current_date - 1
)
SELECT 'MTD' AS label, round(count(leader)::numeric / count(*) * 100, 1) AS val
FROM cte
UNION ALL
SELECT 'Week', round(count(leader)::numeric / count(*) * 100, 1)
FROM cte
WHERE thedate BETWEEN date_trunc('week', current_date - 1) AND current_date - 1
UNION ALL
SELECT 'FTD', round(count(leader)::numeric / count(*) * 100, 1)
FROM cte
WHERE thedate = current_date - 1;
The CTE makes sense for big tables, so you only scan it once. For smaller tables it may be faster without …
Using thedate instead of reserved word date (in standard SQL).
thetime, uni instead of time, unique. Etc.
Simplified the lead() call. You get a value or NULL for the leading row. That seems the be the only relevant information.
It’s a pointless waste to repeat columns from the PARTITION clause in the ORDER BY clause of a window function.
Building on that, count(leader) / count(*) instead of sum(uni) / count(uni) is a bit faster. count(column) only counts non-null values, while count(*) counts all rows.
The condition for the first term of the UNION query was redundant.
More advice and links about data definition in the comments to the question.
Table design / Indexes
You should have primary keys. I suggest serial or IDENTITY column as surrogate PK for table_data:
ALTER TABLE table_data ADD COLUMN table_data_id serial PRIMARY KEY;
See:
Make type the primary key of table_mapper (also needed for the following FK constraint):
ALTER TABLE table_mapper ADD CONSTRAINT table_mapper_pkey (type);
Add a foreign key constraint for type to enforce referential integrity. Something like:
ALTER TABLE table_data ADD CONSTRAINT table_data_type_fkey FOREIGN KEY (type) REFERENCES table_mapper (type) ON UPDATE CASCADE ON DELETE NO ACTION;
For ultimate read performance (at some cost for writes), add a multi-column index to possibly allow index-only scans for above query:
CREATE INDEX table_data_foo_idx ON table_data (thedate, number, thetime);