I have a dates
CTE in my below query where I am using limit
clause which I don’t want to use it. I am trying to understand on how to rewrite my dates
CTE so that I can avoid using limit 8
query.
WITH dates AS ( SELECT (date_trunc('week', getdate() + INTERVAL '1 day')::date - 7 * (row_number() over (order by true) - 1) - INTERVAL '1 day')::date AS week_column FROM dimensions.customer LIMIT 8 ) SELECT dates.week_column, 'W' || ceiling(date_part('week', dates.week_column + INTERVAL '1 day')) AS week_number, COUNT(DISTINCT features.client_id) AS total FROM dimensions.program features JOIN dates ON features.last_update <= dates.week_column WHERE features.type = 'capacity' AND features.status = 'CURRENT' GROUP BY dates.week_column ORDER by dates.week_column DESC
Below is the output I get from my inner dates
CTE query:
SELECT (date_trunc('week', getdate() + INTERVAL '1 day')::date - 7 * (row_number() over (order by true) - 1) - INTERVAL '1 day')::date AS week_column FROM dimensions.customer LIMIT 8
Output from dates
CTE :
2021-01-10 2021-01-03 2020-12-27 2020-12-20 2020-12-13 2020-12-06 2020-11-29 2020-11-22
Is there any way to avoid using limit 8
in my CTE query and still get same output? Our platform doesn’t allow us to run queries if it has limit
clause in it so trying to see if I can rewrite it differently in sql redshift?
If I modify my dates
CTE query like this, then it gives me error as window function is not allowed in where clause
.
WITH dates AS ( SELECT (date_trunc('week', getdate() + INTERVAL '1 day')::date - 7 * (row_number() over (order by true) - 1) - INTERVAL '1 day')::date AS week_column, ROW_NUMBER() OVER () as seqnum FROM dimensions.customer WHERE seqnum <= 8; ) ....
Update
Something like this you mean?
WITH dates AS ( SELECT (date_trunc('week', getdate() + INTERVAL '1 day')::date - 7 * (row_number() over (order by true) - 1) - INTERVAL '1 day')::date AS week_column, ROW_NUMBER() OVER () as seqnum FROM dimensions.customer ) SELECT dates.week_column, 'W' || ceiling(date_part('week', dates.week_column + INTERVAL '1 day')) AS week_number, COUNT(DISTINCT features.client_id) AS total FROM dimensions.program features JOIN dates ON features.last_update <= dates.week_column WHERE dates.seqnum <= 8 AND features.type = 'capacity' AND features.status = 'CURRENT' GROUP BY dates.week_column ORDER by dates.week_column DESC
Advertisement
Answer
Just move your WHERE clause to the outer SELECT. Seqnum doesn’t exists until the CTE runs but does exist when the result of the CTE is consumed.
UPDATE …
After moving the where clause AndyP got a correlated subquery error coming from a WHERE clause not included in the posted query. As shown in this somewhat modified query:
WITH dates AS ( SELECT (DATE_TRUNC('week',getdate () +INTERVAL '1 day')::DATE- 7*(ROW_NUMBER() OVER (ORDER BY TRUE) - 1) -INTERVAL '1 day')::DATE AS week_of FROM (SELECT 1 AS X UNION ALL SELECT 1 AS X UNION ALL SELECT 1 AS X UNION ALL SELECT 1 AS X UNION ALL SELECT 1 AS X UNION ALL SELECT 1 AS X UNION ALL SELECT 1 AS X UNION ALL SELECT 1 AS X) ) SELECT dates.week_of, 'W' || CEILING(DATE_PART('week',dates.week_of +INTERVAL '1 day')) AS week_number, COUNT(DISTINCT features.id) AS total FROM dimensions.program features JOIN dates ON features.last_update <= dates.week_of WHERE features.version = (SELECT MAX(version) FROM headers f2 WHERE features.id = f2.id AND features.type = f2.type AND f2.last_update <= dates.week_of) AND features.type = 'type' AND features.status = 'live' GROUP BY dates.week_of ORDER BY dates.week_of DESC;
This was an interesting replacement of a correlated query with a join due to the inequality in the correlated sub query. We thought others might be helped by posting the final solution. This works:
WITH dates AS ( SELECT (DATE_TRUNC('week',getdate () +INTERVAL '1 day')::DATE- 7*(ROW_NUMBER() OVER (ORDER BY TRUE) - 1) -INTERVAL '1 day')::DATE AS week_of FROM (SELECT 1 AS X UNION ALL SELECT 1 AS X UNION ALL SELECT 1 AS X UNION ALL SELECT 1 AS X UNION ALL SELECT 1 AS X UNION ALL SELECT 1 AS X UNION ALL SELECT 1 AS X UNION ALL SELECT 1 AS X) ) SELECT dates.week_of, 'W' || CEILING(DATE_PART('week',dates.week_of +INTERVAL '1 day')) AS week_number, COUNT(DISTINCT features.carrier_id) AS total FROM dimensions.program features JOIN dates ON features.last_update <= dates.week_of JOIN (SELECT MAX(MAX(version)) OVER(Partition by id, type Order by dates.weeks_of rows unbounded preceding) AS feature_version, f2.id, f2.type, dates.week_of FROM dimensions.headers f2 JOIN dates ON f2.last_update <= dates.week_of GROUP BY f2.id, f2.type, dates.week_of) f2 ON features.id = f2.id AND features.type = f2.type AND f2.week_of = dates.week_of AND features.version = f2.version WHERE features.type = 'type' AND features.status = 'live' GROUP BY dates.week_of ORDER BY dates.week_of DESC;
Needing to make a data segment that had all the possible Max(version) for all possible week_of values was the key. Hopefully having both of these queries posted will help other fix correlated subquery errors.