I have a dates
CTE in my below query where I am using limit
clause which I don’t want to use it. I am trying to understand on how to rewrite my dates
CTE so that I can avoid using limit 8
query.
WITH dates AS (
SELECT (date_trunc('week', getdate() + INTERVAL '1 day')::date - 7 * (row_number() over (order by true) - 1) - INTERVAL '1 day')::date AS week_column
FROM dimensions.customer LIMIT 8
)
SELECT
dates.week_column,
'W' || ceiling(date_part('week', dates.week_column + INTERVAL '1 day')) AS week_number,
COUNT(DISTINCT features.client_id) AS total
FROM dimensions.program features
JOIN dates ON features.last_update <= dates.week_column
WHERE features.type = 'capacity'
AND features.status = 'CURRENT'
GROUP BY dates.week_column
ORDER by dates.week_column DESC
Below is the output I get from my inner dates
CTE query:
SELECT (date_trunc('week', getdate() + INTERVAL '1 day')::date - 7 * (row_number() over (order by true) - 1) - INTERVAL '1 day')::date AS week_column
FROM dimensions.customer LIMIT 8
Output from dates
CTE :
2021-01-10
2021-01-03
2020-12-27
2020-12-20
2020-12-13
2020-12-06
2020-11-29
2020-11-22
Is there any way to avoid using limit 8
in my CTE query and still get same output? Our platform doesn’t allow us to run queries if it has limit
clause in it so trying to see if I can rewrite it differently in sql redshift?
If I modify my dates
CTE query like this, then it gives me error as window function is not allowed in where clause
.
WITH dates AS (
SELECT (date_trunc('week', getdate() + INTERVAL '1 day')::date - 7 * (row_number() over (order by true) - 1) - INTERVAL '1 day')::date AS week_column,
ROW_NUMBER() OVER () as seqnum
FROM dimensions.customer
WHERE seqnum <= 8;
)
.
Update
Something like this you mean?
WITH dates AS (
SELECT (date_trunc('week', getdate() + INTERVAL '1 day')::date - 7 * (row_number() over (order by true) - 1) - INTERVAL '1 day')::date AS week_column,
ROW_NUMBER() OVER () as seqnum
FROM dimensions.customer
)
SELECT
dates.week_column,
'W' || ceiling(date_part('week', dates.week_column + INTERVAL '1 day')) AS week_number,
COUNT(DISTINCT features.client_id) AS total
FROM dimensions.program features
JOIN dates ON features.last_update <= dates.week_column
WHERE dates.seqnum <= 8
AND features.type = 'capacity'
AND features.status = 'CURRENT'
GROUP BY dates.week_column
ORDER by dates.week_column DESC
Advertisement
Answer
Just move your WHERE clause to the outer SELECT. Seqnum doesn’t exists until the CTE runs but does exist when the result of the CTE is consumed.
UPDATE …
After moving the where clause AndyP got a correlated subquery error coming from a WHERE clause not included in the posted query. As shown in this somewhat modified query:
WITH dates AS
(
SELECT (DATE_TRUNC('week',getdate () +INTERVAL '1 day')::DATE- 7*(ROW_NUMBER() OVER (ORDER BY TRUE) - 1) -INTERVAL '1 day')::DATE AS week_of
FROM (SELECT 1 AS X UNION ALL SELECT 1 AS X UNION ALL SELECT 1 AS X UNION ALL SELECT 1 AS X UNION ALL SELECT 1 AS X UNION ALL SELECT 1 AS X UNION ALL SELECT 1 AS X UNION ALL SELECT 1 AS X)
)
SELECT dates.week_of,
'W' || CEILING(DATE_PART('week',dates.week_of +INTERVAL '1 day')) AS week_number,
COUNT(DISTINCT features.id) AS total
FROM dimensions.program features
JOIN dates ON features.last_update <= dates.week_of
WHERE features.version = (SELECT MAX(version)
FROM headers f2
WHERE features.id = f2.id
AND features.type = f2.type
AND f2.last_update <= dates.week_of)
AND features.type = 'type'
AND features.status = 'live'
GROUP BY dates.week_of
ORDER BY dates.week_of DESC;
This was an interesting replacement of a correlated query with a join due to the inequality in the correlated sub query. We thought others might be helped by posting the final solution. This works:
WITH dates AS
(
SELECT (DATE_TRUNC('week',getdate () +INTERVAL '1 day')::DATE- 7*(ROW_NUMBER() OVER (ORDER BY TRUE) - 1) -INTERVAL '1 day')::DATE AS week_of
FROM (SELECT 1 AS X UNION ALL SELECT 1 AS X UNION ALL SELECT 1 AS X UNION ALL SELECT 1 AS X UNION ALL SELECT 1 AS X UNION ALL SELECT 1 AS X UNION ALL SELECT 1 AS X UNION ALL SELECT 1 AS X)
)
SELECT dates.week_of,
'W' || CEILING(DATE_PART('week',dates.week_of +INTERVAL '1 day')) AS week_number,
COUNT(DISTINCT features.carrier_id) AS total
FROM dimensions.program features
JOIN dates ON features.last_update <= dates.week_of
JOIN (SELECT MAX(MAX(version)) OVER(Partition by id, type Order by dates.weeks_of rows unbounded preceding) AS feature_version,
f2.id,
f2.type,
dates.week_of
FROM dimensions.headers f2
JOIN dates ON f2.last_update <= dates.week_of
GROUP BY f2.id,
f2.type,
dates.week_of) f2
ON features.id = f2.id
AND features.type = f2.type
AND f2.week_of = dates.week_of
AND features.version = f2.version
WHERE features.type = 'type'
AND features.status = 'live'
GROUP BY dates.week_of
ORDER BY dates.week_of DESC;
Needing to make a data segment that had all the possible Max(version) for all possible week_of values was the key. Hopefully having both of these queries posted will help other fix correlated subquery errors.