Skip to content
Advertisement

Window function is not allowed in where clause redshift

I have a dates CTE in my below query where I am using limit clause which I don’t want to use it. I am trying to understand on how to rewrite my dates CTE so that I can avoid using limit 8 query.

WITH dates AS (
    SELECT (date_trunc('week', getdate() + INTERVAL '1 day')::date - 7 * (row_number() over (order by true) - 1) - INTERVAL '1 day')::date AS week_column
    FROM dimensions.customer LIMIT 8
)
SELECT
dates.week_column, 
'W' || ceiling(date_part('week', dates.week_column + INTERVAL '1 day')) AS week_number,
COUNT(DISTINCT features.client_id) AS total
FROM dimensions.program features 
JOIN dates ON features.last_update <= dates.week_column
WHERE features.type = 'capacity'
AND features.status = 'CURRENT'
GROUP BY dates.week_column
ORDER by dates.week_column DESC

Below is the output I get from my inner dates CTE query:

SELECT (date_trunc('week', getdate() + INTERVAL '1 day')::date - 7 * (row_number() over (order by true) - 1) - INTERVAL '1 day')::date AS week_column
FROM dimensions.customer LIMIT 8

Output from dates CTE :

2021-01-10
2021-01-03
2020-12-27
2020-12-20
2020-12-13
2020-12-06
2020-11-29
2020-11-22

Is there any way to avoid using limit 8 in my CTE query and still get same output? Our platform doesn’t allow us to run queries if it has limit clause in it so trying to see if I can rewrite it differently in sql redshift?

If I modify my dates CTE query like this, then it gives me error as window function is not allowed in where clause.

WITH dates AS (
    SELECT (date_trunc('week', getdate() + INTERVAL '1 day')::date - 7 * (row_number() over (order by true) - 1) - INTERVAL '1 day')::date AS week_column,
    ROW_NUMBER() OVER () as seqnum
    FROM dimensions.customer
    WHERE seqnum <= 8;
)
....

Update

Something like this you mean?

WITH dates AS (
    SELECT (date_trunc('week', getdate() + INTERVAL '1 day')::date - 7 * (row_number() over (order by true) - 1) - INTERVAL '1 day')::date AS week_column,
    ROW_NUMBER() OVER () as seqnum
    FROM dimensions.customer
)
SELECT
dates.week_column, 
'W' || ceiling(date_part('week', dates.week_column + INTERVAL '1 day')) AS week_number,
COUNT(DISTINCT features.client_id) AS total
FROM dimensions.program features 
JOIN dates ON features.last_update <= dates.week_column
WHERE dates.seqnum <= 8 
AND features.type = 'capacity'
AND features.status = 'CURRENT'
GROUP BY dates.week_column
ORDER by dates.week_column DESC

Advertisement

Answer

Just move your WHERE clause to the outer SELECT. Seqnum doesn’t exists until the CTE runs but does exist when the result of the CTE is consumed.

UPDATE …

After moving the where clause AndyP got a correlated subquery error coming from a WHERE clause not included in the posted query. As shown in this somewhat modified query:

WITH dates AS
(
  SELECT (DATE_TRUNC('week',getdate () +INTERVAL '1 day')::DATE- 7*(ROW_NUMBER() OVER (ORDER BY TRUE) - 1) -INTERVAL '1 day')::DATE AS week_of
  FROM (SELECT 1 AS X UNION ALL SELECT 1 AS X UNION ALL SELECT 1 AS X UNION ALL SELECT 1 AS X UNION ALL SELECT 1 AS X UNION ALL SELECT 1 AS X UNION ALL SELECT 1 AS X UNION ALL SELECT 1 AS X)
)
SELECT dates.week_of,
       'W' || CEILING(DATE_PART('week',dates.week_of +INTERVAL '1 day')) AS week_number,
       COUNT(DISTINCT features.id) AS total
FROM dimensions.program features
  JOIN dates ON features.last_update <= dates.week_of
WHERE features.version = (SELECT MAX(version)
                                  FROM headers f2
                                  WHERE features.id = f2.id
                                  AND   features.type = f2.type
                                  AND   f2.last_update <= dates.week_of)
AND   features.type = 'type'
AND   features.status = 'live'
GROUP BY dates.week_of
ORDER BY dates.week_of DESC;

This was an interesting replacement of a correlated query with a join due to the inequality in the correlated sub query. We thought others might be helped by posting the final solution. This works:

WITH dates AS
(
  SELECT (DATE_TRUNC('week',getdate () +INTERVAL '1 day')::DATE- 7*(ROW_NUMBER() OVER (ORDER BY TRUE) - 1) -INTERVAL '1 day')::DATE AS week_of
  FROM (SELECT 1 AS X UNION ALL SELECT 1 AS X UNION ALL SELECT 1 AS X UNION ALL SELECT 1 AS X UNION ALL SELECT 1 AS X UNION ALL SELECT 1 AS X UNION ALL SELECT 1 AS X UNION ALL SELECT 1 AS X)
)
SELECT dates.week_of,
       'W' || CEILING(DATE_PART('week',dates.week_of +INTERVAL '1 day')) AS week_number,
       COUNT(DISTINCT features.carrier_id) AS total
FROM dimensions.program features
  JOIN dates ON features.last_update <= dates.week_of
  JOIN (SELECT MAX(MAX(version)) OVER(Partition by id, type Order by dates.weeks_of rows unbounded preceding) AS feature_version,
               f2.id,
               f2.type,
               dates.week_of
        FROM dimensions.headers f2
        JOIN dates ON f2.last_update <= dates.week_of
        GROUP BY f2.id,
                 f2.type,
                 dates.week_of) f2
    ON features.id = f2.id
   AND features.type = f2.type
   AND f2.week_of = dates.week_of
   AND features.version = f2.version
WHERE features.type = 'type'
AND   features.status = 'live'
GROUP BY dates.week_of
ORDER BY dates.week_of DESC;

Needing to make a data segment that had all the possible Max(version) for all possible week_of values was the key. Hopefully having both of these queries posted will help other fix correlated subquery errors.

User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement