I have some data I would like to pull from a database, I’m using RStudio for my query. What I intend to do is write:
The first CTE statement to pull all my necessary information.
The second CTE statement will add two new columns for two row numbers, which are partitioned by different groups. Two additional columns will be added for Lead and Lag values.
The third CTE will produce two more columns where the two columns use nested case_when statements to give me NewOpen and NewClosed dates.
What I have so far:
q5<- sqlQuery(ch,paste(" ;with CTE AS ( select oz.id as AccountID ,ac.PROD_TYPE_CDE as ProductTypeCode ,CASE WHEN ac.OPEN_DTE='0001-01-01' then null else ac.OPEN_DTE END as OpenDate ,CASE WHEN ac.CLOS_DTE = '0001-01-01' then null else ac.CLOS_DTE END as ClosedDate ,df.proc_dte as FullDate FROM dbs.tb_dbs_acct_fact df inner join dbs.tb_acct_details ac on df.dw_serv_id = ac.dw_serv_id left outer join dbs.tb_oz_id oz on df.proc_dte = oz.proc_dte ), cte1 as ( select * ,row_nbr = row_number() over( partition by AccountID order by AccountID, FullDate asc ) ,row_nbr2 = row_number() over( partition by AccountID,ProductTypeCode order by AccountID, FullDate asc ) ,lag(ProductTypeCode) over(partition by AccountID order by FullDate asc ) as Lagging ,LEAD(ProductTypeCode) over(partition by AccountID order order by FullDate asc ) as Leading FROM CTE ), cte2 as (select * ,case when cte1.row_nbr = 1 & cte1.Lagging=cte1.ProductTypeCode then cte1.OpenDate else case when cte1.Lagging<>cte1.ProductTypeCode then cte1.FullDate else NULL END END as NewOpen ,case when cte1.ClosedDate IS NOT NULL then cte1.ClosedDate else case when cte1.Leading <> cte1.ProductTypeCode then cte1.FullDate else NULL END END as NewClosed FROM cte1 );"))
This code, however won’t run.
Advertisement
Answer
As mentioned, WITH
is a statement to define CTEs to be used in a final query. Your query only contains CTE definitions but never actually use any in a final statement. Additionally, you can combine the first two CTEs since window functions can run at any level. Possibly the last CTE can serve as your final SELECT
statement.
sql <- "WITH CTE AS (SELECT oz.id AS AccountID , ac.PROD_TYPE_CDE as ProductTypeCode , CASE WHEN ac.OPEN_DTE='0001-01-01' THEN NULL ELSE ac.OPEN_DTE END AS OpenDate , CASE WHEN ac.CLOS_DTE = '0001-01-01' THEN NULL ELSE ac.CLOS_DTE END AS ClosedDate , df.proc_dte AS FullDate , ROW_NUMBER() OVER (PARTITION BY oz.id ORDER BY oz.id, df.proc_dte) AS row_nbr , ROW_NUMBER() OVER (PARTITION BY oz.id, ac.PROD_TYPE_CDE ORDER BY oz.id, df.proc_dte) AS row_nbr2 , LAG(ac.PROD_TYPE_CDE) OVER (PARTITION BY oz.id ORDER BY df.proc_dte) AS Lagging , LEAD(ac.PROD_TYPE_CDE) OVER (PARTITION BY oz.id ORDER BY df.proc_dte) AS Leading FROM dbs.tb_dbs_acct_fact df INNER JOIN dbs.tb_acct_details ac ON df.dw_serv_id = ac.dw_serv_id LEFT OUTER JOIN dbs.tb_oz_id oz ON df.proc_dte = oz.proc_dte ) SELECT * , CASE WHEN row_nbr = 1 & Lagging = ProductTypeCode THEN OpenDate ELSE CASE WHEN Lagging <> ProductTypeCode THEN FullDate ELSE NULL END END AS NewOpen , CASE WHEN ClosedDate IS NOT NULL THEN ClosedDate ELSE CASE WHEN Leading <> ProductTypeCode THEN FullDate ELSE NULL END END AS NewClosed FROM CTE;" q5 <- sqlQuery(ch, sql)