I have some data I would like to pull from a database, I’m using RStudio for my query. What I intend to do is write:
The first CTE statement to pull all my necessary information.
The second CTE statement will add two new columns for two row numbers, which are partitioned by different groups. Two additional columns will be added for Lead and Lag values.
The third CTE will produce two more columns where the two columns use nested case_when statements to give me NewOpen and NewClosed dates.
What I have so far:
q5<- sqlQuery(ch,paste("
;with CTE AS
(
select
oz.id as AccountID
,ac.PROD_TYPE_CDE as ProductTypeCode
,CASE WHEN ac.OPEN_DTE='0001-01-01' then null else ac.OPEN_DTE END as OpenDate
,CASE WHEN ac.CLOS_DTE = '0001-01-01' then null else ac.CLOS_DTE END as ClosedDate
,df.proc_dte as FullDate
FROM
dbs.tb_dbs_acct_fact df
inner join
dbs.tb_acct_details ac on df.dw_serv_id = ac.dw_serv_id
left outer join
dbs.tb_oz_id oz on df.proc_dte = oz.proc_dte
),
cte1 as
(
select *
,row_nbr = row_number() over( partition by AccountID order by AccountID, FullDate asc )
,row_nbr2 = row_number() over( partition by AccountID,ProductTypeCode order by AccountID, FullDate asc )
,lag(ProductTypeCode) over(partition by AccountID order by FullDate asc ) as Lagging
,LEAD(ProductTypeCode) over(partition by AccountID order order by FullDate asc ) as Leading
FROM CTE
),
cte2 as (select *
,case when cte1.row_nbr = 1 & cte1.Lagging=cte1.ProductTypeCode then cte1.OpenDate else
case when cte1.Lagging<>cte1.ProductTypeCode then cte1.FullDate else NULL END END as NewOpen
,case when cte1.ClosedDate IS NOT NULL then cte1.ClosedDate else
case when cte1.Leading <> cte1.ProductTypeCode then cte1.FullDate else NULL END END as NewClosed
FROM cte1
);"))
This code, however won’t run.
Advertisement
Answer
As mentioned, WITH
is a statement to define CTEs to be used in a final query. Your query only contains CTE definitions but never actually use any in a final statement. Additionally, you can combine the first two CTEs since window functions can run at any level. Possibly the last CTE can serve as your final SELECT
statement.
sql <- "WITH CTE AS
(SELECT
oz.id AS AccountID
, ac.PROD_TYPE_CDE as ProductTypeCode
, CASE
WHEN ac.OPEN_DTE='0001-01-01'
THEN NULL
ELSE ac.OPEN_DTE
END AS OpenDate
, CASE
WHEN ac.CLOS_DTE = '0001-01-01'
THEN NULL
ELSE ac.CLOS_DTE
END AS ClosedDate
, df.proc_dte AS FullDate
, ROW_NUMBER() OVER (PARTITION BY oz.id
ORDER BY oz.id, df.proc_dte) AS row_nbr
, ROW_NUMBER() OVER (PARTITION BY oz.id, ac.PROD_TYPE_CDE
ORDER BY oz.id, df.proc_dte) AS row_nbr2
, LAG(ac.PROD_TYPE_CDE) OVER (PARTITION BY oz.id
ORDER BY df.proc_dte) AS Lagging
, LEAD(ac.PROD_TYPE_CDE) OVER (PARTITION BY oz.id
ORDER BY df.proc_dte) AS Leading
FROM
dbs.tb_dbs_acct_fact df
INNER JOIN
dbs.tb_acct_details ac ON df.dw_serv_id = ac.dw_serv_id
LEFT OUTER JOIN
dbs.tb_oz_id oz ON df.proc_dte = oz.proc_dte
)
SELECT *
, CASE
WHEN row_nbr = 1 & Lagging = ProductTypeCode
THEN OpenDate
ELSE
CASE
WHEN Lagging <> ProductTypeCode
THEN FullDate
ELSE NULL
END
END AS NewOpen
, CASE
WHEN ClosedDate IS NOT NULL
THEN ClosedDate
ELSE
CASE
WHEN Leading <> ProductTypeCode
THEN FullDate
ELSE NULL
END
END AS NewClosed
FROM CTE;"
q5 <- sqlQuery(ch, sql)