I’m trying to setup a data check, where we get the row count from a table for today and prior date. Since it isn’t loaded on weekends or holidays, I can’t say DATE-1.
I came-up with the following, to get the previous date:
SELECT LOAD_DATE ,COUNT(LOAD_DATE) RW_COUNT ,ROW_NUMBER() OVER (ORDER BY LOAD_DATE ) AS LOAD_ROWNUM FROM DATABASE1.TABLE1 WHERE LOAD_DATE >= DATE-6 GROUP BY 1
This produces the dates, counts and assigns a row number.
LOAD_DATE RW_COUNT LOAD_ROWNUM 2019-10-16 8259 1 2019-10-15 8253 2 2019-10-11 8256 3 2019-10-10 8243 4
I to take the two most current dates and compare them. Most current would be “current” and the 2nd most current would be “prior” . Then I would like to have something like this as the result set:
CURRENT_COUNT PRIOR_COUNT DIFF_PERCENT 8259 8253 .9927
My issue is, how do I reference the first two rows and compare them to each other? Unless I’m over-thinking this, I need two additional SELECT statements: 1 with the WHERE clause referencing row 1 and another with a WHERE referercing row 2.
How do I do that? Do I have two CTEs?
Eventually, I’ll need a third SELECT dividing the two rows and checking for 10% tolerance. Help, I’m in analysis paralysis.
Advertisement
Answer
You can filter the result of an OLAP-function using QUALIFY:
SELECT LOAD_DATE ,COUNT(LOAD_DATE) AS CURRENT_COUNT -- previous day's count ,LEAD(RW_COUNT) OVER (ORDER BY LOAD_DATE DESC) AS PRIOR_COUNT -- if your TD version doesn't support LAG/LEAD (i.e. < 16.10) --,MIN(RW_COUNT) -- OVER (ORDER BY LOAD_DATE DESC -- ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING) AS PRIOR_COUNT ,CAST(CURRENT_COUNT AS DECIMAL(18,4)) / PRIOR_COUNT AS DIFF_PERCENT FROM DATABASE1.TABLE1 WHERE LOAD_DATE >= DATE-6 GROUP BY 1 -- return the latest row only QUALIFY ROW_NUMBER() OVER (ORDER BY LOAD_DATE DESC) = 1
checking for 10% tolerance:
DIFF_PERCENT BETWEEN 0.9 and 1.1
Either ANDed to the QUALIFY or within a CASE