I have a table A with data associated for an account for each month and there could be multiple entries in each month with following data as shown in snippet.
Expected transformed table
I want to generate table something like below as shown in below snippet. Idea is that for each distinct account_id in table A (configuration) , would like to generate a summary table such that each account_id will have a summarized entry for each month in the calendar year, latest/ last entry of account_id for that month decides whether the account is active or not. For an account with no entry for a particular month the value of ACTIVE column would be empty.
Any inputs on how to achieve this would be much appreciated . Thank you.
Advertisement
Answer
Given that nobody has answered this yet, I have given it a shot. On the other hand, I haven’t used Snowflake nor can I find a place I can try my code with it (similar to db<>fiddle sites) so you may need to modify this to fit conventions for Snowflake.
I have, however, tried to make this as simple and generic as possible. If there are any minor issues please convert it to what you need.
First things first – I created the base data set, as well as a numbers table (called Num
) which has the values 0 to 11. You can use any other way to generate the numbers; I just used this as a fairly generic approach to create the list of months.
CREATE TABLE TableA ("Time" datetime, Account_Id varchar(3), Active varchar(1)) INSERT INTO TableA ("Time", Account_Id, Active) VALUES ('2020-02-25 00:00:30', '111', 'T'), ('2020-02-15 00:10:30', '112', 'T'), ('2020-03-17 10:00:30', '111', 'F'), ('2020-04-09 01:00:24', '111', 'T'), ('2020-05-18 01:00:22', '112', 'F'), ('2020-01-01 11:11:11', '111', 'T'), ('2020-02-28 10:00:00', '111', 'F'), ('2020-05-28 01:00:22', '112', 'F'); CREATE TABLE Nums (n int); INSERT INTO Nums (n) VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11);
The next step is involved three steps: 2 CTEs, and the SELECT statement that draws those together.
- The first CTE
All_Account_Months
creates all the rows you need – one row per Account_Id per month. - The second CTE uses a window function to work out the last
Active
value for a given Account_ID and month. - The SELECT statement then does a LEFT JOIN between these – starting with the first CTE, then adding the extra column
Active
to it.
For this, I wrote it in SQL Server (t-sql) – I have a db<>fiddle here with the run. However, below, I have tried to convert it to Snowflake based on the documentation (e.g., changing column identifiers to double quotes, changing the FORMAT to get the month name to TO_VARCHAR). I’m sorry though – I don’t have a place to test it.
DECLARE @StartDate datetime = '2020-01-01 00:00:00'; WITH All_Account_Months AS (SELECT A.Account_ID, TO_VARCHAR(DATEADD(month, N.n, @StartDate), 'MMMM yyyy') AS "Month" FROM Nums N CROSS JOIN (SELECT DISTINCT Account_ID FROM TableA ) A ), Latest_Active_Values AS (SELECT * FROM (SELECT *, TO_VARCHAR(Asub."Time", 'MMMM yyyy') AS "Month", ROW_NUMBER() OVER (PARTITION BY Asub.Account_ID, YEAR(Asub."Time"), MONTH(Asub."Time") ORDER BY Asub."Time" DESC) AS rn FROM TableA Asub ) AS A WHERE A.rn = 1 ) SELECT AAM.Account_ID, AAM.Month, LAV.Active FROM All_Account_Months AAM LEFT OUTER JOIN Latest_Active_values LAV ON AAM.Account_Id = LAV.Account_Id AND AAM."Month" = LAV."Month";
Results are as follows – the same as yours except I think you typo’ed March for 111 – it should be F not T?
Account_ID Month Active 111 January 2020 T 111 February 2020 F 111 March 2020 F 111 April 2020 T 111 May 2020 NULL 111 June 2020 NULL 111 July 2020 NULL 111 August 2020 NULL 111 September 2020 NULL 111 October 2020 NULL 111 November 2020 NULL 111 December 2020 NULL 112 January 2020 NULL 112 February 2020 T 112 March 2020 NULL 112 April 2020 NULL 112 May 2020 F 112 June 2020 NULL 112 July 2020 NULL 112 August 2020 NULL 112 September 2020 NULL 112 October 2020 NULL 112 November 2020 NULL 112 December 2020 NULL