I have this Standard SQL query:
WITH utils AS ( SELECT today, month_lag, GENERATE_DATE_ARRAY(DATE_SUB(DATE_TRUNC(DATE_SUB(today, INTERVAL month_lag MONTH), MONTH), INTERVAL 11 MONTH), DATE_TRUNC(DATE_SUB(today, INTERVAL month_lag MONTH), MONTH), INTERVAL 1 MONTH) AS months FROM ( SELECT CURRENT_DATE() AS today, IF(EXTRACT(DAY FROM CURRENT_DATE()) < 7, 2, 1) AS month_lag ) ) SELECT date, MAX(ndays) AS ndays, COUNT(*) AS count FROM ( SELECT DATE_TRUNC(date, MONTH) AS date, IFNULL(DATE_DIFF(date, prev_date, DAY), 186) AS ndays FROM ( SELECT date, LAG(date, 1) OVER (ORDER BY date) AS prev_date FROM `myproject.mydataset.mytable`, utils WHERE DATE_TRUNC(date, MONTH) >= DATE_TRUNC(DATE_SUB(today, INTERVAL 17+month_lag MONTH), MONTH) AND type = 'Departamento' ), utils WHERE DATE_TRUNC(date, MONTH) BETWEEN DATE_TRUNC(DATE_SUB(today, INTERVAL 11+month_lag MONTH), MONTH) AND DATE_TRUNC(DATE_SUB(today, INTERVAL month_lag MONTH), MONTH) ) GROUP BY date ORDER BY date
With the following output:
Row date ndays count 1 2020-10-01 36 119 2 2020-11-01 35 61 3 2020-12-01 26 39 4 2021-01-01 30 33 5 2021-04-01 89 163 6 2021-06-01 57 101 7 2021-09-01 90 88
You can simulate the output with:
SELECT '2020-10-01' AS date, 36 AS ndays, 119 AS count UNION ALL SELECT '2020-11-01' AS date, 35 AS ndays, 61 AS count UNION ALL SELECT '2020-12-01' AS date, 26 AS ndays, 39 AS count UNION ALL SELECT '2021-01-01' AS date, 30 AS ndays, 33 AS count UNION ALL SELECT '2021-04-01' AS date, 89 AS ndays, 163 AS count UNION ALL SELECT '2021-06-01' AS date, 57 AS ndays, 101 AS count UNION ALL SELECT '2021-09-01' AS date, 90 AS ndays, 88 AS count
I am retreiving the entries of the last 12 months. I have already grouped and ordered them by month, but I need to also group them three by three according to the month. Moreover, as you can see in the output there are missing months, so I have created in my utils
statement a truncated month array named months
containing the months that should be considered. Groups need to be dynamics, so that if I am retrieving data from November 2020 to October 2021, groups should be 2020 November - January 2021
, February 2021 - April 2021
, May 2021 - July 2021
and August 2021 - October 2021
.
This is the output of my WITH
clause:
Row today month_lag months 1 2021-10-15 1 2020-10-01 2020-11-01 2020-12-01 2021-01-01 2021-02-01 2021-03-01 2021-04-01 2021-05-01 2021-06-01 2021-07-01 2021-08-01 2021-09-01
For each group of three months I need to get:
- Name of the group: something like
2020 October - 2020 December
- Max of ndays in the group: MAX(ndays)
- Number of entries in that group: SUM(count)
- Number of days elapsed in that group: to get the number of days of a month I usually use this sentence
32 - EXTRACT(DAY FROM DATETIME_ADD(DATETIME_TRUNC(DATETIME_SUB(date, INTERVAL 1 MONTH), MONTH), INTERVAL 31 DAY))
, so I need the sum of that for each month of the group.
Considering all mentioned above, the output should be:
Row group_name ndays count group_ndays 1 '2020 October - 2020 December' 36 219 92 2 '2020 January - 2020 March' 30 39 90 3 '2020 April - 2020 June' 89 196 91 4 '2020 July - 2020 September' 90 88 92
What would be the way to proceed?
Advertisement
Answer
I need the months composing each group to change according to the input … The month to start with is the first element in my months array at the utils
select format_date('%Y %B', min(month)) || ' - ' || format_date('%Y %B', max(month)) as group_name, max(ndays) as ndays, sum(count) as count, date_diff(last_day(max(month)) + 1, min(month), day) as group_ndays from ( select month, ndays, count, div(row_number() over(order by month) - 1, 3) grp from utils, unnest(months) month left join current_output on date = month ) group by grp
if to apply to sample data in your question
WITH utils AS ( SELECT today, month_lag, GENERATE_DATE_ARRAY(DATE_SUB(DATE_TRUNC(DATE_SUB(today, INTERVAL month_lag MONTH), MONTH), INTERVAL 11 MONTH), DATE_TRUNC(DATE_SUB(today, INTERVAL month_lag MONTH), MONTH), INTERVAL 1 MONTH) AS months FROM ( SELECT CURRENT_DATE() AS today, IF(EXTRACT(DAY FROM CURRENT_DATE()) < 7, 2, 1) AS month_lag ) ), current_output as ( SELECT date '2020-10-01' AS date, 36 AS ndays, 119 AS count UNION ALL SELECT '2020-11-01' AS date, 35 AS ndays, 61 AS count UNION ALL SELECT '2020-12-01' AS date, 26 AS ndays, 39 AS count UNION ALL SELECT '2021-01-01' AS date, 30 AS ndays, 33 AS count UNION ALL SELECT '2021-04-01' AS date, 89 AS ndays, 163 AS count UNION ALL SELECT '2021-06-01' AS date, 57 AS ndays, 101 AS count UNION ALL SELECT '2021-09-01' AS date, 90 AS ndays, 88 AS count )
the output is