I have this Standard SQL query:
WITH utils AS (
SELECT
today,
month_lag,
GENERATE_DATE_ARRAY(DATE_SUB(DATE_TRUNC(DATE_SUB(today, INTERVAL month_lag MONTH), MONTH), INTERVAL 11 MONTH), DATE_TRUNC(DATE_SUB(today, INTERVAL month_lag MONTH), MONTH), INTERVAL 1 MONTH) AS months
FROM (
SELECT
CURRENT_DATE() AS today,
IF(EXTRACT(DAY FROM CURRENT_DATE()) < 7, 2, 1) AS month_lag
)
)
SELECT
date,
MAX(ndays) AS ndays,
COUNT(*) AS count
FROM (
SELECT
DATE_TRUNC(date, MONTH) AS date,
IFNULL(DATE_DIFF(date, prev_date, DAY), 186) AS ndays
FROM (
SELECT
date,
LAG(date, 1) OVER (ORDER BY date) AS prev_date
FROM `myproject.mydataset.mytable`, utils
WHERE
DATE_TRUNC(date, MONTH) >= DATE_TRUNC(DATE_SUB(today, INTERVAL 17+month_lag MONTH), MONTH)
AND type = 'Departamento'
), utils
WHERE DATE_TRUNC(date, MONTH) BETWEEN DATE_TRUNC(DATE_SUB(today, INTERVAL 11+month_lag MONTH), MONTH) AND DATE_TRUNC(DATE_SUB(today, INTERVAL month_lag MONTH), MONTH)
)
GROUP BY date
ORDER BY date
With the following output:
Row date ndays count
1 2020-10-01 36 119
2 2020-11-01 35 61
3 2020-12-01 26 39
4 2021-01-01 30 33
5 2021-04-01 89 163
6 2021-06-01 57 101
7 2021-09-01 90 88
You can simulate the output with:
SELECT '2020-10-01' AS date, 36 AS ndays, 119 AS count UNION ALL
SELECT '2020-11-01' AS date, 35 AS ndays, 61 AS count UNION ALL
SELECT '2020-12-01' AS date, 26 AS ndays, 39 AS count UNION ALL
SELECT '2021-01-01' AS date, 30 AS ndays, 33 AS count UNION ALL
SELECT '2021-04-01' AS date, 89 AS ndays, 163 AS count UNION ALL
SELECT '2021-06-01' AS date, 57 AS ndays, 101 AS count UNION ALL
SELECT '2021-09-01' AS date, 90 AS ndays, 88 AS count
I am retreiving the entries of the last 12 months. I have already grouped and ordered them by month, but I need to also group them three by three according to the month. Moreover, as you can see in the output there are missing months, so I have created in my utils
statement a truncated month array named months
containing the months that should be considered. Groups need to be dynamics, so that if I am retrieving data from November 2020 to October 2021, groups should be 2020 November - January 2021
, February 2021 - April 2021
, May 2021 - July 2021
and August 2021 - October 2021
.
This is the output of my WITH
clause:
Row today month_lag months
1 2021-10-15 1 2020-10-01
2020-11-01
2020-12-01
2021-01-01
2021-02-01
2021-03-01
2021-04-01
2021-05-01
2021-06-01
2021-07-01
2021-08-01
2021-09-01
For each group of three months I need to get:
- Name of the group: something like
2020 October - 2020 December
- Max of ndays in the group: MAX(ndays)
- Number of entries in that group: SUM(count)
- Number of days elapsed in that group: to get the number of days of a month I usually use this sentence
32 - EXTRACT(DAY FROM DATETIME_ADD(DATETIME_TRUNC(DATETIME_SUB(date, INTERVAL 1 MONTH), MONTH), INTERVAL 31 DAY))
, so I need the sum of that for each month of the group.
Considering all mentioned above, the output should be:
Row group_name ndays count group_ndays
1 '2020 October - 2020 December' 36 219 92
2 '2020 January - 2020 March' 30 39 90
3 '2020 April - 2020 June' 89 196 91
4 '2020 July - 2020 September' 90 88 92
What would be the way to proceed?
Advertisement
Answer
I need the months composing each group to change according to the input … The month to start with is the first element in my months array at the utils
select
format_date('%Y %B', min(month)) || ' - ' || format_date('%Y %B', max(month)) as group_name,
max(ndays) as ndays,
sum(count) as count,
date_diff(last_day(max(month)) + 1, min(month), day) as group_ndays
from (
select month, ndays, count,
div(row_number() over(order by month) - 1, 3) grp
from utils, unnest(months) month
left join current_output
on date = month
)
group by grp
if to apply to sample data in your question
WITH utils AS (
SELECT
today,
month_lag,
GENERATE_DATE_ARRAY(DATE_SUB(DATE_TRUNC(DATE_SUB(today, INTERVAL month_lag MONTH), MONTH), INTERVAL 11 MONTH), DATE_TRUNC(DATE_SUB(today, INTERVAL month_lag MONTH), MONTH), INTERVAL 1 MONTH) AS months
FROM (
SELECT
CURRENT_DATE() AS today,
IF(EXTRACT(DAY FROM CURRENT_DATE()) < 7, 2, 1) AS month_lag
)
), current_output as (
SELECT date '2020-10-01' AS date, 36 AS ndays, 119 AS count UNION ALL
SELECT '2020-11-01' AS date, 35 AS ndays, 61 AS count UNION ALL
SELECT '2020-12-01' AS date, 26 AS ndays, 39 AS count UNION ALL
SELECT '2021-01-01' AS date, 30 AS ndays, 33 AS count UNION ALL
SELECT '2021-04-01' AS date, 89 AS ndays, 163 AS count UNION ALL
SELECT '2021-06-01' AS date, 57 AS ndays, 101 AS count UNION ALL
SELECT '2021-09-01' AS date, 90 AS ndays, 88 AS count
)
the output is