Skip to content
Advertisement

Group elements by month three by three

I have this Standard SQL query:

WITH utils AS (
    SELECT
        today,
        month_lag,
        GENERATE_DATE_ARRAY(DATE_SUB(DATE_TRUNC(DATE_SUB(today, INTERVAL month_lag MONTH), MONTH), INTERVAL 11 MONTH), DATE_TRUNC(DATE_SUB(today, INTERVAL month_lag MONTH), MONTH), INTERVAL 1 MONTH) AS months
    FROM (
        SELECT
            CURRENT_DATE() AS today,
            IF(EXTRACT(DAY FROM CURRENT_DATE()) < 7, 2, 1) AS month_lag
    )
)

SELECT
    date,
    MAX(ndays) AS ndays,
    COUNT(*) AS count
FROM (
    SELECT
        DATE_TRUNC(date, MONTH) AS date,
        IFNULL(DATE_DIFF(date, prev_date, DAY), 186) AS ndays
    FROM (
        SELECT
            date,
            LAG(date, 1) OVER (ORDER BY date) AS prev_date
        FROM `myproject.mydataset.mytable`, utils
        WHERE
            DATE_TRUNC(date, MONTH) >= DATE_TRUNC(DATE_SUB(today, INTERVAL 17+month_lag MONTH), MONTH)
            AND type = 'Departamento'
    ), utils
    WHERE DATE_TRUNC(date, MONTH) BETWEEN DATE_TRUNC(DATE_SUB(today, INTERVAL 11+month_lag MONTH), MONTH) AND DATE_TRUNC(DATE_SUB(today, INTERVAL month_lag MONTH), MONTH)
)
GROUP BY date
ORDER BY date

With the following output:

Row     date         ndays     count    
1       2020-10-01   36        119
2       2020-11-01   35        61
3       2020-12-01   26        39
4       2021-01-01   30        33
5       2021-04-01   89        163
6       2021-06-01   57        101
7       2021-09-01   90        88

You can simulate the output with:

SELECT '2020-10-01' AS date, 36 AS ndays, 119 AS count UNION ALL
SELECT '2020-11-01' AS date, 35 AS ndays, 61 AS count UNION ALL 
SELECT '2020-12-01' AS date, 26 AS ndays, 39 AS count UNION ALL
SELECT '2021-01-01' AS date, 30 AS ndays, 33 AS count UNION ALL
SELECT '2021-04-01' AS date, 89 AS ndays, 163 AS count UNION ALL
SELECT '2021-06-01' AS date, 57 AS ndays, 101 AS count UNION ALL
SELECT '2021-09-01' AS date, 90 AS ndays, 88 AS count

I am retreiving the entries of the last 12 months. I have already grouped and ordered them by month, but I need to also group them three by three according to the month. Moreover, as you can see in the output there are missing months, so I have created in my utils statement a truncated month array named months containing the months that should be considered. Groups need to be dynamics, so that if I am retrieving data from November 2020 to October 2021, groups should be 2020 November - January 2021, February 2021 - April 2021, May 2021 - July 2021 and August 2021 - October 2021.

This is the output of my WITH clause:

Row     today         month_lag     months  
1       2021-10-15    1             2020-10-01
                                    2020-11-01
                                    2020-12-01
                                    2021-01-01
                                    2021-02-01
                                    2021-03-01
                                    2021-04-01
                                    2021-05-01
                                    2021-06-01
                                    2021-07-01
                                    2021-08-01
                                    2021-09-01

For each group of three months I need to get:

  • Name of the group: something like 2020 October - 2020 December
  • Max of ndays in the group: MAX(ndays)
  • Number of entries in that group: SUM(count)
  • Number of days elapsed in that group: to get the number of days of a month I usually use this sentence 32 - EXTRACT(DAY FROM DATETIME_ADD(DATETIME_TRUNC(DATETIME_SUB(date, INTERVAL 1 MONTH), MONTH), INTERVAL 31 DAY)), so I need the sum of that for each month of the group.

Considering all mentioned above, the output should be:

Row     group_name                        ndays     count     group_ndays
1       '2020 October - 2020 December'    36        219       92
2       '2020 January - 2020 March'       30        39        90
3       '2020 April - 2020 June'          89        196       91
4       '2020 July - 2020 September'      90        88        92

What would be the way to proceed?

Advertisement

Answer

I need the months composing each group to change according to the input … The month to start with is the first element in my months array at the utils

select 
  format_date('%Y %B', min(month)) || ' - ' || format_date('%Y %B',  max(month)) as group_name,
  max(ndays) as ndays,
  sum(count) as count,
  date_diff(last_day(max(month)) + 1, min(month), day) as group_ndays
from (
  select month, ndays, count,
    div(row_number() over(order by month) - 1, 3) grp
  from utils, unnest(months) month
  left join current_output
  on date = month
)
group by grp   

if to apply to sample data in your question

WITH utils AS (
    SELECT
        today,
        month_lag,
        GENERATE_DATE_ARRAY(DATE_SUB(DATE_TRUNC(DATE_SUB(today, INTERVAL month_lag MONTH), MONTH), INTERVAL 11 MONTH), DATE_TRUNC(DATE_SUB(today, INTERVAL month_lag MONTH), MONTH), INTERVAL 1 MONTH) AS months
    FROM (
        SELECT
            CURRENT_DATE() AS today,
            IF(EXTRACT(DAY FROM CURRENT_DATE()) < 7, 2, 1) AS month_lag
    )
), current_output as (
  SELECT date '2020-10-01' AS date, 36 AS ndays, 119 AS count UNION ALL
  SELECT '2020-11-01' AS date, 35 AS ndays, 61 AS count UNION ALL 
  SELECT '2020-12-01' AS date, 26 AS ndays, 39 AS count UNION ALL
  SELECT '2021-01-01' AS date, 30 AS ndays, 33 AS count UNION ALL
  SELECT '2021-04-01' AS date, 89 AS ndays, 163 AS count UNION ALL
  SELECT '2021-06-01' AS date, 57 AS ndays, 101 AS count UNION ALL
  SELECT '2021-09-01' AS date, 90 AS ndays, 88 AS count
)          

the output is

enter image description here

User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement