Skip to content
Advertisement

Efficient Multiple Group-bys

I have the following table:

Year Week Day_1 Day_2 Day_3
2020 1 Walk Jump Swim
2020 3 Walk Swim Walk
2020 1 Jump Walk Swim

I want to group by YEAR, WEEK and Event (Walk, jump, Swim) and count the number of times each event occurs in Day_1, Day_2, Day_3. I.e.

Year Week Event Count_Day_1 Count_Day_2 Count_Day_3
2020 1 Walk 1 1 0
2020 3 Walk 1 0 1
2020 1 Jump 1 1 0
2020 3 Jump 0 0 0
2020 1 Swim 0 0 2
2020 3 Swim . 0 1 0

How can I do this efficiently?

Advertisement

Answer

In BigQuery, I would unpivot using arrays and then aggregate:

with t as (
       select 2020 as year, 1 as week, 'Walk' as day_1, 'Jump' as day_2, 'Swim' as day_3 union all
       select 2020, 3, 'Walk', 'Swim', 'Walk' union all
       select 2020, 1, 'Jump', 'Walk', 'Swim'
      )
select t.year, t.week, s.event,
       countif(day = 1) as day_1, countif(day = 2) as day_2, countif(day = 3) as day_3
from t cross join
     unnest([struct(t.day_1 as event, 1 as day),
             struct(t.day_2 as event, 2 as day),
             struct(t.day_3 as event, 3 as day)
            ]) s
group by t.year, t.week, s.event;
              

User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement