I have a table (on BigQuery) that looks like the following:
| Date | Type | Score1 | Score2 | |------------|------------|-------------|-------------| | 2021-01-04 | A | 5 | NULL | | 2021-01-04 | A | 4 | NULL | | 2021-01-04 | A | 5 | NULL | | 2021-01-02 | A | 1 | NULL | | 2021-01-02 | A | 1 | NULL | | 2021-01-01 | A | 3 | 2 | | 2021-01-04 | B | NULL | 3 | | 2021-01-04 | B | NULL | NULL | | 2021-01-02 | B | NULL | 4 | | 2021-01-02 | B | NULL | 4 | | 2021-01-01 | B | 2 | 5 | | 2021-01-01 | B | 5 | 3 | | 2021-01-04 | C | NULL | NULL | | 2021-01-04 | C | 4 | NULL | | 2021-01-04 | C | NULL | NULL | | 2021-01-01 | C | 1 | 5 | | 2021-01-01 | C | 2 | 4 | | 2021-01-01 | C | 3 | 4 |
What I would like to get is the average score for each type but the average should be taken only on the most recent date for which at least one score is available for the type. From the example above, the aim is to obtain the following table in one query (that can contain subqueries):
| Type | AVG Score1 | AVG Score2 | |------------|-------------|-------------| | A | (5+4+5)/3 | (2)/1 | | B | (2+5)/2 | (3)/1 | | C | (4)/1 | (5+4+4)/3 |
I need a solution that could be adapted if I want the average score, not for each type, but for each combination of two columns (type/color), still on the most recent date for which at least one score is available for the combination. It should be also possible to adapt it if I have more score columns and for different aggregations (AVG/MAX/MIN…).
N.B.: A first question was asked to handle the same problem with only one score column: Average on the most recent date for which data is available.
Advertisement
Answer
You can follow the same approach, but you need to enumerate each column separately — so filtering doesn’t work. That would be:
select type, avg(case when seqnum_1 = 1 then score1 end) as avg_1, avg(case when seqnum_2 = 1 then score1 end) as avg_2 from (select t.*, dense_rank() over (partition by type, score1 is null order by date desc) as seqnum_1, dense_rank() over (partition by type, score2 is null order by date desc) as seqnum_2 from t ) t group by type;
Note: This includes the NULL
values in the averages. There is no harm in that, because they don’t affect the results. You could also express this as:
select type, avg(case when seqnum_1 = 1 then score1 end) as avg_1, avg(case when seqnum_2 = 1 then score1 end) as avg_2 from (select t.*, dense_rank() over (partition by type order by score1 is not null desc, date desc) as seqnum_1, dense_rank() over (partition by type order by score2 is not null desc, date desc) as seqnum_2 from t ) t group by type;