If I have data that looks like the follows:
+------+------------+-------+
| a | val | val2 |
+------+------------+-------+
| 3.14 | [1, 2, 3] | [2, 3]|
| 1.59 | [7, 8, 9] | |
| -1 | [4, 5, 6] | |
+------+------------+-------+
and I want to get the array averages of the val
column, naively I’d want to just try something like
SELECT
AVG(val)
FROM
<Table>
But that doesn’t work. I get an error like No matching signature for aggregate function AVG for argument types: ARRAY<INT64>. Supported signatures: AVG(INT64); AVG(UINT64); AVG(DOUBLE); AVG(NUMERIC)
I know that if I have just one column val
I can do something like
SELECT avg
FROM
(
SELECT AVG(val) as avg
FROM UNNEST(val) AS val
)
but what if I have multiple columns (val
, val2
, etc.) and need multiple statistics? The above method just seems really cumbersome.
To be clear the result I’d want is:
+------+------------+-------------+--------------+
| a | avg_val | std_dev_val | avg_val2 |
+------+------------+-------------+--------------+
| 3.14 | 2 | 1 | |
| 1.59 | 8 | . | |
| -1 | 5 | . | |
+------+------------+-------------+--------------+
Is there a simple way to do this? Or do I need to create some sort of temporary function to accomplish this? Or am I stuck doing something like what I see in https://stackoverflow.com/a/45560462/1902480
Advertisement
Answer
Consider below example
#standardSQL
create temp function array_avg(arr any type) as ((
select avg(val) from unnest(arr) val
));
create temp function array_std_dev(arr any type) as ((
select stddev(val) from unnest(arr) val
));
select a,
val, array_avg(val) val_avg, array_std_dev(val) val_stddev,
val2, array_avg(val2) val2_avg, array_std_dev(val2) val2_stddev
from `project.dataset.table`
if applied to sample data in your question – output is