If I have data that looks like the follows:
+------+------------+-------+ | a | val | val2 | +------+------------+-------+ | 3.14 | [1, 2, 3] | [2, 3]| | 1.59 | [7, 8, 9] | ... | | -1 | [4, 5, 6] | ... | +------+------------+-------+
and I want to get the array averages of the val
column, naively I’d want to just try something like
SELECT AVG(val) FROM <Table>
But that doesn’t work. I get an error like No matching signature for aggregate function AVG for argument types: ARRAY<INT64>. Supported signatures: AVG(INT64); AVG(UINT64); AVG(DOUBLE); AVG(NUMERIC)
I know that if I have just one column val
I can do something like
SELECT avg FROM ( SELECT AVG(val) as avg FROM UNNEST(val) AS val )
but what if I have multiple columns (val
, val2
, etc.) and need multiple statistics? The above method just seems really cumbersome.
To be clear the result I’d want is:
+------+------------+-------------+--------------+ | a | avg_val | std_dev_val | avg_val2 | +------+------------+-------------+--------------+ | 3.14 | 2 | 1 | ... | | 1.59 | 8 | .... | ... | | -1 | 5 | .... | ... | +------+------------+-------------+--------------+
Is there a simple way to do this? Or do I need to create some sort of temporary function to accomplish this? Or am I stuck doing something like what I see in https://stackoverflow.com/a/45560462/1902480
Advertisement
Answer
Consider below example
#standardSQL create temp function array_avg(arr any type) as (( select avg(val) from unnest(arr) val )); create temp function array_std_dev(arr any type) as (( select stddev(val) from unnest(arr) val )); select a, val, array_avg(val) val_avg, array_std_dev(val) val_stddev, val2, array_avg(val2) val2_avg, array_std_dev(val2) val2_stddev from `project.dataset.table`
if applied to sample data in your question – output is