Skip to content
Advertisement

sql: Calculate the correlations and convert rows into columns

So my current table has more than 100 fields and I am trying to calculate the correlations between the input variables and output variable, and then convert all those columns into rows. For example, my current table looks like this:

input_1 input_2 output
3 6 5
4 7 5
6 4 4
6 9 3
7 10 5
9 9 2
2 9 4

What I trying to achieve is:

categories correlation
input_1 -0.594
input_2 -0.27

The query that I have written below gives me a correlation per each calculation. However, I was them converted into rows,

select (Avg(input1 * output) - (Avg(input1) * Avg(output))) / (stddev(input1) * stddev(output)) AS correlation

from a

Can you help how this can be achieved? Thank you!

Advertisement

Answer

Consider below approach

execute immediate (select '''
select categories, correlation
from (
  select ''' || 
    string_agg('corr(' || category || ', output) as ' || category , ', ')
|| '''  from `project.dataset.table`
)
unpivot (correlation for categories in (''' || string_agg(category) || '''))
'''
from (
  select category from (
    select 
      array(
        select category
        from unnest(regexp_extract_all(to_json_string(t), r'"(w+)":')) category
        where category != 'output'
      ) arr
    from `project.dataset.table` t
    limit 1
  ), unnest(arr) category
))    

if applied to sample data in your question – output is

enter image description here

As you can notice – I am not using your heavy formula – rather I am using CORR built-in function

User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement