Skip to content
Advertisement

Flatten the Data in BigQuery

I have dimensions.key_value of RECORD type i run the following query with following output.

SELECT *  from table;

   event_id   value     dimensions
     1        140      {"key_value": [{"key": "app", "value": "20"}]}
     2        150      {"key_value": [{"key": "region", "value": "8"}, {"key": "loc", "value": "1"}]}
     3        600      {"key_value": [{"key": "region", "value": "8"}, {"key": "loc", "value": "2"}]}
   

To unnest the data i have created the following view:

with temp as (
  select 
    (select value from t.dimensions.key_value where key = 'region') as region,
    (select value from t.dimensions.key_value where key = 'loc') as loc,
    (select value from t.dimensions.key_value where key = 'app') as app,
    value,
    event_id
    
  from table t
) select *
from temp;

My Output:

region   loc       app   count  event_id
null    null       20     140      1   
 8       1         null   150      2
 8       2         null.  600.     3

There are two thing i need to verify is my query correct ? How i can make the query generic i.e if i don’t know all the key, there some other keys may also be present in our dataset ?

Update:

My Schema : enter image description here

My OutPut : enter image description here

Problem : Let says a user want to do group by using region and loc so there is no easy way of writing the query for that i decided create a view so user can easily do group by

 with temp as (
      select 
        (select value from t.dimensions.key_value where key = 'region') as region,
        (select value from t.dimensions.key_value where key = 'loc') as loc,
        (select value from t.dimensions.key_value where key = 'store') as store,
        value,
        metric_name, event_time
        
      from table t
    ) select *
    from temp;

Based on this view the user can easily do group by. So i wanted to check if their is way to create generic view since we don’t know the all the unique key or is there a easy way to do groupby.

Advertisement

Answer

How i can make the query generic i.e if i don’t know all the key, there some other keys may also be present in our dataset ?

Consider below

execute immediate (select
  ''' select event_id, value, ''' || string_agg('''
    (select value from b.key_value where key = "''' || key_name || '''") as ''' || key_name , ''', ''')
  || '''
  from (
    select event_id, value,
      array(
        select as struct 
          json_extract_scalar(kv, '$.key') key, 
          json_extract_scalar(kv, '$.value') value
        from a.kvs kv
      ) key_value
    from `project.dataset.table`,
    unnest([struct(json_extract_array(dimensions, '$.key_value') as kvs)]) a
  ) b
  '''
  from (
    select distinct json_extract_scalar(kv, '$.key') key_name
    from `project.dataset.table`,
    unnest(json_extract_array(dimensions, '$.key_value')) as kv
  )
)    

if applied to sample data in your question – ooutput is

enter image description here

As you can see in query – there is no any explicit references to actual key names – rather they are dynamically extracted – so no need to know them in advance and there can be any number of them too

User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement