Need help joining 2 tables and rolling them up in a unique way using BigQuery SQL

Question

I have 2 tables at different granularity in a BigQuery dataset. I need to join those two and roll it up using BigQuery SQL in such a way that the value in one of the columns of the 2nd table becomes ...

Accepted Answer

Below is for BigQuery Standard SQL and just to demonstrate you approach of pivoting data     If you would know in advance how many distinct activity_id you have and if the number of such is low &#8211; for example three &#8211; as in your example &#8211; you would do as simple as below   #standardSQLSELECT   user_id,  event_date,  COUNTIF(activity_id = 1) act_1,  COUNTIF(activity_id = 2) act_2,  COUNTIF(activity_id = 3) act_3FROM `project.dataset.table1` t1JOIN `project.dataset.table2` t2USING(user_id)GROUP BY user_id, event_dateORDER BY user_id, event_date   if to apply to sample data as in your question &#8211; result will be    Row user_id event_date  act_1   act_2   act_3    1   A       2019-02-01  2       1       0    2   B       2019-02-10  0       1       1    3   C       2019-01-15  1       0       0      But as you mentioned     The number of distinct activity_ids in Table 2 can change over time. So, I don&#8217;t know before hand how many columns will be created in the output table    So, you need to generate above query dynamically &#8211; below is an example of such    #standardSQLWITH activities AS (  SELECT DISTINCT activity_id   FROM `project.dataset.table2`), generate_query AS (  SELECT CONCAT(    'SELECT user_id, event_date',    STRING_AGG(CONCAT(',COUNTIF(activity_id = ', CAST(activity_id AS STRING), ') act_', CAST(activity_id AS STRING)), ''),    ' FROM `project.dataset.table1` t1 JOIN `project.dataset.table2` t2 USING(user_id) GROUP BY user_id, event_date ORDER BY user_id, event_date'  ) AS query  FROM activities)SELECT query FROM generate_query Again, if to apply to your sample data &#8211; result will be    SELECT user_id, event_date,COUNTIF(activity_id = 1) act_1,COUNTIF(activity_id = 2) act_2,COUNTIF(activity_id = 3) act_3 FROM `project.dataset.table1` t1 JOIN `project.dataset.table2` t2 USING(user_id) GROUP BY user_id, event_date ORDER BY user_id, event_date   if to look closer to above result &#8211; you can see  &#8211; it is exactly the query that we initially wrote manually &#8211; but now it was generated for us &#8211; and no matter how many distinct activity_id you have (obviously limitations on columns number is still apply) &#8211; it will product needed query    So, now you just need to copy the text of query from above result and just simply run it &#8211; which will produce desired result    Row user_id event_date  act_1   act_2   act_3    1   A       2019-02-01  2       1       0    2   B       2019-02-10  0       1       1    3   C       2019-01-15  1       0       0      As you can see this is two-steps process &#8211; but you can script it using client of your choice    Note: i focused on substance of the question and have not addressed at all 90 days related stuff &#8211; I feel it was secondary detail in your question

Advertisement

Answer