Skip to content
Advertisement

google bigQuery subqueries to joins

I have the following table. basically simplified version of my table. I need to aggregate few columns, I will explain what I am trying to do and also what I have written till now.

tableName

food.id                     STRING  NULLABLE    
food.basket.id              STRING  NULLABLE
food.foodType               STRING  NULLABLE
food.price                  INTEGER NULLABLE    
food.printed                BOOLEAN NULLABLE    
food.variations             RECORD  REPEATED    
food.variations.id          INTEGER REPEATED    
food.variations.amount      INTEGER NULLABLE
Sample data
id     basket.id.    foodType.    price.    printed.    variations.id      variations.amount
1.      abbcd.         JUNK.       100.       TRUE.        NULL.            NULL
2.      cdefg.         PIZZA.      200.       TRUE.        1234.            10
                                                           2345.            20
                                                           5678.            20
3.      abbcd.         JUNK.       200.       FALSE.       1234.            10
4.      uiwka.         TOAST.      500.       FALSE.       NULL.            NULL

variations can be like pizza toppings, each variation has an amount, say veggie toppings cost 10 cent and meat toppings cost 20 cents for simplicity

so now I am trying to aggregate some data for this table

I am trying to get

  • number of items printed (items where printed = TRUE)
  • number of items unprinted (items where printed = FALSE)
  • total cost of all items
  • total price of all variations
  • total number of unique baskets for a specific foodType

This is the query I have:

select SUM(CASE When item.printed = TRUE Then 1 Else 0 End ) as printed,
       SUM(CASE When item.printed = FALSE Then 1 Else 0 End) as nonPrinted,
       SUM(item.price) as price,
       (select COUNT(DISTINCT(item.basket.id)) from tableName where itemType = "JUNK") AS baskets,
       (select SUM(CASE when m.amount is NULL then 0 Else m.amount END) as variations_total from tableName, UNNEST(item.variations) as m) as variations
 from tableName;
printed.     unprinted.    price.    baskets.    variations. 
   2.            2.         1000.       1.            60

Now I get the result that I expect. I am trying to understand if we can do this without using subqueries and use only joins?

Advertisement

Answer

Below is for BigQuery Standard SQL and assumes that your query is really working (saying this because your data example does not exactly fit into query you provided)

So, below two subqueries

(select COUNT(DISTINCT(item.basket.id)) from tableName where itemType = "JUNK") AS baskets,
(select SUM(CASE when m.amount is NULL then 0 Else m.amount END) as variations_total from tableName, UNNEST(item.variations) as m) as variations    

can be replace with

COUNT(DISTINCT IF(itemType = "JUNK", item.basket.id, NULL)) AS baskets,
SUM((SELECT SUM(amount) FROM item.variations)) AS variations   

Believe me or not – but result will be the same

Row printed nonPrinted  price   baskets variations   
1   2       2           1000    1       60     

So, as you can see yo don’t need subqueries and you don’t need joins here either

Note: in the second row – (SELECT SUM(amount) FROM item.variations) is not really the same type of subquery as in your original query. Rather here for each row you query its array to find sum of amount in that row which is then being aggregated to total sum …
Hope you get it

Advertisement