google bigQuery subqueries to joins

I have the following table. basically simplified version of my table. I need to aggregate few columns, I will explain what I am trying to do and also what I have written till now.

tableName

food.id                     STRING  NULLABLE    
food.basket.id              STRING  NULLABLE
food.foodType               STRING  NULLABLE
food.price                  INTEGER NULLABLE    
food.printed                BOOLEAN NULLABLE    
food.variations             RECORD  REPEATED    
food.variations.id          INTEGER REPEATED    
food.variations.amount      INTEGER NULLABLE

​x
 
food.id                     STRING  NULLABLE    food.basket.id              STRING  NULLABLEfood.foodType               STRING  NULLABLEfood.price                  INTEGER NULLABLE    food.printed                BOOLEAN NULLABLE    food.variations             RECORD  REPEATED    food.variations.id          INTEGER REPEATED    food.variations.amount      INTEGER NULLABLE​

Sample data
id     basket.id.    foodType.    price.    printed.    variations.id      variations.amount
1.      abbcd.         JUNK.       100.       TRUE.        NULL.            NULL
2.      cdefg.         PIZZA.      200.       TRUE.        1234.            10
                                                           2345.            20
                                                           5678.            20
3.      abbcd.         JUNK.       200.       FALSE.       1234.            10
4.      uiwka.         TOAST.      500.       FALSE.       NULL.            NULL

 
Sample dataid     basket.id.    foodType.    price.    printed.    variations.id      variations.amount1.      abbcd.         JUNK.       100.       TRUE.        NULL.            NULL2.      cdefg.         PIZZA.      200.       TRUE.        1234.            10                                                           2345.            20                                                           5678.            203.      abbcd.         JUNK.       200.       FALSE.       1234.            104.      uiwka.         TOAST.      500.       FALSE.       NULL.            NULL​

variations can be like pizza toppings, each variation has an amount, say veggie toppings cost 10 cent and meat toppings cost 20 cents for simplicity

so now I am trying to aggregate some data for this table

I am trying to get

number of items printed (items where printed = TRUE)
number of items unprinted (items where printed = FALSE)
total cost of all items
total price of all variations
total number of unique baskets for a specific foodType

This is the query I have:

select SUM(CASE When item.printed = TRUE Then 1 Else 0 End ) as printed,
       SUM(CASE When item.printed = FALSE Then 1 Else 0 End) as nonPrinted,
       SUM(item.price) as price,
       (select COUNT(DISTINCT(item.basket.id)) from tableName where itemType = "JUNK") AS baskets,
       (select SUM(CASE when m.amount is NULL then 0 Else m.amount END) as variations_total from tableName, UNNEST(item.variations) as m) as variations
 from tableName;

 
select SUM(CASE When item.printed = TRUE Then 1 Else 0 End ) as printed,       SUM(CASE When item.printed = FALSE Then 1 Else 0 End) as nonPrinted,       SUM(item.price) as price,       (select COUNT(DISTINCT(item.basket.id)) from tableName where itemType = "JUNK") AS baskets,       (select SUM(CASE when m.amount is NULL then 0 Else m.amount END) as variations_total from tableName, UNNEST(item.variations) as m) as variations from tableName;​

printed.     unprinted.    price.    baskets.    variations. 
   2.            2.         1000.       1.            60

 
printed.     unprinted.    price.    baskets.    variations.    2.            2.         1000.       1.            60​

Now I get the result that I expect. I am trying to understand if we can do this without using subqueries and use only joins?

Answer

Below is for BigQuery Standard SQL and assumes that your query is really working (saying this because your data example does not exactly fit into query you provided)

So, below two subqueries

(select COUNT(DISTINCT(item.basket.id)) from tableName where itemType = "JUNK") AS baskets,
(select SUM(CASE when m.amount is NULL then 0 Else m.amount END) as variations_total from tableName, UNNEST(item.variations) as m) as variations

 
(select COUNT(DISTINCT(item.basket.id)) from tableName where itemType = "JUNK") AS baskets,(select SUM(CASE when m.amount is NULL then 0 Else m.amount END) as variations_total from tableName, UNNEST(item.variations) as m) as variations    ​

can be replace with

COUNT(DISTINCT IF(itemType = "JUNK", item.basket.id, NULL)) AS baskets,
SUM((SELECT SUM(amount) FROM item.variations)) AS variations

 
COUNT(DISTINCT IF(itemType = "JUNK", item.basket.id, NULL)) AS baskets,SUM((SELECT SUM(amount) FROM item.variations)) AS variations   ​

Believe me or not – but result will be the same

Row printed nonPrinted  price   baskets variations   
1   2       2           1000    1       60

 
Row printed nonPrinted  price   baskets variations   1   2       2           1000    1       60     ​

So, as you can see yo don’t need subqueries and you don’t need joins here either

Note: in the second row – (SELECT SUM(amount) FROM item.variations) is not really the same type of subquery as in your original query. Rather here for each row you query its array to find sum of amount in that row which is then being aggregated to total sum …
Hope you get it

Advertisement

Answer