I have the following table. basically simplified version of my table. I need to aggregate few columns, I will explain what I am trying to do and also what I have written till now.
tableName
food.id STRING NULLABLE food.basket.id STRING NULLABLE food.foodType STRING NULLABLE food.price INTEGER NULLABLE food.printed BOOLEAN NULLABLE food.variations RECORD REPEATED food.variations.id INTEGER REPEATED food.variations.amount INTEGER NULLABLE
Sample data id basket.id. foodType. price. printed. variations.id variations.amount 1. abbcd. JUNK. 100. TRUE. NULL. NULL 2. cdefg. PIZZA. 200. TRUE. 1234. 10 2345. 20 5678. 20 3. abbcd. JUNK. 200. FALSE. 1234. 10 4. uiwka. TOAST. 500. FALSE. NULL. NULL
variations can be like pizza toppings, each variation has an amount, say veggie toppings cost 10 cent and meat toppings cost 20 cents for simplicity
so now I am trying to aggregate some data for this table
I am trying to get
- number of items printed (items where printed = TRUE)
- number of items unprinted (items where printed = FALSE)
- total cost of all items
- total price of all variations
- total number of unique baskets for a specific foodType
This is the query I have:
select SUM(CASE When item.printed = TRUE Then 1 Else 0 End ) as printed, SUM(CASE When item.printed = FALSE Then 1 Else 0 End) as nonPrinted, SUM(item.price) as price, (select COUNT(DISTINCT(item.basket.id)) from tableName where itemType = "JUNK") AS baskets, (select SUM(CASE when m.amount is NULL then 0 Else m.amount END) as variations_total from tableName, UNNEST(item.variations) as m) as variations from tableName;
printed. unprinted. price. baskets. variations. 2. 2. 1000. 1. 60
Now I get the result that I expect. I am trying to understand if we can do this without using subqueries and use only joins?
Advertisement
Answer
Below is for BigQuery Standard SQL and assumes that your query is really working (saying this because your data example does not exactly fit into query you provided)
So, below two subqueries
(select COUNT(DISTINCT(item.basket.id)) from tableName where itemType = "JUNK") AS baskets, (select SUM(CASE when m.amount is NULL then 0 Else m.amount END) as variations_total from tableName, UNNEST(item.variations) as m) as variations
can be replace with
COUNT(DISTINCT IF(itemType = "JUNK", item.basket.id, NULL)) AS baskets, SUM((SELECT SUM(amount) FROM item.variations)) AS variations
Believe me or not – but result will be the same
Row printed nonPrinted price baskets variations 1 2 2 1000 1 60
So, as you can see yo don’t need subqueries and you don’t need joins here either
Note: in the second row – (SELECT SUM(amount) FROM item.variations)
is not really the same type of subquery as in your original query. Rather here for each row you query its array to find sum of amount in that row which is then being aggregated to total sum …
Hope you get it