Skip to content
Advertisement

BigQuery: JOIN on single matching row

I have two tables, one containing orders with a nested line_items structure and another with a pricing history for each product sku code.

Orders Table

order_id order_date item_sku item_quantity item_subtotal
1 2022-23-07 SKU1 7 12.34
SKU2 1 9.99
2 2022-12-07 SKU1 1 1.12
SKU3 5 32.54

Price History Table

item_sku effective_date cost
SKU1 2022-20-07 0.78
SKU2 2022-02-03 4.50
SKU1 2022-02-03 0.56
SKU3 2022-02-03 4.32

Desired Output

order_id order_date item_sku item_quantity item_subtotal cost
1 2022-23-07 SKU1 7 12.34 0.78
SKU2 1 9.99 4.50
2 2022-12-07 SKU1 1 1.12 0.56
SKU3 5 32.54 4.32

I’m trying to get the product cost by finding the cost at the time of the order being placed.

SELECT order_id, order_date,
  ARRAY(
    SELECT AS STRUCT
    item_sku,
    item_quantity,
    item_subtotal,
    cost.product_cost
    FROM UNNEST(line_items) as items
    JOIN `price_history_table` as cost
    ON items.item_sku = cost.sku AND effective_date < order_date 
  ) AS line_items,
  FROM
  `order_data_table`

The above query works but creates a separate line_item array row for each record in the price history table.

How can I match on just the most recent price for that sku. I want to add something like this

ORDER BY effective_date DESC LIMIT 1

But can’t work out how to add it.

Advertisement

Answer

How can I match on just the most recent price for that sku

You need to add below line into subquery and move join out of select to address correlated subquery issue

QUALIFY 1 = ROW_NUMBER() OVER(PARTITION BY item_sku ORDER BY effective_date DESC)    

so, the final query will look like below

SELECT order_id, order_date,
  ARRAY_AGG(line_item) AS line_items
FROM (
  SELECT order_id, order_date,
      STRUCT(item_sku,
      item_quantity,
      item_subtotal,
      cost.product_cost) AS line_item
  FROM `order_data_table`, UNNEST(line_items) AS items
  JOIN `price_history_table` AS cost
  ON items.item_sku = cost.sku AND effective_date < order_date 
  QUALIFY 1 = ROW_NUMBER() OVER(PARTITION BY order_id, order_date, item_sku ORDER BY effective_date DESC)
)
GROUP BY order_id, order_date   

with output

enter image description here

User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement