Skip to content
Advertisement

BigQuery: lowest timestamp inside an array of objects and average between specific timestamps

Scenario: user has a subscription where the size of the product can be updated. This results in the following data set

Here’s a more visual example enter image description here

The subscription has a current size and an optional size history, which contains the updates and when they have been made.

My goals are to:

  • get the initial size of each subscription. If there is no history, it would be the createdAt timestamp. If there is history, it would be the “from” value of the sizeHistory array member the lowest timestamp
  • get the average value of the time it takes to go from one size to the other.

I got stuck on the first goal already without even considering the subscription without history case. Here’s what I have so far

The main reasoning was: for each subscription, get the correspondent sizeHistory element with the minimum timestamp. The problem is that the where condition is on the entire dataset, so I get just one subscriptions out (the one with the minimum timestamp).

Here’s an example of my ideal results structure (not based on the dummy data provided above):

I’m stuck on this for some days already, so any solution/help/hint would be really useful.

Advertisement

Answer

Please see my query below. I use several CTEs to break down the logic into manageable chunks; change the select statement at the end of the query to see what each is doing.

*Note: remove the interior CTE order by statements (but not in the window functions) before you run this against any actual data as it adds extra overhead.

User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement