I am trying to understand how nested SUMs with OVER clause work in SQL. Let’s say i have the following schema in my DB
Time(timeID, date, month, year); District(districtID, districtName, region, state); Delivery(destinationDistrictID, timeID, numberOfDeliveries);
and suppose that i want to calculate the (number of deliveries of each region)/(number of deliveries of the state) on the same year. The query to do the calculation is:
SELECT region, year, (state), SUM(NumberOfDeliveries)/SUM(SUM(NumberOfDeliveries)) OVER (PARTITION BY state, year), FROM Delivery del, Time t, District dis WHERE t.timeID = del.timeID AND dis.DistrictID = del.destinationDistrictID GROUP BY region, year, state
I can’t understand how the PARTITION BY works in this case. Particularly i can’t understand which data are passed to the analytic function (OVER clause).
Does the PARTITION BY
works on the whole dataset or on the rows aggregated by the GROUP BY?
Thanks a lot to everyone.
Advertisement
Answer
Window functions are set based, and generally evaluate last in the query, after everything else has evaluated. Here is your call to SUM()
again:
SUM(SUM(NumberOfDeliveries)) OVER (PARTITION BY state, year)
This will evaluate after the GROUP BY
, at which point the intermediate table will have one record for each region, state, and year. Your analytic function call sums the sum over a window of all regions in each state and year. So your expression can be summarized as:
total number of deliveries in each region, state, and year / total number of deliveries in each state and year