Running the following code:
val sales = Seq( (0, 0, 0, 5), (1, 0, 1, 3), (2, 0, 2, 1), (3, 1, 0, 2), (4, 2, 0, 8), (5, 2, 2, 8)) .toDF("id", "orderID", "prodID", "orderQty") val orderedByID = Window.orderBy('id') val totalQty = sum('orderQty').over(orderedByID).as('running_total') val salesTotalQty = sales.select(*, totalQty).orderBy('id') salesTotalQty.show()
The result is:
+---+-------+------+--------+-------------+ | id|orderID|prodID|orderQty|running_total| +---+-------+------+--------+-------------+ | 0| 0| 0| 5| 5| | 1| 0| 1| 3| 8| | 2| 0| 2| 1| 9| | 3| 1| 0| 2| 11| | 4| 2| 0| 8| 19| | 5| 2| 2| 8| 27| +---+-------+------+--------+-------------+
There is no window frame defined in the above code, it looks the default window frame is rowsBetween(Window.unboundedPreceding, Window.currentRow)
Not sure my understanding about default window frame is correct
Advertisement
Answer
From Spark Gotchas
Default frame specification depends on other aspects of a given window defintion:
- if the ORDER BY clause is specified and the function accepts the frame specification, then the frame specification is defined by RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW,
- otherwise the frame specification is defined by ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING.