Skip to content
Advertisement

doing some of columns based on some complex logic in pyspark

enter image description here

Here is the question in the image attached:

Table:

So result column is calculated based on the below rules:

  1. If col3 >0 , then result=col1+col2
  2. If col 3=0, then result= sum (col2) till col3 >0 + col1(where col3>0)

for example for row =3, the result=60+70+80+30(from col1 from row 5 because here col3>0)=240 for row=4, the result=70+80+30(from col1 from row 5 because here col3>0)=180 similarly for others

Advertisement

Answer

This answers (correctly, I might add) the original version of the question.

In SQL, you can express this using window functions. Use a cumulative sum to define the group and the an additional cumulative sum:

Here is a db<>fiddle (which uses Postgres).

Note:

Your description says that the else logic should be:

Your example says:

And in my opinion, this seems most logical:

User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement