I have a table of data about a user’s flight booking patterns on a website. Let’s assume the following data is all the historical data I have about my user.
The session_date
is the day that the user came onto the website and searched a specific route, while the flight_date
is the departure date of the flight. I have ordered the table by the session_date
. The outcome is recorded in booked
.
+---------+--------------+----------------+--------------+-------------+--------+ | user_id | session_date | departure_code | arrival_code | flight_date | booked | +---------+--------------+----------------+--------------+-------------+--------+ | user1 | 7 Jan | CA | MY | 8 Mar | 1 | | user1 | 8 Jan | US | MY | 18 May | 0 | | user1 | 8 Jan | US | MY | 18 May | 1 | | user1 | 8 Jan | CA | MY | 19 Mar | 0 | | user1 | 9 Jan | US | MY | 18 May | 1 | +---------+--------------+----------------+--------------+-------------+--------+
I would like to output a new column in my table, called previous_flight_date
. The new column will state with each search, the previous booked flight_date
for that specific route. Even if a user has searched that same route multiple times but never booked, the value in this column will be empty.
+-------+--------------+----------------+--------------+-------------+--------+----------------------+ | _id | session_date | departure_code | arrival_code | flight_date | booked | previous_flight_date | +-------+--------------+----------------+--------------+-------------+--------+----------------------+ | user1 | 7 Jan | CA | SG | 8 Mar | 1 | null | | user1 | 8 Jan | US | MY | 18 May | 0 | null | | user1 | 8 Jan | US | MY | 18 May | 1 | null | | user1 | 8 Jan | CA | SG | 19 Mar | 0 | 8 Mar | | user1 | 2 Feb | US | MY | 2 Jul | 1 | 18 May | +-------+--------------+----------------+--------------+-------------+--------+----------------------+
so for example, the column will be null until the 4th row which reflects ‘8 Mar’, as the user had booked a flight from CA–>SG, to depart on that day.
I have tried using LAST_VALUE but it didn’t work. I also do not know how I can use LAG() when I have multiple different types of routes, and I want to find previous rows on a condition. Would be great if a solution was suggested! thank you.
Advertisement
Answer
I think you can do this with first_value()
. The trick is to put a condition within the window function, turn on the ignore nulls
option, and then use a window frame specifications that looks back on the previous rows having the same departure/arrival, not including the current row:
select t.*, first_value(case when booked = 1 then flight_date end ignore nulls) over( partition by departure_code, arrival code order by flight_date desc rows between unbounded preceding and 1 preceding ) previous_flight_date from mytable t
Actually a window max()
would work too (and then, no need for ignore nulls
):
select t.*, max(case when booked = 1 then flight_date end) over( partition by departure_code, arrival code order by flight_date desc rows between unbounded preceding and 1 preceding ) previous_flight_date from mytable t