I have a table in hive contents are below
+---+----------+----------+----------+--------+ | pk| from_d| to_d| load_date| row_num| +---+----------+----------+----------+--------+ |111|2019-03-03|2019-03-03|2019-03-03| 1| |111|2019-02-02|2019-02-02|2019-02-02| 2| |111|2019-01-01|2019-01-01|2019-01-01| 3| |222|2019-03-03|2019-03-03|2019-03-03| 1| |222|2019-01-01|2019-01-01|2019-01-01| 2| |333|2019-02-02|2019-02-02|2019-02-02| 1| |333|2019-01-01|2019-01-01|2019-01-01| 2| |444|2019-02-02|2019-02-02|2019-02-02| 1| |555|2019-03-03|2019-03-03|2019-03-03| 1| +---+----------+----------+----------+--------+
Now I want to update the to_d
column where row_num > 1 using some conditions like below
when row_num = 2 then to_d column should have row_num 1 row's from_d - 1 day when row_num = 3 then to_d column should have row_num 2 row's from_d - 1 day and so on if row_num =1 then to_d should not be updated
Please don’t get me wrong I am just trying to help one of the other user’s question Updating column values based on the other table values in hive tables
I was trying this approach but I am unable proceed further from here
Advertisement
Answer
You can use LAG
function like below
select pk, from_d, case when row_num = 1 then to_d else date_sub(lag(to_d) over (), 1) end as to_d, row_num from table;
This will give you the desired result
+---+----------+----------+-------------------+ | pk| from_d| to_d|row_number_window_0| +---+----------+----------+-------------------+ |111|2019-03-03|2019-03-03| 1| |111|2019-02-02|2019-03-02| 2| |111|2019-01-01|2019-02-01| 3| |222|2019-03-03|2019-03-03| 1| |222|2019-01-01|2019-03-02| 2| |333|2019-02-02|2019-02-02| 1| |333|2019-01-01|2019-02-01| 2| |444|2019-02-02|2019-02-02| 1| |555|2019-03-03|2019-03-03| 1| +---+----------+----------+-------------------+