This is probably a simple problem but I am quite a noob in SQL. I am using Impala. So I have data like this:
New_ID | Date | Old_ID |
---|---|---|
1 | 2020-11-14 12:41:21 | 0 |
1 | 2020-11-14 12:50:40 | 1 |
2 | 2020-10-14 15:22:00 | 1.5 |
2 | 2020-12-18 11:31:05 | 2 |
3 | 2020-11-14 12:42:25 | 3 |
Assuming that I group by New_ID, I need to check that the difference between the date and the date immediately following it (if one exists) is less that 2 months (just gonna assume that’s 60 days). If the difference is greater than 2 months then I need to change the New_ID to Old_ID. If it’s less than or equal to 2 months, then the New_ID can remain the same. Essentially, I would like the new table to look like this:
New_ID | Date | Old_ID |
---|---|---|
1 | 2020-11-14 12:41:21 | 0 |
1 | 2020-11-14 12:50:40 | 1 |
1.5 | 2020-10-14 15:22:00 | 1.5 |
2 | 2020-12-18 11:31:05 | 2 |
3 | 2020-11-14 12:42:25 | 3 |
I have tried this code snippit and variations of it, but 1. I am not sure how to handle null values and 2. I keep getting a syntax error ‘could not resolve column/field reference ‘day’ ‘
SELECT New_ID, Old_ID, Date, LAG(Date) OVER(partition by New_ID ORDER BY Date) as previous_date, case when datediff(day, previous_date, Date)/30.0 >= 2 then Old_ID else New_ID end as 'new_identifier' From MYTABLE;
Any pointers/suggestions would be greatly appreciated.
Advertisement
Answer
The Impala date function is months_between()
— and previous_date
is not recognized so you need to repeat the expression:
SELECT New_ID, Old_ID, Date, LAG(Date) OVER (partition by New_ID ORDER BY Date) as previous_date, (case when months_between(date, LAG(Date) OVER (partition by New_ID ORDER BY Date)) >= 2 then Old_ID else New_ID end) as new_identifier From MYTABLE;