I want to split a single row into multiple rows based on time.
SrNo Employee StartDate EndDate --------------------------------------------------------------------------- 1 emp1 30/03/2020 09:00:00 31/03/2020 07:15:00 2 emp2 01/04/2020 09:00:00 02/04/2020 08:00:00
Expected output is below:
SrNo Employee StartDate EndDate --------------------------------------------------------------------------- 1 emp1 30/03/2020 09:00:00 30/03/2020 11:59:00 1 emp1 31/03/2020 00:00:00 31/03/2020 07:15:00 2 emp2 01/04/2020 09:00:00 01/04/2020 11:59:00 2 emp2 02/04/2020 00:00:00 02/04/2020 08:00:00
Day start from 00:00 AM to next day 00:00 AM. When EndDate
time is greater than 00:00 AM (midnight) then split this date in two rows. First row end date is 30/03/2020 11:59:00 and next row start 31/03/2020 00:00:00.
Please help me to get is solved.
Advertisement
Answer
This would be a good spot for a recursive CTE, but unfortunatly Hive does not support those. Here is another aproach, that uses a derived table of numbers to split the periods:
select t.SrNo, t.Employee, greatest(t.startDate, date_add(to_date(t.startDate), x.n)) startDate, least(t.endDate, date_add(to_date(t.startDate), x.n + 1)) endDate from mytable t inner join (select 0 n union all select 1 union all select 2) x on date_add(to_date(t.startDate), x.n) <= t.endDate
You can expand the subquery to handle more possible periods per row.
Also note that this generates half-open intervals, where the end of the preceding interval is equal to the start of the next one (while in your resultset there is a one minute lag). The logic is that the interval is inclusive on its smaller bound and exclusive on the the outer bound (that way, you make sure to not leave any gap).