I have a table with events, where start_dt
is a start and end_dt
an end of the event. This table is partitioned by dt
column derived from the end_dt
. This means that events that start before and end after midnight are present only in one partition. What I need to do is to split each event into as many rows as the number of dates the event was associated with. Is there any smart way to achieve it using the Presto SQL syntax?
Input:
id | start_dt | end_dt | dt ------+--------------------------+-------------------------+---------- 1 | 2020-09-24 21:56:12.669 | 2020-09-25 00:26:16.440 | 2020-09-25 2 | 2020-09-25 17:12:02.699 | 2020-09-25 17:42:02.699 | 2020-09-25 3 | 2020-09-23 23:47:29.146 | 2020-09-25 00:17:29.146 | 2020-09-25
Expected output:
id | start_dt | end_dt | dt ------+--------------------------+-------------------------+---------- 1 | 2020-09-24 21:56:12.669 | 2020-09-24 23:59:59.999 | 2020-09-24 1 | 2020-09-25 00:00:00.001 | 2020-09-25 00:26:16.440 | 2020-09-25 2 | 2020-09-25 17:12:02.699 | 2020-09-25 17:42:02.699 | 2020-09-25 3 | 2020-09-23 23:47:29.146 | 2020-09-23 23:59:59.999 | 2020-09-23 3 | 2020-09-24 00:00:00.001 | 2020-09-24 23:59:59.999 | 2020-09-24 3 | 2020-09-25 00:00:00.001 | 2020-09-25 00:17:29.146 | 2020-09-25
Advertisement
Answer
In Presto, you can use sequence()
to generate an array of dates. The rest is just unnesting and conditional logic:
select t.id, case when date(t.start_dt) = s.dt then t.start_dt else cast(s.dt as timestamp) end as new_start_dt, case when date(t.end_dt) = s.dt then t.end_dt else cast(s.dt as timestamp) + interval '1' day end as new_end_dt, s.dt from mytable t cross join unnest(sequence(date(t.start_dt), date(t.end_dt))) as s(dt)
Note that this generates dates intervals that start and end at midnight exactly: the half-open intervals logic makes more sense to me than the removing or adding a millisecond here and there. You can easily change that if you like.