My data is too large to analyze since it is collecting every second or so. To reduce the data, I will like to group into interval of 5 minutes. I tried converting into unix timestamp and reverting it back, but didn’t work.
I tried something like this
SELECT operation, value, from_unixtime(unix_timestamp(update_time)/300) as interval_5 FORM storage
Original data or output
operation value update_time ---------- ------- -------------------- sensor 190 2021-03-18 22:37:04 sensor 191 2021-03-18 22:37:05 sensor 190 2021-03-18 22:37:06 sensor 192 2021-03-18 22:37:07 sensor 193 2021-03-18 22:37:08
Desired output
operation value update_time ---------- ------- -------------------- sensor 190 2021-03-18 22:37:04 sensor 195 2021-03-18 22:42:04 sensor 197 2021-03-18 22:47:04 sensor 194 2021-03-18 22:52:04 sensor 190 2021-03-18 22:57:04
Advertisement
Answer
I don’t know Hive, but make sure that the division is an integr division truncating fractions, and then multiply back, losing the last fractional 300 seconds in the process:
from_unixtime(unix_timestamp(update_time)/300*300) as interval_5
If Hive Sql does not truncate (upon integer division by 300) you have to use a function like FLOOR to do so.