Skip to content
Advertisement

Hive SQL – time interval in 5 minutes

My data is too large to analyze since it is collecting every second or so. To reduce the data, I will like to group into interval of 5 minutes. I tried converting into unix timestamp and reverting it back, but didn’t work.

I tried something like this

SELECT operation, 
value, 
from_unixtime(unix_timestamp(update_time)/300) as interval_5
FORM storage


Original data or output

operation   value   update_time
---------- ------- --------------------
sensor       190    2021-03-18 22:37:04
sensor       191    2021-03-18 22:37:05
sensor       190    2021-03-18 22:37:06
sensor       192    2021-03-18 22:37:07
sensor       193    2021-03-18 22:37:08

Desired output

operation   value   update_time
---------- ------- --------------------
sensor       190    2021-03-18 22:37:04
sensor       195    2021-03-18 22:42:04
sensor       197    2021-03-18 22:47:04
sensor       194    2021-03-18 22:52:04
sensor       190    2021-03-18 22:57:04

Advertisement

Answer

I don’t know Hive, but make sure that the division is an integr division truncating fractions, and then multiply back, losing the last fractional 300 seconds in the process:

from_unixtime(unix_timestamp(update_time)/300*300) as interval_5

If Hive Sql does not truncate (upon integer division by 300) you have to use a function like FLOOR to do so.

User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement