Skip to content
Advertisement

Hiveql – Aggregating hourly data to daily

I’m very new to Hiveql and would like some help with a very simple issue.

I have a table of hourly data, I’ve provided a simplified example below:

datetimetimezone  ID  value
202007130000+0100 001 2
202007130100+0100 001 3
202007130200+0100 001 4
202007130300+0100 001 1
202007140000+0100 002 3
202007140100+0100 002 7
202007140200+0100 002 9
202007140300+0100 002 2

The goal is to create an output where I have datetimetimezone stripped down to “yyyymmdd” format with values being aggregated (SUM) from hourly to daily. So I would end up with a new output like the below:

datetimetimezone  ID  value
20200713          001 10
20200714          002 21

I know I can use substr(datetimetimezone, 0, 8) to obtain the required date format, however I am unsure on how to aggregate ID values from hourly to daily as the second table above shows. Any help is welcome.

Advertisement

Answer

You seem to want the following aggregation query:

select
    substr(datetimetimezone, 1, 8) yyyymm,
    id,
    sum(value) value
from mytable
group by substr(datetimetimezone, 1, 8), id

Note that substr() starts at 1 rather than 0.

User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement