I’m very new to Hiveql and would like some help with a very simple issue.
I have a table of hourly data, I’ve provided a simplified example below:
datetimetimezone ID value 202007130000+0100 001 2 202007130100+0100 001 3 202007130200+0100 001 4 202007130300+0100 001 1 202007140000+0100 002 3 202007140100+0100 002 7 202007140200+0100 002 9 202007140300+0100 002 2
The goal is to create an output where I have datetimetimezone stripped down to “yyyymmdd” format with values being aggregated (SUM) from hourly to daily. So I would end up with a new output like the below:
datetimetimezone ID value 20200713 001 10 20200714 002 21
I know I can use substr(datetimetimezone, 0, 8) to obtain the required date format, however I am unsure on how to aggregate ID values from hourly to daily as the second table above shows. Any help is welcome.
Advertisement
Answer
You seem to want the following aggregation query:
select substr(datetimetimezone, 1, 8) yyyymm, id, sum(value) value from mytable group by substr(datetimetimezone, 1, 8), id
Note that substr()
starts at 1
rather than 0
.