I have electric meter usage data for 22,000 meters that goes back 10 years (2012 to present) and it’s stored as hour intervals. So, each individual meter will have (10 years * 365 days * 24 hours) 87,600 records – one for each hour going back 10 years and the KW (usage) for that hour is recorded.
And for each month going back 10 years, I need to show the monthly KW peak and the Year, Month, Day and Hour it occurred grouped by the Substation, Feeder, and Rate. I don’t want to show any of the other hours, only the hour the summed KW peaked at for the month.
The table has around 2 billion records, and the raw records look like (I’m skipping hours, but each meter will have 24 records per day, hour 1 to 24, and the KW that was used for that hour):
Meter | Substation | Feeder | Rate | IntervalDate | IntervalHour | KW |
---|---|---|---|---|---|---|
11111 | 1 | W1 | Residential | 2022-06-12 | 1 | 2.7 |
11111 | 1 | W1 | Residential | 2022-06-12 | 2 | 5.9 |
11111 | 1 | W1 | Residential | 2022-06-12 | 24 | 8.3 |
22222 | 1 | W1 | Residential | 2022-06-13 | 1 | 4.8 |
22222 | 1 | W1 | Residential | 2022-06-13 | 2 | 2.0 |
22222 | 1 | W1 | Residential | 2022-06-13 | 24 | 3.7 |
11111 | 1 | W1 | Residential | 2022-07-25 | 1 | 10.5 |
11111 | 1 | W1 | Residential | 2022-07-25 | 2 | 4.5 |
11111 | 1 | W1 | Residential | 2022-07-25 | 24 | 4.5 |
22222 | 1 | W1 | Residential | 2022-07-25 | 1 | 15.5 |
22222 | 1 | W1 | Residential | 2022-07-25 | 2 | 7.5 |
22222 | 1 | W1 | Residential | 2022-07-25 | 24 | 5.5 |
33333 | 2 | G1 | Commercial | 2022-07-25 | 1 | 20.4 |
44444 | 3 | Z5 | Industrial | 2022-07-25 | 4 | 100.4 |
55555 | 5 | J5 | Commercial | 2022-07-25 | 24 | 5.7 |
The Meter can be ignored because it’s being grouped down to the Substation, Feeder, Rate, IntervalDate, IntervalHour level. There are 3 different rates, Residential, Commercial, and Industrial.
So, for every month for the past 10 years, I need to show the peak KW grouping by Substation, Feeder, and Rate and which Year, Day, and Hour the KW peak occurred.
@Greg_Pavlik below gets the summed totals using his first query:
WITH HOURLY_SUMS AS ( SELECT Substation, Feeder, Rate, IntervalDate, IntervalHour, SUM(KW) as TotalKW FROM T1 GROUP BY Substation, Feeder, Rate, IntervalDate, IntervalHour )
But, the result (not using above data) needs to look like this:
Year | Month | Day | Hour | Rate | Substation | Feeder | Summed KW |
---|---|---|---|---|---|---|---|
2012 | January | 23 | 8 | Residential | 3 | Z5 | 20.5 |
2012 | January | 15 | 9 | Commercial | 3 | Z5 | 40.7 |
2012 | January | 28 | 7 | Industrial | 3 | Z5 | 100.2 |
2012 | February | 23 | 8 | Residential | 3 | Z5 | 20.5 |
2012 | February | 15 | 9 | Commercial | 3 | Z5 | 40.7 |
2012 | February | 28 | 7 | Industrial | 3 | Z5 | 100.2 |
…. | …….. | .. | . | ………. | . | .. | ….. |
2022 | July | 09 | 6 | Residential | 3 | Z5 | 25.5 |
2022 | July | 12 | 5 | Commercial | 3 | Z5 | 48.7 |
2022 | July | 02 | 8 | Industrial | 3 | Z5 | 143.2 |
A substation has one or more feeders, which I’m not showing very well with my sample data. But my data has 33 different feeders.
So, the number of result records should be 11,880 (10 years * 12 months * 3 Rates * 33 Feeders).
I appreciate any guidance.
Advertisement
Answer
with agg as ( select Feeder, Substation, Rate, IntervalDate, IntervalHour, -- ties within a month are broken by picking the earliest row_number() over ( partition by Feeder, Substation, Rate, eomonth(IntervalDate) order by sum(KW) desc, IntervalDate, IntervalHour) as rn, sum(KW) as KW from T group by Feeder, Substation, Rate, IntervalDate, IntervalHour ) select year(IntervalDate) "Year", month(IntervalDate) "Month", day(IntervalDate) "Day", IntervalHour as "Hour", Rate, Substation, Feeder, KW as "Summed KW" from agg where rn = 1;