I have table which contains power values (kW) for devices. Values are read from each device once a minute and inserted into table with timestamp. What I need to do is calculate power consumption (kWh) for given time span and return 10 most power consuming devices. Right now I query results for given time span and do calculation in backend looping all records. This works fine with small amount of devices and with short time span, but in real use case I could have thousands of devices and long time span.
So my question is how could I do this all in PostgreSQL 9.4.4 so that my query would return only 10 most power consuming (device_id, power_consumption) pairs?
Example table:
CREATE TABLE measurements ( id serial primary key, device_id integer, power real, created_at timestamp );
Simple data example:
| id | device_id | power | created_at | |----|-----------|-------|--------------------------| | 1 | 1 | 10 | August, 26 2015 08:23:25 | | 2 | 1 | 13 | August, 26 2015 08:24:25 | | 3 | 1 | 12 | August, 26 2015 08:25:25 | | 4 | 2 | 103 | August, 26 2015 08:23:25 | | 5 | 2 | 134 | August, 26 2015 08:24:25 | | 6 | 2 | 2 | August, 26 2015 08:25:25 | | 7 | 3 | 10 | August, 26 2015 08:23:25 | | 8 | 3 | 13 | August, 26 2015 08:24:25 | | 9 | 3 | 20 | August, 26 2015 08:25:25 |
Wanted results for query:
| id | device_id | power_consumption | |----|-----------|-------------------| | 1 | 1 | 24.0 | | 2 | 2 | 186.5 | | 3 | 3 | 28.0 |
Simplified example (created_at in hours) how I calculate kWh value:
data = [ [ { 'id': 1, 'device_id': 1, 'power': 10.0, 'created_at': 0 }, { 'id': 2, 'device_id': 1, 'power': 13.0, 'created_at': 1 }, { 'id': 3, 'device_id': 1, 'power': 12.0, 'created_at': 2 } ], [ { 'id': 4, 'device_id': 2, 'power': 103.0, 'created_at': 0 }, { 'id': 5, 'device_id': 2, 'power': 134.0, 'created_at': 1 }, { 'id': 6, 'device_id': 2, 'power': 2.0, 'created_at': 2 } ], [ { 'id': 7, 'device_id': 3, 'power': 10.0, 'created_at': 0 }, { 'id': 8, 'device_id': 3, 'power': 13.0, 'created_at': 1 }, { 'id': 9, 'device_id': 3, 'power': 20.0, 'created_at': 2 } ] ] # device_id: power_consumption results = { 1: 0, 2: 0, 3: 0 } for d in data: for i in range(0, len(d)): if i < len(d)-1: # Area between two records gives us kWh # X-axis is time(h) # Y-axis is power(kW) x1 = d[i]['created_at'] x2 = d[i+1]['created_at'] y1 = d[i]['power'] y2 = d[i+1]['power'] # Area between two records gives us kWh # X-axis is time(h) # Y-axis is power(kW) x1 = d[i]['created_at'] x2 = d[i+1]['created_at'] y1 = d[i]['power'] y2 = d[i+1]['power'] results[d[i]['device_id']] += ((x2-x1)*(y2+y1))/2 print results
EDIT: Check this to see how I ended up solving this.
Advertisement
Answer
If someone happens to wonder same thing here is how I solved this.
I followed instructions by David and made this:
SELECT t.device_id, sum(len_y*(extract(epoch from date_trunc('milliseconds', len_x)))/7200) AS total FROM ( SELECT m.id, m.device_id, m.power, m.created_at, m.power+lag(m.power) OVER ( PARTITION BY device_id ORDER BY m.created_at ) AS len_y, m.created_at-lag(m.created_at) OVER ( PARTITION BY device_id ORDER BY m.created_at ) AS len_x FROM mes AS m WHERE m.created_at BETWEEN '2015-08-26 13:39:57.834674'::timestamp AND '2015-08-26 13:43:57.834674'::timestamp ) AS t GROUP BY t.device_id ORDER BY total DESC LIMIT 10;