I have a table called device_data
that looks like below
Column | Type | Collation | Nullable | Default ----------------+-----------------------------+-----------+----------+------------------------------------------------- id | integer | | | nextval('device_data_id_seq'::regclass) date | timestamp without time zone | | | packet_drop | real | | | jitter | real | | | latency | real | | | alert | character varying(50) | | |
It basically stores the packet drops
, jitter
, latency
on a minute basis.
Now there is another column called alert
that holds the value HIGH
, MEDIUM
and LOW
based on a threshold and holds empty string
if the threshold is not met.
Now I do an hourly average on the table, my query looks like below
select date_trunc('hour', date) as hourly, avg(jitter), avg(latency), avg(packet_drop) from device_data where date between '2022-11-26 17:41:11' and '2022-11-27 17:41:11' group by hourly order by hourly;
Output
hourly | avg | avg | avg ---------------------+--------------------+--------------------+--------------------- 2022-11-26 17:00:00 | 3.52857138642243 | 2.771428568022592 | 0 2022-11-26 18:00:00 | 2.484615419346553 | 2.815384602546692 | 0 2022-11-26 19:00:00 | 2.218461540570626 | 2.723076921242934 | 0 2022-11-26 20:00:00 | 5.098461512992015 | 2.7076923021903405 | 0 2022-11-26 21:00:00 | 2.0060606116824076 | 2.6469696814363655 | 0 2022-11-26 22:00:00 | 5.672307815345434 | 2.810769222332881 | 0 2022-11-26 23:00:00 | 2.7828124976949766 | 2.893749985843897 | 0 2022-11-27 00:00:00 | 2.6046153992414474 | 2.8030769238105187 | 0 2022-11-27 01:00:00 | 3.846031717837803 | 2.8333333200878568 | 0 ... (25 rows)
Next I do an hourly average on the table with additional column alert
, so my query looks something like below
select date_trunc('hour', date) as hourly, avg(jitter), avg(latency), avg(packet_drop), alert from device_data where date between '2022-11-26 17:41:11' and '2022-11-27 17:41:11' group by hourly, alert order by hourly;
Output
hourly | avg | avg | avg | alert ---------------------+--------------------+--------------------+---------------------+---------------- 2022-11-26 17:00:00 | 1.2649999938905239 | 2.755000001192093 | 0 | 2022-11-26 17:00:00 | 48.79999923706055 | 3.0999999046325684 | 0 | MEDIUM 2022-11-26 18:00:00 | 2.484615419346553 | 2.815384602546692 | 0 | 2022-11-26 19:00:00 | 2.218461540570626 | 2.723076921242934 | 0 | 2022-11-26 20:00:00 | 2.6603174461495307 | 2.695238092589 | 0 | 2022-11-26 20:00:00 | 106.5999984741211 | 3.0999999046325684 | 0 | HIGH 2022-11-26 20:00:00 | 57.20000076293945 | 3.0999999046325684 | 0 | MEDIUM 2022-11-26 21:00:00 | 2.0060606116824076 | 2.6469696814363655 | 0 | 2022-11-26 22:00:00 | 2.8349206649831364 | 2.793650784189739 | 0 | 2022-11-26 22:00:00 | 95.05000305175781 | 3.350000023841858 | 0 | HIGH 2022-11-26 23:00:00 | 2.7828124976949766 | 2.893749985843897 | 0 | 2022-11-27 00:00:00 | 2.132812526775524 | 2.796875 | 0 | 2022-11-27 00:00:00 | 32.79999923706055 | 3.200000047683716 | 0 | LOW 2022-11-27 01:00:00 | 1.849999995962266 | 2.822580631702177 | 0 | 2022-11-27 01:00:00 | 127.5999984741211 | 3.5 | 0 | HIGH ... (35 rows)
As you can see in my second query, some of the rows have duplicate timestamps like 2022-11-26 17:00:00
, 2022-11-26 20:00:00
and so on.
I understand that when averaging out the columns, in the case of alert
column, it encounters multiple values be it empty string along with some HIGH
, MEDIUM
or LOW
strings or some other combinations resulting in multiple rows for the same timestamp.
Requirement
What I want is to simply average out the values (initially ignoring the alert
column values) and then check for HIGH
string in the alert
column for that particular hour (I am doing hourly average). If the string is present, simply assign HIGH
for that row in the alert
column. If no HIGH
string found, just assign empty string
for that row in the alert
column. Something like below
Expected output
hourly | avg | avg | avg | alert ---------------------+--------------------+--------------------+---------------------+---------------- 2022-11-26 17:00:00 | 3.52857138642243 | 2.771428568022592 | 0 | 2022-11-26 18:00:00 | 2.484615419346553 | 2.815384602546692 | 0 | 2022-11-26 19:00:00 | 2.218461540570626 | 2.723076921242934 | 0 | 2022-11-26 20:00:00 | 5.098461512992015 | 2.7076923021903405 | 0 | HIGH 2022-11-26 21:00:00 | 2.0060606116824076 | 2.6469696814363655 | 0 | 2022-11-26 22:00:00 | 5.672307815345434 | 2.810769222332881 | 0 | HIGH 2022-11-26 23:00:00 | 2.7828124976949766 | 2.893749985843897 | 0 | 2022-11-27 00:00:00 | 2.6046153992414474 | 2.8030769238105187 | 0 | 2022-11-27 01:00:00 | 3.846031717837803 | 2.8333333200878568 | 0 | HIGH ... (25 rows)
I thought of only targeting rows containing HIGH
value in alert
column by doing something like where alert = 'HIGH'
, but then it will drop other rows and affect my average calculation.
How can I do this?
Advertisement
Answer
You need to use an aggregate function for the column ‘alert’, as well. Since you want to only bring up a certain value, you can use a conditional expression inside the aggregate, like this:
select date_trunc('hour', date) as hourly , avg(jitter) , avg(latency) , avg(packet_drop) , coalesce(MAX(case when alert='HIGH' then 'HIGH' end),'') as alert from device_data where date between '2022-11-26 17:41:11' and '2022-11-27 17:41:11' group by hourly order by hourly;
This should work, because the MAX function ignores the NULL values (all alert values other than HIGH will cause NULL result in the case expression.