I have a database that monitors a network (snapshots table, that contains a snapshot_date column). This production database was flooded by a faulty crontab, resulting in many snapshots for the same device every day.
I don’t won’t to remove everything, but i want to keep only one snapshot per snapshot_date and per device_id (column type is “timestamp without time zone”) so to reduce the number of entries in this table.
I don’t know any simple mechanism to do this in plain SQL. Can this be achieved ?
Advertisement
Answer
One option uses distinct on
:
select distinct on (snapshot_date, device_id) * from mytable order by snapshot_date, device_id, snapshot_id
This retains the one row per snapshot_date
and device_id
that has the smalles snapshot_id
. Note that this assumes that snapshot_id
is unique (or, at least, is unique for each (snapshot_date, device_id)
tuple).
If you wanted a delete
statement, then:
delete from mytable t using ( select snapshot_date, device_id, min(snapshot_id) snapshot_id from mytable group by snapshot_date, device_id ) t1 where t.snapshot_date = t1.snapshot_date and t.device_id = t1.device_id and t.snapshot_id < t1.id