I have a table (Postgres 9.3) defined as follows:
CREATE TABLE tsrs (
id SERIAL PRIMARY KEY,
customer_id INTEGER NOT NULL REFERENCES customers,
timestamp TIMESTAMP WITHOUT TIME ZONE,
licensekeys_checksum VARCHAR(32));
The pertinent details here are the customer_id, the timestamp, and the licensekeys_checksum. There can be multiple entries with the same customer_id, some of those may have matching licensekey_checksum entries, and some may be different. There will never be rows with equal checksum and equal timestamps.
I want to return a table containing 1 row for each group of rows with matching licensekeys_checksum entries. The row returned for each group should be the one with the newest / most recent timestamp.
Sample Input:
1, 2, 2014-08-21 16:03:35, 3FF2561A 2, 2, 2014-08-22 10:00:41, 3FF2561A 2, 2, 2014-06-10 10:00:41, 081AB3CA 3, 5, 2014-02-01 12:03:23, 299AFF90 4, 5, 2013-12-13 08:14:26, 299AFF90 5, 6, 2013-09-09 18:21:53, 49FFA891
Desired Output:
2, 2, 2014-08-22 10:00:41, 3FF2561A 2, 2, 2014-06-10 10:00:41, 081AB3CA 3, 5, 2014-02-01 12:03:23, 299AFF90 5, 6, 2013-09-09 18:21:53, 49FFA891
I have managed to piece together a query based on the comments below, and hours of searching on the internet. 🙂
select * from tsrs
inner join (
select licensekeys_checksum, max(timestamp) as mts
from tsrs
group by licensekeys_checksum
) x on x.licensekeys_checksum = tsrs.licensekeys_checksum
and x.mts = tsrs.timestamp;
It seems to work, but I am unsure. Am I on the right track?
Advertisement
Answer
Your query in the question should perform better than the queries in the (previously) accepted answer. Test with EXPLAIN ANALYZE.
DISTINCT ON is typically simpler and faster:
SELECT DISTINCT ON (licensekeys_checksum) * FROM tsrs ORDER BY licensekeys_checksum, timestamp DESC NULLS LAST;
Detailed explanation: