Returning the row with the most recent timestamp from each group

I have a table (Postgres 9.3) defined as follows:

 CREATE TABLE tsrs (
     id SERIAL PRIMARY KEY,
     customer_id INTEGER NOT NULL REFERENCES customers,
     timestamp TIMESTAMP WITHOUT TIME ZONE,
     licensekeys_checksum VARCHAR(32));

​x
 
 CREATE TABLE tsrs (     id SERIAL PRIMARY KEY,     customer_id INTEGER NOT NULL REFERENCES customers,     timestamp TIMESTAMP WITHOUT TIME ZONE,     licensekeys_checksum VARCHAR(32));​

The pertinent details here are the customer_id, the timestamp, and the licensekeys_checksum. There can be multiple entries with the same customer_id, some of those may have matching licensekey_checksum entries, and some may be different. There will never be rows with equal checksum and equal timestamps.

I want to return a table containing 1 row for each group of rows with matching licensekeys_checksum entries. The row returned for each group should be the one with the newest / most recent timestamp.

Sample Input:

1, 2, 2014-08-21 16:03:35, 3FF2561A
2, 2, 2014-08-22 10:00:41, 3FF2561A
2, 2, 2014-06-10 10:00:41, 081AB3CA
3, 5, 2014-02-01 12:03:23, 299AFF90
4, 5, 2013-12-13 08:14:26, 299AFF90
5, 6, 2013-09-09 18:21:53, 49FFA891

 
1, 2, 2014-08-21 16:03:35, 3FF2561A2, 2, 2014-08-22 10:00:41, 3FF2561A2, 2, 2014-06-10 10:00:41, 081AB3CA3, 5, 2014-02-01 12:03:23, 299AFF904, 5, 2013-12-13 08:14:26, 299AFF905, 6, 2013-09-09 18:21:53, 49FFA891​

Desired Output:

2, 2, 2014-08-22 10:00:41, 3FF2561A
2, 2, 2014-06-10 10:00:41, 081AB3CA
3, 5, 2014-02-01 12:03:23, 299AFF90
5, 6, 2013-09-09 18:21:53, 49FFA891

 
2, 2, 2014-08-22 10:00:41, 3FF2561A2, 2, 2014-06-10 10:00:41, 081AB3CA3, 5, 2014-02-01 12:03:23, 299AFF905, 6, 2013-09-09 18:21:53, 49FFA891​

I have managed to piece together a query based on the comments below, and hours of searching on the internet. 🙂

select * from tsrs
inner join (
   select licensekeys_checksum, max(timestamp) as mts
   from tsrs
   group by licensekeys_checksum
   ) x on x.licensekeys_checksum = tsrs.licensekeys_checksum
      and x.mts = tsrs.timestamp;

 
select * from tsrsinner join (   select licensekeys_checksum, max(timestamp) as mts   from tsrs   group by licensekeys_checksum   ) x on x.licensekeys_checksum = tsrs.licensekeys_checksum      and x.mts = tsrs.timestamp;​

It seems to work, but I am unsure. Am I on the right track?

Answer

Your query in the question should perform better than the queries in the (previously) accepted answer. Test with EXPLAIN ANALYZE.

DISTINCT ON is typically simpler and faster:

SELECT DISTINCT ON (licensekeys_checksum) *
FROM   tsrs
ORDER  BY licensekeys_checksum, timestamp DESC NULLS LAST;

 
SELECT DISTINCT ON (licensekeys_checksum) *FROM   tsrsORDER  BY licensekeys_checksum, timestamp DESC NULLS LAST;​

db<>fiddle here
_{Old sqlfiddle}

Detailed explanation:

Select first row in each GROUP BY group?

Advertisement

Answer