Effectively select latest row for each group in a very large table?

Question

I have (for example&#8217;s sake) a table Users (user_id, status, timestamp, &#8230;). I also have another table SpecialUsers (user_id, &#8230;). I need to show each special user&#8217;s latest status. The problem is that the Users table is VERY, VERY LARGE (more than 50 Billion rows). Most of the solutions i…

Accepted Answer

Perhaps a join with a window function will work:select su.*from (select s.user_id, u.status, u.timestamp,             max(u.timestamp) over (partition by s.user_id) as max_timestamp      from specialusers s join           users u           on s.user_id = u.user_id     ) suwhere timestamp = max_timestamp;This specifically uses max() instead of row_number() on the speculation that it might use slightly fewer resources.

Advertisement

Answer