Which one is quicker/optimized – Inner Join or Partition By – to obtain Aggregated data?

Question

In my data &#8216;table1&#8217;, there are multiple records for each app_id. I&#8217;m trying to get a latest app_id entry. Which one of the below queries would be quick or better to consider in terms of performance &#8230;

Accepted Answer

The right answer to &#8220;which is faster&#8221; is to try the queries on your data and your systems.That said, there are some considerations in favor of row_number().  In particular, window functions are not an &#8220;accidental&#8221; feature in databases.  Adding a new string function is just a function and the function may or may not be optimized.On the other hand, window functions required rewriting/redesigning some fundamental components of the database engine.  In general, this was done with performance in mind.  So, I usually find that window functions are faster than equivalent constructs.The only exception that I regularly find (across databases) ironically applies in your case.  And, it is not using the join and group by.  Instead it is:select t1.*from table1 t1where t1.datetime = (select max(tt1.datetime)                     from table1 tt1                     where tt1.app_id = t1.app_id                    );Along with an index on table1(app_id, datetime).The basic reason for the performance improvement is that this scans table1 once and does an index lookup at each row.The join/group by is scanning the table multiple times and the aggregation is expensive.  The row_number() version scans the table (or index), calculates the value, and then brings the value back to every row &#8212; similar to two scans of the data.

Advertisement

Answer