Which metrics to compare when evaluating SQL query performance?

Question

I recently watched an online course about oracle SQL performance tuning. In the video, the lecturer constantly compares the COST value from the Autotrace when comparing the performance of two queries. But I've also read from other forums and websites where it states that COST is a relative value specific to that query and should not be used for an

Accepted Answer

In general, I would be very wary about comparing the cost between two queries unless you have a very specific reason to believe that makes sense.In general, people don&#8217;t look at the 99.9% of queries that the optimizer produces a (nearly) optimal plan for.  People look at queries where the optimizer has produced a decidedly sub-optimal plan.  The optimizer will produce a sub-optimal plan for one of two basic reasons&#8211; either it can&#8217;t transform a query into a form it can optimize (in which case a human likely needs to rewrite the query) or the statistics it is using to make its estimates are incorrect so what it thinks is an optimal plan is not.  (Of course, there are other reasons queries might be slow&#8211; perhaps the optimizer produced an optimal plan but the optimal plan is doing a table scan because an index is missing for example.)If I&#8217;m looking at a query that is slow and the query seems to be reasonably well-written and a reasonable set of indexes are available, statistics are the most likely source of problems.  Since cost is based entirely on statistics, however, that means that the optimizer&#8217;s cost estimates are incorrect.  If they are incorrect, the cost is roughly equally likely to be incorrectly high or incorrectly low.  If I look at the query plan for a query that I know needs to aggregate hundreds of thousands of rows to produce a report and I see that the optimizer has assigned it a single-digit cost, I know that somewhere along the line it is estimating that a step will return far too few rows.  In order to tune that query, I&#8217;m going to need the cost to go up so that the optimizer&#8217;s estimates accurately reflect reality.  If I look at the query plan for a query I know should only need to scan a handful of rows and I see a cost in the tens of thousands, I know that the optimizer is estimating that some step will return far too many rows.  In order to tune that query, I&#8217;m going to need the cost to go down so that the optimizer&#8217;s estimates reflect reality.If you use the gather_plan_statistics hint, you&#8217;ll see the estimated and actual row counts in your query plan.  If the optimizer&#8217;s estimates are close to reality, the plan is likely to be pretty good and cost is likely to be reasonably accurate.  If the optimizer&#8217;s estimates are off, the plan is likely to be poor and the cost is likely to be wrong.  Trying to use a cost metric to tune a query without first confirming that the cost is reasonably close to reality is seldom very productive.Personally, I would ignore cost and focus on metrics that are likely to be stable over time and that are actually correlated with performance.  My bias would be to focus on logical reads since most systems are I/O bound but you could use CPU time or elapsed time as well (elapsed time, though, tends not to be particularly stable because it depends on what happens to be in cache at the time the query is run).  If you&#8217;re looking at a plan, focus on the estimated vs. actual row counts not on the cost.

Advertisement

Answer