Skip to content
Advertisement

Does adding the guaranteed where condition on sql can improve performance?

Let’s say there’s a table about logging website access.

(id is the only index column)

| id | domain     | logged_at                |
|----|------------|--------------------------|
| 1  | yahoo.com  | 2002-04-08T16:44:32.654Z |
| 2  | google.com | 2002-04-02T19:12:12.124Z |
| 3  | google.com | 2002-04-01T21:54:29.852Z |
| 4  | amazon.com | 2002-03-10T02:11:01.555Z |
| 5  | cnn.com    | 2002-05-11T09:26:34.292Z |
| 6  | amazon.com | 2002-04-02T22:06:18.119Z |
...
| 2468955885  | netflix.com  | 2011-01-08T16:44:32.654Z |
| 2468955886  | facebook.com | 2011-07-02T19:12:12.124Z |
| 2468955887  | uber.com     | 2011-05-01T21:54:29.852Z |
| 2468955888  | google.com   | 2011-02-10T02:11:01.555Z |
| 2468955889  | uber.com     | 2011-04-11T09:26:34.292Z |
| 2468955890  | amazon.com   | 2011-05-02T22:06:18.119Z |

I want to filter the uber.com rows.

I can guarantee that uber.com access log must exist later than 2009-03-01T00:00:00.000Z.

SQL-A
SELECT * FROM table where domain = 'uber.com';

SQL-B
SELECT * FROM table where domain = 'uber.com' AND logged_at > '2009-03-01T00:00:00.000Z'

Is there any performance difference between SQL-A and SQL-B?

Advertisement

Answer

In general, no.

I can think of two situations where there would be a performance impact:

  • There is an index starting with logged_at and no index with domain as the first column.
  • The table is partitioned by logged_at.

You did not mention anything in the question to suggest that either of these might be the case.

For a regular query, you want an index on (domain) or (domain, logged_at). Both queries would use both indexes and have very similar performance.

I should note that the second query incurs a small amount of overhead for the unnecessary date comparison. However, that is likely to be close to unmeasurable if you have a large amount of data.

User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement