Does creating an index on a column that will always have a different value in each record (like a unique column) improves performances on SELECTs?
I understand that having an index on a column named ie. status which can have 3 values (such as PENDING, DONE, FAILED) and searching only FAILED in 1kk records will be faster. But what happens if I have a unique id (not the primary key) in 1kk records, and I’m doing a SELECT on that column?
Advertisement
Answer
An index on a unique column is actually better than an index on a column with a few values.
To understand why, you need a basic understanding of how databases manage storage. This is a high-level view.
The primary purpose of an index is to reduce the number of pages that need to be read for a query. The rows themselves are stored on data pages. If you don’t have an index, then all the data needs to be read.
The index is a data structure that makes it efficient to find a particular value. You can think of it as a sorted list, where a binary search is used to identify the right location. In actual fact, these are usually stored in a structure called b-trees (where the “b” stands for “balanced”, not “binary”) but that is an implementation detail. And there are types of indexes that don’t use b-trees.
So, if the values are unique, then an index is extremely helpful. Instead of doing a full table scan, the “row id” can efficiently be looked up in the index and then only one data page needs to be read.
Note that unique
constraints are implemented using indexes. If you have declared a column to be unique
, there is no need for an additional index because it is already there.