I know the importance of indexes and how order of joins can change performance. I’ve done a bunch of reading related to multi-column indexes and haven’t found the answer to my question.
I’m curious if I do a multi-column index, if the order that they are specified matters at all. My guess is that it would not, and that the engine would treat them as a group, where ordering doesn’t matter. But I wish to verify.
For example, from mysql’s website (http://dev.mysql.com/doc/refman/5.0/en/multiple-column-indexes.html)
CREATE TABLE test ( id INT NOT NULL, last_name CHAR(30) NOT NULL, first_name CHAR(30) NOT NULL, PRIMARY KEY (id), INDEX name (last_name,first_name) );
Would there be any benifit in any cases where the following would be better, or is it equivalent?
CREATE TABLE test ( id INT NOT NULL, last_name CHAR(30) NOT NULL, first_name CHAR(30) NOT NULL, PRIMARY KEY (id), INDEX name (first_name,last_name) );
Specificially:
INDEX name (last_name,first_name)
vs
INDEX name (first_name,last_name)
Advertisement
Answer
When discussing multi-column indexes, I use an analogy to a telephone book. A telephone book is basically an index on last name, then first name. So the sort order is determined by which “column” is first. Searches fall into a few categories:
If you look up people whose last name is Smith, you can find them easily because the book is sorted by last name.
If you look up people whose first name is John, the telephone book doesn’t help because the Johns are scattered throughout the book. You have to scan the whole telephone book to find them all.
If you look up people with a specific last name Smith and a specific first name John, the book helps because you find the Smiths sorted together, and within that group of Smiths, the Johns are also found in sorted order.
If you had a telephone book sorted by first name then by last name, the sorting of the book would assist you in the above cases #2 and #3, but not case #1.
That explains cases for looking up exact values, but what if you’re looking up by ranges of values? Say you wanted to find all people whose first name is John and whose last name begins with ‘S’ (Smith, Saunders, Staunton, Sherman, etc.). The Johns are sorted under ‘J’ within each last name, but if you want all Johns for all last names starting with ‘S’, the Johns are not grouped together. They’re scattered again, so you end up having to scan through all the names with last name starting with ‘S’. Whereas if the telephone book were organized by first name then by last name, you’d find all the Johns together, then within the Johns, all the ‘S’ last names would be grouped together.
So the order of columns in a multi-column index definitely matters. One type of query may need a certain column order for the index. If you have several types of queries, you might need several indexes to help them, with columns in different orders.
You can read my presentation How to Design Indexes, Really for more information.