Clustered index – multi-part vs single-part index and effects of inserts/deletes

Question

This question is about what happens with the reorganizing of data in a clustered index when an insert is done. I assume that it should be more expensive to do inserts on a table which has a clustered ...

Accepted Answer

Yes, inserting into the middle of an existing table (or its page) could be expensive when you have a less than optimal clustered index. Worst case would be a page split : half the rows on the page would have to be moved elsewhere, and indices (including non-clustered indices on that table) need to be updated.You can alleviate that problem by using the right clustered index &#8211; one that ideally is:narrow (only a single field, as small as possible)static (never changes)unique (so that SQL Server doesn&#8217;t need to add 4-byte uniqueifiers to your rows) ever-increasing (like an INT IDENTITY)You want a narrow key (ideally a single INT) since each and every entry in each and every non-clustered index will also contain the clustering key(s) &#8211; you don&#8217;t want to put lots of columns in your clustering key, nor do you want to put things like VARCHAR(200) there!With an ever increasing clustered index, you will never see the case of a page split. The only fragmentation you could encounter is from deletes (&#8220;swiss cheese&#8221; problem).Check out Kimberly Tripp&#8217;s excellet blog posts on indexing &#8211; most notably:GUIDs as PRIMARY KEYs and/or the clustering keyThe Clustered Index Debate Continues&#8230; &#8211; this one actually shows that a good clustered index will speed up all operations &#8211; including inserts, delete etc., compared to a heap with no clustered index!Ever-increasing clustering key &#8211; the Clustered Index Debate&#8230;&#8230;&#8230;.again!  Assume there is a table (Junk) and  there are two queries that are done on  the table, the first query searches by  Name and the second query searches by  Name and Something. As I&#8217;m working on  the database I discovered that the  table has been created with two  indexes, one to support each query,  like so:That&#8217;s definitely not necessary &#8211; if you have one index on (Name, Something), that index can also and just as well be used if you search and restrict on just WHERE Name = abc &#8211; having a separate index with just the Name column is totally not needed and only wastes space (and costs time to be kept up to date).So basically, you only need a single index on (Name, Something), and I would agree with you &#8211; if you have no other indices on this table, then you should be able to make this the clustered key. Since that key won&#8217;t be ever-increasing and could possibly change, too (right?), this might not be such a great idea.The other option would be to introduce a surrogate ID INT IDENTITY and cluster on that &#8211; with two benefits: it&#8217;s all a good clustered key should be, including ever-increasing -> you&#8217;ll never have any issues with page splits and performance for INSERT operationsyou still get all the benefits of having a clustering key (see Kim Tripps&#8217; blog posts &#8211; clustered tables are almost always preferable to heaps)

Advertisement

Answer