Skip to content
Advertisement

SQL Server range indexing ideas

I need help understanding how to create proper indexing on a table for fast range selects.

I have a table with the following columns:

ColumnType


frameidx — int

u — int

v — int

x — float(53)

y — float(53)

z — float(53)


None of these columns is unique.

There are to be approximately 30 million records in this table.

An average query would look something like this:

   Select x, y, z from tablename
   Where
       frameidx = 4 AND
       u between 34 AND 500
       v between 0 AND 200

Pretty straight forward, no joins, no nested stuff. Just good ol’ subset selection.

What sort of indexing should I do in MS SQL Server (2012) for this table in order to be able to fetch records (which can be in the thousands from this query) in (ideally) less than a 100ms, for example?

Thanks.

Advertisement

Answer

If you don’t have indices, SQL Server needs to scan the whole table to find the required data. For such a big table (30M rows), that’s time consuming.

If you have indices appropriate for your query, the SQL server will seek them (i.e. it will quickly find the required rows in the index, using the index structure). The index consists of the indexed column values, in the given index order, and pointers to the rows in the indexed table, so once the data is found in the index, the necessary data from the indexed table is recovered using those pointers.

SO, if you want to speed up thing, you need to create indexes for the columns which you’re going to use to filter the ranges.

Adding indexes will improve the query response time, but will also take up more space, and make the insertions slower. So you shouldn’t create a lot of indexes.

If you’re going to use all the columns for filtering all the time, you should make only one index. And, ideally, that index should be the more selective, i.e. the one that has the most different values (the least number of repeated values). Only one index can be used for each query.

If you’re going to use different sets of range filters, you should create more indexes.

Using a composite can be good or bad. In a composite key, the rows are ordered by all of the columns in the index. So, provided you index by A, B, C & D, filtering or ordering by A will give consecutive rows of the index, and it’s a quick operation. And filtering by A, B, C & D, is ideal for this index. However, filtering or ordering only by D, is the worst case for this index, because it will need to recover data spread all over the index: remember that the data is ordered by A, then B, then C, then D, so the D info is spread all over the index. Depending on several factors (table stats, index selectivity, and so on), it’s even possible that no index is used at all, and the table is scanned.

A final note on the clustered index: a clustered index defines the physical order in which the data is stored in the table. It doesn’t need to be unique. If you’re using one of the columns for filtering most of the times, it’s a good idea to make that the table’s clustered index, because, in this case, instead of seeking an index and finding the data in the indexed table using pointers, the table is sought directly, and that can improve performance.

So there is no simple answer, but I hope to know you have info to improve your query speed.

EDIT

Corrected info, according to a very interesting comment.

User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement