Why does this simple query not use the index in postgres?

Question

In my postgreSQL database I have a table named "product". In this table I have a column named "date_touched" with type timestamp. I created a simple btree index on this column. This is the schema of my table (I omitted irrelevant column & index definitions): The table has ~300,000 rows and I want to get the n-th element from the

Accepted Answer

It is a very good thing, that SeqScan is used here. Your OFFSET 100000 is not a good thing for the IndexScan.A bit of theoryBtree indexes contain 2 structures inside: balanced tree anddouble-linked list of keys.First structure allows for fast keys lookups, second is responsible for the ordering. For bigger tables, linked list cannot fit into a single page and therefore it is a list of linked pages, where each page&#8217;s entries maintain ordering, specified during index creation.It is wrong to think, though, that such pages are sitting together on the disk. In fact, it is more probable that those are spread across different locations. And in order to read pages based on the index&#8217;s order, system has to perform random disk reads. Random disk IO is expensive, compared to sequential access. Therefore good optimizer will prefer a SeqScan instead.I highly recommend “SQL Performance Explained” book to better understand indexes. It is also available on-line.What is going on?Your OFFSET clause would cause database to read index&#8217;s linked list of keys (causing lots of random disk reads) and than discarding all those results, till you hit the wanted offset. And it is good, in fact, that Postgres decided to use SeqScan + Sort here — this should be faster.You can check this assumption by:running EXPLAIN (analyze, buffers) of your big-OFFSET querythan do SET enable_seqscan TO 'off';and run EXPLAIN (analyze, buffers) again, comparing the results.In general, it is better to avoid OFFSET, as DBMSes not always pick the right approach here. (BTW, which version of PostgreSQL you&#8217;re using?)Here&#8217;s a comparison of how it performs for different offset values.EDIT: In order to avoid OFFSET one would have to base pagination on the real data, that exists in the table and is a part of the index. For this particular case, the following might be possible:show first N (say, 20) elementsinclude maximal date_touched that is shown on the page to all the “Next” links. You can compute this value on the application side. Do similar for the “Previous” links, except include minimal date_touch for these.on the server side you will get the limiting value. Therefore, say for the “Next” case, you can do a query like this:SELECT id  FROM product WHERE date_touched > $max_date_seen_on_the_page ORDER BY date_touched ASC LIMIT 20;This query makes best use of the index.Of course, you can adjust this example to your needs. I used pagination as it is a typical case for the OFFSET.One more note — querying 1 row many times, increasing offset for each query by 1, will be much more time consuming, than doing a single batch query that returns all those records, which are then iterated from on the application side.

Advertisement

Answer