Skip to content
Advertisement

Why does this simple query not use the index in postgres?

In my postgreSQL database I have a table named "product". In this table I have a column named "date_touched" with type timestamp. I created a simple btree index on this column. This is the schema of my table (I omitted irrelevant column & index definitions):

                                           Table "public.product"
          Column           |           Type           | Modifiers                              
---------------------------+--------------------------+-------------------
 id                        | integer                  | not null default nextval('product_id_seq'::regclass)
 date_touched              | timestamp with time zone | not null

Indexes:
    "product_pkey" PRIMARY KEY, btree (id)
    "product_date_touched_59b16cfb121e9f06_uniq" btree (date_touched)

The table has ~300,000 rows and I want to get the n-th element from the table ordered by "date_touched". when I want to get the 1000th element, it takes 0.2s, but when I want to get the 100,000th element, it takes about 6s. My question is, why does it take too much time to retrieve the 100,000th element, although I’ve defined a btree index?

Here is my query with explain analyze that shows postgreSQL does not use the btree index and instead sorts all rows to find the 100,000th element:

  • first query (100th element):
explain analyze
  SELECT product.id
  FROM product
  ORDER BY product.date_touched ASC
  LIMIT 1
  OFFSET 1000;
                                QUERY PLAN
-----------------------------------------------------------------------------------------------------
Limit  (cost=3035.26..3038.29 rows=1 width=12) (actual time=160.208..160.209 rows=1 loops=1)
->  Index Scan using product_date_touched_59b16cfb121e9f06_uniq on product  (cost=0.42..1000880.59 rows=329797 width=12) (actual time=16.651..159.766 rows=1001 loops=1)
Total runtime: 160.395 ms
  • second query (100,000th element):
explain analyze
  SELECT product.id
  FROM product
  ORDER BY product.date_touched ASC
  LIMIT 1
  OFFSET 100000;
                           QUERY PLAN                         
------------------------------------------------------------------------------------------------------
 Limit  (cost=106392.87..106392.88 rows=1 width=12) (actual time=6621.947..6621.950 rows=1 loops=1)
   ->  Sort  (cost=106142.87..106967.37 rows=329797 width=12) (actual time=6381.174..6568.802 rows=100001 loops=1)
         Sort Key: date_touched
         Sort Method: external merge  Disk: 8376kB
         ->  Seq Scan on product  (cost=0.00..64637.97 rows=329797 width=12) (actual time=1.357..4184.115 rows=329613 loops=1)
 Total runtime: 6629.903 ms

Advertisement

Answer

It is a very good thing, that SeqScan is used here. Your OFFSET 100000 is not a good thing for the IndexScan.

A bit of theory

Btree indexes contain 2 structures inside:

  1. balanced tree and
  2. double-linked list of keys.

First structure allows for fast keys lookups, second is responsible for the ordering. For bigger tables, linked list cannot fit into a single page and therefore it is a list of linked pages, where each page’s entries maintain ordering, specified during index creation.

It is wrong to think, though, that such pages are sitting together on the disk. In fact, it is more probable that those are spread across different locations. And in order to read pages based on the index’s order, system has to perform random disk reads. Random disk IO is expensive, compared to sequential access. Therefore good optimizer will prefer a SeqScan instead.

I highly recommend “SQL Performance Explained” book to better understand indexes. It is also available on-line.

What is going on?

Your OFFSET clause would cause database to read index’s linked list of keys (causing lots of random disk reads) and than discarding all those results, till you hit the wanted offset. And it is good, in fact, that Postgres decided to use SeqScan + Sort here — this should be faster.

You can check this assumption by:

  • running EXPLAIN (analyze, buffers) of your big-OFFSET query
  • than do SET enable_seqscan TO 'off';
  • and run EXPLAIN (analyze, buffers) again, comparing the results.

In general, it is better to avoid OFFSET, as DBMSes not always pick the right approach here. (BTW, which version of PostgreSQL you’re using?) Here’s a comparison of how it performs for different offset values.


EDIT: In order to avoid OFFSET one would have to base pagination on the real data, that exists in the table and is a part of the index. For this particular case, the following might be possible:

  • show first N (say, 20) elements
  • include maximal date_touched that is shown on the page to all the “Next” links. You can compute this value on the application side. Do similar for the “Previous” links, except include minimal date_touch for these.
  • on the server side you will get the limiting value. Therefore, say for the “Next” case, you can do a query like this:
SELECT id
  FROM product
 WHERE date_touched > $max_date_seen_on_the_page
 ORDER BY date_touched ASC
 LIMIT 20;

This query makes best use of the index.

Of course, you can adjust this example to your needs. I used pagination as it is a typical case for the OFFSET.

One more note — querying 1 row many times, increasing offset for each query by 1, will be much more time consuming, than doing a single batch query that returns all those records, which are then iterated from on the application side.

8 People found this is helpful
Advertisement