In my postgreSQL database I have a table named "product"
. In this table I have a column named "date_touched"
with type timestamp
. I created a simple btree index on this column. This is the schema of my table (I omitted irrelevant column & index definitions):
Table "public.product" Column | Type | Modifiers ---------------------------+--------------------------+------------------- id | integer | not null default nextval('product_id_seq'::regclass) date_touched | timestamp with time zone | not null Indexes: "product_pkey" PRIMARY KEY, btree (id) "product_date_touched_59b16cfb121e9f06_uniq" btree (date_touched)
The table has ~300,000 rows and I want to get the n-th element from the table ordered by "date_touched"
. when I want to get the 1000th element, it takes 0.2s, but when I want to get the 100,000th element, it takes about 6s. My question is, why does it take too much time to retrieve the 100,000th element, although I’ve defined a btree index?
Here is my query with explain analyze
that shows postgreSQL does not use the btree index and instead sorts all rows to find the 100,000th element:
- first query (100th element):
explain analyze SELECT product.id FROM product ORDER BY product.date_touched ASC LIMIT 1 OFFSET 1000; QUERY PLAN ----------------------------------------------------------------------------------------------------- Limit (cost=3035.26..3038.29 rows=1 width=12) (actual time=160.208..160.209 rows=1 loops=1) -> Index Scan using product_date_touched_59b16cfb121e9f06_uniq on product (cost=0.42..1000880.59 rows=329797 width=12) (actual time=16.651..159.766 rows=1001 loops=1) Total runtime: 160.395 ms
- second query (100,000th element):
explain analyze SELECT product.id FROM product ORDER BY product.date_touched ASC LIMIT 1 OFFSET 100000; QUERY PLAN ------------------------------------------------------------------------------------------------------ Limit (cost=106392.87..106392.88 rows=1 width=12) (actual time=6621.947..6621.950 rows=1 loops=1) -> Sort (cost=106142.87..106967.37 rows=329797 width=12) (actual time=6381.174..6568.802 rows=100001 loops=1) Sort Key: date_touched Sort Method: external merge Disk: 8376kB -> Seq Scan on product (cost=0.00..64637.97 rows=329797 width=12) (actual time=1.357..4184.115 rows=329613 loops=1) Total runtime: 6629.903 ms
Advertisement
Answer
It is a very good thing, that SeqScan is used here. Your OFFSET 100000
is not a good thing for the IndexScan.
A bit of theory
Btree indexes contain 2 structures inside:
- balanced tree and
- double-linked list of keys.
First structure allows for fast keys lookups, second is responsible for the ordering. For bigger tables, linked list cannot fit into a single page and therefore it is a list of linked pages, where each page’s entries maintain ordering, specified during index creation.
It is wrong to think, though, that such pages are sitting together on the disk. In fact, it is more probable that those are spread across different locations. And in order to read pages based on the index’s order, system has to perform random disk reads. Random disk IO is expensive, compared to sequential access. Therefore good optimizer will prefer a SeqScan
instead.
I highly recommend “SQL Performance Explained” book to better understand indexes. It is also available on-line.
What is going on?
Your OFFSET
clause would cause database to read index’s linked list of keys (causing lots of random disk reads) and than discarding all those results, till you hit the wanted offset. And it is good, in fact, that Postgres decided to use SeqScan
+ Sort
here — this should be faster.
You can check this assumption by:
- running
EXPLAIN (analyze, buffers)
of your big-OFFSET
query - than do
SET enable_seqscan TO 'off';
- and run
EXPLAIN (analyze, buffers)
again, comparing the results.
In general, it is better to avoid OFFSET
, as DBMSes not always pick the right approach here. (BTW, which version of PostgreSQL you’re using?)
Here’s a comparison of how it performs for different offset values.
EDIT: In order to avoid OFFSET
one would have to base pagination on the real data, that exists in the table and is a part of the index. For this particular case, the following might be possible:
- show first N (say, 20) elements
- include maximal
date_touched
that is shown on the page to all the “Next” links. You can compute this value on the application side. Do similar for the “Previous” links, except include minimaldate_touch
for these. - on the server side you will get the limiting value. Therefore, say for the “Next” case, you can do a query like this:
SELECT id FROM product WHERE date_touched > $max_date_seen_on_the_page ORDER BY date_touched ASC LIMIT 20;
This query makes best use of the index.
Of course, you can adjust this example to your needs. I used pagination as it is a typical case for the OFFSET
.
One more note — querying 1 row many times, increasing offset for each query by 1, will be much more time consuming, than doing a single batch query that returns all those records, which are then iterated from on the application side.