Extremely slow “row_number() over order” query

I have a users table with columns (id, xp, …) and around 1.5 million rows.

I am getting someone’s position in the XP leaderboard with the following query (which took 33 seconds to execute):

EXPLAIN ANALYZE WITH counts AS (
                SELECT DISTINCT
                                id,
                                ROW_NUMBER () OVER (ORDER BY xp DESC)
                FROM
                    users
            ) SELECT
                *
            FROM
                counts
            WHERE
                id=1;

​x
 
EXPLAIN ANALYZE WITH counts AS (                SELECT DISTINCT                                id,                                ROW_NUMBER () OVER (ORDER BY xp DESC)                FROM                    users            ) SELECT                *            FROM                counts            WHERE                id=1;​

Subquery Scan on counts  (cost=344492.80..395160.57 rows=7404 width=16) (actual time=30683.244..32174.117 rows=1 loops=1)
  Filter: (counts.id = '1'::bigint)
  Rows Removed by Filter: 1481060
  ->  HashAggregate  (cost=344492.80..376651.79 rows=1480702 width=24) (actual time=30679.440..32034.921 rows=1481061 loops=1)
        Group Key: users.id, row_number() OVER (?)"
        Planned Partitions: 64  Batches: 65  Memory Usage: 4369kB  Disk Usage: 125960kB
        ->  WindowAgg  (cost=212155.06..238067.34 rows=1480702 width=24) (actual time=2983.137..20302.548 rows=1481061 loops=1)
              ->  Sort  (cost=212155.06..215856.81 rows=1480702 width=16) (actual time=2983.082..5040.782 rows=1481061 loops=1)
                    Sort Key: users.xp DESC
                    Sort Method: external merge  Disk: 37760kB
                    ->  Seq Scan on users  (cost=0.00..35094.02 rows=1480702 width=16) (actual time=25.467..880.626 rows=1481061 loops=1)
Planning Time: 2.593 ms
JIT:
  Functions: 14
  Options: Inlining false, Optimization false, Expressions true, Deforming true"
  Timing: Generation 12.061 ms, Inlining 0.000 ms, Optimization 1.503 ms, Emission 26.086 ms, Total 39.650 ms"
Execution Time: 32325.206 ms

 
Subquery Scan on counts  (cost=344492.80..395160.57 rows=7404 width=16) (actual time=30683.244..32174.117 rows=1 loops=1)  Filter: (counts.id = '1'::bigint)  Rows Removed by Filter: 1481060  ->  HashAggregate  (cost=344492.80..376651.79 rows=1480702 width=24) (actual time=30679.440..32034.921 rows=1481061 loops=1)        Group Key: users.id, row_number() OVER (?)"        Planned Partitions: 64  Batches: 65  Memory Usage: 4369kB  Disk Usage: 125960kB        ->  WindowAgg  (cost=212155.06..238067.34 rows=1480702 width=24) (actual time=2983.137..20302.548 rows=1481061 loops=1)              ->  Sort  (cost=212155.06..215856.81 rows=1480702 width=16) (actual time=2983.082..5040.782 rows=1481061 loops=1)                    Sort Key: users.xp DESC                    Sort Method: external merge  Disk: 37760kB                    ->  Seq Scan on users  (cost=0.00..35094.02 rows=1480702 width=16) (actual time=25.467..880.626 rows=1481061 loops=1)Planning Time: 2.593 msJIT:  Functions: 14  Options: Inlining false, Optimization false, Expressions true, Deforming true"  Timing: Generation 12.061 ms, Inlining 0.000 ms, Optimization 1.503 ms, Emission 26.086 ms, Total 39.650 ms"Execution Time: 32325.206 ms​​

My table definition:

CREATE TABLE users
(
    id                    bigint                                        NOT NULL
        CONSTRAINT users_pkey
            PRIMARY KEY,
    xp                    bigint               DEFAULT 0                NOT NULL,
    ...
);
CREATE INDEX user_xp_leaderboard_index
    ON users (xp DESC, id ASC);

 
CREATE TABLE users(    id                    bigint                                        NOT NULL        CONSTRAINT users_pkey            PRIMARY KEY,    xp                    bigint               DEFAULT 0                NOT NULL,    ...);CREATE INDEX user_xp_leaderboard_index    ON users (xp DESC, id ASC);​

But it is extremely slow. Although it is not surprising considering it sorts the whole table and filter it, I don’t know how I could improve/optimize this query.

I did SET work_mem TO '1 GB';. It helped a bit but not much.

Any help would be appreciated. Thanks in advance.

Answer

You can write the query like this:

select count(*)
from users u 
where u.xp >= (select u2.xp from users u2 where u2.id = 1);

 
select count(*)from users u where u.xp >= (select u2.xp from users u2 where u2.id = 1);​

This can take advantage of an index on users(id, xp). This should totally eliminate any sorting. An index on users(xp) could also be helpful if the rows are quite wide and Postgres can use an index-only scan.

Advertisement

Answer