Skip to content
Advertisement

Storing ‘Rank’ for Contests in Postgres

I’m trying to determine if there a “low cost” optimization for the following query. We’ve implemented a system whereby ‘tickets’ earn ‘points’ and thus can be ranked. In order to support analytical type of queries, we store the rank of every ticket (tickets can be tied) with the ticket.

I’ve found that, at scale, updating this rank is very slow. I’m attempting to run the scenario below on a set of “tickets” that is about 20k tickets big.

I’m hoping that someone can help identify why and offer some help.

We’re on postgres 9.3.6

Here’s a simplified ticket table schema:

Here’s the query that I’m executing:

Here’s the explain on a set of about 10k rows:

Advertisement

Answer

The correlated subquery has to be executed for every row (20k times in your example). This only makes sense for a small number of rows or where the computation requires it.

This derived table is computed once before we join to it:

Should be quite a bit faster. 🙂

Additional improvements

The last predicate rules out empty updates:

Only makes sense if the new rank can be the old rank at least occasionally. Else remove it.

We don’t need to repeat AND t.status != 'x' in the outer query since we join on the PK column id it’s the same value on both sides.
And the standard SQL inequality operator is <>, even if Postgres supports !=, too.

Push down the predicate event_id = <EVENT_ID> into the subquery as well. No need to compute numbers for any other event_id. This was handed down from the outer query in your original. In the rewritten query, we best apply it in the subquery altogether. Since we use PARTITION BY tt.event_id, that’s not going to mess with ranks.

User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement