Skip to content
Advertisement

Get distinct sets of rows for an INSERT executed in concurrent transactions

I am implementing a simple pessimistic locking mechanism using Postgres as a medium. The goal is that multiple instances of an application can simultaneously acquire locks on distinct sets of users. The app instances are not trying to lock specific users. Instead they will take any user locks they can get.

Say, for example, we have three instances of the app running and there are currently 5 users that are not locked. All three instances attempt to acquire locks on up to three users at the same time. Their requests are served in arbitrary order. Ideally, the first instance served would acquire 3 user locks, the second would acquire 2 and the third would acquire no locks.

So far I have not been able to write a Query that accomplishes this. I’ll show you my best attempt so far.

Here are the tables for the example:

CREATE TABLE my_locks (
  id                  bigserial       PRIMARY KEY,
  user_id             bigint          NOT NULL UNIQUE
);

CREATE TABLE my_users (
  id                  bigserial       PRIMARY KEY,
  user_name           varchar(50)     NOT NULL
);

And this is the Query for acquiring locks:

INSERT INTO my_locks(user_id)
  SELECT u.id
  FROM my_users AS u
  LEFT JOIN my_locks
    AS l
    ON u.id = l.user_id
  WHERE l.user_id IS NULL
  LIMIT 3
RETURNING *

I had hoped that folding the collecting of lockable users and the insertion of the locks into the database into a single query would ensure that multiple simultaneous requests would be processed in their entirety one after the other.

It doesn’t work that way. If applied to the above example where three instances use this Query to simultaneously acquire locks on a pool of 5 users, one instance acquires three locks and the other instances receive an error for attempting to insert locks with non-unique user-IDs.

This is not ideal, because it prevents the locking mechanism from scaling. There are a number of workarounds to deal with this, but what I am looking for is a database-level solution. Is there a way to tweak the Query or DB configuration in such a way that multiple app instances can (near-)simultaneously acquire the maximum available number of locks in perfectly distinct sets?

Answer

The locking clause SKIP LOCKED should be perfect for you. Added with Postgres 9.5.
The manual:

With SKIP LOCKED, any selected rows that cannot be immediately locked are skipped.

FOR NO KEY UPDATE should be strong enough for your purpose. (Still allows other, non-exclusive locks.) And ideally, you take the weakest lock that’s strong enough.

Work with just locks

If you can do your work while a transaction locking involved users stays open, then that’s all you need:

BEGIN;

SELECT id FROM my_users
LIMIT  3
FOR    NO KEY UPDATE SKIP LOCKED;

-- do some work on selected users here  !!!

COMMIT;

Locks are gathered along the way and kept till the end of the current transaction. While the order can be arbitrary, we don’t even need ORDER BY. No waiting, no deadlock possible with SKIP LOCKED. Each transaction scans over the table and locks the first 3 rows still up for grabs. Very cheap and fast.

Since transaction might stay open for a while, don’t put anything else into the same transaction so not to block more than necessary.

Work with lock table additionally

If you can’t do your work while a transaction locking involved users stays open, register users in that additional table my_locks.

Before work:

INSERT INTO my_locks(user_id)
SELECT id FROM my_users u
WHERE  NOT EXISTS (
   SELECT FROM my_locks l
   WHERE  l.user_id = u.id
   )
LIMIT  3
FOR    NO KEY UPDATE SKIP LOCKED
RETRUNGING *;

No explicit transaction wrapper needed.

Users in my_locks are excluded in addition to those currently locked exclusively. That works under concurrent load. While each transaction is open, locks are active. Once those are released at the end of the transaction they have already been written to the locks table – and are visible to other transaction at the same time.

There’s a theoretical race condition for concurrent statements not seeing newly committed rows in the locks table just yet, and grabbing the same users after locks have just been released. But that would fail trying to write to the locks table. A UNIQUE constraint is absolute and will not allow duplicate entries, disregarding visibility.

Users won’t be eligible again until deleted from your locks table.

Further reading:

Aside:

… multiple simultaneous requests would be processed in their entirety one after the other.

It doesn’t work that way.

To understand how it actually works, read about the Multiversion Concurrency Control (MVCC) of Postgres in the manual, starting here.