Problem with pg_try_advisory_lock and two connections (seemingly) getting the same lock

Started by Brett Hoernerover 15 years ago4 messagesgeneral

bretthoerner@gmail.com

over 15 years ago

Hi,

I currently have a simple queue written ontop of Postgres. Jobs are
inserted and workers periodically check for jobs they can do, do them,
and then delete the rows. pg_try_advisory_lock is used to (attempt
to) stop two workers from doing the same job.

(I'm working on moving to a "real" messaging queue right now, this is
more a point of curiosity and education now.)

Here is my queue table,

CREATE TABLE queue (
id serial NOT NULL PRIMARY KEY,
rcvd timestamp with time zone,
sent timestamp with time zone,
host character varying(32),
job character varying(32),
arg text
);

Here is an example query,

SELECT q.*
FROM (SELECT id, job, arg
FROM queue
WHERE job = 'foo' OR job = 'bar'
OFFSET 0) AS q
WHERE pg_try_advisory_lock(1, q.id)
LIMIT 10

(For information on OFFSET 0 see:
http://blog.endpoint.com/2009/04/offset-0-ftw.html)

Now if I have two workers running I will periodically see that each
worker gets a row with the same q.id (and thus does the work). How is
that possible? The outer query seemingly does a WHERE on an
advisory_lock.

Does anyone have any ideas? Am I grossly misusing advisory_locks?

Thanks,
Brett

Craig Ringer

craig@2ndquadrant.com

over 15 years ago

In reply to: Brett Hoerner (#1)

Re: Problem with pg_try_advisory_lock and two connections (seemingly) getting the same lock

On 21/07/10 07:27, Brett Hoerner wrote:

Here is an example query,

SELECT q.*
FROM (SELECT id, job, arg
FROM queue
WHERE job = 'foo' OR job = 'bar'
OFFSET 0) AS q
WHERE pg_try_advisory_lock(1, q.id)
LIMIT 10

(For information on OFFSET 0 see:
http://blog.endpoint.com/2009/04/offset-0-ftw.html)

Now if I have two workers running I will periodically see that each
worker gets a row with the same q.id (and thus does the work). How is
that possible? The outer query seemingly does a WHERE on an
advisory_lock.

Does anyone have any ideas? Am I grossly misusing advisory_locks?

You kick off two queries at once. Both have subqueries that grab a set
of id,job,arg . There's no exclusion at this stage, so they can easily
both land up with some or all of the same results.

THEN you filter the result. The filter will drop the result list to
empty if it can't acquire the lock. Under what circumstances can it not
acquire the lock? If another transaction holds it.

The first transaction might have grabbe the data, acquired the lock,
done its processing, and committed/rolled back to *release* the lock
before the second transaction gets around to checking the lock. In this
case, the second transaction will happily acquire the lock.

Classic race condition.

You should probably use one of the existing queuing mechanisms rather
than rolling your own, because building a high-performance, reliable
queueing mechanism is surprisingly hard to build. A search of the
archives here will turn up several options. I've noticed that PGQ from
Skytools gets mentioned a lot.

--
Craig Ringer

Merlin Moncure

mmoncure@gmail.com

over 15 years ago

In reply to: Craig Ringer (#2)

Re: Problem with pg_try_advisory_lock and two connections (seemingly) getting the same lock

On Tue, Jul 20, 2010 at 10:52 PM, Craig Ringer
<craig@postnewspapers.com.au> wrote:

On 21/07/10 07:27, Brett Hoerner wrote:

Here is an example query,

SELECT q.*
FROM (SELECT id, job, arg
FROM queue
WHERE job = 'foo' OR job = 'bar'
OFFSET 0) AS q
WHERE pg_try_advisory_lock(1, q.id)
LIMIT 10

(For information on OFFSET 0 see:
http://blog.endpoint.com/2009/04/offset-0-ftw.html)

Now if I have two workers running I will periodically see that each
worker gets a row with the same q.id (and thus does the work). How is
that possible? The outer query seemingly does a WHERE on an
advisory_lock.

Does anyone have any ideas? Am I grossly misusing advisory_locks?

You kick off two queries at once. Both have subqueries that grab a set
of id,job,arg . There's no exclusion at this stage, so they can easily
both land up with some or all of the same results.

THEN you filter the result. The filter will drop the result list to
empty if it can't acquire the lock. Under what circumstances can it not
acquire the lock? If another transaction holds it.

The first transaction might have grabbe the data, acquired the lock,
done its processing, and committed/rolled back to *release* the lock
before the second transaction gets around to checking the lock. In this
case, the second transaction will happily acquire the lock.

Classic race condition.

You should probably use one of the existing queuing mechanisms rather
than rolling your own, because building a high-performance, reliable
queueing mechanism is surprisingly hard to build. A search of the
archives here will turn up several options. I've noticed that PGQ from
Skytools gets mentioned a lot.

I generally agree with your statements, but there is one correction to
make: advisory locks are not released at end of transaction. You
don't have to worry about the race as long as you make sure the record
'get' and 'lock' are done in the same operation. This is indeed quite
tricky to get right, but I'm interested on academic grounds; I want to
know if the issue is the lock itself or the post lock handling. OP:
can we also see how the lock is released?

merlin

Craig Ringer

craig@2ndquadrant.com

over 15 years ago

In reply to: Merlin Moncure (#3)

Re: Problem with pg_try_advisory_lock and two connections (seemingly) getting the same lock

On 24/07/10 00:00, Merlin Moncure wrote:

I generally agree with your statements, but there is one correction to
make: advisory locks are not released at end of transaction.

Argh. Good point. Every other kind of lock is, but advisory locks are
only released when the connection is closed or the lock is explicitly
released.

Thanks for the correction.

--
Craig Ringer