BUG #17141: SELECT LIMIT WITH TIES FOR UPDATE SKIP LOCKED returns wrong number of rows

Started by PG Bug reporting formover 4 years ago11 messagesbugs
Jump to latest
#1PG Bug reporting form
noreply@postgresql.org

The following bug has been logged on the website:

Bug reference: 17141
Logged by: Emil Iggland
Email address: emil@iggland.com
PostgreSQL version: 13.3
Operating system: Windows
Description:

I am trying to create a queue which should assign multiple tasks to a
worker. I use row locking with FOR UPDATE SKIP LOCKED, but the number of
rows returned are inconsistent with what I expect.

Minimum example:

CREATE TABLE queue (task INTEGER);
INSERT INTO queue (task)
VALUES (180),(280),(380),(480),(580),(180),(280),(380),(480),(580);

BEGIN;
SELECT * FROM queue
ORDER BY task DESC
FETCH FIRST 1 ROWS WITH TIES
FOR UPDATE SKIP LOCKED;
/* Some work to be done here */
COMMIT;

select version();
PostgreSQL 13.3, compiled by Visual C++ build 1914, 64-bit

Expected result Worker 1: (580), (580), Actual result Worker 1: (580),
(580)
Expected result Worker 2: (480), (480), Actual result Worker 2: (480)

#2Emil Iggland
emil@iggland.com
In reply to: PG Bug reporting form (#1)
Re: BUG #17141: SELECT LIMIT WITH TIES FOR UPDATE SKIP LOCKED returns wrong number of rows

FWIW, the following query exhibits the behaviour that I expect, but
isn't really usable in production as I do not know how many rows to
expect per worker.

BEGIN;
SELECT * FROM queue
ORDER BY task DESC
LIMIT 2 OFFSET 0
FOR UPDATE SKIP LOCKED;
COMMIT;

#3Emil Iggland
emil@iggland.com
In reply to: PG Bug reporting form (#1)
Re: BUG #17141: SELECT LIMIT WITH TIES FOR UPDATE SKIP LOCKED returns wrong number of rows

As does

SELECT * FROM queue
ORDER BY task DESC
FETCH NEXT 1 ROWS WITH TIES
FOR UPDATE SKIP LOCKED;

Note the use of NEXT instead of FIRST

Show quoted text

On 2021-08-11 18:38, PG Bug reporting form wrote:

The following bug has been logged on the website:

Bug reference: 17141
Logged by: Emil Iggland
Email address: emil@iggland.com
PostgreSQL version: 13.3
Operating system: Windows
Description:

I am trying to create a queue which should assign multiple tasks to a
worker. I use row locking with FOR UPDATE SKIP LOCKED, but the number of
rows returned are inconsistent with what I expect.

Minimum example:

CREATE TABLE queue (task INTEGER);
INSERT INTO queue (task)
VALUES (180),(280),(380),(480),(580),(180),(280),(380),(480),(580);

BEGIN;
SELECT * FROM queue
ORDER BY task DESC
FETCH FIRST 1 ROWS WITH TIES
FOR UPDATE SKIP LOCKED;
/* Some work to be done here */
COMMIT;

select version();
PostgreSQL 13.3, compiled by Visual C++ build 1914, 64-bit

Expected result Worker 1: (580), (580), Actual result Worker 1: (580),
(580)
Expected result Worker 2: (480), (480), Actual result Worker 2: (480)

#4Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: PG Bug reporting form (#1)
Re: BUG #17141: SELECT LIMIT WITH TIES FOR UPDATE SKIP LOCKED returns wrong number of rows

On 2021-Aug-11, PG Bug reporting form wrote:

BEGIN;
SELECT * FROM queue
ORDER BY task DESC
FETCH FIRST 1 ROWS WITH TIES
FOR UPDATE SKIP LOCKED;
/* Some work to be done here */
COMMIT;

select version();
PostgreSQL 13.3, compiled by Visual C++ build 1914, 64-bit

Expected result Worker 1: (580), (580), Actual result Worker 1: (580), (580)
Expected result Worker 2: (480), (480), Actual result Worker 2: (480)

Ouch, we already saw this actually:
/messages/by-id/16676-fd62c3c835880da6@postgresql.org
The problem is that the first worker locks the first (480) row (even
though it does not return it), so the second worker skips it due to SKIP
LOCKED.

I have this on my list of things to look at, but it's not at the top
yet sadly ...

--
Álvaro Herrera

#5David Christensen
david.christensen@crunchydata.com
In reply to: Alvaro Herrera (#4)
Re: BUG #17141: SELECT LIMIT WITH TIES FOR UPDATE SKIP LOCKED returns wrong number of rows

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

On 2021-Aug-11, PG Bug reporting form wrote:

BEGIN;
SELECT * FROM queue
ORDER BY task DESC
FETCH FIRST 1 ROWS WITH TIES
FOR UPDATE SKIP LOCKED;
/* Some work to be done here */
COMMIT;

select version();
PostgreSQL 13.3, compiled by Visual C++ build 1914, 64-bit

Expected result Worker 1: (580), (580), Actual result Worker 1: (580), (580)
Expected result Worker 2: (480), (480), Actual result Worker 2: (480)

Ouch, we already saw this actually:
/messages/by-id/16676-fd62c3c835880da6@postgresql.org
The problem is that the first worker locks the first (480) row (even
though it does not return it), so the second worker skips it due to SKIP
LOCKED.

I have this on my list of things to look at, but it's not at the top
yet sadly ...

Yeah, I'd looked at this when I found it, and short of detecting the situation "WITH TIES FOR UPDATE
SKIP LOCKED" and erroring out, it seems like it would require adding in infrastructure that we don't
support (AFAIK) with unlocking an already locked row inside a transaction or reworking the order of
LockRows and Limit such that Limit comes first (and itself handles the WITH TIES) before handing to
LockRows. Either way (other than the error), it seems to be a fairly invasive change.

If someone has another idea on how to handle this, I could take a stab at things. Detecting the
situation and erroring seems like the easiest way to handle so you're at least not getting back bad
results, though I agree that the functionality would be useful if we *could* support it somehow.

David
--

#6Emil Iggland
emil@iggland.com
In reply to: David Christensen (#5)
Re: BUG #17141: SELECT LIMIT WITH TIES FOR UPDATE SKIP LOCKED returns wrong number of rows

I continued trying the variations in the linked thread.

* Sub-query
BEGIN;
SELECT * FROM (SELECT * FROM queue
ORDER BY task DESC
FETCH NEXT 1 ROWS WITH TIES) t
FOR UPDATE SKIP LOCKED;
COMMIT;

This behaves the same way, this does not work around the bug.
The same goes for my previous "find" with NEXT. I can not replicate the
working state, I must have done something wrong last night.

I added some more tasks with the same number in order to see if there
was a problem with the first row, or with the count.

I now have the following counts:
task count(*)
180 2
280 2
380 4
480 3
580 2

I attempted to select multiple tasks at the same time, representing a
case where a worker might select multiple tasks.

SELECT * FROM queue
ORDER BY task DESC
FETCH FIRST 3 ROWS WITH TIES
FOR UPDATE SKIP LOCKED;

Here I get three rows back (580), (580), (480)

If I run
SELECT * FROM queue ORDER BY task DESC
FETCH FIRST 3 ROWS WITH TIES;
I get back 5 rows (580), (580), (480), (480), (480) as expected.

/Emil

Show quoted text

On 2021-08-11 23:39, David Christensen wrote:

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

On 2021-Aug-11, PG Bug reporting form wrote:

BEGIN;
SELECT * FROM queue
ORDER BY task DESC
FETCH FIRST 1 ROWS WITH TIES
FOR UPDATE SKIP LOCKED;
/* Some work to be done here */
COMMIT;

select version();
PostgreSQL 13.3, compiled by Visual C++ build 1914, 64-bit

Expected result Worker 1: (580), (580), Actual result Worker 1: (580), (580)
Expected result Worker 2: (480), (480), Actual result Worker 2: (480)

Ouch, we already saw this actually:
/messages/by-id/16676-fd62c3c835880da6@postgresql.org
The problem is that the first worker locks the first (480) row (even
though it does not return it), so the second worker skips it due to SKIP
LOCKED.

I have this on my list of things to look at, but it's not at the top
yet sadly ...

Yeah, I'd looked at this when I found it, and short of detecting the situation "WITH TIES FOR UPDATE
SKIP LOCKED" and erroring out, it seems like it would require adding in infrastructure that we don't
support (AFAIK) with unlocking an already locked row inside a transaction or reworking the order of
LockRows and Limit such that Limit comes first (and itself handles the WITH TIES) before handing to
LockRows. Either way (other than the error), it seems to be a fairly invasive change.

If someone has another idea on how to handle this, I could take a stab at things. Detecting the
situation and erroring seems like the easiest way to handle so you're at least not getting back bad
results, though I agree that the functionality would be useful if we *could* support it somehow.

David

#7David Christensen
david.christensen@crunchydata.com
In reply to: Alvaro Herrera (#4)
Re: BUG #17141: SELECT LIMIT WITH TIES FOR UPDATE SKIP LOCKED returns wrong number of rows

On Wed, Aug 11, 2021 at 3:07 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2021-Aug-11, PG Bug reporting form wrote:

BEGIN;
SELECT * FROM queue
ORDER BY task DESC
FETCH FIRST 1 ROWS WITH TIES
FOR UPDATE SKIP LOCKED;
/* Some work to be done here */
COMMIT;

select version();
PostgreSQL 13.3, compiled by Visual C++ build 1914, 64-bit

Expected result Worker 1: (580), (580), Actual result Worker 1: (580), (580)
Expected result Worker 2: (480), (480), Actual result Worker 2: (480)

Ouch, we already saw this actually:
/messages/by-id/16676-fd62c3c835880da6@postgresql.org
The problem is that the first worker locks the first (480) row (even
though it does not return it), so the second worker skips it due to SKIP
LOCKED.

I have this on my list of things to look at, but it's not at the top
yet sadly ...

I have written a patch[1]/messages/by-id/CAOxo6XLPccCKru3xPMaYDpa+AXyPeWFs+SskrrL+HKwDjJnLhg@mail.gmail.com to detect this situation and error out
instead of silently not returning some of the rows it ostensibly
should have. I'm not convinced it's the *right* solution, as we may
want to allow the existing types of SELECT that currently trigger this
to run instead, but it is at least a solution and the start of a
discussion.

Best,

David

[1]: /messages/by-id/CAOxo6XLPccCKru3xPMaYDpa+AXyPeWFs+SskrrL+HKwDjJnLhg@mail.gmail.com

#8Emil Iggland
emil@iggland.com
In reply to: David Christensen (#7)
Re: BUG #17141: SELECT LIMIT WITH TIES FOR UPDATE SKIP LOCKED returns wrong number of rows

I tried to read the other thread to understand the underlying problem of
an extra row being locked, but couldn't find any good explanation.

I see a lot of thought being put into how this issue can be worked
around, but very little discussion on if this behaviour is correct or
not. Without having thought about it much deeper, this seems to only be
a problem with the "WITH TIES" clause which provokes this extra row
being locked. Perhaps that is where the problem should be attacked.

Show quoted text

On 2021-08-13 23:46, David Christensen wrote:

On Wed, Aug 11, 2021 at 3:07 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2021-Aug-11, PG Bug reporting form wrote:

BEGIN;
SELECT * FROM queue
ORDER BY task DESC
FETCH FIRST 1 ROWS WITH TIES
FOR UPDATE SKIP LOCKED;
/* Some work to be done here */
COMMIT;

select version();
PostgreSQL 13.3, compiled by Visual C++ build 1914, 64-bit

Expected result Worker 1: (580), (580), Actual result Worker 1: (580), (580)
Expected result Worker 2: (480), (480), Actual result Worker 2: (480)

Ouch, we already saw this actually:
/messages/by-id/16676-fd62c3c835880da6@postgresql.org
The problem is that the first worker locks the first (480) row (even
though it does not return it), so the second worker skips it due to SKIP
LOCKED.

I have this on my list of things to look at, but it's not at the top
yet sadly ...

I have written a patch[1] to detect this situation and error out
instead of silently not returning some of the rows it ostensibly
should have. I'm not convinced it's the *right* solution, as we may
want to allow the existing types of SELECT that currently trigger this
to run instead, but it is at least a solution and the start of a
discussion.

Best,

David

[1] /messages/by-id/CAOxo6XLPccCKru3xPMaYDpa+AXyPeWFs+SskrrL+HKwDjJnLhg@mail.gmail.com

#9Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Emil Iggland (#8)
Re: BUG #17141: SELECT LIMIT WITH TIES FOR UPDATE SKIP LOCKED returns wrong number of rows

On 2021-Aug-25, Emil Iggland wrote:

I see a lot of thought being put into how this issue can be worked
around, but very little discussion on if this behaviour is correct or
not. Without having thought about it much deeper, this seems to only be
a problem with the "WITH TIES" clause which provokes this extra row
being locked. Perhaps that is where the problem should be attacked.

The problem is that when the WITH TIES clause is added, we need to read
one extra row after the one that reaches the LIMIT count, in order to
verify whether it (the next one) should be included due to a tie. With
the executor structure that we currently have, there is no way to read
that row and not lock it. So a good fix would be to separate the act of
locking the row from the act of reading it.

--
Álvaro Herrera 39°49'30"S 73°17'W — https://www.EnterpriseDB.com/

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#9)
Re: BUG #17141: SELECT LIMIT WITH TIES FOR UPDATE SKIP LOCKED returns wrong number of rows

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

On 2021-Aug-25, Emil Iggland wrote:

I see a lot of thought being put into how this issue can be worked
around, but very little discussion on if this behaviour is correct or
not. Without having thought about it much deeper, this seems to only be
a problem with the "WITH TIES" clause which provokes this extra row
being locked. Perhaps that is where the problem should be attacked.

The problem is that when the WITH TIES clause is added, we need to read
one extra row after the one that reaches the LIMIT count, in order to
verify whether it (the next one) should be included due to a tie. With
the executor structure that we currently have, there is no way to read
that row and not lock it. So a good fix would be to separate the act of
locking the row from the act of reading it.

I'm not convinced that's a "good fix". SELECT FOR UPDATE generally
considers that it should return rows only if they still match the
WHERE condition after they're locked. The natural extension of that
to WITH TIES is that you ought to check for equality after locking.
So these features seem rather fundamentally in conflict.

regards, tom lane

#11Emil Iggland
emil@iggland.com
In reply to: Tom Lane (#10)
Re: BUG #17141: SELECT LIMIT WITH TIES FOR UPDATE SKIP LOCKED returns wrong number of rows

To be it would seem that a "peek()" type function would do a good job
also, you can always peek at the next row to determine if it is not a
candidate, if it is the "look and lock"-it.
Then you can keep both features.

Show quoted text

On 2021-10-03 00:30, Tom Lane wrote:

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

On 2021-Aug-25, Emil Iggland wrote:

I see a lot of thought being put into how this issue can be worked
around, but very little discussion on if this behaviour is correct or
not. Without having thought about it much deeper, this seems to only be
a problem with the "WITH TIES" clause which provokes this extra row
being locked. Perhaps that is where the problem should be attacked.

The problem is that when the WITH TIES clause is added, we need to read
one extra row after the one that reaches the LIMIT count, in order to
verify whether it (the next one) should be included due to a tie. With
the executor structure that we currently have, there is no way to read
that row and not lock it. So a good fix would be to separate the act of
locking the row from the act of reading it.

I'm not convinced that's a "good fix". SELECT FOR UPDATE generally
considers that it should return rows only if they still match the
WHERE condition after they're locked. The natural extension of that
to WITH TIES is that you ought to check for equality after locking.
So these features seem rather fundamentally in conflict.

regards, tom lane