SKIP LOCKED DATA (work in progress)
Hi
A couple of years ago I posted an outline of a plan [1]/messages/by-id/CADLWmXV7inA-HS=bJ-s6+Ai8DsVGx8zMohr0Ht38EWmXNeqPWw@mail.gmail.com and an
initial patch [2]/messages/by-id/CADLWmXUUjmrPU-+9ss7BCATxM-hr6__9mB6Wv7ry3-r+KXGgBw@mail.gmail.com for implementing SKIP LOCKED DATA. I have
recently come back to this idea, rebased the patch and added a
simple isolation test -- please see attached.
However, heap_lock_tuple is clearly not for the faint hearted,
and I freely admit that I don't understand half of the things
going on in there yet. My general approach has been to follow
the example set by NOWAIT, generalising that flag into a 3-way
policy... but of course NOWAIT doesn't have to worry about
cleaning anything up, because it uses ereport, hence my TODOs.
As always, I would be grateful for any feedback.
Thanks!
Thomas Munro
[1]: /messages/by-id/CADLWmXV7inA-HS=bJ-s6+Ai8DsVGx8zMohr0Ht38EWmXNeqPWw@mail.gmail.com
/messages/by-id/CADLWmXV7inA-HS=bJ-s6+Ai8DsVGx8zMohr0Ht38EWmXNeqPWw@mail.gmail.com
[2]: /messages/by-id/CADLWmXUUjmrPU-+9ss7BCATxM-hr6__9mB6Wv7ry3-r+KXGgBw@mail.gmail.com
/messages/by-id/CADLWmXUUjmrPU-+9ss7BCATxM-hr6__9mB6Wv7ry3-r+KXGgBw@mail.gmail.com
Attachments:
skip-locked-data-v3.patchtext/x-patch; charset=US-ASCII; name=skip-locked-data-v3.patchDownload+556-44
On 05/14/2014 07:06 AM, Thomas Munro wrote:
Hi
A couple of years ago I posted an outline of a plan [1] and an
initial patch [2] for implementing SKIP LOCKED DATA. I have
recently come back to this idea, rebased the patch and added a
simple isolation test -- please see attached.
Simon did some similar work a while ago, so I've cc'd him for his input.
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 13 May 2014 23:06, Thomas Munro <munro@ip9.org> wrote:
A couple of years ago I posted an outline of a plan [1] and an
initial patch [2] for implementing SKIP LOCKED DATA. I have
recently come back to this idea, rebased the patch and added a
simple isolation test -- please see attached.
I have attached a new version which copies the skipLocked member
in _copyPlanRowMark, _copyRowMarkClause and _copyLockClause.
Without these, SKIP LOCKED DATA didn't work in pl/pgsql (as reported
by David Rowley off-list). (Also some whitespace changes.)
Best regards,
Thomas Munro
Attachments:
skip-locked-data-v4.patchtext/x-patch; charset=US-ASCII; name=skip-locked-data-v4.patchDownload+559-44
I'm rebasing another implementation of this against current HEAD at the
moment. It was well tested but has bitrotted a bit, in particular it
needs merging with the multixact changes (eep).
That should provide a useful basis for comparison and a chance to share
ideas.
I'll follow up with the patch and a git tree when it's ready, hopefully
tonight.
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 05/16/2014 04:46 PM, Craig Ringer wrote:
I'll follow up with the patch and a git tree when it's ready, hopefully
tonight.
Here's a rebased version of Simon's original patch that runs on current
master.
I still need to merge the isolation tests for it merged and sorted out,
and after re-reading it I'd like to change "waitMode" into an enum, not
just some #defines .
Hope it's useful for comparison and ideas.
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
skip-locked.patchtext/x-patch; name=skip-locked.patchDownload+118-70
On 16 May 2014 13:21, Craig Ringer <craig@2ndquadrant.com> wrote:
On 05/16/2014 04:46 PM, Craig Ringer wrote:
I'll follow up with the patch and a git tree when it's ready, hopefully
tonight.Here's a rebased version of Simon's original patch that runs on current
master.I still need to merge the isolation tests for it merged and sorted out,
and after re-reading it I'd like to change "waitMode" into an enum, not
just some #defines .Hope it's useful for comparison and ideas.
Thank you! At first glance they're sort of similar which is reassuring.
I'm especially interested in the buffer release semantics which I was
confused about and will look into that (to resolve the TODO notes in my
patch).
I noticed that in applyLockingClause, Simon has "rc->waitMode |= waitMode".
Is that right? The values are 0, 1, and 2, but if you had both NOWAIT and
SKIP LOCKED somewhere in your query you could up with rc->waitMode == 3
unless I'm mistaken. In my patch I have code that would give precedence to
NOWAIT, though looking at it again it could be simpler. (One reviewer
pointed out, that it should really be a single unified enum. In fact I have
been a bit unsure about what scope such an enumeration should have in the
application -- could it even be used in parser code? I tried to follow
existing examples which is why I used #define macros in gram.y).
From a bikeshed colour point of view:
* I used SKIP LOCKED DATA like DB2, and Simon used SKIP LOCKED like
Oracle, and I guess shorter is sweeter
* I used the term wait_policy and an enum, Simon used waitMode and an int
* I had noWait and skipLocked travelling separately in some places, Simon
had a single parameter, which is much better
Best regards,
Thomas Munro
On 05/17/2014 05:24 AM, Thomas Munro wrote:
I noticed that in applyLockingClause, Simon has "rc->waitMode |=
waitMode". Is that right? The values are 0, 1, and 2, but if you had
both NOWAIT and SKIP LOCKED somewhere in your query you could up with
rc->waitMode == 3 unless I'm mistaken.
I do not think that |= is correct there.
It may be that no case can arise where you get the bogus value, but
since in all other places the values are tested for equalty not as bit
fields ( waitMode == NOWAIT not waitMode & NOWAIT ) it doesn't make
sense to |= it there.
? In my patch I have code that
would give precedence to NOWAIT, though looking at it again it could be
simpler.
I agree; if NOWAIT is present anywhere it should be preferred to
SKIP_LOCKED, as it's OK to apply NOWAIT where SKIP LOCKED appears, but
possibly semantically incorrect to apply SKIP LOCKED where only NOWAIT
was expected.
(One reviewer pointed out, that it should really be a single
unified enum. In fact I have been a bit unsure about what scope such an
enumeration should have in the application -- could it even be used in
parser code? I tried to follow existing examples which is why I used
#define macros in gram.y).
Not sure there.
From a bikeshed colour point of view:
* I used SKIP LOCKED DATA like DB2, and Simon used SKIP LOCKED like
Oracle, and I guess shorter is sweeter
We have a long tradition of trying to allow noise keywords where it's
harmless.
So the clause should probably be
SKIP LOCKED [DATA]
in much the same way we have
BEGIN [ WORK | TRANSACTION ] ...
There won't be any ambiguity there.
* I used the term wait_policy and an enum, Simon used waitMode and an int
I prefer an enum and intended to change Simon's patch but didn't have
the time.
I have some isolation tester and regression tests that are still to follow.
* I had noWait and skipLocked travelling separately in some places,
Simon had a single parameter, which is much better
Yes, I strongly prefer that.
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 17 May 2014 05:02, Craig Ringer <craig@2ndquadrant.com> wrote:
On 05/17/2014 05:24 AM, Thomas Munro wrote:
I noticed that in applyLockingClause, Simon has "rc->waitMode |=
waitMode". Is that right? The values are 0, 1, and
2, but if you had
both NOWAIT and SKIP LOCKED somewhere in your query you could up with
rc->waitMode == 3 unless I'm mistaken.I do not think that |= is correct there.
It may be that no case can arise where you get the bogus value, but
since in all other places the values are tested for equalty not as bit
fields ( waitMode == NOWAIT not waitMode & NOWAIT ) it doesn't make
sense to |= it there.
There are indeed ways (subselects etc) to have more than one
locking clause applying to the same table, which are supposed to
be merged into the appropriate wait policy by that line.
? In my patch I have code that
would give precedence to NOWAIT, though looking at it again it could be
simpler.I agree; if NOWAIT is present anywhere it should be preferred to
SKIP_LOCKED, as it's OK to apply NOWAIT where
SKIP LOCKED appears, but
possibly semantically incorrect to apply SKIP LOCKED where only NOWAIT
was expected.
In the new v5 patch (attached) I used the same approach as
the "lock strength" support, that is, relying on the values of
the enumeration being arranged so that the enumerators that
should take precendence have higher numerical values so I could
just write:
rc->strength = Max(rc->strength, strength);
+ rc->waitPolicy = Max(rc->waitPolicy, waitPolicy);
(One reviewer pointed out, that it should really be a single
unified enum. In fact I have been a bit unsure about what scope such an
enumeration should have in the application -- could it even be used in
parser code? I tried to follow existing examples which is why I used
#define macros in gram.y).Not sure there.
I removed those macros and went for an enumeration -- see below.
From a bikeshed colour point of view:
* I used SKIP LOCKED DATA like DB2, and Simon used SKIP LOCKED like
Oracle, and I guess shorter is sweeterWe have a long tradition of trying to allow noise keywords where it's
harmless.So the clause should probably be
SKIP LOCKED [DATA]
Ok, done in the v5 patch.
I have some isolation tester and regression tests that are still to
follow.
Great, I am looking forward to seeing those. I have added an
isolation test for NOWAIT, and I am planning to design some tests
that will test the various affected branched of
heap_lock_tuple (eg multi-transaction ID etc), and test
interaction with various other features (various lock stregths,
UPDATEs, combinations of SKIP LOCKED DATA, NOWAIT and default
policies etc).
* I had noWait and skipLocked travelling separately in some places,
Simon had a single parameter, which is much betterYes, I strongly prefer that.
I agree in principal but couldn't decide *how many* enumerations
to use. In v5 I have added two more:
* one called LockClauseWaitPolicy defined in parsenodes.h that is
used in LockClause and RowMarkCause
* another called RowWaitPolicy defined in plannodes.h that is
used in PlanRowMark and ExecRowMark
The parser produces LockClauseWaitPolicy values, which InitPlan
converts to RowWaitPolicy values by simple assignment (the values
are the same, which obviously wouldn't work if this were compiled
as C++). Then ExecLockRows converts those into LockWaitPolicy
values when calling heal_lock_tuple.
Obviously there is an opportunity to unify all three, or at least
LockClauseWaitPolicy (parser) and RowWaitPolicy (plan and
execution), but I didn't know how appropriate that would be, and
followed the new lock strength feature for inspiration.
One difference between my patch and Simon's is that mine
introduces a new return value for heap_lock_type called
HeapTupleWouldBlock, whereas his returns the existin
HeapTupleBeingUpdated and then handles it differently in
ExecLockRows if in SKIP LOCKED mode.
Best regards,
Thomas Munro
Attachments:
skip-locked-data-v5.patchtext/x-patch; charset=US-ASCII; name=skip-locked-data-v5.patchDownload+614-72
Hello
As a simple example for people wondering what the point of this
feature is, I created a table work (id, data, status)
and then create 10,000 items with status 'NEW' and then started
a number of worker threads that did the following pair of
transactions, with and without SKIP LOCKED DATA on the end of the
SELECT statement, until all rows were deleted:
BEGIN
SELECT id, data FROM work WHERE status = 'NEW' LIMIT 1 FOR UPDATE
-- if no rows returned, then finish
UPDATE work SET status = 'WORK' WHERE id = $id
COMMIT
BEGIN
DELETE FROM work WHERE id = $id
COMMIT
Here are the times taken to process all items, in elapsed seconds, on
a slow laptop (i5 4 core with an SSD, with default postgresql.conf
except for checkpoint_segments set to 300):
| Threads | default | SKIP |
| 1 | 26.78 | 27.02 |
| 2 | 23.46 | 22.00 |
| 3 | 22.02 | 14.83 |
| 4 | 22.59 | 11.16 |
| 5 | 22.37 | 9.05 |
| 6 | 22.55 | 7.66 |
| 7 | 22.46 | 6.69 |
| 8 | 22.57 | 8.39 |
| 9 | 22.40 | 8.38 |
| 10 | 22.38 | 7.93 |
| 11 | 22.43 | 6.86 |
| 12 | 22.34 | 6.77 |
I am not experienced at benchmarking and I don't claim that this
particular workload or configuration is particularly sensible or
representative of anything but it might give some idea of the
motivation.
Best regards,
Thomas Munro
On Sat, May 17, 2014 at 1:02 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
We have a long tradition of trying to allow noise keywords where it's
harmless.So the clause should probably be
SKIP LOCKED [DATA]
in much the same way we have
BEGIN [ WORK | TRANSACTION ] ...
There won't be any ambiguity there.
We've had some problems in the past where allowing optional noise
words resulted in grammar conflicts that made future features harder
to add. See previous discussions about LOCK TABLE, wherein we almost
went to the extreme of adding a completely separate ACQUIRE LOCK
command. A lot of these things seem harmless when you first do them,
and then later they seem less harmless.
Anyway, +1 for the general idea of this feature. It's come up a
number of times on this mailing list, and we've had customer requests
for it, too.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes:
On Sat, May 17, 2014 at 1:02 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
We have a long tradition of trying to allow noise keywords where it's
harmless.So the clause should probably be
SKIP LOCKED [DATA]
in much the same way we have
BEGIN [ WORK | TRANSACTION ] ...
There won't be any ambiguity there.
We've had some problems in the past where allowing optional noise
words resulted in grammar conflicts that made future features harder
to add.
In this particular case, I'd be worried about whether we'd not end up
having to fully reserve DATA in order to allow it to be optional here.
That would be necessary if this clause could be followed immediately
by an identifier, either now or in the future. That would be a mighty
high price to pay for failing to make up our minds about which syntax
to use. (How many tables out there do you think have "data" as a column
name?)
A different concern is that this patch adds not one but two new unreserved
keywords, ie SKIP and LOCKED. That bloats our parser tables, which are
too darn large already, and it has a nonzero compatibility cost (since
we only allow AS-less column aliases when they are no keyword at all).
If we're pulling syntax out of the air it'd be nice if we could avoid
adding new keywords to the grammar.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 23 May 2014 15:40, Tom Lane <tgl@sss.pgh.pa.us> wrote:
A different concern is that this patch adds not one but two new unreserved
keywords, ie SKIP and LOCKED. That bloats our parser tables, which are
too darn large already, and it has a nonzero compatibility cost (since
we only allow AS-less column aliases when they are no keyword at all).
If we're pulling syntax out of the air it'd be nice if we could avoid
adding new keywords to the grammar.
How about some of these combinations of existing words:
EXCLUDE LOCK
NOWAIT EXCLUDE
NOWAIT NEXT
NOWAIT FOLLOWING
NOWAIT DISCARD
Of those I think I prefer NOWAIT EXCLUDE (perhaps with NOWAIT ABORT as a
long version of the existing NOWAIT behaviour for contrast).
Or adding just one new keyword:
NOWAIT SKIP
SKIP LOCK
Regards,
Thomas Munro
On 23 May 2014 10:40, Tom Lane <tgl@sss.pgh.pa.us> wrote:
If we're pulling syntax out of the air it'd be nice if we could avoid
adding new keywords to the grammar.
Oracle, SQLServer and DB2 have this capability. MySQL does not.
SQLServer implements that using the table hint of READPAST. Since that
whole syntax area is radically different to what we have, it isn't
easy to maintain code compatibility.
DB2 z/OS 10 provides SKIP LOCKED DATA clause to allow moving past
already locked rows. That's fairly recent and I don't believe there
will be many programs using that. DB2 UDB supports some complex
functionality using DB2_SKIPINSERTED, DB2_EVALUNCOMMITTED and
DB2_SKIPDELETED, all of which is complex and mostly exists for
benchmarks, AFAICS.
Oracle uses both SKIP LOCKED and NOWAIT.
PostgreSQL already chose to follow the Oracle syntax when we
implemented NOWAIT. So my proposal is that we follow the Oracle syntax
again and use the words SKIP LOCKED.
I don't see any advantage in inventing new syntax that leaves us
incompatible with Oracle, nor do I see any need to be compatible with
both Oracle and DB2 since the latter is much less likely to gain us
anything in practice.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
2014-05-23 21:24 GMT+02:00 Simon Riggs <simon@2ndquadrant.com>:
On 23 May 2014 10:40, Tom Lane <tgl@sss.pgh.pa.us> wrote:
If we're pulling syntax out of the air it'd be nice if we could avoid
adding new keywords to the grammar.Oracle, SQLServer and DB2 have this capability. MySQL does not.
SQLServer implements that using the table hint of READPAST. Since that
whole syntax area is radically different to what we have, it isn't
easy to maintain code compatibility.DB2 z/OS 10 provides SKIP LOCKED DATA clause to allow moving past
already locked rows. That's fairly recent and I don't believe there
will be many programs using that. DB2 UDB supports some complex
functionality using DB2_SKIPINSERTED, DB2_EVALUNCOMMITTED and
DB2_SKIPDELETED, all of which is complex and mostly exists for
benchmarks, AFAICS.Oracle uses both SKIP LOCKED and NOWAIT.
PostgreSQL already chose to follow the Oracle syntax when we
implemented NOWAIT. So my proposal is that we follow the Oracle syntax
again and use the words SKIP LOCKED.I don't see any advantage in inventing new syntax that leaves us
incompatible with Oracle, nor do I see any need to be compatible with
both Oracle and DB2 since the latter is much less likely to gain us
anything in practice.
+1
Pavel
Show quoted text
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, May 23, 2014 at 3:24 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
PostgreSQL already chose to follow the Oracle syntax when we
implemented NOWAIT. So my proposal is that we follow the Oracle syntax
again and use the words SKIP LOCKED.I don't see any advantage in inventing new syntax that leaves us
incompatible with Oracle, nor do I see any need to be compatible with
both Oracle and DB2 since the latter is much less likely to gain us
anything in practice.
+1.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 23 May 2014 21:18, Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, May 23, 2014 at 3:24 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
PostgreSQL already chose to follow the Oracle syntax when we
implemented NOWAIT. So my proposal is that we follow the Oracle syntax
again and use the words SKIP LOCKED.I don't see any advantage in inventing new syntax that leaves us
incompatible with Oracle, nor do I see any need to be compatible with
both Oracle and DB2 since the latter is much less likely to gain us
anything in practice.+1.
Hello
Please find attached a rebased version of my SKIP LOCKED
patch (formerly SKIP LOCKED DATA), updated to support only the
Oracle-like syntax.
I posted a small test program here:
https://github.com/macdice/skip-locked-tests
Would anyone who is interested in a SKIP LOCKED feature and
attending the CHAR(14)/PGDay UK conference next week be
interested in a birds-of-a-feather discussion?
Best regards,
Thomas Munro
Attachments:
skip-locked-v6.patchtext/x-patch; charset=US-ASCII; name=skip-locked-v6.patchDownload+945-78
On 29 June 2014 10:01, Thomas Munro <munro@ip9.org> wrote:
Would anyone who is interested in a SKIP LOCKED feature and
attending the CHAR(14)/PGDay UK conference next week be
interested in a birds-of-a-feather discussion?
Sounds like a plan. I'll check my schedule.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sun, Jun 29, 2014 at 9:01 PM, Thomas Munro <munro@ip9.org> wrote:
Please find attached a rebased version of my SKIP LOCKED
patch (formerly SKIP LOCKED DATA), updated to support only the
Oracle-like syntax.
Hi Thomas,
Apologies for taking this long to get to reviewing this, I'd gotten a bit
side tracked with my own patches during this commitfest.
I'm really glad to see this patch is back again. I think it will be very
useful for processing queues. I could have made good use of it in my last
work, using it for sending unsent emails which were "queued" up in a table
in the database.
I've so far read over the patch and done only some brief tests of the
functionality.
Here's what I've picked up on so far:
* In heap_lock_tuple() there's a few places where you test the wait_policy,
but you're not always doing this in the same order. The previous code did
if (nolock) first each time, but since there's now 3 values I think if
(wait_policy == LockWaitError) should be first each time as likely this is
the most common case.
* The following small whitespace change can be removed in gram.y:
@@ -119,7 +119,6 @@ typedef struct PrivTarget
#define CAS_NOT_VALID 0x10
#define CAS_NO_INHERIT 0x20
-
#define parser_yyerror(msg) scanner_yyerror(msg, yyscanner)
#define parser_errposition(pos) scanner_errposition(pos, yyscanner)
* analyze.c. rc->waitPolicy = Max(rc->waitPolicy, waitPolicy);
I'm not quite sure I understand this yet, but I'm guessing the original
code was there so that something like:
SELECT * FROM (SELECT * FROM a FOR UPDATE) a FOR UPDATE NOWAIT;
Would give:
ERROR: could not obtain lock on row in relation "a"
So it seems that with the patch as you've defined in by the order of the
enum values in LockClauseWaitPolicy that SKIP LOCKED overrides FOR UPDATE.
I'm wondering if this is dangerous.
Should the following really skip locked tuples?
SELECT * FROM (SELECT * FROM a FOR UPDATE SKIP LOCKED) a FOR UPDATE LIMIT 1;
But on the other hand perhaps I've missed a discussion on this, if so then
I think the following comment should be updated to explain it all:
* We also consider that NOWAIT wins if it's specified both ways. This
* is a bit more debatable but raising an error doesn't seem helpful.
* (Consider for instance SELECT FOR UPDATE NOWAIT from a view that
* internally contains a plain FOR UPDATE spec.)
*
* plannodes.h -> RowWaitPolicy waitPolicy; /* NOWAIT and SKIP LOCKED DATA
options */
Should be "NOWAIT and SKIP LOCKED options" since DATA has now been removed
from the syntax.
* nodeLockRow.c has extra space in if condition: else if (erm->waitPolicy
== RWP_SKIP )
I'm also wondering about this block of code in general:
if (erm->waitPolicy == RWP_WAIT)
wait_policy = LockWaitBlock;
else if (erm->waitPolicy == RWP_SKIP )
wait_policy = LockWaitSkip;
else /* erm->waitPolicy == RWP_NOWAIT */
wait_policy = LockWaitError;
Just after this code heap_lock_tuple() is called, and if that
returns HeapTupleWouldBlock, the code does a goto lnext, which then will
repeat that whole block again. I'm wondering why there's 2 enums that are
for the same thing? if you just had 1 then you'd not need this block of
code at all, you could just pass erm->waitPolicy to heap_lock_tuple().
* parsenodes.h comment does not meet project standards (
http://www.postgresql.org/docs/devel/static/source-format.html)
typedef enum LockClauseWaitPolicy
{
/* order is important (see applyLockingClause which takes the greatest
value when several wait policies have been specified), and values must
match RowWaitPolicy from plannodes.h */
* parsenode.h remove "DATA" from LockClauseWaitPolicy waitPolicy; /* NOWAIT
and SKIP LOCKED DATA */
I have noticed the /* TODO -- work out what needs to be released here */
comments in head_lock_tuple(), and perhaps the lack of cleaning up is what
is causing the following:
create table skiptest (id int primary key); insert into skiptest (id)
select x.x from generate_series(1,1000000) x(x);
Session 1: begin work; select * from skiptest for update limit 999999;
Session 2: select * from skiptest for update skip locked limit 1;
WARNING: out of shared memory
ERROR: out of shared memory
HINT: You might need to increase max_locks_per_transaction.
Yet if I do:
session 1: begin work; select * from skiptest where id > 1 for update;
session 2: select * from skiptest for update skip locked limit 1;
id
----
1
(1 row)
That test makes me think your todo comments are in the right place,
something is missing there for sure!
* plays about with patch for a bit *
I don't quite understand how heap_lock_tuple works, as if I debug session 2
from the above set of queries the first call
to ConditionalLockTupleTuplock() (heapam.c line 4236) succeeds where I'd
have thought it would fail, since session 1 should be locking this tuple?
Later at line 4527 on the line if (!ConditionalXactLockTableWait(xwait)),
it fails to grab the lock and returns HeapTupleWouldBlock. The above query
seems to work ok if I just apply the following patch:
diff --git a/src/backend/access/heap/heapam.c
b/src/backend/access/heap/heapam.c
index b661d0e..b3e9dcc 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -4525,8 +4525,10 @@ l3:
else /* wait_policy == LockWaitSkip */
{
if
(!ConditionalXactLockTableWait(xwait))
- /* TODO -- work out what
needs to be released here */
+ {
+
UnlockTupleTuplock(relation, tid, mode);
return HeapTupleWouldBlock;
+ }
}
/* if there are updates, follow the update
chain */
Running the test query again, I get:
select * from skiptest for update skip locked limit 1;
id
---------
1000000
(1 row)
Would you also be able to supply a rebased patch with current master? The
applying the patch is a bit noisy and the isolation tests part does not
seem to apply at all now, so I've not managed to look at those yet.
I've still got more to look at, but hopefully this will get things moving
again.
Regards
David Rowley
On 23 July 2014 13:15, David Rowley <dgrowleyml@gmail.com> wrote:
* In heap_lock_tuple() there's a few places where you test the wait_policy,
but you're not always doing this in the same order. The previous code did if
(nolock) first each time, but since there's now 3 values I think if
(wait_policy == LockWaitError) should be first each time as likely this is
the most common case.
Ok, I have made the them consistent -- though I put LockWaitBlock
first as that is surely the most common case (plain old FOR UPDATE).
* The following small whitespace change can be removed in gram.y:
[snip]
Fixed.
* analyze.c. rc->waitPolicy = Max(rc->waitPolicy, waitPolicy);
I'm not quite sure I understand this yet, but I'm guessing the original code
was there so that something like:SELECT * FROM (SELECT * FROM a FOR UPDATE) a FOR UPDATE NOWAIT;
Would give:
ERROR: could not obtain lock on row in relation "a"
Surely only if another session already has one or more tuples locked?
So it seems that with the patch as you've defined in by the order of the
enum values in LockClauseWaitPolicy that SKIP LOCKED overrides FOR UPDATE.
I'm wondering if this is dangerous.
My understanding was that we have to choose a single policy for
attempts to lock rows in that relation, and if you've stated in at
least one place that you don't want to wait, then it makes sense for
NOWAIT to take precedence. I had to figure out how to fit SKIP LOCKED
into that hierarchy, and figured that NOWAIT should still be the
strongest policy, overriding all others, and SKIP LOCKED should
overrule default FOR UPDATE (ie blocking).
Should the following really skip locked tuples?
SELECT * FROM (SELECT * FROM a FOR UPDATE SKIP LOCKED) a FOR UPDATE LIMIT 1;
I agree that it's somewhat arbitrary... I suppose the two most
obvious options are either to reduce to a single policy using some
rule (such as the NOWAIT > SKIP LOCKED > default hierarchy I propose),
or simply reject such queries (which I'm guessing wouldn't fly at this
point since existing valid queries would become invalid).
But on the other hand perhaps I've missed a discussion on this, if so then I
think the following comment should be updated to explain it all:* We also consider that NOWAIT wins if it's specified both ways. This
* is a bit more debatable but raising an error doesn't seem helpful.
* (Consider for instance SELECT FOR UPDATE NOWAIT from a view that
* internally contains a plain FOR UPDATE spec.)
Modified to address SKIP LOCKED (there are also couple of words to the
same effect in select.sgml).
* plannodes.h -> RowWaitPolicy waitPolicy; /* NOWAIT and SKIP LOCKED DATA
options */
Should be "NOWAIT and SKIP LOCKED options" since DATA has now been removed
from the syntax.* nodeLockRow.c has extra space in if condition: else if (erm->waitPolicy ==
RWP_SKIP )
Fixed.
I'm also wondering about this block of code in general:
if (erm->waitPolicy == RWP_WAIT)
wait_policy = LockWaitBlock;
else if (erm->waitPolicy == RWP_SKIP )
wait_policy = LockWaitSkip;
else /* erm->waitPolicy == RWP_NOWAIT */
wait_policy = LockWaitError;Just after this code heap_lock_tuple() is called, and if that returns
HeapTupleWouldBlock, the code does a goto lnext, which then will repeat that
whole block again. I'm wondering why there's 2 enums that are for the same
thing? if you just had 1 then you'd not need this block of code at all, you
could just pass erm->waitPolicy to heap_lock_tuple().
True. Somewhere upthread I mentioned the difficulty I had deciding
how many enumerations were needed, for the various subsystems, ie
which headers and type they were allowed to share. Then I put off
working on this for so long that a nice example came along that showed
me the way: the lock strength enums LockTupleMode (heapam.h) and
RowMarkType (plannodes.h). The wait policy enums LockWaitPolicy
(heapam.h) and RowWaitPolicy (plannodes.h) mirror them closely, and
the same type of enumeration translation takes place in nodeLockRows.c
immediately above the code you pasted. I don't have any more
principled argument than monkey-see-monkey-do for that one...
* parsenodes.h comment does not meet project standards
(http://www.postgresql.org/docs/devel/static/source-format.html)typedef enum LockClauseWaitPolicy
{
/* order is important (see applyLockingClause which takes the greatest
value when several wait policies have been specified), and values must
match RowWaitPolicy from plannodes.h */
Fixed.
* parsenode.h remove "DATA" from LockClauseWaitPolicy waitPolicy; /* NOWAIT
and SKIP LOCKED DATA */
Fixed.
I have noticed the /* TODO -- work out what needs to be released here */
comments in head_lock_tuple(), and perhaps the lack of cleaning up is what
is causing the following:create table skiptest (id int primary key); insert into skiptest (id) select
x.x from generate_series(1,1000000) x(x);Session 1: begin work; select * from skiptest for update limit 999999;
Session 2: select * from skiptest for update skip locked limit 1;
WARNING: out of shared memory
ERROR: out of shared memory
HINT: You might need to increase max_locks_per_transaction.Yet if I do:
session 1: begin work; select * from skiptest where id > 1 for update;
session 2: select * from skiptest for update skip locked limit 1;
id
----
1
(1 row)That test makes me think your todo comments are in the right place,
something is missing there for sure!* plays about with patch for a bit *
I don't quite understand how heap_lock_tuple works, as if I debug session 2
from the above set of queries the first call to
ConditionalLockTupleTuplock() (heapam.c line 4236) succeeds where I'd have
thought it would fail, since session 1 should be locking this tuple? Later
at line 4527 on the line if (!ConditionalXactLockTableWait(xwait)), it fails
to grab the lock and returns HeapTupleWouldBlock. The above query seems to
work ok if I just apply the following patch:diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c index b661d0e..b3e9dcc 100644 --- a/src/backend/access/heap/heapam.c +++ b/src/backend/access/heap/heapam.c @@ -4525,8 +4525,10 @@ l3: else /* wait_policy == LockWaitSkip */ { if (!ConditionalXactLockTableWait(xwait)) - /* TODO -- work out what needs to be released here */ + { + UnlockTupleTuplock(relation, tid, mode); return HeapTupleWouldBlock; + } }/* if there are updates, follow the update
chain */Running the test query again, I get:
select * from skiptest for update skip locked limit 1;
id
---------
1000000
(1 row)
Ahh! Ok, take a look now, I included that change, but also made the
unlocking conditional on have_tuple_lock, in the second two cases
where HeapTupleWouldBlock is returned. In the first case it doesn't
look possible for it to be true.
Would you also be able to supply a rebased patch with current master? The
applying the patch is a bit noisy and the isolation tests part does not seem
to apply at all now, so I've not managed to look at those yet.
Done, and attached with the above changes.
I've still got more to look at, but hopefully this will get things moving
again.
Thank you very much for taking the time to look at this!
Best regards,
Thomas Munro
Attachments:
skip-locked-v7.patchtext/x-patch; charset=US-ASCII; name=skip-locked-v7.patchDownload+963-87
On 24 July 2014 00:52, Thomas Munro <munro@ip9.org> wrote:
On 23 July 2014 13:15, David Rowley <dgrowleyml@gmail.com> wrote:
I'm also wondering about this block of code in general:
if (erm->waitPolicy == RWP_WAIT)
wait_policy = LockWaitBlock;
else if (erm->waitPolicy == RWP_SKIP )
wait_policy = LockWaitSkip;
else /* erm->waitPolicy == RWP_NOWAIT */
wait_policy = LockWaitError;Just after this code heap_lock_tuple() is called, and if that returns
HeapTupleWouldBlock, the code does a goto lnext, which then will repeat that
whole block again. I'm wondering why there's 2 enums that are for the same
thing? if you just had 1 then you'd not need this block of code at all, you
could just pass erm->waitPolicy to heap_lock_tuple().True. Somewhere upthread I mentioned the difficulty I had deciding
how many enumerations were needed, for the various subsystems, ie
which headers and type they were allowed to share. Then I put off
working on this for so long that a nice example came along that showed
me the way: the lock strength enums LockTupleMode (heapam.h) and
RowMarkType (plannodes.h). The wait policy enums LockWaitPolicy
(heapam.h) and RowWaitPolicy (plannodes.h) mirror them closely, and
the same type of enumeration translation takes place in nodeLockRows.c
immediately above the code you pasted. I don't have any more
principled argument than monkey-see-monkey-do for that one...
On reflection, I agree that this sucks, and I would like to unify the
three new enums in the current patch (see below for recap) into one
that can be passed between parser, planner, executor and heap access
manager code as I think you are suggesting. My only question is: in
which header should the enum be defined, that all those modules could
include?
Best regards,
Thomas Munro
Enumeration explosion recap:
* parsenode.h defines enum LockClauseWaitPolicy, which is used in
the structs LockClause and RowMarkClause, for use by the parser
code
* plannodes.h defines enum RowWaitPolicy, which is used in the
structs PlanRowMark and ExecRowMark, for use by the planner and
executor code (numbers coincide with LockClauseWaitPolicy)
* heapam.h defines enum LockWaitPolicy, which is used as a
parameter to heap_lock_tuple, for use by heap access code
The parser produces LockClauseWaitPolicy values. InitPlan
converts these to RowWaitPolicy values in execMain.c. Then
nodeLockRows.c converts RowWaitPolicy values to LockWaitPolicy
values (by if-then-else) so it can call heap_lock_tuple.
This roughly mirrors what happens to lock strength information.
The unpatched code simply passes a boolean 'nowait' flag around.
An earlier version of my patch passed a pair of booleans around.
Simon's independent patch[1]/messages/by-id/537610D5.3090405@2ndquadrant.com uses an int in the various node structs
and the heap_lock_tuple function, and in execNode.h it has macros to
give names to the values, and that is included by access/heapm.h.
[1]: /messages/by-id/537610D5.3090405@2ndquadrant.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers