On conflict update & hint bits

Started by Konstantin Knizhnikover 9 years ago15 messages
#1Konstantin Knizhnik
k.knizhnik@postgrespro.ru

Hi,

I am faced with rarely reproduced problem at our multimaster (and never
at vanilla Postgres).
We are using our own customized transaction manager, so it may be
definitely the problem in our multimaster.
But stack trace looks suspiciously and this is why I want to consult
with people familiar with this code whether it is bug in
ExecOnConflictUpdate or not.

Briefly: ExecOnConflictUpdate tries to set hint bit without holding lock
on the buffer and so get assertion failure in MarkBufferDirtyHint.

Now stack trace:

#0 0x00007fe3b940acc9 in __GI_raise (sig=sig@entry=6) at
../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007fe3b940e0d8 in __GI_abort () at abort.c:89
#2 0x000000000097b996 in ExceptionalCondition (conditionName=0xb4d970
"!(LWLockHeldByMe(((LWLock*) (&(bufHdr)->content_lock))))",
errorType=0xb4d2e9 "FailedAssertion",
fileName=0xb4d2e0 "bufmgr.c", lineNumber=3380) at assert.c:54
#3 0x00000000007e365b in MarkBufferDirtyHint (buffer=946, buffer_std=1
'\001') at bufmgr.c:3380
#4 0x00000000009c3660 in SetHintBits (tuple=0x7fe396a9d858, buffer=946,
infomask=256, xid=1398) at tqual.c:136
#5 0x00000000009c5194 in HeapTupleSatisfiesMVCC (htup=0x7ffc00169030,
snapshot=0x2e79778, buffer=946) at tqual.c:1065
#6 0x00000000006ace83 in ExecCheckHeapTupleVisible (estate=0x2e81ae8,
tuple=0x7ffc00169030, buffer=946) at nodeModifyTable.c:197
#7 0x00000000006ae343 in ExecOnConflictUpdate (mtstate=0x2e81d50,
resultRelInfo=0x2e81c38, conflictTid=0x7ffc001690c0, planSlot=0x2e82428,
excludedSlot=0x2e82428, estate=0x2e81ae8,
canSetTag=1 '\001', returning=0x7ffc001690c8) at nodeModifyTable.c:1173
#8 0x00000000006ad256 in ExecInsert (mtstate=0x2e81d50, slot=0x2e82428,
planSlot=0x2e82428, arbiterIndexes=0x2e7eeb0,
onconflict=ONCONFLICT_UPDATE, estate=0x2e81ae8, canSetTag=1 '\001')
at nodeModifyTable.c:395
#9 0x00000000006aebe3 in ExecModifyTable (node=0x2e81d50) at
nodeModifyTable.c:1496

In ExecOnConflictUpdate buffer is pinned but not locked:

/*
* Lock tuple for update. Don't follow updates when tuple cannot be
* locked without doing so. A row locking conflict here means our
* previous conclusion that the tuple is conclusively committed is not
* true anymore.
*/
tuple.t_self = *conflictTid;
test = heap_lock_tuple(relation, &tuple, estate->es_output_cid,
lockmode, LockWaitBlock, false, &buffer,
&hufd);

heap_lock_tuple is pinning buffer but not locking it:
* *buffer: set to buffer holding tuple (pinned but not locked at exit)

Later we try to check tuple visibility:

ExecCheckHeapTupleVisible(estate, &tuple, buffer);

and inside HeapTupleSatisfiesMVCC try to set hint bit.

MarkBufferDirtyHint assumes that buffer is locked:
* 2. The caller might have only share-lock instead of exclusive-lock
on the
* buffer's content lock.

and we get assertion failure in

/* here, either share or exclusive lock is OK */
Assert(LWLockHeldByMe(BufferDescriptorGetContentLock(bufHdr)));

So the question is whether it is correct that ExecOnConflictUpdate tries
to access and update tuple without holding lock on the buffer?

Thank in advance,

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#2Peter Geoghegan
pg@heroku.com
In reply to: Konstantin Knizhnik (#1)
Re: On conflict update & hint bits

On Fri, Sep 30, 2016 at 5:33 PM, Konstantin Knizhnik
<k.knizhnik@postgrespro.ru> wrote:

Later we try to check tuple visibility:

ExecCheckHeapTupleVisible(estate, &tuple, buffer);

and inside HeapTupleSatisfiesMVCC try to set hint bit.

So, you're using repeatable read or serializable isolation level?

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Konstantin Knizhnik
k.knizhnik@postgrespro.ru
In reply to: Peter Geoghegan (#2)
Re: On conflict update & hint bits

On 30.09.2016 19:37, Peter Geoghegan wrote:

On Fri, Sep 30, 2016 at 5:33 PM, Konstantin Knizhnik
<k.knizhnik@postgrespro.ru> wrote:

Later we try to check tuple visibility:

ExecCheckHeapTupleVisible(estate, &tuple, buffer);

and inside HeapTupleSatisfiesMVCC try to set hint bit.

So, you're using repeatable read or serializable isolation level?

Repeatable read.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Peter Geoghegan
pg@heroku.com
In reply to: Konstantin Knizhnik (#1)
Re: On conflict update & hint bits

On Fri, Sep 30, 2016 at 5:33 PM, Konstantin Knizhnik
<k.knizhnik@postgrespro.ru> wrote:

So the question is whether it is correct that ExecOnConflictUpdate tries to
access and update tuple without holding lock on the buffer?

You're right -- this is a bug in Postgres.

I'm travelling from Ireland to the USA this weekend, but will work on
this early next week. I don't think it's a particularly tricky fix --
as you say, it is necessary to have at least a shared buffer lock to
call HeapTupleSatisfiesVisibility(), and we quite simply fail to
ensure that. We must have a shared buffer lock in the visibility-check
path for ON CONFLICT DO UPDATE where isolation level > READ COMMITTED
-- a buffer pin is not enough.

It also looks like the DO NOTHING variant is similarly affected, even
when the isolation level is READ COMMITTED. :-(

Thanks for the detailed report.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Peter Geoghegan
pg@heroku.com
In reply to: Peter Geoghegan (#4)
Re: On conflict update & hint bits

On Sat, Oct 1, 2016 at 1:15 PM, Peter Geoghegan <pg@heroku.com> wrote:

It also looks like the DO NOTHING variant is similarly affected, even
when the isolation level is READ COMMITTED. :-(

Actually, the DO NOTHING variant is also only affected in isolation
levels greater than READ COMMITTED. Not sure why I thought otherwise.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Peter Geoghegan
pg@heroku.com
In reply to: Peter Geoghegan (#4)
1 attachment(s)
Re: On conflict update & hint bits

On Sat, Oct 1, 2016 at 5:15 AM, Peter Geoghegan <pg@heroku.com> wrote:

On Fri, Sep 30, 2016 at 5:33 PM, Konstantin Knizhnik
<k.knizhnik@postgrespro.ru> wrote:

So the question is whether it is correct that ExecOnConflictUpdate tries to
access and update tuple without holding lock on the buffer?

You're right -- this is a bug in Postgres.

(Thomas Munro and Kevin Grittner added to CC list.)

Attached is a revision of Thomas' patch to fix a related issue; it now
also fixes this no-buffer-lock-held issue. He posted his version of
this to a -general mailing list thread a week ago [1]/messages/by-id/CAEepm=3Ra9NgDHocDBtB4iiB7MWdavQybNS3F47SvKh1Mk-mFQ@mail.gmail.com -- Peter Geoghegan. This was a
thread about spurious serialization failure with ON CONFLICT DO
NOTHING. I understand that Kevin is still not happy with the behavior
under SSI even with our fix, since serialization failures will still
occur more often than necessary (see other thread for details of what
I'm talking about).

I believe this patch to be a strict improvement on master, and I
suggest it be applied in advance of the deadline to get fixes in for
the next set of point releases. Note that this revision includes
Thomas' isolation test, which is completely unchanged. I still intend
to work with Kevin to do better than this under SSI, though that will
probably be for a future point release. The behavior proposed by my
fix here is the right thing for REPEATABLE READ isolation level, which
has nothing to do with Kevin's proposed more careful handling under
SSI. Besides, the buffer pin bug reported by Konstantin on this thread
really should be addressed ASAP.

[1]: /messages/by-id/CAEepm=3Ra9NgDHocDBtB4iiB7MWdavQybNS3F47SvKh1Mk-mFQ@mail.gmail.com -- Peter Geoghegan
--
Peter Geoghegan

Attachments:

0001-Fix-ON-CONFLICT-bugs-at-higher-isolation-levels.patchtext/x-patch; charset=US-ASCII; name=0001-Fix-ON-CONFLICT-bugs-at-higher-isolation-levels.patchDownload
From 648de70f226831af04e1d31329118c325161da0b Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <pg@bowt.ie>
Date: Wed, 19 Oct 2016 13:59:20 -0700
Subject: [PATCH] Fix ON CONFLICT bugs at higher isolation levels.

When used at higher isolation levels, ON CONFLICT DO NOTHING could raise
spurious serialization failure errors.  This happened when duplicate
values were proposed for insertion within a single ON CONFLICT DO
NOTHING command. (Similar ON CONFLICT DO UPDATE cases typically raised
valid cardinality violation errors instead.)

Separately, though in the same code path, insufficient care was taken
when a visibility check was performed on some existing, conflicting
tuple.  At least a shared buffer lock is required on any buffer
containing a tuple whose visibility is checked through tqual.c routines,
which may set hint bits (a buffer pin is therefore insufficient).

To fix the first issue, check if tuple was created by current
transaction as a further condition of raising a serialization failure
in relevant test.  To fix the second issue, outsource visibility test to
preexisting heap_fetch() call.

Patch by Thomas Munroe and Peter Geoghegan.  First bug reported by Jason
Dusek.  Second bug reported by Konstantin Knizhnik.

Report: <CAO3NbwOycQjt2Oqy2VW-eLTq2M5uGMyHnGm=RNga4mjqcYD7gQ@mail.gmail.com>
Report: <57EE93C8.8080504@postgrespro.ru>
Backpatch-through: 9.5
---
 src/backend/executor/nodeModifyTable.c             | 41 ++++++++++------------
 .../expected/insert-conflict-do-nothing-2.out      | 29 +++++++++++++++
 src/test/isolation/isolation_schedule              |  1 +
 .../specs/insert-conflict-do-nothing-2.spec        | 34 ++++++++++++++++++
 4 files changed, 82 insertions(+), 23 deletions(-)
 create mode 100644 src/test/isolation/expected/insert-conflict-do-nothing-2.out
 create mode 100644 src/test/isolation/specs/insert-conflict-do-nothing-2.spec

diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index af7b26c..3c4f458 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -179,7 +179,7 @@ ExecProcessReturning(ResultRelInfo *resultRelInfo,
 }
 
 /*
- * ExecCheckHeapTupleVisible -- verify heap tuple is visible
+ * ExecCheckTIDVisible -- fetch tuple, and verify its visibility
  *
  * It would not be consistent with guarantees of the higher isolation levels to
  * proceed with avoiding insertion (taking speculative insertion's alternative
@@ -187,23 +187,6 @@ ExecProcessReturning(ResultRelInfo *resultRelInfo,
  * Check for the need to raise a serialization failure, and do so as necessary.
  */
 static void
-ExecCheckHeapTupleVisible(EState *estate,
-						  HeapTuple tuple,
-						  Buffer buffer)
-{
-	if (!IsolationUsesXactSnapshot())
-		return;
-
-	if (!HeapTupleSatisfiesVisibility(tuple, estate->es_snapshot, buffer))
-		ereport(ERROR,
-				(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-			 errmsg("could not serialize access due to concurrent update")));
-}
-
-/*
- * ExecCheckTIDVisible -- convenience variant of ExecCheckHeapTupleVisible()
- */
-static void
 ExecCheckTIDVisible(EState *estate,
 					ResultRelInfo *relinfo,
 					ItemPointer tid)
@@ -212,14 +195,26 @@ ExecCheckTIDVisible(EState *estate,
 	Buffer		buffer;
 	HeapTupleData tuple;
 
-	/* Redundantly check isolation level */
 	if (!IsolationUsesXactSnapshot())
 		return;
 
 	tuple.t_self = *tid;
-	if (!heap_fetch(rel, SnapshotAny, &tuple, &buffer, false, NULL))
-		elog(ERROR, "failed to fetch conflicting tuple for ON CONFLICT");
-	ExecCheckHeapTupleVisible(estate, &tuple, buffer);
+	if (!heap_fetch(rel, estate->es_snapshot, &tuple, &buffer, true, NULL))
+	{
+		/*
+		 * Must not raise serialization failure when multiple values are
+		 * proposed for insertion by the same command with duplication among
+		 * values (this is at least possible with ON CONFLICT DO NOTHING, where
+		 * cardinality violation errors are never raised).  Avoid this with
+		 * final check that determines if tuple was inserted by another
+		 * transaction.
+		 */
+		if (!TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmin(tuple.t_data)))
+			ereport(ERROR,
+					(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+				 errmsg("could not serialize access due to concurrent update")));
+	}
+
 	ReleaseBuffer(buffer);
 }
 
@@ -1170,7 +1165,7 @@ ExecOnConflictUpdate(ModifyTableState *mtstate,
 	 * snapshot.  This is in line with the way UPDATE deals with newer tuple
 	 * versions.
 	 */
-	ExecCheckHeapTupleVisible(estate, &tuple, buffer);
+	ExecCheckTIDVisible(estate, resultRelInfo, &tuple.t_self);
 
 	/* Store target's existing tuple in the state's dedicated slot */
 	ExecStoreTuple(&tuple, mtstate->mt_existing, buffer, false);
diff --git a/src/test/isolation/expected/insert-conflict-do-nothing-2.out b/src/test/isolation/expected/insert-conflict-do-nothing-2.out
new file mode 100644
index 0000000..b379ab1
--- /dev/null
+++ b/src/test/isolation/expected/insert-conflict-do-nothing-2.out
@@ -0,0 +1,29 @@
+Parsed test spec with 2 sessions
+
+starting permutation: donothing1 c1 donothing2 c2
+step donothing1: INSERT INTO ints(key, val) VALUES(1, 'donothing1') ON CONFLICT DO NOTHING;
+step c1: COMMIT;
+step donothing2: INSERT INTO ints(key, val) VALUES(1, 'donothing2'), (1, 'donothing2') ON CONFLICT DO NOTHING;
+step c2: COMMIT;
+
+starting permutation: donothing2 c2 donothing1 c1
+step donothing2: INSERT INTO ints(key, val) VALUES(1, 'donothing2'), (1, 'donothing2') ON CONFLICT DO NOTHING;
+step c2: COMMIT;
+step donothing1: INSERT INTO ints(key, val) VALUES(1, 'donothing1') ON CONFLICT DO NOTHING;
+step c1: COMMIT;
+
+starting permutation: donothing1 donothing2 c1 c2
+step donothing1: INSERT INTO ints(key, val) VALUES(1, 'donothing1') ON CONFLICT DO NOTHING;
+step donothing2: INSERT INTO ints(key, val) VALUES(1, 'donothing2'), (1, 'donothing2') ON CONFLICT DO NOTHING; <waiting ...>
+step c1: COMMIT;
+step donothing2: <... completed>
+error in steps c1 donothing2: ERROR:  could not serialize access due to concurrent update
+step c2: COMMIT;
+
+starting permutation: donothing2 donothing1 c2 c1
+step donothing2: INSERT INTO ints(key, val) VALUES(1, 'donothing2'), (1, 'donothing2') ON CONFLICT DO NOTHING;
+step donothing1: INSERT INTO ints(key, val) VALUES(1, 'donothing1') ON CONFLICT DO NOTHING; <waiting ...>
+step c2: COMMIT;
+step donothing1: <... completed>
+error in steps c2 donothing1: ERROR:  could not serialize access due to concurrent update
+step c1: COMMIT;
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index a96a318..2606a27 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -25,6 +25,7 @@ test: eval-plan-qual
 test: lock-update-delete
 test: lock-update-traversal
 test: insert-conflict-do-nothing
+test: insert-conflict-do-nothing-2
 test: insert-conflict-do-update
 test: insert-conflict-do-update-2
 test: insert-conflict-do-update-3
diff --git a/src/test/isolation/specs/insert-conflict-do-nothing-2.spec b/src/test/isolation/specs/insert-conflict-do-nothing-2.spec
new file mode 100644
index 0000000..bbe92f7
--- /dev/null
+++ b/src/test/isolation/specs/insert-conflict-do-nothing-2.spec
@@ -0,0 +1,34 @@
+# INSERT...ON CONFLICT DO NOTHING test with multiple rows in higher isolation
+
+setup
+{
+  CREATE TABLE ints (key int primary key, val text);
+}
+
+teardown
+{
+  DROP TABLE ints;
+}
+
+session "s1"
+setup
+{
+  BEGIN ISOLATION LEVEL REPEATABLE READ;
+}
+step "donothing1" { INSERT INTO ints(key, val) VALUES(1, 'donothing1') ON CONFLICT DO NOTHING; }
+step "c1" { COMMIT; }
+step "a1" { ABORT; }
+
+session "s2"
+setup
+{
+  BEGIN ISOLATION LEVEL REPEATABLE READ;
+}
+step "donothing2" { INSERT INTO ints(key, val) VALUES(1, 'donothing2'), (1, 'donothing2') ON CONFLICT DO NOTHING; }
+step "c2" { COMMIT; }
+step "a2" { ABORT; }
+
+permutation "donothing1" "c1" "donothing2" "c2"
+permutation "donothing2" "c2" "donothing1" "c1"
+permutation "donothing1" "donothing2" "c1" "c2"
+permutation "donothing2" "donothing1" "c2" "c1"
-- 
2.7.4

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Geoghegan (#6)
Re: On conflict update & hint bits

Peter Geoghegan <pg@heroku.com> writes:

Attached is a revision of Thomas' patch to fix a related issue; it now
also fixes this no-buffer-lock-held issue.

I think this needs more work.

1. Why are you replacing ExecOnConflictUpdate's ExecCheckHeapTupleVisible
call with ExecCheckTIDVisible? That appears to demand a re-fetch of the
tuple ExecOnConflictUpdate already has its hands on. Wouldn't it be
better to just put a buffer lock/unlock around the existing code?

2. Your proposed coding

+	if (!heap_fetch(rel, estate->es_snapshot, &tuple, &buffer, true, NULL))
+ ...
+		if (!TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmin(tuple.t_data)))

will dump core in the case where heap_fetch returns false with
tuple.t_data set to null. If that case is impossible here,
it's not apparent why, and it'd surely deserve at least a comment,
maybe an Assert. But I'd rather just assume it's possible.

I'm not taking a position on whether this is OK at the higher level
of whether the SSI behavior could be better. However, unless Kevin
has objections to committing something like this now, I think we
should fix the above-mentioned problems and push it. The existing
behavior is clearly bad.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Peter Geoghegan
pg@heroku.com
In reply to: Tom Lane (#7)
Re: On conflict update & hint bits

On Sat, Oct 22, 2016 at 9:38 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

1. Why are you replacing ExecOnConflictUpdate's ExecCheckHeapTupleVisible
call with ExecCheckTIDVisible? That appears to demand a re-fetch of the
tuple ExecOnConflictUpdate already has its hands on. Wouldn't it be
better to just put a buffer lock/unlock around the existing code?

I thought that that was less than idiomatic within nodeModifyTable.c
-- executor modules rarely directly acquire buffer locks. More
importantly, while the lighter weight ExecCheckHeapTupleVisible()
variant seemed to make sense when there was no buffer lock
acquisition, now that it's clear that a buffer lock acquisition is
necessary, I am skeptical. I now doubt that the added complexity pays
for itself, which is why I proposed to remove
ExecCheckHeapTupleVisible(). Besides, ON CONFLICT seems like a feature
that is significantly less compelling at higher isolation levels; that
must be why it took this long to hear a problem report like
Konstantin's.

2. Your proposed coding

+       if (!heap_fetch(rel, estate->es_snapshot, &tuple, &buffer, true, NULL))
+ ...
+               if (!TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmin(tuple.t_data)))

will dump core in the case where heap_fetch returns false with
tuple.t_data set to null. If that case is impossible here,
it's not apparent why, and it'd surely deserve at least a comment,
maybe an Assert. But I'd rather just assume it's possible.

You could say the same thing about at least one other existing place
that more or less assumes a conflictTid heap_fetch() or similar cannot
allow that to happen, I think. Maybe this is just as good a place to
talk about it as any other, though. I defer to you.

(It's safe because our snapshot is a kind of interlock against vacuum
killing the conflictTid tuple.)

I'm not taking a position on whether this is OK at the higher level
of whether the SSI behavior could be better. However, unless Kevin
has objections to committing something like this now, I think we
should fix the above-mentioned problems and push it. The existing
behavior is clearly bad.

I don't have any strong feelings on it myself just yet; I trust that
Kevin's judgement that we should do better under SSI is sound. I
should have mentioned that Kevin encouraged me to do this much first
in my description of the patch. I'm pretty confident that he will have
no objection to doing something like this for now.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Geoghegan (#8)
Re: On conflict update & hint bits

Peter Geoghegan <pg@heroku.com> writes:

On Sat, Oct 22, 2016 at 9:38 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

1. Why are you replacing ExecOnConflictUpdate's ExecCheckHeapTupleVisible
call with ExecCheckTIDVisible? That appears to demand a re-fetch of the
tuple ExecOnConflictUpdate already has its hands on. Wouldn't it be
better to just put a buffer lock/unlock around the existing code?

I thought that that was less than idiomatic within nodeModifyTable.c
-- executor modules rarely directly acquire buffer locks.

"Rarely" is not "never". A bigger problem though is that heap_fetch
does more than just lock the buffer: there are also PredicateLockTuple
and CheckForSerializableConflictOut calls in there. It's possible that
those are no-ops in this usage (because after all we already fetched
the tuple once), or maybe they're even desirable because they would help
resolve Kevin's concerns. But it hasn't been analyzed and so I'm not
prepared to risk a behavioral change in this already under-tested area
the day before a release wrap.

AFAICT, it's very hard to get to an actual failure from the lack of
buffer lock here. You would need to have a situation where the tuple
was not hintable when originally fetched but has become so by the time
ON CONFLICT is rechecking it. That's possible, eg if we're using
async commit and the previous transaction's commit record has gotten
flushed to disk in between, but it's not likely. Even then, no actual
problem would ensue (in non-assert builds) from setting a hint bit without
buffer lock, except in very hard-to-hit race conditions. So the buffer
lock issue doesn't seem urgent enough to me to justify making a fix under
time pressure.

The business about not throwing a serialization failure due to actions
of our own xact does seem worth fixing now, but if I understand correctly,
we can deal with that by adding a test for xmin-is-our-own-xact into
the existing code structure. I propose doing that much (and adding
Munro's regression test case) and calling it good for today.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Peter Geoghegan
pg@heroku.com
In reply to: Tom Lane (#9)
Re: On conflict update & hint bits

On Sun, Oct 23, 2016 at 1:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

"Rarely" is not "never". A bigger problem though is that heap_fetch
does more than just lock the buffer: there are also PredicateLockTuple
and CheckForSerializableConflictOut calls in there. It's possible that
those are no-ops in this usage (because after all we already fetched
the tuple once), or maybe they're even desirable because they would help
resolve Kevin's concerns. But it hasn't been analyzed and so I'm not
prepared to risk a behavioral change in this already under-tested area
the day before a release wrap.

The heap_fetch() contract doesn't ask its callers to consider any of
that. Besides, those actions (PredicateLockTuple +
CheckForSerializableConflictOut) are very probably redundant no-ops as
you say. (They won't help with Kevin's additional concerns, BTW,
because Kevin is concerned about excessive serialization failures with
SSI.)

AFAICT, it's very hard to get to an actual failure from the lack of
buffer lock here. You would need to have a situation where the tuple
was not hintable when originally fetched but has become so by the time
ON CONFLICT is rechecking it. That's possible, eg if we're using
async commit and the previous transaction's commit record has gotten
flushed to disk in between, but it's not likely. Even then, no actual
problem would ensue (in non-assert builds) from setting a hint bit without
buffer lock, except in very hard-to-hit race conditions. So the buffer
lock issue doesn't seem urgent enough to me to justify making a fix under
time pressure.

The business about not throwing a serialization failure due to actions
of our own xact does seem worth fixing now, but if I understand correctly,
we can deal with that by adding a test for xmin-is-our-own-xact into
the existing code structure. I propose doing that much (and adding
Munro's regression test case) and calling it good for today.

I'm surprised at how you've assessed the risk here. I think that the
risk of not proceeding with fixing the buffer lock issue is greater
than the risk of breaking something else with the proposed fix. Even
if the former risk isn't such a big one.

If there are a non-trivial number of users that are particularly
attached to the precise behavior of higher isolation levels in master
(assuming it would be altered by the proposed fix), then it's
surprising that it took so many months for someone to complain about
the ON CONFLICT DO NOTHING behavior being blatantly broken.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Geoghegan (#10)
Re: On conflict update & hint bits

Peter Geoghegan <pg@heroku.com> writes:

On Sun, Oct 23, 2016 at 1:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

"Rarely" is not "never". A bigger problem though is that heap_fetch
does more than just lock the buffer: there are also PredicateLockTuple
and CheckForSerializableConflictOut calls in there. It's possible that
those are no-ops in this usage (because after all we already fetched
the tuple once), or maybe they're even desirable because they would help
resolve Kevin's concerns. But it hasn't been analyzed and so I'm not
prepared to risk a behavioral change in this already under-tested area
the day before a release wrap.

I'm surprised at how you've assessed the risk here.

What's bothering me is (a) it's less than 24 hours to release wrap and
(b) this patch changes SSI-relevant behavior and hasn't been approved
by Kevin. I'm not familiar enough with that logic to commit a change
in it on my own authority, especially with so little time for problems
to be uncovered.

I'm okay with adding an explicit buffer lock/unlock pair, and in fact
plan to go have a look at that in a bit. I'm not okay with doing a
refactoring that might change the behavior in more ways than that
under these circumstances.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Peter Geoghegan
pg@heroku.com
In reply to: Tom Lane (#11)
Re: On conflict update & hint bits

On Sun, Oct 23, 2016 at 2:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

What's bothering me is (a) it's less than 24 hours to release wrap and
(b) this patch changes SSI-relevant behavior and hasn't been approved
by Kevin. I'm not familiar enough with that logic to commit a change
in it on my own authority, especially with so little time for problems
to be uncovered.

I should point out that I knew that the next set of point releases had
been moved forward much later than you did. I had to work on this fix
during the week, which was pretty far from ideal for me for my own
reasons.

I'm okay with adding an explicit buffer lock/unlock pair, and in fact
plan to go have a look at that in a bit. I'm not okay with doing a
refactoring that might change the behavior in more ways than that
under these circumstances.

Fair enough. As long as we do that much, I'm comfortable.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Konstantin Knizhnik
k.knizhnik@postgrespro.ru
In reply to: Peter Geoghegan (#12)
Re: On conflict update & hint bits

On 24.10.2016 00:49, Peter Geoghegan wrote:

On Sun, Oct 23, 2016 at 2:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

What's bothering me is (a) it's less than 24 hours to release wrap and
(b) this patch changes SSI-relevant behavior and hasn't been approved
by Kevin. I'm not familiar enough with that logic to commit a change
in it on my own authority, especially with so little time for problems
to be uncovered.

I should point out that I knew that the next set of point releases had
been moved forward much later than you did. I had to work on this fix
during the week, which was pretty far from ideal for me for my own
reasons.

Just for information: I know that you are working on this issue, but as
far as we need to proceed further with our testing of multimaster,
I have done the following obvious changes and it fixes the problem (at
least this assertion failure is not happen any more):

src/backend/executor/nodeModifyTable.c

@@ -1087,6 +1087,13 @@ ExecOnConflictUpdate(ModifyTableState *mtstate,
      test = heap_lock_tuple(relation, &tuple, estate->es_output_cid,
                             lockmode, LockWaitBlock, false, &buffer,
                             &hufd);
+    /*
+     * We must hold share lock on the buffer content while examining tuple
+     * visibility.  Afterwards, however, the tuples we have found to be
+     * visible are guaranteed good as long as we hold the buffer pin.
+     */
+    LockBuffer(buffer, BUFFER_LOCK_SHARE);
+
      switch (test)
      {
          case HeapTupleMayBeUpdated:
@@ -1142,6 +1149,7 @@ ExecOnConflictUpdate(ModifyTableState *mtstate,
               * loop here, as the new version of the row might not conflict
               * anymore, or the conflicting tuple has actually been 
deleted.
               */
+            LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
              ReleaseBuffer(buffer);
              return false;

@@ -1175,6 +1183,8 @@ ExecOnConflictUpdate(ModifyTableState *mtstate,
/* Store target's existing tuple in the state's dedicated slot */
ExecStoreTuple(&tuple, mtstate->mt_existing, buffer, false);

+    LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+
      /*
       * Make tuple and any needed join variables available to ExecQual and
       * ExecProject.  The EXCLUDED tuple is installed in 
ecxt_innertuple, while

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Konstantin Knizhnik (#13)
Re: On conflict update & hint bits

Konstantin Knizhnik <k.knizhnik@postgrespro.ru> writes:

Just for information: I know that you are working on this issue, but as
far as we need to proceed further with our testing of multimaster,
I have done the following obvious changes and it fixes the problem (at
least this assertion failure is not happen any more):

This seems kind of the hard way --- why didn't you put the buffer lock
calls into ExecCheckHeapTupleVisible?

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=8f1fb7d621b0e6bd2eb0ba2ac9634c5b5a03564b

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Kevin Grittner
kgrittn@gmail.com
In reply to: Tom Lane (#9)
Re: On conflict update & hint bits

On Sun, Oct 23, 2016 at 3:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

The business about not throwing a serialization failure due to actions
of our own xact does seem worth fixing now, but if I understand correctly,
we can deal with that by adding a test for xmin-is-our-own-xact into
the existing code structure. I propose doing that much (and adding
Munro's regression test case) and calling it good for today.

Thanks. This is the only part of it that I consider an actual
*bug* (since you can retry the serialization failure forever and
never move on because there is no other transaction involved which
can finish to clear the problem) as opposed to an opportunity to
optimize (by reducing false positive serialization failures
actually involving other transactions). I never saw anything in
the literature addressing the intersection of UPSERT and SSI, and I
think we do need to think about and discuss this a bit more before
we can be sure of the best fix. This is probably not thread on
which to have that discussion.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers