SERIALIZABLE and INSERTs with multiple VALUES

Started by Jason Dusekover 9 years ago32 messagesgeneral
Jump to latest
#1Jason Dusek
jason.dusek@gmail.com

Hi All,

I notice the following oddity:

=# CREATE TABLE with_pk (i integer PRIMARY KEY);CREATE TABLE

=# BEGIN;BEGIN
=# INSERT INTO with_pk VALUES (1) ON CONFLICT DO NOTHING;INSERT 0 1
=# INSERT INTO with_pk VALUES (1) ON CONFLICT DO NOTHING;INSERT 0 0
=# END;COMMIT

=# BEGIN;BEGIN
=# INSERT INTO with_pk VALUES (2), (2) ON CONFLICT DO NOTHING;
ERROR: could not serialize access due to concurrent update
=# END;ROLLBACK

How are these two transactions different?

Kind Regards,

Jason Dusek

#2Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Jason Dusek (#1)
Re: SERIALIZABLE and INSERTs with multiple VALUES

On Tue, Oct 11, 2016 at 2:29 PM, Jason Dusek <jason.dusek@gmail.com> wrote:

I notice the following oddity:

=# CREATE TABLE with_pk (i integer PRIMARY KEY);
CREATE TABLE

=# BEGIN;
BEGIN
=# INSERT INTO with_pk VALUES (2), (2) ON CONFLICT DO NOTHING;
ERROR: could not serialize access due to concurrent update
=# END;
ROLLBACK

I don't see that on development HEAD. What version are you
running? What is your setting for default_transaction_isolation?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3Jason Dusek
jason.dusek@gmail.com
In reply to: Kevin Grittner (#2)
Re: SERIALIZABLE and INSERTs with multiple VALUES

SELECT version(),
(SELECT setting FROM pg_settings WHERE name =
'default_transaction_deferrable') AS default_transaction_deferrable,
(SELECT setting FROM pg_settings WHERE name =
'default_transaction_isolation') AS default_transaction_isolation;
─[ RECORD 1 ]──────────────────┬─────────────────────────────────────────────────────────────────────────────────────────────────────────────
version │ PostgreSQL 9.5.4 on
x86_64-apple-darwin15.6.0, compiled by Apple LLVM version 8.0.0
(clang-800.0.38), 64-bit
default_transaction_deferrable │ on
default_transaction_isolation │ serializable

On Tue, 11 Oct 2016 at 13:00 Kevin Grittner <kgrittn@gmail.com> wrote:

Show quoted text

On Tue, Oct 11, 2016 at 2:29 PM, Jason Dusek <jason.dusek@gmail.com>
wrote:

I notice the following oddity:

=# CREATE TABLE with_pk (i integer PRIMARY KEY);
CREATE TABLE

=# BEGIN;
BEGIN
=# INSERT INTO with_pk VALUES (2), (2) ON CONFLICT DO NOTHING;
ERROR: could not serialize access due to concurrent update
=# END;
ROLLBACK

I don't see that on development HEAD. What version are you
running? What is your setting for default_transaction_isolation?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#4Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Kevin Grittner (#2)
Re: SERIALIZABLE and INSERTs with multiple VALUES

Kevin Grittner wrote:

Sent: Tuesday, October 11, 2016 10:00 PM
To: Jason Dusek
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] SERIALIZABLE and INSERTs with multiple VALUES

On Tue, Oct 11, 2016 at 2:29 PM, Jason Dusek <jason.dusek@gmail.com> wrote:

I notice the following oddity:

=# CREATE TABLE with_pk (i integer PRIMARY KEY);
CREATE TABLE

=# BEGIN;
BEGIN
=# INSERT INTO with_pk VALUES (2), (2) ON CONFLICT DO NOTHING;
ERROR: could not serialize access due to concurrent update
=# END;
ROLLBACK

I don't see that on development HEAD. What version are you
running? What is your setting for default_transaction_isolation?

The subject says SERIALIZABLE, and I can see it on my 9.5.4 database:

test=> CREATE TABLE with_pk (i integer PRIMARY KEY);
CREATE TABLE
test=> START TRANSACTION ISOLATION LEVEL SERIALIZABLE;
START TRANSACTION
test=> INSERT INTO with_pk VALUES (2), (2) ON CONFLICT DO NOTHING;
ERROR: could not serialize access due to concurrent update

Yours,
Laurenz Albe

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#5Thomas Munro
thomas.munro@gmail.com
In reply to: Laurenz Albe (#4)
Re: SERIALIZABLE and INSERTs with multiple VALUES

On Wed, Oct 12, 2016 at 8:50 PM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:

Kevin Grittner wrote:

On Tue, Oct 11, 2016 at 2:29 PM, Jason Dusek <jason.dusek@gmail.com> wrote:

I notice the following oddity:

=# CREATE TABLE with_pk (i integer PRIMARY KEY);
CREATE TABLE

=# BEGIN;
BEGIN
=# INSERT INTO with_pk VALUES (2), (2) ON CONFLICT DO NOTHING;
ERROR: could not serialize access due to concurrent update
=# END;
ROLLBACK

I don't see that on development HEAD. What version are you
running? What is your setting for default_transaction_isolation?

The subject says SERIALIZABLE, and I can see it on my 9.5.4 database:

test=> CREATE TABLE with_pk (i integer PRIMARY KEY);
CREATE TABLE
test=> START TRANSACTION ISOLATION LEVEL SERIALIZABLE;
START TRANSACTION
test=> INSERT INTO with_pk VALUES (2), (2) ON CONFLICT DO NOTHING;
ERROR: could not serialize access due to concurrent update

This happens in both SERIALIZABLE and REPEATABLE READ when a single
command inserts conflicting rows with an ON CONFLICT cause, and it
comes from the check in ExecCheckHeapTupleVisible whose comment says:

/*
* ExecCheckHeapTupleVisible -- verify heap tuple is visible
*
* It would not be consistent with guarantees of the higher isolation levels to
* proceed with avoiding insertion (taking speculative insertion's alternative
* path) on the basis of another tuple that is not visible to MVCC snapshot.
* Check for the need to raise a serialization failure, and do so as necessary.
*/

So it seems to be working as designed. Perhaps someone could argue
that you should make an exception for tuples inserted by the current
command.

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#6Vitaly Burovoy
vitaly.burovoy@gmail.com
In reply to: Thomas Munro (#5)
Re: SERIALIZABLE and INSERTs with multiple VALUES

On 10/12/16, Thomas Munro <thomas.munro@enterprisedb.com> wrote:

On Wed, Oct 12, 2016 at 8:50 PM, Albe Laurenz <laurenz.albe@wien.gv.at>
wrote:

Kevin Grittner wrote:

On Tue, Oct 11, 2016 at 2:29 PM, Jason Dusek <jason.dusek@gmail.com>
wrote:

I notice the following oddity:

=# CREATE TABLE with_pk (i integer PRIMARY KEY);
CREATE TABLE

=# BEGIN;
BEGIN
=# INSERT INTO with_pk VALUES (2), (2) ON CONFLICT DO NOTHING;
ERROR: could not serialize access due to concurrent update
=# END;
ROLLBACK

I don't see that on development HEAD. What version are you
running? What is your setting for default_transaction_isolation?

The subject says SERIALIZABLE, and I can see it on my 9.5.4 database:

test=> CREATE TABLE with_pk (i integer PRIMARY KEY);
CREATE TABLE
test=> START TRANSACTION ISOLATION LEVEL SERIALIZABLE;
START TRANSACTION
test=> INSERT INTO with_pk VALUES (2), (2) ON CONFLICT DO NOTHING;
ERROR: could not serialize access due to concurrent update

This happens in both SERIALIZABLE and REPEATABLE READ when a single
command inserts conflicting rows with an ON CONFLICT cause, and it
comes from the check in ExecCheckHeapTupleVisible whose comment says:

/*
* ExecCheckHeapTupleVisible -- verify heap tuple is visible
*
* It would not be consistent with guarantees of the higher isolation levels
to
* proceed with avoiding insertion (taking speculative insertion's
alternative
* path) on the basis of another tuple that is not visible to MVCC
snapshot.
* Check for the need to raise a serialization failure, and do so as
necessary.
*/

So it seems to be working as designed. Perhaps someone could argue
that you should make an exception for tuples inserted by the current
command.

I disagree. It is designed to prevent updating a tuple which was
updated in a parallel transaction which has been just committed.
This case is a little bit different: this tuple has been inserted in
the same transaction in the same command.
I think it is an obvious bug because there is no "concurrent update".
There is no update at all.

ON CONFLICT handling just does not cover all possible ways which can happen.
Normally (without "ON CONFLICT" clause) INSERT raises "duplicate key
value violates unique constraint" and doesn't run to
"ExecCheckHeapTupleVisible" check.
The "ExecInsert" handles constraint checks but not later checks like
ExecCheckHeapTupleVisible.

--
Best regards,
Vitaly Burovoy

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#7Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Laurenz Albe (#4)
Re: SERIALIZABLE and INSERTs with multiple VALUES

On Wed, Oct 12, 2016 at 2:50 AM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:

Kevin Grittner wrote:

I don't see that on development HEAD. What version are you
running? What is your setting for default_transaction_isolation?

The subject says SERIALIZABLE, and I can see it on my 9.5.4 database:

Oh, I see -- it doesn't happen if the row already exists at the
start of the transaction, which it does if you run the OP's entire
sample. If you skip the transaction in the middle of his sample,
the error occurs.

test=> CREATE TABLE with_pk (i integer PRIMARY KEY);
CREATE TABLE
test=> START TRANSACTION ISOLATION LEVEL SERIALIZABLE;
START TRANSACTION
test=> INSERT INTO with_pk VALUES (2), (2) ON CONFLICT DO NOTHING;
ERROR: could not serialize access due to concurrent update

Or that, as a nice, self-contained test case. :-)

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#8Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Vitaly Burovoy (#6)
Re: SERIALIZABLE and INSERTs with multiple VALUES

[adding Peter Geoghegan to the email addresses]

On Wed, Oct 12, 2016 at 7:11 AM, Vitaly Burovoy <vitaly.burovoy@gmail.com>
wrote:

On 10/12/16, Thomas Munro <thomas.munro@enterprisedb.com> wrote:

This happens in both SERIALIZABLE and REPEATABLE READ when a single
command inserts conflicting rows with an ON CONFLICT cause, and it
comes from the check in ExecCheckHeapTupleVisible whose comment says:

/*
* ExecCheckHeapTupleVisible -- verify heap tuple is visible
*
* It would not be consistent with guarantees of the higher isolation

levels to

* proceed with avoiding insertion (taking speculative insertion's

alternative

* path) on the basis of another tuple that is not visible to MVCC

snapshot.

* Check for the need to raise a serialization failure, and do so as

necessary.

*/

So it seems to be working as designed. Perhaps someone could argue
that you should make an exception for tuples inserted by the current
command.

I disagree. It is designed to prevent updating a tuple which was
updated in a parallel transaction which has been just committed.
This case is a little bit different: this tuple has been inserted in
the same transaction in the same command.
I think it is an obvious bug because there is no "concurrent update".
There is no update at all.

Yeah, the concept of a serialization failure is that the
transaction would have succeeded except for the actions of a
concurrent transaction. It is supposed to indicate that a retry
has a chance to succeed, because the conflicting transaction is
likely to have completed. To generate a "serialization failure"
(or, IMO, any error with a SQLSTATE in class "40") from the actions
of a single transaction is completely wrong, and should be
considered a bug.

ON CONFLICT handling just does not cover all possible ways which can

happen.

Normally (without "ON CONFLICT" clause) INSERT raises "duplicate key
value violates unique constraint" and doesn't run to
"ExecCheckHeapTupleVisible" check.
The "ExecInsert" handles constraint checks but not later checks like
ExecCheckHeapTupleVisible.

The test in ExecCheckHeapTupleVisible() seems wrong to me. It's
not immediately obvious what the proper fix is. Peter, do you have
any ideas on this?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#9Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Kevin Grittner (#8)
Re: SERIALIZABLE and INSERTs with multiple VALUES

On Wed, Oct 12, 2016 at 10:06 AM, Kevin Grittner <kgrittn@gmail.com> wrote:

The test in ExecCheckHeapTupleVisible() seems wrong to me. It's
not immediately obvious what the proper fix is.

To identify what cases ExecCheckHeapTupleVisible() was meant to
cover I commented out the body of the function to see which
regression tests failed. None did. The failures shown on this
thread are fixed by doing so. If there is really a need for this
function, speak up now and provide a test case showing what is
broken without it; otherwise if I can't find some justification for
this function I will rip it (and the calls to it) out of the code.
If you do have some test case showing what breaks without the
function, let's get it added to the regression tests!

I'm currently running `make check-world` with TAP tests enabled,
just in case there is some test there which demonstrates the need
for this. It seems unlikely that such a test would be under the
TAP tests, but I'm making sure...

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

In reply to: Kevin Grittner (#9)
Re: SERIALIZABLE and INSERTs with multiple VALUES

On Wed, Oct 12, 2016 at 12:45 PM, Kevin Grittner <kgrittn@gmail.com> wrote:

The test in ExecCheckHeapTupleVisible() seems wrong to me. It's
not immediately obvious what the proper fix is.

(prune old e-mail address from cc list)

I agree that the multi-value case is a bug.

To identify what cases ExecCheckHeapTupleVisible() was meant to
cover I commented out the body of the function to see which
regression tests failed. None did. The failures shown on this
thread are fixed by doing so. If there is really a need for this
function, speak up now and provide a test case showing what is
broken without it; otherwise if I can't find some justification for
this function I will rip it (and the calls to it) out of the code.
If you do have some test case showing what breaks without the
function, let's get it added to the regression tests!

I am independently working on a bug to fix to
ExecCheckHeapTupleVisible(), concerning the fact that
HeapTupleSatisfiesMVCC() can be called without even a shared buffer
lock held (only a buffer pin) [1]/messages/by-id/57EE93C8.8080504@postgrespro.ru -- Peter Geoghegan. So, anything that we do here should
probably take care of that, while we're at it.

I think that it should be pretty obvious to you why the check exists
at all, Kevin. It exists because it would be improper to decide to
take the DO NOTHING path on the basis of some other committed tuple
existing that is not visible to the original MVCC snapshot, at higher
isolation levels. There is a similar consideration for DO UPDATE. I'm
slightly surprised that you're contemplating just ripping the check
out. Did I miss something?

[1]: /messages/by-id/57EE93C8.8080504@postgrespro.ru -- Peter Geoghegan
--
Peter Geoghegan

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#11Thomas Munro
thomas.munro@gmail.com
In reply to: Kevin Grittner (#9)
Re: SERIALIZABLE and INSERTs with multiple VALUES

On Thu, Oct 13, 2016 at 8:45 AM, Kevin Grittner <kgrittn@gmail.com> wrote:

On Wed, Oct 12, 2016 at 10:06 AM, Kevin Grittner <kgrittn@gmail.com> wrote:

The test in ExecCheckHeapTupleVisible() seems wrong to me. It's
not immediately obvious what the proper fix is.

To identify what cases ExecCheckHeapTupleVisible() was meant to
cover I commented out the body of the function to see which
regression tests failed. None did. The failures shown on this
thread are fixed by doing so. If there is really a need for this
function, speak up now and provide a test case showing what is
broken without it; otherwise if I can't find some justification for
this function I will rip it (and the calls to it) out of the code.
If you do have some test case showing what breaks without the
function, let's get it added to the regression tests!

Here's an isolation test that shows the distinction between a
transaction that reports a serialization failure because it crashed
into its own invisible tuples, and one that reports a serialization
failure because it crashed into a concurrent transaction's invisible
tuples. Surely Peter intended the latter to report an error, but the
former seems like an oversight.

Here's a patch that shows one way to fix it. I think it does make
sense to change this, because otherwise automatic
retry-on-serialization-failure strategies will be befuddle by this
doomed transaction. And as you and Vitaly have said, there is
literally no concurrent update.

--
Thomas Munro
http://www.enterprisedb.com

Attachments:

isolation-test.patchapplication/octet-stream; name=isolation-test.patchDownload+64-0
check-self-inserted.patchapplication/octet-stream; name=check-self-inserted.patchDownload+15-6
In reply to: Thomas Munro (#11)
Re: SERIALIZABLE and INSERTs with multiple VALUES

On Wed, Oct 12, 2016 at 1:05 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

Here's a patch that shows one way to fix it. I think it does make
sense to change this, because otherwise automatic
retry-on-serialization-failure strategies will be befuddle by this
doomed transaction. And as you and Vitaly have said, there is
literally no concurrent update.

I think that you have the right idea, but we still need to fix that
buffer lock bug I mentioned...

--
Peter Geoghegan

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#13Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Peter Geoghegan (#12)
Re: SERIALIZABLE and INSERTs with multiple VALUES

On Wed, Oct 12, 2016 at 3:07 PM, Peter Geoghegan <pg@bowt.ie> wrote:

On Wed, Oct 12, 2016 at 1:05 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

Here's a patch that shows one way to fix it. I think it does make
sense to change this, because otherwise automatic
retry-on-serialization-failure strategies will be befuddle by this
doomed transaction. And as you and Vitaly have said, there is
literally no concurrent update.

I think that you have the right idea, but we still need to fix that
buffer lock bug I mentioned...

Aren't these two completely separate and independent bugs?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

In reply to: Kevin Grittner (#13)
Re: SERIALIZABLE and INSERTs with multiple VALUES

On Wed, Oct 12, 2016 at 1:41 PM, Kevin Grittner <kgrittn@gmail.com> wrote:

Aren't these two completely separate and independent bugs?

Technically they are, but they are both isolated to the same small
function. Surely it's better to fix them both at once?

--
Peter Geoghegan

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#15Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Peter Geoghegan (#10)
Re: SERIALIZABLE and INSERTs with multiple VALUES

On Wed, Oct 12, 2016 at 3:02 PM, Peter Geoghegan <pg@bowt.ie> wrote:

I agree that the multi-value case is a bug.

I think that it should be pretty obvious to you why the check exists
at all, Kevin. It exists because it would be improper to decide to
take the DO NOTHING path on the basis of some other committed tuple
existing that is not visible to the original MVCC snapshot, at higher
isolation levels.

That's only true if it causes a cycle in the apparent order of
execution. If we rip out the check what we have is behavior
completely consistent with the other transaction executing first;
in other words, it creates a read-write dependency (a/k/a
rw-conflict) from the current transaction to the concurrent
transaction which succeeds in its insert. That may or may not
cause a cycle, depending on what else is happening.

Now, generating a write conflict serialization failure will prevent
serialization anomalies, but unless I'm missing something it is a
cruder test than is required, and will generate false positives in
simple cases like Thomas created. What I think would be more
appropriate is to record the dependency and test for a
serialization failure.

There is a similar consideration for DO UPDATE.

But in DO UPDATE the current transaction is writing something that
the other transaction attempted to read, so that *does* create
write skew and its attendant anomalies. There I think we *do* need
to throw a serialization failure error.

I'm slightly surprised that you're contemplating just ripping the check out.

The lack of any regression tests to demonstrate the failure the
code is preventing puts the burden on others to figure it out fresh
each time, which had me a little miffed. Note that you are
mis-quoting me a bit -- I said "if I can't find some justification
for this function I will rip it (and the calls to it) out of the
code." I was still looking. I don't think the write conflict
justifies it, but the rw-conflict and attendant risk of a cycle in
apparent order of execution does.

If the "proper" fix is impossible (or just too freaking ugly) we
might fall back on the fix Thomas suggested, but I would like to
take advantage of the "special properties" of the INSERT/ON
CONFLICT DO NOTHING code to avoid false positives where we can.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#16Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Peter Geoghegan (#14)
Re: SERIALIZABLE and INSERTs with multiple VALUES

On Wed, Oct 12, 2016 at 3:55 PM, Peter Geoghegan <pg@bowt.ie> wrote:

On Wed, Oct 12, 2016 at 1:41 PM, Kevin Grittner <kgrittn@gmail.com> wrote:

Aren't these two completely separate and independent bugs?

Technically they are, but they are both isolated to the same small
function. Surely it's better to fix them both at once?

If the code is hopelessly intertwined, maybe. Generally it is
better to fix two independent bugs with two independent patches.
It documents things better and make each fix easier to understand.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

In reply to: Kevin Grittner (#15)
Re: SERIALIZABLE and INSERTs with multiple VALUES

On Wed, Oct 12, 2016 at 2:06 PM, Kevin Grittner <kgrittn@gmail.com> wrote:

If the "proper" fix is impossible (or just too freaking ugly) we
might fall back on the fix Thomas suggested, but I would like to
take advantage of the "special properties" of the INSERT/ON
CONFLICT DO NOTHING code to avoid false positives where we can.

Do you intend to propose a patch to do that?

--
Peter Geoghegan

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#18Thomas Munro
thomas.munro@gmail.com
In reply to: Kevin Grittner (#15)
Re: SERIALIZABLE and INSERTs with multiple VALUES

On Thu, Oct 13, 2016 at 10:06 AM, Kevin Grittner <kgrittn@gmail.com> wrote:

On Wed, Oct 12, 2016 at 3:02 PM, Peter Geoghegan <pg@bowt.ie> wrote:

I agree that the multi-value case is a bug.

I think that it should be pretty obvious to you why the check exists
at all, Kevin. It exists because it would be improper to decide to
take the DO NOTHING path on the basis of some other committed tuple
existing that is not visible to the original MVCC snapshot, at higher
isolation levels.

That's only true if it causes a cycle in the apparent order of
execution. If we rip out the check what we have is behavior
completely consistent with the other transaction executing first;
in other words, it creates a read-write dependency (a/k/a
rw-conflict) from the current transaction to the concurrent
transaction which succeeds in its insert. That may or may not
cause a cycle, depending on what else is happening.

The "higher isolation levels" probably shouldn't be treated the same way.

I think Peter's right about REPEATABLE READ. We should definitely
raise the error immediately as we do in that level, because our RR
(SI) doesn't care about write skew and all that stuff, it just
promises that you can only see data in your snapshot. We can't allow
you to take a different course of action based on data that your
snapshot can't see, so the only reasonable thing to do is abandon
ship.

But yeah, the existing code raises false positive serialization
failures under SERIALIZABLE, and that's visible in the isolation test
I posted: there is actually a serial order of those transactions with
the same result.

When working on commit fcff8a57 I became suspicious of the way ON
CONFLICT interacts with SSI, as I mentioned in passing back then[1]/messages/by-id/CAEepm=2kYCegxp9qMR5TM1X3oXHj16iYzLPj_go52R2R07EvnA@mail.gmail.com,
thinking mainly of false negatives. I failed to find a
non-serializable schedule involving ON CONFLICT that was allowed to
run, though I didn't spend much time on it. One thing that worries
me is the final permutation of read-write-unique-4.spec, which
produces an arguably spurious UCV, that is, a transaction that doesn't
commit but raises a UCV instead of the serialization failure you might
expect. The ON CONFLICT equivalent might be a transaction that takes
the ON CONFLICT path and then commits, even though it should be
considered non-serializable. I would really like to understand that
case better, and until then I wouldn't bet my boots that it isn't
possible to commit anomalies using ON CONFLICT under SERIALIZABLE
without Peter's check (or even with it), despite the fact that it
reaches predicate locking code via heap_fetch etc.

[1]: /messages/by-id/CAEepm=2kYCegxp9qMR5TM1X3oXHj16iYzLPj_go52R2R07EvnA@mail.gmail.com

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

In reply to: Thomas Munro (#18)
Re: SERIALIZABLE and INSERTs with multiple VALUES

On Wed, Oct 12, 2016 at 6:06 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

But yeah, the existing code raises false positive serialization
failures under SERIALIZABLE, and that's visible in the isolation test
I posted: there is actually a serial order of those transactions with
the same result.

I was under the impression that false positives of this kind are
allowed by SSI. Why focus on this false positive scenario in
particular?

--
Peter Geoghegan

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#20Thomas Munro
thomas.munro@gmail.com
In reply to: Peter Geoghegan (#19)
Re: SERIALIZABLE and INSERTs with multiple VALUES

On Thu, Oct 13, 2016 at 2:32 PM, Peter Geoghegan <pg@bowt.ie> wrote:

On Wed, Oct 12, 2016 at 6:06 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

But yeah, the existing code raises false positive serialization
failures under SERIALIZABLE, and that's visible in the isolation test
I posted: there is actually a serial order of those transactions with
the same result.

I was under the impression that false positives of this kind are
allowed by SSI. Why focus on this false positive scenario in
particular?

Sure, they're allowed. Of course the ones caused by your own command
are questionable because there is no concurrent transaction and
retrying the transaction will never succeed, as discussed, but it
seems we all agree on that. The question is just whether INSERT ...
ON CONFLICT should generate more false positives than plain old
INSERT. Two overlapping conflicting plain old INSERTs without any
other entanglement to create a cycle will result in one succeeding and
the other getting a UCV, as if one ran after the other with no
overlap. It would be nice if the ON CONFLICT case used the same
smarts to take the ON CONFLICT path, unless there is some theoretical
problem I'm overlooking. Otherwise concurrency is reduced.

I wonder if we should fix the same-command problem reported by the OP,
and then study larger questions of ON CONFLICT/SERIALIZABLE
interaction as a separate project. I may be imagining problems where
there are none...

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#21Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Peter Geoghegan (#17)
#22Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Thomas Munro (#18)
#23Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Peter Geoghegan (#19)
In reply to: Kevin Grittner (#23)
#25Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Peter Geoghegan (#24)
#26Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Kevin Grittner (#25)
#27Thomas Munro
thomas.munro@gmail.com
In reply to: Kevin Grittner (#22)
#28Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Thomas Munro (#27)
#29Tom Lane
tgl@sss.pgh.pa.us
In reply to: Kevin Grittner (#28)
#30Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Thomas Munro (#27)
In reply to: Kevin Grittner (#30)
#32Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Peter Geoghegan (#31)