[OSSTEST PATCH 0/1] PostgreSQL db: Retry on constraint violation

Started by Ian Jacksonover 9 years ago36 messageshackers
Jump to latest
#1Ian Jackson
ian.jackson@eu.citrix.com

Hi. This message is going to xen-devel (because that's where the
osstest project is) and to pgsql-hackers (because I hope they may be
able to advise about the scope of the PostgreSQL SERIALIZABLE
constraint problem).

In summary: PostgreSQL only provides transaction serialisability for
successful transactions. Even with SERIALIZABLE, transactions may
fail due to spurious and "impossible" constraint violations.

As a result, I need to make osstest retry transactions not only on
explicitly reported serialisation failures and deadlocks, but also on
integrity violations.

It is not clear to me from the thread
WIP: Detecting SSI conflicts before reporting constraint violations
which was on pgsql-hackers earlier this year, whether it is only
unique constraint violations which may spuriously occur.

Can anyone from the PostgreSQL hacker community advise ? The existing
documentation patch just talks about "successful" transactions, and
uses unique constraints as an example. In principle this leaves open
the possibility that the transaction might fail with bizarre and
unpredictable error codes, although I hope this isn't possible.

It would be good to narrow the scope of the retries in my system as
much as possible.

I'm hoping to get an authoritative answer here but it seems that this
is a common problem to which there is still not yet a definitive
solution. I would like there to be a definitive solution.

If I get a clear answer I'll submit a further docs patch to pgsql :-).

Thanks,
Ian.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#2Ian Jackson
ian.jackson@eu.citrix.com
In reply to: Ian Jackson (#1)
[OSSTEST PATCH 1/1] PostgreSQL db: Retry transactions on constraint failures

This is unfortunate but appears to be necessary.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: pgsql-hackers@postgresql.org
---
Osstest/JobDB/Executive.pm | 45 ++++++++++++++++++++++++++++++++++++++++++++-
tcl/JobDB-Executive.tcl | 6 ++++--
2 files changed, 48 insertions(+), 3 deletions(-)

diff --git a/Osstest/JobDB/Executive.pm b/Osstest/JobDB/Executive.pm
index 610549a..dc6d3c2 100644
--- a/Osstest/JobDB/Executive.pm
+++ b/Osstest/JobDB/Executive.pm
@@ -62,8 +62,51 @@ sub need_retry ($$$) {
     my ($jd, $dbh,$committing) = @_;
     return
 	($dbh_tests->err() // 0)==7 &&
-	($dbh_tests->state =~ m/^(?:40P01|40001)/);
+	($dbh_tests->state =~ m/^(?:40P01|40001|23|40002)/);
     # DEADLOCK DETECTED or SERIALIZATION FAILURE
+    # or any Integrity Constraint Violation including
+    # TRANSACTION_INTEGRITY_CONSTRAINT_VIOLATION.
+    #
+    # An Integrity Constraint Violation ought not to occur with
+    # serialisable transactions, so it is aways a bug.  These bugs
+    # should not be retried.  However, there is a longstanding bug in
+    # PostgreSQL: SERIALIZABLE's guarantee of transaction
+    # serialisability only applies to successful transactions.
+    # Concurrent SERIALIZABLE transactions may generate "impossible"
+    # errors.  For example, doing a SELECT to ensure that a row does
+    # not exist, and then inserting it, may produce a unique
+    # constraint violation.
+    #
+    # I have not been able to find out clearly which error codes may
+    # be spuriously generated.  At the very least "23505
+    # UNIQUE_VIOLATION" is, but I'm not sure about others.  I am
+    # making the (hopefully not unwarranted) assumption that this is
+    # the only class of spurious errors.  (We don't have triggers.)
+    #
+    # The undesirable side effect is that a buggy transaction would be
+    # retried at intervals until the retry count is reached.  But
+    # there seems no way to avoid this.
+    #
+    # This bug may have been fixed in very recent PostgreSQL (although
+    # a better promise still seems absent from the documentation, at
+    # the time of writing in December 2016).  But we need to work with
+    # PostgreSQL back to at least 9.1.  Perhaps in the future we can
+    # make this behaviour conditional on the pgsql bug being fixed.
+    #
+    # References:
+    #
+    # "WIP: Detecting SSI conflicts before reporting constraint violations"
+    # January 2016 - April 2016 on pgsql-hackers
+    # https://www.postgresql.org/message-id/flat/CAEepm%3D2_9PxSqnjp%3D8uo1XthkDVyOU9SO3%2BOLAgo6LASpAd5Bw%40mail.gmail.com
+    # (includes patch for PostgreSQL and its documentation)
+    #
+    # BUG #9301: INSERT WHERE NOT EXISTS on table with UNIQUE constraint in concurrent SERIALIZABLE transactions
+    # 2014, pgsql-bugs
+    # https://www.postgresql.org/message-id/flat/3F697CF1-2BB7-40D4-9D20-919D1A5D6D93%40apple.com
+    #
+    # "Working around spurious unique constraint errors due to SERIALIZABLE bug"
+    # 2009, pgsql-general
+    # https://www.postgresql.org/message-id/flat/D960CB61B694CF459DCFB4B0128514C203937E44%40exadv11.host.magwien.gv.at
 }
 sub current_flight ($) { #method
diff --git a/tcl/JobDB-Executive.tcl b/tcl/JobDB-Executive.tcl
index 62c63af..6b9bcb0 100644
--- a/tcl/JobDB-Executive.tcl
+++ b/tcl/JobDB-Executive.tcl
@@ -365,8 +365,10 @@ proc transaction {tables script {autoreconnect 0}} {
 	if {$rc} {
 	    switch -glob $errorCode {
 		{OSSTEST-PSQL * 40P01} -
-		{OSSTEST-PSQL * 40001} {
-		    # DEADLOCK DETECTED or SERIALIZATION FAILURE
+		{OSSTEST-PSQL * 40001} -
+		{OSSTEST-PSQL * 23*}   -
+		{OSSTEST-PSQL * 40002} {
+		    # See Osstest/JobDB/Executive.pm:need_retry
 		    logputs stdout \
  "transaction serialisation failure ($errorCode) ($result) retrying ..."
 		    if {$dbopen} { db-execute ROLLBACK }
-- 
2.1.4

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Ian Jackson (#1)
Re: [OSSTEST PATCH 0/1] PostgreSQL db: Retry on constraint violation

On Fri, Dec 9, 2016 at 12:26 PM, Ian Jackson <ian.jackson@eu.citrix.com> wrote:

In summary: PostgreSQL only provides transaction serialisability for
successful transactions. Even with SERIALIZABLE, transactions may
fail due to spurious and "impossible" constraint violations.

As a result, I need to make osstest retry transactions not only on
explicitly reported serialisation failures and deadlocks, but also on
integrity violations.

It is not clear to me from the thread
WIP: Detecting SSI conflicts before reporting constraint violations
which was on pgsql-hackers earlier this year, whether it is only
unique constraint violations which may spuriously occur.

Can anyone from the PostgreSQL hacker community advise ? The existing
documentation patch just talks about "successful" transactions, and
uses unique constraints as an example. In principle this leaves open
the possibility that the transaction might fail with bizarre and
unpredictable error codes, although I hope this isn't possible.

It is, especially prior to 9.6.

It would be good to narrow the scope of the retries in my system as
much as possible.

If I recall correctly, the constraints for which there can be
errors appearing due to concurrent transactions are primary key,
unique, and foreign key constraints. I don't remember seeing it
happen, but it would not surprise me if an exclusion constraint can
also cause an error due to a concurrent transaction's interaction
with the transaction receiving the error.

The good news is that in 9.6 we were able to change many of these
to serialization failures if there was an explicit check for a
violation before creating the conflict. For example, if there is a
primary key (or unique constraint or unique index) and within a
transaction you SELECT to test for the presence of a particular
value (or set of values, in the case of a multi-column constraint
or index) and it is not found, an attempt to insert that runs into
an insert from a concurrent transaction will now generate a
serialization failure.

I'm hoping to get an authoritative answer here but it seems that this
is a common problem to which there is still not yet a definitive
solution. I would like there to be a definitive solution.

I'm afraid that even though the frequency of getting some error
other than a serialization failure due to the actions of a
concurrent transaction, even if the failing transaction could not
have failed in the absence of the concurrent transaction and is
likely to succeed on retry, has gone down in 9.6, it has not been
eliminated. You are forced to choose between performing some
limited number of retries for constraint violations or presenting
users with error messages which "should not be possible".
Obviously, the former means you will sometimes retry when there is
no hope of the retry succeeding and give up on retries when a retry
is likely to work, and the latter can confuse users and waste their
time. :-(

If I get a clear answer I'll submit a further docs patch to pgsql :-).

Of course, code patches to improve the situation are also welcome! :-)

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Ian Jackson
ian.jackson@eu.citrix.com
In reply to: Kevin Grittner (#3)
Re: [OSSTEST PATCH 0/1] PostgreSQL db: Retry on constraint violation

Kevin Grittner writes ("Re: [HACKERS] [OSSTEST PATCH 0/1] PostgreSQL db: Retry on constraint violation"):

If I recall correctly, the constraints for which there can be
errors appearing due to concurrent transactions are primary key,
unique, and foreign key constraints. I don't remember seeing it
happen, but it would not surprise me if an exclusion constraint can
also cause an error due to a concurrent transaction's interaction
with the transaction receiving the error.

Is it not in principle also possible to contrive a situation where
some set of (suitably weird) transactions will generate almost any
error, from the outer transaction ?

This is at the very least possible using pgsql's in-sql
exception-trapping facilities. Such a construction might, in
principle, generate any error which can be conditionally generated at
query runtime.

ISTM that depending on the implementation arrangements (which I
frankly don't understand at all) there may be other constructions
which would give "impossible" answers.

Actually, now I come to think of it, the fact that pgsql has an in-sql
exception trapping facility means that the current situation is
clearly an actual bug, in the sense that the behaviour is contrary to
the documentation. (And contrary to any reasonable thing that could
be written in the documentation.)

AIUI the documented behavour is that "every set of successful
transactions is serialisable".

But, consider the following scenario.

Context:

CREATE OR REPLACE FUNCTION demo(nk TEXT, c INTEGER) RETURNS INTEGER AS $$
BEGIN
BEGIN
INSERT INTO t (k,v) VALUES (nk, -1);
EXCEPTION WHEN unique_violation THEN
INSERT INTO c (k,v) VALUES (nk, c);
END;
RETURN 0;
END;
$$ LANGUAGE plpgsql;

DROP TABLE IF EXISTS t;
DROP TABLE IF EXISTS c;

CREATE TABLE t (k TEXT PRIMARY KEY, v INTEGER NOT NULL);
CREATE TABLE c (k TEXT PRIMARY KEY, v INTEGER NOT NULL);

Two identical transactions:

BEGIN;
SELECT count(*) FROM t WHERE k=1; -- save value
-- sleep to ensure conflict
SELECT demo(1, ?); -- using value from SELECT
COMMIT;

I have just tried this and got this result:

osstestdb_test_iwj=> select * from t;
k | v
---+----
1 | -1
(1 row)

osstestdb_test_iwj=> select * from c;
k | v
---+---
1 | 0
(1 row)

osstestdb_test_iwj=>

The row ('1',0) in table c is clearly wrong. No rows with v=0 could
ever be inserted into c by this SQL code, unless the other
transaction is somehow interfering in the middle.

The perl program I used is below. `csreadconfig' does nothing of
particular interest except obtaining the db connnection as $dbh_tests.

Ian.

#!/usr/bin/perl -w

use strict qw(vars);
use Osstest;
use Data::Dumper;
use Osstest::Executive;

csreadconfig();

if (!@ARGV) {
$dbh_tests->do(<<'END');
CREATE OR REPLACE FUNCTION demo(nk TEXT, c INTEGER) RETURNS INTEGER AS $$
BEGIN
BEGIN
INSERT INTO t (k,v) VALUES (nk, -1);
EXCEPTION WHEN unique_violation THEN
INSERT INTO c (k,v) VALUES (nk, c);
END;
RETURN 0;
END;
$$ LANGUAGE plpgsql;

DROP TABLE IF EXISTS t;
DROP TABLE IF EXISTS c;

CREATE TABLE t (k TEXT PRIMARY KEY, v INTEGER NOT NULL);
CREATE TABLE c (k TEXT PRIMARY KEY, v INTEGER NOT NULL);
END
exit 0;
}

our ($k,$v) = @ARGV;

$dbh_tests->begin_work;
$dbh_tests->do("SET TRANSACTION ISOLATION LEVEL SERIALIZABLE");

our ($conflictors)= $dbh_tests->selectrow_array(<<END, {}, $k);
SELECT count(*) FROM t WHERE k=?
END

print STDERR "conflictors=$conflictors\n";
sleep 5;
print STDERR "continuing...\n";

$dbh_tests->do(<<END, {}, $k, $conflictors);
SELECT demo(?,?);
END

$dbh_tests->commit;

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Ian Jackson (#4)
Re: [OSSTEST PATCH 0/1] PostgreSQL db: Retry on constraint violation

On Mon, Dec 12, 2016 at 8:45 AM, Ian Jackson <ian.jackson@eu.citrix.com> wrote:

AIUI the documented behavour is that "every set of successful
transactions is serialisable".

Well, in context that is referring to serializable transactions.
No such guarantee is provided for other isolation levels.

By the way, existing behavior should be sufficient to prevent
serialization anomalies from becoming manifest in the database;
where it is less than ideal is that it is hard to tell from the
SQLSTATE on a failure whether a retry is sensible. It would be
nice to provide the additional functionality, but database is
performing as intended and (as far as I know) as documented. If
the documentation on this is not clear, I'd be happy to help get it
fixed, but barring any deficiency there, this is a feature request,
not a bug report.

But, consider the following scenario.

[example]

I have just tried this and got this result:

[nonsensical results]

I didn't. First, I got this when I tried to start the concurrent
transactions using the example as provided:

test=# SELECT count(*) FROM t WHERE k=1; -- save value
ERROR: operator does not exist: text = integer
LINE 1: SELECT count(*) FROM t WHERE k=1;
^
HINT: No operator matches the given name and argument type(s). You
might need to add explicit type casts.

That is as it should be. There is no equality comparison operator
supported for text on one side and integer on the other. There
would be no principled way to determine the correct result of
comparing '2' and '15' or of comparing '01' and '1'. It kinda
raises a question of what you are running that did *not* generate
this error. What version with what modifications are you running?

So, I modified it so that it *should* run, set
default_transaction_isolation = 'serializable' on both connections,
and got this:

*** CONNECTION 1 ***
test=# CREATE OR REPLACE FUNCTION demo(nk TEXT, c INTEGER) RETURNS INTEGER AS $$
test$# BEGIN
test$# BEGIN
test$# INSERT INTO t (k,v) VALUES (nk, -1);
test$# EXCEPTION WHEN unique_violation THEN
test$# INSERT INTO c (k,v) VALUES (nk, c);
test$# END;
test$# RETURN 0;
test$# END;
test$# $$ LANGUAGE plpgsql;
CREATE FUNCTION
test=#
test=# DROP TABLE IF EXISTS t;
DROP TABLE
test=# DROP TABLE IF EXISTS c;
DROP TABLE
test=#
test=# CREATE TABLE t (k TEXT PRIMARY KEY, v INTEGER NOT NULL);
CREATE TABLE
test=# CREATE TABLE c (k TEXT PRIMARY KEY, v INTEGER NOT NULL);
CREATE TABLE
test=#
test=# BEGIN;
BEGIN
test=# SELECT count(*) FROM t WHERE k = '1'; -- save value
count
-------
0
(1 row)

test=# -- sleep to ensure conflict

*** CONNECTION 2 ***
test=# BEGIN;
BEGIN
test=# SELECT count(*) FROM t WHERE k = '1'; -- save value
count
-------
0
(1 row)

test=# -- sleep to ensure conflict

*** CONNECTION 1 ***
test=# SELECT demo('1', 0); -- using value from SELECT
demo
------
0
(1 row)

test=#

*** CONNECTION 2 ***
test=# SELECT demo('1', 0); -- using value from SELECT
*** CONNECTION 2 blocks ***

*** CONNECTION 1 ***
test=# COMMIT;
COMMIT
test=#

*** CONNECTION 2 unblocks and outputs ***
ERROR: could not serialize access due to read/write dependencies
among transactions
DETAIL: Reason code: Canceled on identification as a pivot, during write.
HINT: The transaction might succeed if retried.
CONTEXT: SQL statement "INSERT INTO t (k,v) VALUES (nk, -1)"
PL/pgSQL function demo(text,integer) line 4 at SQL statement
test=#

As you can see, this generated a serialization failure. I decided
to do what an application should, and retry the transaction.

*** CONNECTION 2 ***
test=# ROLLBACK;
ROLLBACK
test=# BEGIN;
BEGIN
test=# SELECT count(*) FROM t WHERE k = '1'; -- save value
count
-------
1
(1 row)

test=# SELECT demo('1', 1); -- using value from SELECT
demo
------
0
(1 row)

test=# COMMIT;
COMMIT
test=# SELECT * FROM t;
k | v
---+----
1 | -1
(1 row)

test=# SELECT * FROM c;
k | v
---+---
1 | 1
(1 row)

test=#

If you have some way to cause a set of concurrent serializable
transactions to generate results from those transactions which
commit which is not consistent with some one-at-a-time order of
execution, I would be very interested in seeing the test case.
The above, however, is not it.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Kevin Grittner (#5)
Re: [OSSTEST PATCH 0/1] PostgreSQL db: Retry on constraint violation

On Mon, Dec 12, 2016 at 12:32 PM, Kevin Grittner <kgrittn@gmail.com> wrote:

As you can see, this generated a serialization failure.

That was on 9.6. On earlier versions it does indeed allow the
transaction on connection 2 to commit, yielding a non-serializable
result. This makes a pretty strong case for back-patching this
commit:

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=fcff8a575198478023ada8a48e13b50f70054766

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Kevin Grittner (#6)
Re: [OSSTEST PATCH 0/1] PostgreSQL db: Retry on constraint violation

On Mon, Dec 12, 2016 at 1:06 PM, Kevin Grittner <kgrittn@gmail.com> wrote:

On Mon, Dec 12, 2016 at 12:32 PM, Kevin Grittner <kgrittn@gmail.com> wrote:

As you can see, this generated a serialization failure.

That was on 9.6. On earlier versions it does indeed allow the
transaction on connection 2 to commit, yielding a non-serializable
result. This makes a pretty strong case for back-patching this
commit:

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=fcff8a575198478023ada8a48e13b50f70054766

I have confirmed that this patch applies cleanly to all supported
branches (with some offsets), that the bug is manifest without the
patch, and that it is fixed with this patch in all supported
branches. This patch has been in the development code base for
about 8 months and in production with the 9.6 release, so it has
been in active production for 3 months with no sign of trouble. If
you ignore the code comment, doc changes, and new regression tests
it consists of adding one line of code to nbtinsert.c. What it
does is that when a "unique_violation" error is about to fire, it
adds a check to see whether the conditions for a serialization
failure also exist; if so, it fires that error instead.

This was not initially back-patched, in spite of numerous requests
to do so, because it was a behavior change and not clearly a bug --
it has a least some minimal chance of changing behavior someone
might be relying on; however, Ian has constructed a use-case where
without this patch we clearly allow a serialization anomaly which
is not allowed with the patch, so this patch should, IMO, be
considered a bug fix on that basis.

Barring objections I will back-patch to 9.2 through 9.5 tomorrow.
(9.1 is out of support and the fix is already in 9.6 and forward.)

Release notes for the next minor release set should probably
include a note that for those using serializable transactions a
serialization error can now occur in some situations where a unique
violation would previously have been reported. I expect that this
will mostly be a welcome change, but it seems like something which
merits a compatibility warning.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Thomas Munro
thomas.munro@gmail.com
In reply to: Kevin Grittner (#7)
Re: [OSSTEST PATCH 0/1] PostgreSQL db: Retry on constraint violation

On Tue, Dec 13, 2016 at 10:47 AM, Kevin Grittner <kgrittn@gmail.com> wrote:

On Mon, Dec 12, 2016 at 1:06 PM, Kevin Grittner <kgrittn@gmail.com> wrote:

On Mon, Dec 12, 2016 at 12:32 PM, Kevin Grittner <kgrittn@gmail.com> wrote:

As you can see, this generated a serialization failure.

That was on 9.6. On earlier versions it does indeed allow the
transaction on connection 2 to commit, yielding a non-serializable
result. This makes a pretty strong case for back-patching this
commit:

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=fcff8a575198478023ada8a48e13b50f70054766

I have confirmed that this patch applies cleanly to all supported
branches (with some offsets), that the bug is manifest without the
patch, and that it is fixed with this patch in all supported
branches. This patch has been in the development code base for
about 8 months and in production with the 9.6 release, so it has
been in active production for 3 months with no sign of trouble. If
you ignore the code comment, doc changes, and new regression tests
it consists of adding one line of code to nbtinsert.c. What it
does is that when a "unique_violation" error is about to fire, it
adds a check to see whether the conditions for a serialization
failure also exist; if so, it fires that error instead.

This was not initially back-patched, in spite of numerous requests
to do so, because it was a behavior change and not clearly a bug --
it has a least some minimal chance of changing behavior someone
might be relying on; however, Ian has constructed a use-case where
without this patch we clearly allow a serialization anomaly which
is not allowed with the patch, so this patch should, IMO, be
considered a bug fix on that basis.

Barring objections I will back-patch to 9.2 through 9.5 tomorrow.
(9.1 is out of support and the fix is already in 9.6 and forward.)

+1

Ian's test case uses an exception handler to convert a difference in
error code into a difference in committed effect, thereby converting a
matter of programmer convenience into a bug.

For the record, read-write-unique-4.spec's permutation r2 w1 w2 c1 c2
remains an open question for further work.

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Ian Jackson
ian.jackson@eu.citrix.com
In reply to: Ian Jackson (#1)
Re: [OSSTEST PATCH 0/1] PostgreSQL db: Retry on constraint violation [and 2 more messages]

Thanks to everyone for your attention.

Kevin Grittner writes ("Re: [HACKERS] [OSSTEST PATCH 0/1] PostgreSQL db: Retry on constraint violation"):

On Mon, Dec 12, 2016 at 8:45 AM, Ian Jackson <ian.jackson@eu.citrix.com> wrote:

AIUI the documented behavour is that "every set of successful
transactions is serialisable".

Well, in context that is referring to serializable transactions.
No such guarantee is provided for other isolation levels.

Indeed.

I didn't [get the same results]. First, I got this when I tried to
start the concurrent transactions using the example as provided:

I'm sorry, I didn't actually try my paraphrased repro recipe, so it
contained an error:

test=# SELECT count(*) FROM t WHERE k=1; -- save value

My Perl code said "k=?" and of course ? got (effectively) substituted
with "'1'" rather than "1". I introduced the error when transcribing
the statements from my Perl script to my prose. Sorry about that.
Next time I will test my alleged recipe with psql.

I should also have told you that I was running this on 9.1.

Kevin Grittner writes ("Re: [HACKERS] [OSSTEST PATCH 0/1] PostgreSQL db: Retry on constraint violation"):

On Mon, Dec 12, 2016 at 12:32 PM, Kevin Grittner <kgrittn@gmail.com> wrote:

As you can see, this generated a serialization failure.

That was on 9.6. On earlier versions it does indeed allow the
transaction on connection 2 to commit, yielding a non-serializable
result. This makes a pretty strong case for back-patching this
commit:

Right. That's part of my point. Thanks.

[back to the first message]

If you have some way to cause a set of concurrent serializable
transactions to generate results from those transactions which
commit which is not consistent with some one-at-a-time order of
execution, I would be very interested in seeing the test case.
The above, however, is not it.

I am concerned that there are other possible bugs of this form.
In earlier messages on this topic, it has been suggested that the
"impossible" unique constraint violation is only one example of a
possible "leakage".

Earlier you wrote:

If I recall correctly, the constraints for which there can be
errors appearing due to concurrent transactions are primary key,
unique, and foreign key constraints. I don't remember seeing it
happen, but it would not surprise me if an exclusion constraint can
also cause an error due to a concurrent transaction's interaction
with the transaction receiving the error.

Are all of these cases fixed by fcff8a57519847 "Detect SSI conflicts
before reporting constraint violations" ?

I can try playing around with other kind of constraints, to try to
discover different aspects or versions of this bug, but my knowledge
of the innards of databases is very limited and I may not be
particularly effective. Certainly if I try and fail, I wouldn't have
confidence that no such bug existed.

And,

Thomas Munro writes ("Re: [HACKERS] [OSSTEST PATCH 0/1] PostgreSQL db: Retry on constraint violation"):

Ian's test case uses an exception handler to convert a difference in
error code into a difference in committed effect, thereby converting a
matter of programmer convenience into a bug.

Precisely. I think this is a fully general technique, which means
that any situation where a transaction can "spuriously" fail is a
similar bug. So I think that ISOLATION LEVEL SERIALIZABLE needs to do
what a naive programmer would expect:

All statements in such transactions, even aborted transactions, need
to see results, and have behaviour, which are completely consistent
with some serialisaton of all involved transactions. This must apply
up to (but not including) any serialisation failure error.

If that is the behaviour of 9.6 then I would like to submit a
documentation patch which says so. If the patch is to be backported,
then this ought to apply to all (patched) 9.x versions ?

It would be nice if the documentation stated the error codes that
might be generated. AFAICT that's just 40P01 and 40001 ? (I'm not
sure what 40002 is.)

For the record, read-write-unique-4.spec's permutation r2 w1 w2 c1 c2
remains an open question for further work.

Is this another possible bug of this form ?

Ian.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Ian Jackson (#9)
Re: [OSSTEST PATCH 0/1] PostgreSQL db: Retry on constraint violation [and 2 more messages]

On Tue, Dec 13, 2016 at 5:30 AM, Ian Jackson <ian.jackson@eu.citrix.com> wrote:

I am concerned that there are other possible bugs of this form.
In earlier messages on this topic, it has been suggested that the
"impossible" unique constraint violation is only one example of a
possible "leakage".

As I see it, the main point of serializable transactions is to
prevent serialization anomalies from being persisted in the
database or seen by a serializable transaction which successfully
commits. It is certainly very nice from a programming perspective
if the SQLSTATE permits easy identification of which failures it
can be expected to probably yield a different result on retry, but
it doesn't seem to me that the standard requires that, and other
researchers and developers in this area have taken advantage of the
fact that constraints prevent certain types of serialization
anomalies from reaching the database. In the initial
implementation of serializable transactions we noted papers that
described this, and counted on it for correctness.

Note that with a one year development cycle for major releases, a
feature often initially gets implemented in a "bare bones" format
that is useful but not ideal, and later releases build on it.
Unfortunately there has not been anyone putting the resources into
building on the initial implementation (in 9.1) as the current
implementation has worked well enough for people to be focused on
other areas.

Earlier you wrote:

If I recall correctly, the constraints for which there can be
errors appearing due to concurrent transactions are primary key,
unique, and foreign key constraints. I don't remember seeing it
happen, but it would not surprise me if an exclusion constraint can
also cause an error due to a concurrent transaction's interaction
with the transaction receiving the error.

Are all of these cases fixed by fcff8a57519847 "Detect SSI conflicts
before reporting constraint violations" ?

No. It was specifically meant to address duplicate keys, and there
was one particular set of steps which it was not able to address.
See post by Thomas Munro. Hopefully, he, I, or someone else will
have a chance to work on the one known remaining issue and look for
others. Your efforts have been helpful; it would be great if you
can find and document any other test cases which show a
less-than-ideal SQLSTATE or other outright serialization anomalies.

I can try playing around with other kind of constraints, to try to
discover different aspects or versions of this bug, but my knowledge
of the innards of databases is very limited and I may not be
particularly effective. Certainly if I try and fail, I wouldn't have
confidence that no such bug existed.

Right. Proving the absence of any bug when dealing with race
conditions is notoriously hard.

All statements in such transactions, even aborted transactions, need
to see results, and have behaviour, which are completely consistent
with some serialisaton of all involved transactions. This must apply
up to (but not including) any serialisation failure error.

If I understand what you are saying, I disagree. To prevent
incorrect results from being returned even when a transaction later
fails with a serialization failure would require blocking, and
would have a major detrimental effect on both concurrency and
throughput compared to the techniques PostgreSQL is using. As far
as I'm aware, the guarantee you seek can only be provided by strict
two phase locking (S2PL). Benchmarks by Dan R. K. Ports of MIT
CSAIL showed S2PL to be consistently slower than the SSI techniques
used by PostgreSQL -- with throughput up to almost 5x worse in some
workloads, even with SSI's requirement that results from a
transaction which later gets a serialization failure must be
ignored. Please see Section 8, and particularly the performance of
S2PL in Figure 4 of this paper, which Dan and I presented to the
VLDB conference in Istanbul:

http://vldb.org/pvldb/vol5/p1850_danrkports_vldb2012.pdf

It would be nice if the documentation stated the error codes that
might be generated. AFAICT that's just 40P01 and 40001 ?

Those are the only ones I know of.

(I'm not sure what 40002 is.)

That doesn't appear to me to be related to transaction
serialization issues.

For the record, read-write-unique-4.spec's permutation r2 w1 w2 c1 c2
remains an open question for further work.

Is this another possible bug of this form ?

Yes. See the last specified permutation in this file:

https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/test/isolation/specs/read-write-unique-4.spec;h=ec447823484cff99a61f2c2e67ed3aef2210d680;hb=refs/heads/master

If it's not clear how to read that to construct the problem case,
the README file might help:

https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/test/isolation/README;h=bea278a856fffc911a7819fe8055f7e328fcac5f;hb=refs/heads/master

I guess it's short enough to just paste here:

*** SETUP ***
CREATE TABLE invoice (
year int,
invoice_number int,
PRIMARY KEY (year, invoice_number)
);
INSERT INTO invoice VALUES (2016, 1), (2016, 2);

*** CONNECTION2 ***
BEGIN ISOLATION LEVEL SERIALIZABLE;
SELECT COALESCE(MAX(invoice_number) + 1, 1) FROM invoice WHERE year = 2016;

*** CONNECTION1 ***
BEGIN ISOLATION LEVEL SERIALIZABLE;
INSERT INTO invoice VALUES (2016, 3);

*** CONNECTION2 ***
INSERT INTO invoice VALUES (2016, 3);

Well, the test includes commits and teardown, but this gets you to
the problem. Connection2 gets this:

ERROR: duplicate key value violates unique constraint "invoice_pkey"
DETAIL: Key (year, invoice_number)=(2016, 3) already exists.

If connection1 had explicitly read the "gap" into which it inserted
its row (i.e., with a SELECT statement) there would be a
serialization failure instead. Getting the RI index maintenance to
register as a read for this purpose is a bit tricky, and we don't
yet have a working patch for that.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Ian Jackson
ian.jackson@eu.citrix.com
In reply to: Kevin Grittner (#10)
Re: [OSSTEST PATCH 0/1] PostgreSQL db: Retry on constraint violation [and 2 more messages]

Kevin Grittner writes ("Re: [HACKERS] [OSSTEST PATCH 0/1] PostgreSQL db: Retry on constraint violation [and 2 more messages]"):

On Tue, Dec 13, 2016 at 5:30 AM, Ian Jackson <ian.jackson@eu.citrix.com> wrote:

Are all of these cases fixed by fcff8a57519847 "Detect SSI conflicts
before reporting constraint violations" ?

No. It was specifically meant to address duplicate keys, and there
was one particular set of steps which it was not able to address.
See post by Thomas Munro. Hopefully, he, I, or someone else will
have a chance to work on the one known remaining issue and look for
others. Your efforts have been helpful; it would be great if you
can find and document any other test cases which show a
less-than-ideal SQLSTATE or other outright serialization anomalies.

Well, my own usage of postgresql is not really that advanced and we do
not have very many complicated constraints. So for projects I'm
responsible for, what has been done in 9.6 is going to be good enough
in practice (when it finally bubbles through via my distro etc.); and
I have a workaround for now.

But I am trying to save others (who may have more complicated
situations) from being misled.

All statements in such transactions, even aborted transactions, need
to see results, and have behaviour, which are completely consistent
with some serialisaton of all involved transactions. This must apply
up to (but not including) any serialisation failure error.

If I understand what you are saying, I disagree.

But I have just demonstrated a completely general technique which can
be used to convert any "spurious" constraint failure into a nonsense
transaction actually committed to the database. Of course my
transactions are contrived, but I don't see a way to rule out the
possibility that a real application might do something similar.

I think there are only two possible coherent positions:

1. Declare that all spurious failures, in SERIALIZABLE transactions,
are bugs.

2. State that the SERIALIZABLE guarantee in Postgresql only applies to
transactions which (a) complete successsfully and (b) contain only
very simple pgsql constructs.

I think (2) would be rather a bad concession! (It is probably also
contrary to some standards document somewhere, if that matters.)
Furthermore if you adopt (2) you would have to make a list of the
"safe" pgsql constructs.

That would definitely exclude the exception trapping facility; but
what other facilities or statements might be have similar effects ?
ISTM that very likely INSERT ... ON CONFLICT can be used the same way.
Surely you do not want to say "PostgreSQL does not give a
transactional guarantee when INSERT ... ON CONFLICT is used".

To prevent incorrect results from being returned even when a
transaction later fails with a serialization failure would require
blocking

I'm afraid I don't understand enough about database implementation to
answer this with confidence. But this does not seem likely to be
true. Blocking can surely always be avoided, by simply declaring a
serialisation failure instead of blocking. I have no idea whether
that is a practical approach in the general case, or whether it brings
its own problems.

But whether or not it is true, I think that "it's hard" is not really
a good answer to "PostgreSQL should implement behaviour for
SERIALIZABLE which can be coherently described and which covers
practically useful use cases".

I am concerned that there are other possible bugs of this form.
In earlier messages on this topic, it has been suggested that the
"impossible" unique constraint violation is only one example of a
possible "leakage".

As I see it, the main point of serializable transactions is to
prevent serialization anomalies from being persisted in the
database or seen by a serializable transaction which successfully
commits.

As I have demonstrated, "spurious error" conditions can result in
"serialisation anomaly persisted in the database". So there is not a
real distinction here.

There is another very important use case for serialisable transactions
which is read-only report transactions. An application doing such a
transaction needs to see a consistent snapshot. Such a transaction is
readonly so will never commit.

Any reasonable definition of what SERIALIZABLE means needs to give a
sensible semantics for readonly transactions.

Well, the test includes commits and teardown, but this gets you to
the problem. Connection2 gets this:

ERROR: duplicate key value violates unique constraint "invoice_pkey"
DETAIL: Key (year, invoice_number)=(2016, 3) already exists.

I think my error trapping technique can be used to convert this into
a scenario which commits "impossible" information to the database.

Do you agree ?

If connection1 had explicitly read the "gap" into which it inserted
its row (i.e., with a SELECT statement) there would be a
serialization failure instead. Getting the RI index maintenance to
register as a read for this purpose is a bit tricky, and we don't
yet have a working patch for that.

Of course I understand that these problems may be hard to solve. But,
I think it needs to be recognised that these problems are all bugs.

And, then, we should aim to have an understanding of what the scope of
these bugs are. That way something can be written in the
documentation.

"If you do not use features X or Y, these bugs will not affect you" is
a useful thing to perhaps say, even if X or Y is as general as "INSERT
... ON CONFLICT" or "... EXCEPTION WHEN ...".

Ian.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Ian Jackson (#11)
Re: [OSSTEST PATCH 0/1] PostgreSQL db: Retry on constraint violation [and 2 more messages]

On Tue, Dec 13, 2016 at 9:50 AM, Ian Jackson <ian.jackson@eu.citrix.com> wrote:

Kevin Grittner writes ("Re: [HACKERS] [OSSTEST PATCH 0/1] PostgreSQL db: Retry on constraint violation [and 2 more messages]"):

On Tue, Dec 13, 2016 at 5:30 AM, Ian Jackson <ian.jackson@eu.citrix.com> wrote:

Are all of these cases fixed by fcff8a57519847 "Detect SSI conflicts
before reporting constraint violations" ?

No. It was specifically meant to address duplicate keys, and there
was one particular set of steps which it was not able to address.
See post by Thomas Munro. Hopefully, he, I, or someone else will
have a chance to work on the one known remaining issue and look for
others. Your efforts have been helpful; it would be great if you
can find and document any other test cases which show a
less-than-ideal SQLSTATE or other outright serialization anomalies.

Well, my own usage of postgresql is not really that advanced and we do
not have very many complicated constraints. So for projects I'm
responsible for, what has been done in 9.6 is going to be good enough
in practice (when it finally bubbles through via my distro etc.); and
I have a workaround for now.

I worked in a shop with over 30,000 types of transactions, and
*none* of them involved using a subtransaction to catch and ignore
an attempted constraint violation. I know of several larger shops
which have not experienced such problems, either. I agree you
found a bug and it needs to be fixed, but it doesn't seem to be a
bug that occurs in common usage patterns; otherwise, it would
probably have taken less than five years for it to be found.

But I am trying to save others (who may have more complicated
situations) from being misled.

A motive I certainly appreciate.

All statements in such transactions, even aborted transactions, need
to see results, and have behaviour, which are completely consistent
with some serialisaton of all involved transactions. This must apply
up to (but not including) any serialisation failure error.

If I understand what you are saying, I disagree.

But I have just demonstrated a completely general technique which can
be used to convert any "spurious" constraint failure into a nonsense
transaction actually committed to the database.

Really? I see a lot of daylight between "you must discard any
results from a transaction which later throws a serialization
failure" and "if you catch the exception from a constraint
violation within a subtransaction it is possible to create a
serialization anomaly". I don't actually see how one relates to
the other. What am I missing?

I think there are only two possible coherent positions:

1. Declare that all spurious failures, in SERIALIZABLE transactions,
are bugs.

It's not clear to me what you mean here.

2. State that the SERIALIZABLE guarantee in Postgresql only applies to
transactions which (a) complete successsfully and (b) contain only
very simple pgsql constructs.

About 24 hours ago you reported a bug which was hard enough to hit
that it took 5 years for anyone to find it. I think you might
consider allowing a bit more time to analyze the situation before
declaring that serializable transactions only work "in very simple
constructs". Some of the transactions with which I have seen it
reliably work involved in excess of 20,000 lines of procedural
code. What it seems to me we should do is try to identify the
conditions under which a bug can occur (which seems likely to be
limited to a subset of cases where errors thrown by declarative
constraints are caught within a subtransaction, and the
subtransaction work is discarded without throwing another error),
and see whether the bugs can be fixed.

I think (2) would be rather a bad concession! (It is probably also
contrary to some standards document somewhere, if that matters.)
Furthermore if you adopt (2) you would have to make a list of the
"safe" pgsql constructs.

I think the unsafe constructs are likely to be a *much* shorter list.

That would definitely exclude the exception trapping facility; but
what other facilities or statements might be have similar effects ?
ISTM that very likely INSERT ... ON CONFLICT can be used the same way.
Surely you do not want to say "PostgreSQL does not give a
transactional guarantee when INSERT ... ON CONFLICT is used".

I think we've chased down the bugs in this area. Counterexamples
welcome.

To prevent incorrect results from being returned even when a
transaction later fails with a serialization failure would require
blocking

I'm afraid I don't understand enough about database implementation to
answer this with confidence. But this does not seem likely to be
true. Blocking can surely always be avoided, by simply declaring a
serialisation failure instead of blocking. I have no idea whether
that is a practical approach in the general case, or whether it brings
its own problems.

The "false positive" rate (throwing a serialization failure error
when there would not have been a serialization anomaly without the
error) would be horrible. I don't think performance would be
acceptable to very many.

As a side-note, PostgreSQL originally used S2PL, and therefore
provided exactly the guarantees you're looking for (modulus any
bugs which might have been there); but that was ripped out 18 years
ago due to the horrible performance, in favor of providing snapshot
isolation.

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=3f7fbf85dc5b42dfd33c803efe6c90533773576a

Personally, I don't think the community properly understood what
they were giving up in doing so, and the documentation glossed over
the differences until I made an effort to give people a clearer
idea of the situation. During discussions, the paper on the
Serializable Snapshot Isolation technique was pointed out (it won
the "Best Paper" award at ACM SIGMOD 2008, a very prestigious
conference in the database world) and I jumped on using that to try
to improve the behavior, rather than just the docs.

But whether or not it is true, I think that "it's hard" is not really
a good answer to "PostgreSQL should implement behaviour for
SERIALIZABLE which can be coherently described and which covers
practically useful use cases".

The behavior we have been providing, modulus the bug you just found
and any related bugs of the same ilk, is "we don't allow
serialization anomalies into the database or in results from
serializable transactions which commit." That is enough to be very
useful to many.

I am concerned that there are other possible bugs of this form.
In earlier messages on this topic, it has been suggested that the
"impossible" unique constraint violation is only one example of a
possible "leakage".

As I see it, the main point of serializable transactions is to
prevent serialization anomalies from being persisted in the
database or seen by a serializable transaction which successfully
commits.

As I have demonstrated, "spurious error" conditions can result in
"serialisation anomaly persisted in the database". So there is not a
real distinction here.

You found a bug. Thanks. Let's not exaggerate the issue; let's fix it.

There is another very important use case for serialisable transactions
which is read-only report transactions. An application doing such a
transaction needs to see a consistent snapshot. Such a transaction is
readonly so will never commit.

Yes, it will. It can also get a serialization failure before it
does unless you use the DEFERRABLE option.

https://www.postgresql.org/docs/current/static/sql-set-transaction.html

| The DEFERRABLE transaction property has no effect unless the
| transaction is also SERIALIZABLE and READ ONLY. When all three of
| these properties are selected for a transaction, the transaction
| may block when first acquiring its snapshot, after which it is
| able to run without the normal overhead of a SERIALIZABLE
| transaction and without any risk of contributing to or being
| canceled by a serialization failure. This mode is well suited for
| long-running reports or backups.

Any reasonable definition of what SERIALIZABLE means needs to give a
sensible semantics for readonly transactions.

I certainly agree. I don't see where it doesn't.

Well, the test includes commits and teardown, but this gets you to
the problem. Connection2 gets this:

ERROR: duplicate key value violates unique constraint "invoice_pkey"
DETAIL: Key (year, invoice_number)=(2016, 3) already exists.

I think my error trapping technique can be used to convert this into
a scenario which commits "impossible" information to the database.

Do you agree ?

It seems likely, which converts this from a "convenience feature
request" to a bug, as of yesterday. You are certainly very
fortunate if you have worked for a software company which never had
a bug that took longer than a day to fix.

If connection1 had explicitly read the "gap" into which it inserted
its row (i.e., with a SELECT statement) there would be a
serialization failure instead. Getting the RI index maintenance to
register as a read for this purpose is a bit tricky, and we don't
yet have a working patch for that.

Of course I understand that these problems may be hard to solve. But,
I think it needs to be recognised that these problems are all bugs.

Yes, when they are shown to allow serialization anomalies, they
certainly are bugs. That has now happened.

And, then, we should aim to have an understanding of what the scope of
these bugs are. That way something can be written in the
documentation.

You sure don't aim very high. Given a higher priority now that
they have been shown to *be* bugs, I expect that they will be fixed
fairly quickly.

"If you do not use features X or Y, these bugs will not affect you" is
a useful thing to perhaps say, even if X or Y is as general as "INSERT
... ON CONFLICT" or "... EXCEPTION WHEN ...".

Well, if they are not fixed soon, it might be worth adding
something to the docs; but as far as I can see, the specifics would
be more like "If you catch errors generated by declarative
constraints or unique index insertion within a subtransaction, and
then discard the work of the subtransaction without throwing any
error, you might create a serialization anomaly, depending on what
else you do within the transaction." Have you seen anything that
suggests that this is not a complete description of the scope of
the issue? I don't see any point in a statement that there might
be undiscovered bugs in the software, either in general or relative
to any particular feature.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Ian Jackson
ian.jackson@eu.citrix.com
In reply to: Kevin Grittner (#12)
Re: [OSSTEST PATCH 0/1] PostgreSQL db: Retry on constraint violation [and 2 more messages]

Kevin Grittner writes ("Re: [HACKERS] [OSSTEST PATCH 0/1] PostgreSQL db: Retry on constraint violation [and 2 more messages]"):

[various things]

I get the feeling from your message that I have irritated you. I'm
sorry if I have. In particular, I wanted to say that I certainly
don't expect bugs to be fixed immediately. I appreciate your
attention to detail. I also appreciate you taking the time to deal on
your -hackers list with someone who really isn't a database hacker!

I still hope to be able to convince you that the definition of
SERIALIZABLE (in the pgsql docs) ought to be a stronger version, which
covers even non-committed transactions.

[Ian Jackson:]

But I have just demonstrated a completely general technique which can
be used to convert any "spurious" constraint failure into a nonsense
transaction actually committed to the database.

Really? I see a lot of daylight between "you must discard any
results from a transaction which later throws a serialization
failure" and "if you catch the exception from a constraint
violation within a subtransaction it is possible to create a
serialization anomaly". I don't actually see how one relates to
the other. What am I missing?

I don't think it can be useful to state (in the pgsql docs) the caveat
you propose, to wit:
"you must discard any results from a transaction which later throws
a serialization failure"

I think any scenario where this caveat declares a behaviour to be "not
a bug", can be converted into a scenario where there is undoubtedly a
bug. So this caveat does not actually reduce the difficulty of
implementing the promises made by the documentation.

The conversion is as follows: if a scenario is affected by the caveat,
in there must be at least one transaction T which firstly produces
"impossible" results I, and in which some later statement S produces a
serialisation failure.

To exhibit the corresponding unavoidable bug: Execute an identical
scenario, with exactly the same sequence of steps in the same order,
up to S. However, instead of S, execute ROLLBACK.

Now this anomalous transaction T' is not "a transaction which later
throws a serialization failure". There is no requirement to discard
the "impossible" results I. The results I are supposed to be correct,
but they are not.

(I am making the hopefully-warranted assumption that ROLLBACK cannot
itself cause a serialisation failure.)

In theory this argument does not prove that every bug which the caveat
is trying to address, is as important as the same bug without the
caveat. This is because perhaps T' is an unrealistic transaction
somehow.

But I don't really think that the primary documentation should be
complicated, and should give weaker statements about the system
behaviour, simply as an aid to bug prioritisation.

1. Declare that all spurious failures, in SERIALIZABLE transactions,
are bugs.

It's not clear to me what you mean here.

I mean that the pgsql documentation should simply say that for
SERIALIZABLE transactions, there is always a coherent serialisation.
Even if the transaction failed due to a constraint violation, or
serialisation failure, or database disconnection, or whatever.

I think there is no "useful" weaker guarantee.

By which I mean that for any reasonable weaker guarantee: any scenario
in which pgsql violates the stronger guarantee, can be converted into
a scenario where pgsql violates the weaker guarantee.

Conversely, any correct implementation of the weaker guarantee
necessarily implements the stronger guarantee too. So the weaker
guarantee does not make implementation easier. All it does is
complicate application code.

2. State that the SERIALIZABLE guarantee in Postgresql only applies to
transactions which (a) complete successsfully and (b) contain only
very simple pgsql constructs.

About 24 hours ago you reported a bug which was hard enough to hit
that it took 5 years for anyone to find it.

Well, of course I have a slightly different point of view.

The fact that pgsql might generate "spurious" constraint violations,
even in SERIALIZABLE transactions, has been known for many years. See
the URLs etc. in my patch, which include a reference from 2009. Many
people have always regarded this as a bug. The pgsql authors have not
agreed.

From my point of view, I have developed a trick which allows me to
convince you that it has been a bug all along :-).

The behavior we have been providing, modulus the bug you just found
and any related bugs of the same ilk, is "we don't allow
serialization anomalies into the database or in results from
serializable transactions which commit." That is enough to be very
useful to many.

As I explain, I think that if you make that promise, you will have to
do all the work necessary to make the stronger promise anyway.

So you may as well make the stronger (and more comprehensible!)
promise.

Of course I understand that these problems may be hard to solve. But,
I think it needs to be recognised that these problems are all bugs.

Yes, when they are shown to allow serialization anomalies, they
certainly are bugs. That has now happened.

I think all scenarios of this kind can be shown to allow serialisation
anomalies.

"If you do not use features X or Y, these bugs will not affect you" is
a useful thing to perhaps say, even if X or Y is as general as "INSERT
... ON CONFLICT" or "... EXCEPTION WHEN ...".

Well, if they are not fixed soon, it might be worth adding
something to the docs; but as far as I can see, the specifics would
be more like "If you catch errors generated by declarative
constraints or unique index insertion within a subtransaction, and
then discard the work of the subtransaction without throwing any
error, you might create a serialization anomaly, depending on what
else you do within the transaction." Have you seen anything that
suggests that this is not a complete description of the scope of
the issue? I don't see any point in a statement that there might
be undiscovered bugs in the software, either in general or relative
to any particular feature.

I'm afraid I have no idea. But if that is a complete description of
the bug (if it doesn't affect "INSERT ... ON CONFLICT" for example)
then that is good.

Thanks,
Ian.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Ian Jackson (#13)
Re: [OSSTEST PATCH 0/1] PostgreSQL db: Retry on constraint violation [and 2 more messages]

On Tue, Dec 13, 2016 at 12:00 PM, Ian Jackson <ian.jackson@eu.citrix.com> wrote:

Kevin Grittner writes:

I still hope to be able to convince you that the definition of
SERIALIZABLE (in the pgsql docs) ought to be a stronger version, which
covers even non-committed transactions.

That doesn't seem likely. The stronger definition would perform so
poorly that even if you managed to convince me, the PostgreSQL
community as a whole has already rejected it.

"you must discard any results from a transaction which later throws
a serialization failure"

You argue that saying "If A happens, you must not use the results"
means that "If A doesn't happen you can use the results." That
does not logically follow; your argument based on it has no merit.
Basically, a serializable transaction must successfully commit in
order to consider any results from it to be valid. The quote above
is discussing one way that can happen, which does not preclude
other ways.

1. Declare that all spurious failures, in SERIALIZABLE transactions,
are bugs.

It's not clear to me what you mean here.

I mean that the pgsql documentation should simply say that for
SERIALIZABLE transactions, there is always a coherent serialisation.
Even if the transaction failed due to a constraint violation, or
serialisation failure, or database disconnection, or whatever.

I think there is no "useful" weaker guarantee.

That's wrong on the face of it, and certainly at odds with all
research papers and standards in the area.

By which I mean that for any reasonable weaker guarantee: any scenario
in which pgsql violates the stronger guarantee, can be converted into
a scenario where pgsql violates the weaker guarantee.

You have failed to provide any evidence of that. Twisting
statements and the principles of logical inference don't advance
your position.

The fact that pgsql might generate "spurious" constraint violations,
even in SERIALIZABLE transactions, has been known for many years.

Yes, it was clearly discussed during development.

See the URLs etc. in my patch, which include a reference from 2009.

This paper from 2007 was among those cited during development of
the PostgreSQL serializable implementation:

Automating the Detection of Snapshot Isolation Anomalies.
Sudhir Jorwekar, Alan Fekete, Krithi Ramamritham, S. Sudarshan.
VLDB ‘07, September 23-28, 2007, Vienna, Austria.
https://www.cse.iitb.ac.in/~sudarsha/Pubs-dir/VLDB07-snapshot.pdf

Among relevant statements in that paper:

| The database system ensures the preservation of some integrity
| constraints which are explicitly declared to the system in the
| schema definition, such as uniqueness of primary key and
| referential integrity. Some of the SI anomalies are avoided due
| to the dbms enforcement of these constraints.

Of course, both pre-date development of the feature itself -- which
was released as part of PostgreSQL 9.1, in September, 2011.

Many people have always regarded this as a bug. The pgsql
authors have not agreed.

Right. For some people, a system has a bug if it does not behave
as they would most like it to. In my world, it is not a bug if it
is functioning as defined, intended, and documented. The change
requested is a feature request, and a feature I would also like to
see. It is not one that anyone has offered to pay to have
developed, which is a significant barrier to implementation.

The behavior we have been providing, modulus the bug you just found
and any related bugs of the same ilk, is "we don't allow
serialization anomalies into the database or in results from
serializable transactions which commit." That is enough to be very
useful to many.

As I explain, I think that if you make that promise, you will have to
do all the work necessary to make the stronger promise anyway.

That is absolutely *not* true of your suggestion that results
returned from a serializable transaction which does not
subsequently commit must be free from serialization anomalies.
You are just factually wrong there.

I think all scenarios of this kind can be shown to allow serialisation
anomalies.

Which kind? Catching an exception from a declarative constraint?
Not all of them can, especially after the patch that went into 9.6
and that I'm intending to back-patch to other supported branches
shortly. Other exceptions? I have seen no evidence that any
others can, and I have a lot of reasons to believe that the special
properties of the accesses from constraint implementations which
allow those exceptions to be problematic are not present for other
exceptions.

Now, that is not an assertion that it is impossible for the code to
have some other bug, but if there is such a bug we would need to
find it to be able to fix it.

I'm afraid I have no idea. But if that is a complete description of
the bug (if it doesn't affect "INSERT ... ON CONFLICT" for example)
then that is good.

A bug was found in the initial release of INSERT ... ON CONFLICT,
and fixed in the next minor release after it was found. Will that
be the last bug ever found? I don't know. We still occasionally
find bugs in other areas of PostgreSQL which have been around for
decades without being noticed; if it's hard to hit, any bug could
lurk undetected for a while.

I appreciate your pointing out that catching a constraint error in
a subtransaction and throwing away the work of the subtransaction
without re-throwing that exception or throwing a new exception, in
combination with other factors, can cause a serialization anomaly.
That was helpful. If you find any other way to create a
serialization anomaly in the database or in query results from a
transaction which does not fit this pattern, I would be interested
in hearing about it.

I won't respond to any further repetition of the assertion that it
is a bug that a serializable transaction which does not
successfully commit can return results that represent a
serialization anomaly. That is a design choice which allows much
higher concurrency and throughput than what you suggest as an
alternative, and the community has already vigorously rejected
taking the slow approach on that -- even if I was somehow convinced
I'm quite sure I couldn't move the community to a consensus on that
position. Calling it a bug requires a rather tortured definition
of what constitutes a bug (basically, designed differently than you
think you would prefer). There's no benefit to you, me, or the
thousands who read this list to keep repeating that.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Thomas Munro (#8)
Re: [OSSTEST PATCH 0/1] PostgreSQL db: Retry on constraint violation

On Mon, Dec 12, 2016 at 4:46 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

On Tue, Dec 13, 2016 at 10:47 AM, Kevin Grittner <kgrittn@gmail.com> wrote:

Barring objections I will back-patch to 9.2 through 9.5 tomorrow.
(9.1 is out of support and the fix is already in 9.6 and forward.)

+1

Done.

I will add a test case based on Ian's work to the master branch to
show the bug if this were to be reverted, but first want to look
for similar cases, so we know the scope of the issue and we fix
what we can. I probably won't get to that until early next month,
since I'm on vacation the last two weeks in December and am trying
to tie up some loose ends on other things first.

Thanks, Ian, for the example to show that this should be considered
a bug!

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16Robert Haas
robertmhaas@gmail.com
In reply to: Ian Jackson (#13)
Re: [OSSTEST PATCH 0/1] PostgreSQL db: Retry on constraint violation [and 2 more messages]

On Tue, Dec 13, 2016 at 1:00 PM, Ian Jackson <ian.jackson@eu.citrix.com> wrote:

The conversion is as follows: if a scenario is affected by the caveat,
in there must be at least one transaction T which firstly produces
"impossible" results I, and in which some later statement S produces a
serialisation failure.

To exhibit the corresponding unavoidable bug: Execute an identical
scenario, with exactly the same sequence of steps in the same order,
up to S. However, instead of S, execute ROLLBACK.

I am having a hard time understanding exactly what the argument on
this thread is about, but I want to comment on this point.

Saying that a set of transactions are serializable is equivalent to
the statement that there is some serial order of execution which would
have produced results equivalent to the actual results. That is,
there must be at least one schedule (T1, ..., Tn) such that running
the transactions one after another in that order would have produced
the same results that were obtained by running them concurrently.
Any transactions that roll back whether due to serialization anomalies
or manual user intervention or any other reason are not part of the
schedule, which considers only successfully committed transactions.
The rolled-back transactions effectively didn't happen, and
serializability as a concept makes no claim about the behavior of such
transactions. That's not a PostgreSQL thing: that's database theory.

However, in practice, the scenario that you mention should generally
work fine in PostgreSQL, I think. If T performed any writes, the
rollback throws them away, so imagine removing the actual T from the
mix and replacing it with a substitute version T' which performs the
same reads but no writes and then tries to COMMIT where T tried to
ROLLBACK. T' will succeed, because we never roll back a read-only
transaction at commit time. If it were to fail, it would have to fail
*while performing one of the reads*, not later.

But imagine a hypothetical database system in which anomalies are
never detected until commit time. We somehow track the global
must-happen-before graph and refuse the commit of any transaction
which will create a cycle. Let's also suppose that this system uses
snapshots to implement MVCC. In such a system, read-only transactions
will sometimes fail at commit time if they've seen a view of the world
that is inconsistent with the only remaining possible serial
schedules. For example, suppose T1 updates A -> A' and reads B.
Concurrently, T2 updates B -> B'; thus, T1 must precede T2. Next, T2
commits. Now, T3 begins and reads B', so that T2 must precede T3.
Next T1 commits. T3, which took its snapshot before the commit of T1,
now reads A. Thus T3 has to proceed T1. That's a cycle, so T3 won't
be allowed to commit, but yet it's done a couple of reads and hasn't
failed yet... because of an implementation detail of the system.
That's probably annoying from a user perspective -- if a transaction
is certainly going to fail we'd like to report the failure as early as
possible -- and it's probably crushingly slow, but according to my
understanding it's unarguably a correct implementation of
serializability (assuming there are no bugs), yet it doesn't deliver
the guarantee you're asking for here.

I have not read any database literature on the interaction of
serializability with subtransactions. This seems very thorny.
Suppose T1 reads A and B and updates A -> A' while concurrently T2
reads A and B and updates B -> B'. This is obviously not
serializable; if either transaction had executed before the other in a
serial schedule, the second transaction in the schedule would have had
to have seen (A, B') or (A', B) rather than (A, B), but that's not
what happened. But what if each of T1 and T2 did the reads in a
subtransaction, rolled it back, and then did the write in the main
transaction and committed? The database system has two options.
First, it could assume that the toplevel transaction may have relied
on the results of the aborted subtransaction. But if it does that,
then any serialization failure which afflicts a subtransaction must
necessarily also kill the main transaction, which seems pedantic and
unhelpful. If you'd wanted the toplevel transaction to be killled,
you wouldn't have used a subtransaction, right? Second, it could
assume that the toplevel transaction in no way relied on or used the
values obtained from the aborted subtransaction. However, that
defeats the whole purpose of having subtransactions in the first
place. What's the point of being able to start subtransactions if you
can't roll them back and then decide to do something else instead? It
seems to me that what the database system should do is make that
second assumption, and what the user should do is understand that to
the degree that transactions depend on the results of aborted
subtransactions, there may be serialization anomalies that go
undetected. And the user should put up with that because the cure is
worse than the disease. Maybe there's a more formally satisfying
answer than that but I'm not really seeing it...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Robert Haas (#16)
Re: [OSSTEST PATCH 0/1] PostgreSQL db: Retry on constraint violation [and 2 more messages]

Robert Haas wrote:

I have not read any database literature on the interaction of
serializability with subtransactions. This seems very thorny.
Suppose T1 reads A and B and updates A -> A' while concurrently T2
reads A and B and updates B -> B'. This is obviously not
serializable; if either transaction had executed before the other in a
serial schedule, the second transaction in the schedule would have had
to have seen (A, B') or (A', B) rather than (A, B), but that's not
what happened. But what if each of T1 and T2 did the reads in a
subtransaction, rolled it back, and then did the write in the main
transaction and committed? The database system has two options.
First, it could assume that the toplevel transaction may have relied
on the results of the aborted subtransaction. But if it does that,
then any serialization failure which afflicts a subtransaction must
necessarily also kill the main transaction, which seems pedantic and
unhelpful. If you'd wanted the toplevel transaction to be killled,
you wouldn't have used a subtransaction, right? Second, it could
assume that the toplevel transaction in no way relied on or used the
values obtained from the aborted subtransaction. However, that
defeats the whole purpose of having subtransactions in the first
place. What's the point of being able to start subtransactions if you
can't roll them back and then decide to do something else instead? It
seems to me that what the database system should do is make that
second assumption, and what the user should do is understand that to
the degree that transactions depend on the results of aborted
subtransactions, there may be serialization anomalies that go
undetected.

Actually, Ian's sample transaction is even more malicious, because the
problem is not caused by reads within the aborted subtransaction -- the
cached-in-app reads occured *before* the subtransaction started in the
first place. I think saving a count(*) across a possibly-failed
insertion on the same table is wrong. I can't blame the database for
the behavior.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Robert Haas (#16)
Re: [OSSTEST PATCH 0/1] PostgreSQL db: Retry on constraint violation [and 2 more messages]

On Wed, Dec 14, 2016 at 12:44 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Dec 13, 2016 at 1:00 PM, Ian Jackson <ian.jackson@eu.citrix.com> wrote:

Saying that a set of transactions are serializable is equivalent to
the statement that there is some serial order of execution which would
have produced results equivalent to the actual results. That is,
there must be at least one schedule (T1, ..., Tn) such that running
the transactions one after another in that order would have produced
the same results that were obtained by running them concurrently.
Any transactions that roll back whether due to serialization anomalies
or manual user intervention or any other reason are not part of the
schedule, which considers only successfully committed transactions.
The rolled-back transactions effectively didn't happen, and
serializability as a concept makes no claim about the behavior of such
transactions. That's not a PostgreSQL thing: that's database theory.

Right. That said, with S2PL's reliance on blocking, it can provide
certain guarantees that are not required by the SQL standard nor by
normal definitions of serializable transactions.
(1) The effective order of execution will match commit order of
successfully committed transactions. Both the SQL standard and the
most definitions of serializable transactions merely require that
there is *some* order of transactions which provides the same
effects as having run the transactions one at a time in that order.
(2) Even transactions terminated to resolve deadlocks never show
serialization anomalies before being canceled.

S2PL was the most common technique for implementing serializable
transactions for a long time, and some have come to think that its
guarantees should always be provided. The problem is that in most
common workloads S2PL performs far worse than some other
techniques, including the SSI technique used by PostgreSQL, even
when the need to discard results from failed transactions is
considered. Essentially, the position Ian has been taking is that
PostgreSQL should provide the guarantee of (2) above. As far as I
can see, that would require using S2PL -- something the community
ripped out of PostgreSQL because of its horrible performance and
has refused to consider restoring (for good reason, IMO).

However, in practice, the scenario that you mention should generally
work fine in PostgreSQL, I think. If T performed any writes, the
rollback throws them away, so imagine removing the actual T from the
mix and replacing it with a substitute version T' which performs the
same reads but no writes and then tries to COMMIT where T tried to
ROLLBACK. T' will succeed, because we never roll back a read-only
transaction at commit time.

Huh, I had to think about that for a minute, but you are right
about never rolling back a read-only transaction at commit time
(except possibly for a prepared transaction -- I would have to look
at that more closely). The reason is that to have a serialization
failure under SSI we must have a "dangerous structure" (of a pivot
(Tpivot) with both a rw-dependency in (Tin) and a rw-dependency out
(Tout)) and Tout must have committed (and committed first). If
Tpivot is still active, we will cancel it, because Tin, if it were
to be canceled and perform an immediate retry, could hit the exact
same problem, and be canceled again based on conflicts with the
exact same other transactions. We don't want to burn resources
repeating transactions with no progress made. So the only time the
read-only transaction can be canceled is when Tout and Tpivot are
running concurrently, Tout commits, Tpivot develops a rw-dependency
to Tout (either before or after the commit of Tout), Tin starts
(after the commit of Tout), Tpivot commits before Tin develops a
rw-dependency with it (otherwise Tpivot would be the one canceled),
and then Tin develops a rw-dependency to Tpivot. That dependency
should be recognized when it is developed, causing Tin to be
canceled -- so the commit of Tin should never get the serialization
failure.

If it were to fail, it would have to fail *while performing one
of the reads*, not later.

Yeah, that's the concise statement of what my lengthy explanation
above proves. ;-)

But imagine a hypothetical database system in which anomalies are
never detected until commit time. We somehow track the global
must-happen-before graph and refuse the commit of any transaction
which will create a cycle.

You have just described optimistic concurrency control (OCC), which
has been used, although not in any really popular DBMS systems I
can think of. It certainly has enjoyed a resurgence in some ORMs,
though -- implemented outside the DBMS.

Let's also suppose that this system uses
snapshots to implement MVCC. In such a system, read-only transactions
will sometimes fail at commit time if they've seen a view of the world
that is inconsistent with the only remaining possible serial
schedules. For example, suppose T1 updates A -> A' and reads B.
Concurrently, T2 updates B -> B'; thus, T1 must precede T2. Next, T2
commits.

In this, T1 is Tpivot and T2 is Tout from my above discussion.

Now, T3 begins and reads B', so that T2 must precede T3.
Next T1 commits. T3, which took its snapshot before the commit of T1,
now reads A. Thus T3 has to proceed T1. That's a cycle, so T3 won't
be allowed to commit,

Yes, all the conditions described above are matched here.

but yet it's done a couple of reads and hasn't
failed yet... because of an implementation detail of the system.
That's probably annoying from a user perspective -- if a transaction
is certainly going to fail we'd like to report the failure as early as
possible -- and it's probably crushingly slow, but according to my
understanding it's unarguably a correct implementation of
serializability (assuming there are no bugs), yet it doesn't deliver
the guarantee you're asking for here.

100% accurate.

I have not read any database literature on the interaction of
serializability with subtransactions. This seems very thorny.
Suppose T1 reads A and B and updates A -> A' while concurrently T2
reads A and B and updates B -> B'. This is obviously not
serializable;

Right, this pattern is generally described in the literature as
simple write skew.

if either transaction had executed before the other in a
serial schedule, the second transaction in the schedule would have had
to have seen (A, B') or (A', B) rather than (A, B), but that's not
what happened. But what if each of T1 and T2 did the reads in a
subtransaction, rolled it back, and then did the write in the main
transaction and committed? The database system has two options.
First, it could assume that the toplevel transaction may have relied
on the results of the aborted subtransaction.

This is what we do in our implementation of SSI. Predicate locks
from reads within subtransactions are not discarded, even if the
work of the subtransaction is otherwise discarded.

But if it does that,
then any serialization failure which afflicts a subtransaction must
necessarily also kill the main transaction, which seems pedantic and
unhelpful. If you'd wanted the toplevel transaction to be killled,
you wouldn't have used a subtransaction, right?

We reasoned that the results of reads within the subtransaction
could have influenced the decision on whether to roll back the
subtransaction, so release of the predicate locks could easily lead
to non-repeatable results.

Second, it could
assume that the toplevel transaction in no way relied on or used the
values obtained from the aborted subtransaction. However, that
defeats the whole purpose of having subtransactions in the first
place. What's the point of being able to start subtransactions if you
can't roll them back and then decide to do something else instead? It
seems to me that what the database system should do is make that
second assumption, and what the user should do is understand that to
the degree that transactions depend on the results of aborted
subtransactions, there may be serialization anomalies that go
undetected.

Well, we opted for trying to prevent the anomalies even in the face
of subtransactions which rolled back. The requirements within SSI
to cause a serialization failure (dangerous structure, commit
timings, etc.) are enough to allow useful subtransactions, I think.

And the user should put up with that because the cure is
worse than the disease.

Not as we saw it during development of the serializable transaction
feature, and I have not yet heard any complaints about that choice.

Maybe there's a more formally satisfying
answer than that but I'm not really seeing it...

It would take some actual field complaints about current behavior,
with reasonable use cases, to persuade me that your answer is
better than what has been implemented and out in the field for five
years. The subtransactions are only causing a problem in *this*
case because the exception that is being ignored is coming from a
declarative constraint. We relied on previous research showing
that apparent opportunities for serialization anomalies were often
prevented by declarative constraints. With the difficulty in
acquiring the necessary predicate locks within the code
implementing the constraints, we felt we could not do that reliably
and still make the 9.1 release with the overall feature. We
thought that the only down side was that that SQLSTATE codes from
the constraints could be thrown when the violation was caused by
the actions of a concurrent transaction -- meaning that a
serialization failure would be better in terms of knowing when to
retry a transaction rather than throwing an error in the face of an
end user. This was always something we wanted to come back and
improve in a later release; but nobody has been willing to fund any
of the many improvements we know could be made to the serializable
transaction implementation, so it has so far remained in this
state.

What we missed is that, while we took action to try to ensure that
a serialization failure could not be discarded, we didn't consider
that a constraint violation exception which was preventing an
anomaly *could* be discarded. Effectively, this has escalated
detection of serialization failures involving reads which are part
of enforcing declarative constraints from the level of a feature
request to cure a significant annoyance for those using them, to a
bug fix necessary to prevent serialization anomalies.

Fortunately, Thomas Munro took an interest in the problem as it
related to duplicates on primary keys, unique constraints, and
unique indexes, and put forward a patch that cured the defect in
the common cases, and provided an easy workaround for the one case
he was unable to fix in that initial patch. To finish the work for
these constraints and indexes, I think we need to add predicate
locking while descending to the insertion point during the check
for an existing duplicate.

I'm not sure about foreign key constraints and exclusion
constraints. I have neither seen a failure related to either of
these, nor proven that there cannot be one. Without having
assessed the scope of the problems (if any) in those constraints,
it's hard to say what needs to be done or how much work it is.

It also wouldn't hurt to double-check that the serializable
transaction's "doomed" flag is set and checked everywhere it should
be; but I have no reason to think it isn't.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19Robert Haas
robertmhaas@gmail.com
In reply to: Alvaro Herrera (#17)
Re: [OSSTEST PATCH 0/1] PostgreSQL db: Retry on constraint violation [and 2 more messages]

On Wed, Dec 14, 2016 at 9:05 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

Robert Haas wrote:

I have not read any database literature on the interaction of
serializability with subtransactions. This seems very thorny.
Suppose T1 reads A and B and updates A -> A' while concurrently T2
reads A and B and updates B -> B'. This is obviously not
serializable; if either transaction had executed before the other in a
serial schedule, the second transaction in the schedule would have had
to have seen (A, B') or (A', B) rather than (A, B), but that's not
what happened. But what if each of T1 and T2 did the reads in a
subtransaction, rolled it back, and then did the write in the main
transaction and committed? The database system has two options.
First, it could assume that the toplevel transaction may have relied
on the results of the aborted subtransaction. But if it does that,
then any serialization failure which afflicts a subtransaction must
necessarily also kill the main transaction, which seems pedantic and
unhelpful. If you'd wanted the toplevel transaction to be killled,
you wouldn't have used a subtransaction, right? Second, it could
assume that the toplevel transaction in no way relied on or used the
values obtained from the aborted subtransaction. However, that
defeats the whole purpose of having subtransactions in the first
place. What's the point of being able to start subtransactions if you
can't roll them back and then decide to do something else instead? It
seems to me that what the database system should do is make that
second assumption, and what the user should do is understand that to
the degree that transactions depend on the results of aborted
subtransactions, there may be serialization anomalies that go
undetected.

Actually, Ian's sample transaction is even more malicious, because the
problem is not caused by reads within the aborted subtransaction -- the
cached-in-app reads occured *before* the subtransaction started in the
first place. I think saving a count(*) across a possibly-failed
insertion on the same table is wrong. I can't blame the database for
the behavior.

I don't see why it's wrong to cache a COUNT(*) across a
possibly-failed substransaction; instead, I would argue that the
problem is that by relying on the knowledge that the subtransaction
failed while inserting A as a justification for inserting B, it's
intrinsically doing something that it sensitive to concurrency. I
understand that Ian --- and Kevin, and Thomas Munro --- would like
that subtransaction to fail with serialization_failure rather than a
unique_violation, and I previously argued that the change was fine but
shouldn't be back-patched since it might break things for existing
users. But people are continuing to show up and say "hey, this is
broken", and now Kevin's chosen to back-patch, and I'm not going to
kick up a fuss about that. After all, there's a growing number of
people complaining about the same thing and saying it's a bug, so I
guess it's a bug!

But even after that fix, at the least, you'll still be able to
demonstrate the same problem by trapping serialization_failure rather
than unique_constraint. And even if you decree that trapping
serialization_failure is off the table, I bet there are other ways for
a supposedly-serializable transaction to sort of cheat, find out
enough information about what's going on in the system to adjust its
behavior, and then do something different based on that. And if
that's possible and if a transaction finds a way to do it, then it's
going to allow results not equivalent to any serial schedule. For
example, imagine a transaction that queries pg_stat_activity or
pg_locks and then makes decisions based on the contents thereof. That
transaction is determined to behave different under concurrency than
it does on an idle system, and even the ineluctable triumvirate of
Kevin Grittner, Dan Ports, and Michael Cahill will not be able to
prevent it from doing so. That's not a bug.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20Robert Haas
robertmhaas@gmail.com
In reply to: Kevin Grittner (#18)
Re: [OSSTEST PATCH 0/1] PostgreSQL db: Retry on constraint violation [and 2 more messages]

Thanks for the reply.

On Wed, Dec 14, 2016 at 9:26 AM, Kevin Grittner <kgrittn@gmail.com> wrote:

considered. Essentially, the position Ian has been taking is that
PostgreSQL should provide the guarantee of (2) above. As far as I
can see, that would require using S2PL -- something the community
ripped out of PostgreSQL because of its horrible performance and
has refused to consider restoring (for good reason, IMO).

I'm not sure Ian is intentionally taking that position. Not all of us
are as familiar with the ramifications of every serializability
behavior we may want as you are.

Huh, I had to think about that for a minute, but you are right
about never rolling back a read-only transaction at commit time ...

Yeah, I had to think about it for about an hour ... and look at the
code and README-SSI. So if you only had to think about it for a
minute, you win. :-)

if either transaction had executed before the other in a
serial schedule, the second transaction in the schedule would have had
to have seen (A, B') or (A', B) rather than (A, B), but that's not
what happened. But what if each of T1 and T2 did the reads in a
subtransaction, rolled it back, and then did the write in the main
transaction and committed? The database system has two options.
First, it could assume that the toplevel transaction may have relied
on the results of the aborted subtransaction.

This is what we do in our implementation of SSI. Predicate locks
from reads within subtransactions are not discarded, even if the
work of the subtransaction is otherwise discarded.

Oh, interesting. Just to be clear, I'm not lobbying to change that; I
was guessing (very late at night) what decision you probably made and
it seems I was incorrect. But doesn't that imply that if a read fails
in a subtransaction with serialization_error, the parent MUST also be
killed with serialization_error? If not, then I don't see how you can
escape having results that, overall, are not serializable.

What we missed is that, while we took action to try to ensure that
a serialization failure could not be discarded, we didn't consider
that a constraint violation exception which was preventing an
anomaly *could* be discarded. Effectively, this has escalated
detection of serialization failures involving reads which are part
of enforcing declarative constraints from the level of a feature
request to cure a significant annoyance for those using them, to a
bug fix necessary to prevent serialization anomalies.

Hmm. I see.

Fortunately, Thomas Munro took an interest in the problem as it
related to duplicates on primary keys, unique constraints, and
unique indexes, and put forward a patch that cured the defect in
the common cases, and provided an easy workaround for the one case
he was unable to fix in that initial patch. To finish the work for
these constraints and indexes, I think we need to add predicate
locking while descending to the insertion point during the check
for an existing duplicate.

I suggest adding something about this to README-SSI as a known issue.

I'm not sure about foreign key constraints and exclusion
constraints. I have neither seen a failure related to either of
these, nor proven that there cannot be one. Without having
assessed the scope of the problems (if any) in those constraints,
it's hard to say what needs to be done or how much work it is.

I think it's a natural result of implementing techniques from academic
research papers that there will sometimes be open research questions.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Robert Haas (#19)
#22Robert Haas
robertmhaas@gmail.com
In reply to: Kevin Grittner (#21)
#23Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Robert Haas (#20)
#24Ian Jackson
ian.jackson@eu.citrix.com
In reply to: Ian Jackson (#1)
#25Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Ian Jackson (#24)
#26Ian Jackson
ian.jackson@eu.citrix.com
In reply to: Kevin Grittner (#25)
#27Robert Haas
robertmhaas@gmail.com
In reply to: Ian Jackson (#24)
#28Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Ian Jackson (#26)
#29Ian Jackson
ian.jackson@eu.citrix.com
In reply to: Kevin Grittner (#28)
#30Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Ian Jackson (#29)
#31Ian Jackson
ian.jackson@eu.citrix.com
In reply to: Kevin Grittner (#30)
#32Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Ian Jackson (#31)
#33Robert Haas
robertmhaas@gmail.com
In reply to: Kevin Grittner (#30)
#34Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Robert Haas (#33)
#35Robert Haas
robertmhaas@gmail.com
In reply to: Kevin Grittner (#34)
#36Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Robert Haas (#35)