Documenting when to retry on serialization failure

Started by Simon Riggsabout 4 years ago17 messages
#1Simon Riggs
simon.riggs@enterprisedb.com

"Applications using this level must be prepared to retry transactions
due to serialization failures."
...
"When an application receives this error message, it should abort the
current transaction and retry the whole transaction from the
beginning."

I note that the specific error codes this applies to are not
documented, so lets discuss what the docs for that would look like.

I had a conversation with Kevin Grittner about retry some years back
and it seemed clear that the application should re-execute application
logic from the beginning, rather than just slavishly re-execute the
same SQL. But that is not documented either.

Is *automatic* retry possible? In all cases? None? Or maybe Some?

ISTM that we can't retry anything where a transaction has replied to a
user and then the user issued a subsequent SQL statement, since we
have no idea whether the subsequent SQL was influenced by the initial
reply.

But what about the case of a single statement transaction? Can we just
re-execute then? I guess if it didn't run anything other than
IMMUTABLE functions then it should be OK, assuming the inputs
themselves were immutable, which we've no way for the user to declare.
Could we allow a user-defined auto_retry parameter?

We don't mention that a transaction might just repeatedly fail either.

Anyway, know y'all would have some opinions on this. Happy to document
whatever we agree.

--
Simon Riggs http://www.EnterpriseDB.com/

#2Robert Haas
robertmhaas@gmail.com
In reply to: Simon Riggs (#1)
Re: Documenting when to retry on serialization failure

On Thu, Dec 9, 2021 at 7:43 AM Simon Riggs <simon.riggs@enterprisedb.com> wrote:

I had a conversation with Kevin Grittner about retry some years back
and it seemed clear that the application should re-execute application
logic from the beginning, rather than just slavishly re-execute the
same SQL. But that is not documented either.

Yeah, that would be good to mention somehow.

Is *automatic* retry possible? In all cases? None? Or maybe Some?

ISTM that we can't retry anything where a transaction has replied to a
user and then the user issued a subsequent SQL statement, since we
have no idea whether the subsequent SQL was influenced by the initial
reply.

I agree.

But what about the case of a single statement transaction? Can we just
re-execute then? I guess if it didn't run anything other than
IMMUTABLE functions then it should be OK, assuming the inputs
themselves were immutable, which we've no way for the user to declare.
Could we allow a user-defined auto_retry parameter?

I suppose in theory a user-defined parameter is possible, but I think
it's fundamentally better for this to be managed on the application
side. Even if the transaction is a single query, we don't know how
expensive that query is, and it's at least marginally possible that
the user might care about that. For example, if the user has set a
10-minute timeout someplace, and the query fails after 8 minutes, they
may want to retry. But if we retry automatically then they might hit
their timeout, or just be confused about why things are taking so
long. And they can always decide not to retry after all, but give up,
save it for a less busy period, or whatever.

We don't mention that a transaction might just repeatedly fail either.

True. I think that's another good argument against an auto-retry system.

The main thing that worries me about an auto-retry system is something
else: I think it would rarely be applicable, and people would try to
apply it to situations where it won't actually work properly. I
believe most users who need to retry transactions that fail due to
serialization problems will need some real application logic to make
sure that they do the right thing. People with single-statement
transactions that can be blindly retried probably aren't using higher
isolation levels anyway, and probably won't have many failures even if
they are. SSI is really for sophisticated applications, and I think
trying to make it "just work" for people with dumb applications will,
well, just not work.

--
Robert Haas
EDB: http://www.enterprisedb.com

#3Greg Stark
stark@mit.edu
In reply to: Robert Haas (#2)
Re: Documenting when to retry on serialization failure

Fwiw I think the real problem with automatic retries is that the SQL
interface doesn't lend itself to it because the server never really
knows if the command is going to be followed by a commit or more
commands.

I actually think if that problem were tackled it would very likely be
a highly appreciated option. Because I think there's a big overlap
between the set of users interested in higher isolation levels and the
set of users writing stored procedures defining their business logic.
They're both kind of "traditional" SQL engine approaches and both lend
themselves to the environment where you have a lot of programmers
working on a system and you're not able to do things like define
strict locking and update orderings.

So a lot of users are probably looking at something like "BEGIN;
SELECT create_customer_order(....); COMMIT" and wondering why the
server can't handle automatically retrying the query if they get an
isolation failure.

There are actually other reasons why providing the whole logic for the
transaction up front with a promise that it'll be the whole
transaction is attractive. E.g. vacuum could ignore a transaction if
it knows the transaction will never look at the table it's
processing... Or automatic deadlock testing tools could extract the
list of tables being accessed and suggest "lock table" commands to put
at the head of the transaction sorted in a canonical order.

These things may not be easy but they're currently impossible for the
same reasons automatically retrying is. The executor doesn't know what
subsequent commands will be coming after the current one and doesn't
know whether it has the whole transaction.

#4Thomas Munro
thomas.munro@gmail.com
In reply to: Simon Riggs (#1)
Re: Documenting when to retry on serialization failure

On Fri, Dec 10, 2021 at 1:43 AM Simon Riggs
<simon.riggs@enterprisedb.com> wrote:

"Applications using this level must be prepared to retry transactions
due to serialization failures."
...
"When an application receives this error message, it should abort the
current transaction and retry the whole transaction from the
beginning."

I note that the specific error codes this applies to are not
documented, so lets discuss what the docs for that would look like.

+1 for naming the error.

I had a conversation with Kevin Grittner about retry some years back
and it seemed clear that the application should re-execute application
logic from the beginning, rather than just slavishly re-execute the
same SQL. But that is not documented either.

Right, the result of the first statement could cause the application
to do something completely different the second time through. I
personally think the best way for applications to deal with this
problem (and at least also deadlock, serialisation failure's
pessimistic cousin) is to represent transactions as blocks of code
that can be automatically retried, however that looks in your client
language. It might be that you pass a
function/closure/whatever-you-call-it to the transaction management
code so it can rerun it if necessary, or that a function is decorated
in some way that some magic infrastructure understands, but that's a
little tricky to write about in a general enough way for our manual.
(A survey of how this looks with various different libraries and tools
might make a neat conference talk though.) But isn't that exactly
what that existing sentence "... from the beginning" is trying to say,
especially with the follow sentence ("The second time through...")?
Hhm, yeah, perhaps that next sentence could be clearer.

Is *automatic* retry possible? In all cases? None? Or maybe Some?

I'm aware of a couple of concrete cases that confound attempts to
retry automatically: sometimes we report a unique constraint
violation or an exclusion constraint failure, when we have the
information required to diagnose a serialisation anomaly. In those
cases, we really should figure out how to spit out 40001 (otherwise
what is general purpose auto retry code supposed to do with UCV?). We
fixed a single-index variant of this problem in commit fcff8a57. I
have an idea for how this might be fixed for the multi-index UCV[1]/messages/by-id/CAGPCyEZG76zjv7S31v_xPeLNRuzj-m=Y2GOY7PEzu7vhB=yQog@mail.gmail.com
and exclusion constraint[2]/messages/by-id/CAMTXbE-sq9JoihvG-ccC70jpjMr+DWmnYUj+VdnFRFSRuaaLZQ@mail.gmail.com variants of the problem, but haven't
actually tried yet.

If there are other things that stand in the way of reliable automated
retry (= a list of error codes a client library could look for) then
I'd love to have a list of them.

But what about the case of a single statement transaction? Can we just
re-execute then? I guess if it didn't run anything other than
IMMUTABLE functions then it should be OK, assuming the inputs
themselves were immutable, which we've no way for the user to declare.
Could we allow a user-defined auto_retry parameter?

I've wondered about that too, but so far it didn't seem worth the
effort, since application developers need another solution for
multi-statement retry anyway.

We don't mention that a transaction might just repeatedly fail either.

According to the VLDB paper, the "safe retry" property (§ 5.4) means
that a retry won't abort for the same reason (due to a cycle with the
same set of other transactions as your last attempt), unless prepared
transactions are involved (§ 7.1). This means that the whole system
continues to make some kind of progress in the absence of 2PC, though
of course your transaction might or might not fail because of a cycle
with some other set of transactions. Maybe that is too technical for
our manual, which already provides the link to that paper, but it's
interesting to note that you can suffer from a stuck busy-work loop
until conflicting prepared xacts go away, with a naive
automatic-retry-forever system.

[1]: /messages/by-id/CAGPCyEZG76zjv7S31v_xPeLNRuzj-m=Y2GOY7PEzu7vhB=yQog@mail.gmail.com
[2]: /messages/by-id/CAMTXbE-sq9JoihvG-ccC70jpjMr+DWmnYUj+VdnFRFSRuaaLZQ@mail.gmail.com

#5Simon Riggs
simon.riggs@enterprisedb.com
In reply to: Thomas Munro (#4)
1 attachment(s)
Re: Documenting when to retry on serialization failure

On Wed, 29 Dec 2021 at 03:30, Thomas Munro <thomas.munro@gmail.com> wrote:

On Fri, Dec 10, 2021 at 1:43 AM Simon Riggs
<simon.riggs@enterprisedb.com> wrote:

"Applications using this level must be prepared to retry transactions
due to serialization failures."
...
"When an application receives this error message, it should abort the
current transaction and retry the whole transaction from the
beginning."

I note that the specific error codes this applies to are not
documented, so lets discuss what the docs for that would look like.

+1 for naming the error.

I've tried to sum up the various points from everybody into this doc
patch. Thanks all for replies.

--
Simon Riggs http://www.EnterpriseDB.com/

Attachments:

retryable_error_docs.v1.patchapplication/octet-stream; name=retryable_error_docs.v1.patchDownload
diff --git a/doc/src/sgml/mvcc.sgml b/doc/src/sgml/mvcc.sgml
index a1ae842341..0ab39cc73d 100644
--- a/doc/src/sgml/mvcc.sgml
+++ b/doc/src/sgml/mvcc.sgml
@@ -619,7 +619,7 @@ ERROR:  could not serialize access due to concurrent update
     applications using this level must
     be prepared to retry transactions due to serialization failures.
     In fact, this isolation level works exactly the same as Repeatable
-    Read except that it monitors for conditions which could make
+    Read except that it also monitors for conditions which could make
     execution of a concurrent set of serializable transactions behave
     in a manner inconsistent with all possible serial (one at a time)
     executions of those transactions.  This monitoring does not
@@ -1751,6 +1751,54 @@ SELECT pg_advisory_lock(q.id) FROM
    </sect2>
   </sect1>
 
+  <sect1 id="xact-retryable-error-handling">
+   <title>Retryable Error Handling</title>
+
+   <indexterm>
+    <primary>retryable error</primary>
+   </indexterm>
+
+   <para>
+    Both Repeatable Read and Serializable isolation levels can produce errors
+    that are designed to avoid serialization anomalies. As previously advised,
+    applications using this level must be prepared to retry transactions that
+    fail due to serialization failures. The exact error message will vary
+    according to the precise circumstances of the error, should always return
+    with an SQLSTATE value of '40001', with a condition name of
+    'serialization_failure'.
+   </para>
+
+   <para>
+    Currently, if a table has multiple unique
+    indexes defined, <productname>PostgreSQL</productname> can return an
+    SQLSTATE value of '23505', with a condition name of 'unique_violation',
+    though this may be change in later releases.
+   </para>
+
+   <para>
+    Should a deadlock condition occur an SQLSTATE value of '40P01' will be set,
+    with a condition name of 'deadlock_detected'.
+   </para>
+
+   <para>
+    For both SQLSTATE '40001' and '40P01', the transaction has definitely
+    rolled back and can be retried. It is important to retry the complete
+    transaction, including all logic that decided whch SQL and/or whch values
+    to use for transactions. As a result, <productname>PostgreSQL</productname>
+    does not offer an automatic retry facility, since it cannot do so with
+    any guarantee of safety, in the general case.
+   </para>
+
+   <para>
+    Transaction retry does not guarantee that the retried transaction will
+    complete. In cases with very high contention, it is possible that eventual
+    completion of a transaction may take many attempts. In cases involving
+    a prepared transaction, it may not be possible to make progress until the
+    conflicting prepared transaction commits or rolls back.
+   </para>
+
+  </sect1>
+
   <sect1 id="mvcc-caveats">
    <title>Caveats</title>
 
#6Simon Riggs
simon.riggs@enterprisedb.com
In reply to: Greg Stark (#3)
Re: Documenting when to retry on serialization failure

On Thu, 16 Dec 2021 at 06:05, Greg Stark <stark@mit.edu> wrote:

So a lot of users are probably looking at something like "BEGIN;
SELECT create_customer_order(....); COMMIT" and wondering why the
server can't handle automatically retrying the query if they get an
isolation failure.

I agree with you that it would be desirable to retry for the simple
case of an autocommit/single statement transaction run with
default_transaction_isolation = 'serializability'.

The most important question before we take further action is whether
this would be correct to do so, in all cases.

Some problem cases would help us decide either way.

--
Simon Riggs http://www.EnterpriseDB.com/

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#5)
1 attachment(s)
Re: Documenting when to retry on serialization failure

Simon Riggs <simon.riggs@enterprisedb.com> writes:

I've tried to sum up the various points from everybody into this doc
patch. Thanks all for replies.

This seemed rather badly in need of copy-editing. How do you
like the attached text?

regards, tom lane

Attachments:

retryable_error_docs.v2.patchtext/x-diff; charset=us-ascii; name=retryable_error_docs.v2.patchDownload
diff --git a/doc/src/sgml/mvcc.sgml b/doc/src/sgml/mvcc.sgml
index da07f3f6c6..cd659dd994 100644
--- a/doc/src/sgml/mvcc.sgml
+++ b/doc/src/sgml/mvcc.sgml
@@ -588,7 +588,7 @@ ERROR:  could not serialize access due to concurrent update
     applications using this level must
     be prepared to retry transactions due to serialization failures.
     In fact, this isolation level works exactly the same as Repeatable
-    Read except that it monitors for conditions which could make
+    Read except that it also monitors for conditions which could make
     execution of a concurrent set of serializable transactions behave
     in a manner inconsistent with all possible serial (one at a time)
     executions of those transactions.  This monitoring does not
@@ -1720,6 +1720,60 @@ SELECT pg_advisory_lock(q.id) FROM
    </sect2>
   </sect1>
 
+  <sect1 id="mvcc-serialization-failure-handling">
+   <title>Serialization Failure Handling</title>
+
+   <indexterm>
+    <primary>serialization failure</primary>
+   </indexterm>
+   <indexterm>
+    <primary>retryable error</primary>
+   </indexterm>
+
+   <para>
+    Both Repeatable Read and Serializable isolation levels can produce
+    errors that are designed to prevent serialization anomalies.  As
+    previously stated, applications using these levels must be prepared to
+    retry transactions that fail due to serialization errors.  Such an
+    error's message text will vary according to the precise circumstances,
+    but it will always have the SQLSTATE code <literal>40001</literal>
+    (<literal>serialization_failure</literal>).
+   </para>
+
+   <para>
+    It may also be advisable to retry deadlock failures.
+    These have the SQLSTATE code <literal>40P01</literal>
+    (<literal>deadlock_detected</literal>).
+   </para>
+
+   <para>
+    In some circumstances, a failure that is arguably a serialization
+    problem may manifest as a unique-key failure, with SQLSTATE
+    code <literal>23505</literal> (<literal>unique_violation</literal>),
+    or as an exclusion constraint failure, with SQLSTATE
+    code <literal>23P01</literal> (<literal>exclusion_violation</literal>).
+    Therefore, retrying these cases may also be advisable, although one must
+    be careful that such an error could be persistent.
+   </para>
+
+   <para>
+    It is important to retry the complete transaction, including all logic
+    that decides which SQL to issue and/or which values to use.
+    Therefore, <productname>PostgreSQL</productname> does not offer an
+    automatic retry facility, since it cannot do so with any guarantee of
+    correctness.
+   </para>
+
+   <para>
+    Transaction retry does not guarantee that the retried transaction will
+    complete; multiple retries may be needed.  In cases with very high
+    contention, it is possible that completion of a transaction may take
+    many attempts.  In cases involving a conflicting prepared transaction,
+    it may not be possible to make progress until the prepared transaction
+    commits or rolls back.
+   </para>
+  </sect1>
+
   <sect1 id="mvcc-caveats">
    <title>Caveats</title>
 
#8Simon Riggs
simon.riggs@enterprisedb.com
In reply to: Tom Lane (#7)
Re: Documenting when to retry on serialization failure

On Wed, 23 Mar 2022 at 19:50, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Simon Riggs <simon.riggs@enterprisedb.com> writes:

I've tried to sum up the various points from everybody into this doc
patch. Thanks all for replies.

This seemed rather badly in need of copy-editing. How do you
like the attached text?

Seems clear and does the job.

The unique violation thing is worryingly general. Do we know enough to
say that this is thought to occur only with a) multiple unique
constraints, b) exclusion constraints?

--
Simon Riggs http://www.EnterpriseDB.com/

#9Thomas Munro
thomas.munro@gmail.com
In reply to: Simon Riggs (#8)
Re: Documenting when to retry on serialization failure

On Thu, Mar 24, 2022 at 11:44 PM Simon Riggs
<simon.riggs@enterprisedb.com> wrote:

The unique violation thing is worryingly general. Do we know enough to
say that this is thought to occur only with a) multiple unique
constraints, b) exclusion constraints?

I'm aware of 3 cases. The two you mentioned, which I think we can fix
(as described in the threads I posted upthread), and then there is a
third case that I'm still confused about, in the last line of
read-write-unique-4.spec.

#10Simon Riggs
simon.riggs@enterprisedb.com
In reply to: Thomas Munro (#9)
2 attachment(s)
Re: Documenting when to retry on serialization failure

On Thu, 24 Mar 2022 at 11:01, Thomas Munro <thomas.munro@gmail.com> wrote:

On Thu, Mar 24, 2022 at 11:44 PM Simon Riggs
<simon.riggs@enterprisedb.com> wrote:

The unique violation thing is worryingly general. Do we know enough to
say that this is thought to occur only with a) multiple unique
constraints, b) exclusion constraints?

I'm aware of 3 cases. The two you mentioned, which I think we can fix
(as described in the threads I posted upthread), and then there is a
third case that I'm still confused about, in the last line of
read-write-unique-4.spec.

I don't see any confusion - it is clearly a serialization error. What
is more, I see this as a confusing bug that we should fix.

If we were updating the row rather than inserting it, we would get
"ERROR: could not serialize access due to concurrent update", as
documented. The type of command shouldn't affect whether it is a
serialization error or not. (Attached patch proves it does throw
serializable error for UPDATE).

Solving this requires us to alter the Index API to pass down a
snapshot to allow us to test whether the concurrent insert is visible
or not. The test is shown in the attached patch, but this doesn't
attempt the major task of tweaking the APIs to allow this check to be
made.

--
Simon Riggs http://www.EnterpriseDB.com/

Attachments:

partial_patch_for_read-write-unique-4_serialization_error.v1.patchapplication/octet-stream; name=partial_patch_for_read-write-unique-4_serialization_error.v1.patchDownload
diff --git a/src/backend/access/nbtree/nbtinsert.c b/src/backend/access/nbtree/nbtinsert.c
index 68628ec000..412c9dc318 100644
--- a/src/backend/access/nbtree/nbtinsert.c
+++ b/src/backend/access/nbtree/nbtinsert.c
@@ -224,6 +224,25 @@ search:
 			else
 				XactLockTableWait(xwait, rel, &itup->t_tid, XLTW_InsertIndex);
 
+			/*
+			 * Similar check to ExecCheckTupleVisible()
+			 */
+			if (IsolationUsesXactSnapshot() &&
+				TransactionDidCommit(xwait) &&
+				!XidInMVCCSnapshot(xwait, check_snapshot))
+			{
+				/*
+				 * We should not raise a serialization failure if the conflict is
+				 * against a tuple inserted by our own transaction, even if it's not
+				 * visible to our snapshot.  (This would happen, for example, if
+				 * conflicting keys are proposed for insertion in a single command.)
+				 */
+				if (!TransactionIdIsCurrentTransactionId(xmin))
+					ereport(ERROR,
+							(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+							 errmsg("could not serialize access due to concurrent update")));
+			}
+
 			/* start over... */
 			if (stack)
 				_bt_freestack(stack);
isotest_read-write-unique-5.v1.patchapplication/octet-stream; name=isotest_read-write-unique-5.v1.patchDownload
commit ebbb52dd62ed7796cb3f0a2a33fbe6d33de3150a
Author: Simon Riggs <simon.riggs@enterprisedb.com>
Date:   Thu Mar 24 12:08:25 2022 +0000

    Test concurrent UPDATE version of read-write-unique-4.spec

diff --git a/src/test/isolation/expected/read-write-unique-5.out b/src/test/isolation/expected/read-write-unique-5.out
new file mode 100644
index 0000000000..179a17bd19
--- /dev/null
+++ b/src/test/isolation/expected/read-write-unique-5.out
@@ -0,0 +1,49 @@
+Parsed test spec with 2 sessions
+
+starting permutation: r1 r2 w1 w2 c1 c2
+step r1: SELECT COALESCE(MAX(invoice_number) + 1, 1) FROM invoice WHERE year = 2016;
+coalesce
+--------
+       3
+(1 row)
+
+step r2: SELECT COALESCE(MAX(invoice_number) + 1, 1) FROM invoice WHERE year = 2016;
+coalesce
+--------
+       3
+(1 row)
+
+step w1: UPDATE invoice SET invoice_number = 3 WHERE year = 2016;
+step w2: UPDATE invoice SET invoice_number = 3 WHERE year = 2016; <waiting ...>
+step c1: COMMIT;
+step w2: <... completed>
+ERROR:  could not serialize access due to concurrent update
+step c2: COMMIT;
+
+starting permutation: r1 w1 w2 c1 c2
+step r1: SELECT COALESCE(MAX(invoice_number) + 1, 1) FROM invoice WHERE year = 2016;
+coalesce
+--------
+       3
+(1 row)
+
+step w1: UPDATE invoice SET invoice_number = 3 WHERE year = 2016;
+step w2: UPDATE invoice SET invoice_number = 3 WHERE year = 2016; <waiting ...>
+step c1: COMMIT;
+step w2: <... completed>
+ERROR:  could not serialize access due to concurrent update
+step c2: COMMIT;
+
+starting permutation: r2 w1 w2 c1 c2
+step r2: SELECT COALESCE(MAX(invoice_number) + 1, 1) FROM invoice WHERE year = 2016;
+coalesce
+--------
+       3
+(1 row)
+
+step w1: UPDATE invoice SET invoice_number = 3 WHERE year = 2016;
+step w2: UPDATE invoice SET invoice_number = 3 WHERE year = 2016; <waiting ...>
+step c1: COMMIT;
+step w2: <... completed>
+ERROR:  could not serialize access due to concurrent update
+step c2: COMMIT;
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index 8e87098150..04d9e87021 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -5,6 +5,7 @@ test: read-write-unique
 test: read-write-unique-2
 test: read-write-unique-3
 test: read-write-unique-4
+test: read-write-unique-5
 test: simple-write-skew
 test: receipt-report
 test: temporal-range-integrity
diff --git a/src/test/isolation/specs/read-write-unique-5.spec b/src/test/isolation/specs/read-write-unique-5.spec
new file mode 100644
index 0000000000..e34e93ad98
--- /dev/null
+++ b/src/test/isolation/specs/read-write-unique-5.spec
@@ -0,0 +1,48 @@
+# Read-write-unique test.
+# Implementing a gapless sequence of ID numbers for each year.
+
+setup
+{
+  CREATE TABLE invoice (
+    year int,
+    invoice_number int,
+    PRIMARY KEY (year)
+  );
+
+  INSERT INTO invoice VALUES (2016, 2);
+}
+
+teardown
+{
+  DROP TABLE invoice;
+}
+
+session s1
+setup { BEGIN ISOLATION LEVEL SERIALIZABLE; }
+step r1 { SELECT COALESCE(MAX(invoice_number) + 1, 1) FROM invoice WHERE year = 2016; }
+step w1 { UPDATE invoice SET invoice_number = 3 WHERE year = 2016; }
+step c1 { COMMIT; }
+
+session s2
+setup { BEGIN ISOLATION LEVEL SERIALIZABLE; }
+step r2 { SELECT COALESCE(MAX(invoice_number) + 1, 1) FROM invoice WHERE year = 2016; }
+step w2 { UPDATE invoice SET invoice_number = 3 WHERE year = 2016; }
+step c2 { COMMIT; }
+
+# if they both read first then there should be an SSI conflict
+permutation r1 r2 w1 w2 c1 c2
+
+# cases where one session doesn't explicitly read before writing:
+
+# if s2 doesn't explicitly read, then trying to insert the value
+# generates a unique constraint violation after s1 commits, as if s2
+# ran after s1
+permutation r1 w1 w2 c1 c2
+
+# if s1 doesn't explicitly read, but s2 does, then s1 inserts and
+# commits first, should s2 experience an SSI failure instead of a
+# unique constraint violation?  there is no serial order of operations
+# (s1, s2) or (s2, s1) where s1 succeeds, and s2 doesn't see the row
+# in an explicit select but then fails to insert due to unique
+# constraint violation
+permutation r2 w1 w2 c1 c2
#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Thomas Munro (#9)
Re: Documenting when to retry on serialization failure

Thomas Munro <thomas.munro@gmail.com> writes:

On Thu, Mar 24, 2022 at 11:44 PM Simon Riggs
<simon.riggs@enterprisedb.com> wrote:

The unique violation thing is worryingly general. Do we know enough to
say that this is thought to occur only with a) multiple unique
constraints, b) exclusion constraints?

I'm aware of 3 cases. The two you mentioned, which I think we can fix
(as described in the threads I posted upthread), and then there is a
third case that I'm still confused about, in the last line of
read-write-unique-4.spec.

That test is modeling the case where the application does an INSERT
with values based on some data it read earlier. There is no way for
the server to know that there's any connection, so I think if you
try to throw a serialization error rather than a uniqueness error,
you're basically lying to the client by claiming something you do not
know to be true. And the lie is not without consequences: if the
application believes it, it might iterate forever vainly trying to
commit a transaction that will never succeed.

regards, tom lane

#12Simon Riggs
simon.riggs@enterprisedb.com
In reply to: Tom Lane (#11)
Re: Documenting when to retry on serialization failure

On Thu, 24 Mar 2022 at 14:05, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Thomas Munro <thomas.munro@gmail.com> writes:

On Thu, Mar 24, 2022 at 11:44 PM Simon Riggs
<simon.riggs@enterprisedb.com> wrote:

The unique violation thing is worryingly general. Do we know enough to
say that this is thought to occur only with a) multiple unique
constraints, b) exclusion constraints?

I'm aware of 3 cases. The two you mentioned, which I think we can fix
(as described in the threads I posted upthread), and then there is a
third case that I'm still confused about, in the last line of
read-write-unique-4.spec.

That test is modeling the case where the application does an INSERT
with values based on some data it read earlier. There is no way for
the server to know that there's any connection, so I think if you
try to throw a serialization error rather than a uniqueness error,
you're basically lying to the client by claiming something you do not
know to be true. And the lie is not without consequences: if the
application believes it, it might iterate forever vainly trying to
commit a transaction that will never succeed.

OK, I see what you mean. There are 2 types of transaction, one that
reads inside the transaction, one that decides what value to use some
other way.

So now we have 2 cases, both of which generate uniqueness violations,
but only one of which might succeed if retried. The patch does cover
this, I guess, by saying be careful, but I would be happier if we can
also add

"this is thought to occur only with multiple unique constraints and/or
an exclusion constraints"

--
Simon Riggs http://www.EnterpriseDB.com/

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#12)
Re: Documenting when to retry on serialization failure

Simon Riggs <simon.riggs@enterprisedb.com> writes:

OK, I see what you mean. There are 2 types of transaction, one that
reads inside the transaction, one that decides what value to use some
other way.

So now we have 2 cases, both of which generate uniqueness violations,
but only one of which might succeed if retried. The patch does cover
this, I guess, by saying be careful, but I would be happier if we can
also add

"this is thought to occur only with multiple unique constraints and/or
an exclusion constraints"

Um, what's that got to do with it? The example in
read-write-unique-4.spec involves only a single pkey constraint.

We could add something trying to explain that if the application inserts a
value into a constrained column based on data it read earlier, then any
resulting constraint violation might be effectively a serialization
failure.

regards, tom lane

#14Simon Riggs
simon.riggs@enterprisedb.com
In reply to: Tom Lane (#13)
Re: Documenting when to retry on serialization failure

On Thu, 24 Mar 2022 at 14:56, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Simon Riggs <simon.riggs@enterprisedb.com> writes:

OK, I see what you mean. There are 2 types of transaction, one that
reads inside the transaction, one that decides what value to use some
other way.

So now we have 2 cases, both of which generate uniqueness violations,
but only one of which might succeed if retried. The patch does cover
this, I guess, by saying be careful, but I would be happier if we can
also add

"this is thought to occur only with multiple unique constraints and/or
an exclusion constraints"

Um, what's that got to do with it? The example in
read-write-unique-4.spec involves only a single pkey constraint.

Yes, but as you explained, its not actually a serializable case, it
just looks a bit like one.

That means we are not currently aware of any case where the situation
is serializable but the error message is uniqueness violation, unless
we have 2 or more unique constraints and/or an exclusion constraint.

We could add something trying to explain that if the application inserts a
value into a constrained column based on data it read earlier, then any
resulting constraint violation might be effectively a serialization
failure.

We could do that as well.

--
Simon Riggs http://www.EnterpriseDB.com/

#15Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#14)
1 attachment(s)
Re: Documenting when to retry on serialization failure

Simon Riggs <simon.riggs@enterprisedb.com> writes:

On Thu, 24 Mar 2022 at 14:56, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Um, what's that got to do with it? The example in
read-write-unique-4.spec involves only a single pkey constraint.

Yes, but as you explained, its not actually a serializable case, it
just looks a bit like one.

That means we are not currently aware of any case where the situation
is serializable but the error message is uniqueness violation, unless
we have 2 or more unique constraints and/or an exclusion constraint.

Meh. I'm disinclined to document it at that level of detail, both
because it's subject to change and because we're not sure that that
list is exhaustive. I think a bit of handwaving is preferable.
How about the attached? (Only the third new para is different.)

regards, tom lane

Attachments:

retryable_error_docs.v3.patchtext/x-diff; charset=us-ascii; name=retryable_error_docs.v3.patchDownload
diff --git a/doc/src/sgml/mvcc.sgml b/doc/src/sgml/mvcc.sgml
index da07f3f6c6..176325247d 100644
--- a/doc/src/sgml/mvcc.sgml
+++ b/doc/src/sgml/mvcc.sgml
@@ -588,7 +588,7 @@ ERROR:  could not serialize access due to concurrent update
     applications using this level must
     be prepared to retry transactions due to serialization failures.
     In fact, this isolation level works exactly the same as Repeatable
-    Read except that it monitors for conditions which could make
+    Read except that it also monitors for conditions which could make
     execution of a concurrent set of serializable transactions behave
     in a manner inconsistent with all possible serial (one at a time)
     executions of those transactions.  This monitoring does not
@@ -1720,6 +1720,71 @@ SELECT pg_advisory_lock(q.id) FROM
    </sect2>
   </sect1>
 
+  <sect1 id="mvcc-serialization-failure-handling">
+   <title>Serialization Failure Handling</title>
+
+   <indexterm>
+    <primary>serialization failure</primary>
+   </indexterm>
+   <indexterm>
+    <primary>retryable error</primary>
+   </indexterm>
+
+   <para>
+    Both Repeatable Read and Serializable isolation levels can produce
+    errors that are designed to prevent serialization anomalies.  As
+    previously stated, applications using these levels must be prepared to
+    retry transactions that fail due to serialization errors.  Such an
+    error's message text will vary according to the precise circumstances,
+    but it will always have the SQLSTATE code <literal>40001</literal>
+    (<literal>serialization_failure</literal>).
+   </para>
+
+   <para>
+    It may also be advisable to retry deadlock failures.
+    These have the SQLSTATE code <literal>40P01</literal>
+    (<literal>deadlock_detected</literal>).
+   </para>
+
+   <para>
+    In some cases it is also appropriate to retry unique-key failures,
+    which have SQLSTATE code <literal>23505</literal>
+    (<literal>unique_violation</literal>), and exclusion constraint
+    failures, which have SQLSTATE code <literal>23P01</literal>
+    (<literal>exclusion_violation</literal>).  For example, if the
+    application selects a new value for a primary key column after
+    inspecting the currently stored keys, it could get a unique-key
+    failure because another application instance selected the same new key
+    concurrently.  This is effectively a serialization failure, but the
+    server will not detect it as such because it cannot <quote>see</quote>
+    the connection between the inserted value and the previous reads.
+    There are also some corner cases in which the server will issue a
+    unique-key or exclusion constraint error even though in principle it
+    has enough information to determine that a serialization problem
+    exists.  While it's recommendable to just
+    retry <literal>40001</literal> errors unconditionally, more care is
+    needed when retrying these other error codes, since they might
+    represent persistent error conditions rather than transient failures.
+   </para>
+
+   <para>
+    It is important to retry the complete transaction, including all logic
+    that decides which SQL to issue and/or which values to use.
+    Therefore, <productname>PostgreSQL</productname> does not offer an
+    automatic retry facility, since it cannot do so with any guarantee of
+    correctness.
+   </para>
+
+   <para>
+    Transaction retry does not guarantee that the retried transaction will
+    complete; multiple retries may be needed.  In cases with very high
+    contention, it is possible that completion of a transaction may take
+    many attempts.  In cases involving a conflicting prepared transaction,
+    it may not be possible to make progress until the prepared transaction
+    commits or rolls back.
+   </para>
+  </sect1>
+
   <sect1 id="mvcc-caveats">
    <title>Caveats</title>
 
#16Simon Riggs
simon.riggs@enterprisedb.com
In reply to: Tom Lane (#15)
Re: Documenting when to retry on serialization failure

On Thu, 24 Mar 2022 at 16:29, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Simon Riggs <simon.riggs@enterprisedb.com> writes:

On Thu, 24 Mar 2022 at 14:56, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Um, what's that got to do with it? The example in
read-write-unique-4.spec involves only a single pkey constraint.

Yes, but as you explained, its not actually a serializable case, it
just looks a bit like one.

That means we are not currently aware of any case where the situation
is serializable but the error message is uniqueness violation, unless
we have 2 or more unique constraints and/or an exclusion constraint.

Meh. I'm disinclined to document it at that level of detail, both
because it's subject to change and because we're not sure that that
list is exhaustive. I think a bit of handwaving is preferable.
How about the attached? (Only the third new para is different.)

It's much better, thanks.

--
Simon Riggs http://www.EnterpriseDB.com/

#17Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#16)
Re: Documenting when to retry on serialization failure

Simon Riggs <simon.riggs@enterprisedb.com> writes:

On Thu, 24 Mar 2022 at 16:29, Tom Lane <tgl@sss.pgh.pa.us> wrote:

How about the attached? (Only the third new para is different.)

It's much better, thanks.

Pushed then.

regards, tom lane