BUG #12330: ACID is broken for unique constraints

Started by Nikita Volkovover 11 years ago27 messageshackersbugs
Jump to latest
#1Nikita Volkov
nikita.y.volkov@mail.ru
hackersbugs

The following bug has been logged on the website:

Bug reference: 12330
Logged by: Nikita Volkov
Email address: nikita.y.volkov@mail.ru
PostgreSQL version: 9.2.4
Operating system: OSX 10.8.2
Description:

Executing concurrent transactions inserting the same value of a unique key
fails with the "duplicate key" error under code "23505" instead of any of
transaction conflict errors with a "40***" code.

E.g., having the following table:

CREATE TABLE "song_artist" (
"song_id" INT8 NOT NULL,
"artist_id" INT8 NOT NULL,
PRIMARY KEY ("song_id", "artist_id")
);

Even trying to protect from this with a select, won't help to get away from
the error, because at the beginning of the transaction the key does not
exist yet.

BEGIN ISOLATION LEVEL SERIALIZABLE READ WRITE;
INSERT INTO song_artist (song_id, artist_id)
SELECT 1, 2
WHERE NOT EXISTS (SELECT * FROM song_artist WHERE song_id=1 AND
artist_id=2);
COMMIT;

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#2Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Nikita Volkov (#1)
hackersbugs
Re: BUG #12330: ACID is broken for unique constraints

"nikita.y.volkov@mail.ru" <nikita.y.volkov@mail.ru> wrote:

Executing concurrent transactions inserting the same value of a
unique key fails with the "duplicate key" error under code
"23505" instead of any of transaction conflict errors with a
"40***" code.

This is true, and can certainly be inconvenient when using
serializable transactions to simplify handling of race conditions,
because you can't just test for a SQLSTATE of '40001' or '40P01' to
indicate the need to retry the transaction. You have two
reasonable ways to avoid duplicate keys if the values are synthetic
and automatically generated. One is to use a SEQUENCE object to
generate the values. The other (really only recommended if gaps in
the sequence are a problem) is to have the serializable transaction
update a row to "claim" the number.

Otherwise you need to consider errors related to duplicates as
possibly being caused by a concurrent transaction. You may want to
do one transaction retry in such cases, and fail if an identical
error is seen. Keep in mind that these errors do not allow
serialization anomalies to appear in the committed data, so are
arguably not violations of ACID principles -- more of a wart on the
otherwise clean technique of using serializable transactions to
simplify application programming under concurrency.

Thinking about it just now I think we might be able to generate a
write conflict instead of a duplicate key error for this case by
checking the visibility information for the duplicate row. It
might not even have a significant performance impact, since we need
to check visibility information to generate the duplicate key
error. That would still leave similar issues (where similar
arguments can be made) relating to foreign keys; but those can
largely be addressed already by declaring the constraints to be
DEFERRED -- and anyway, that would be a separate fix.

I'm moving this discussion to the -hackers list so that I can ask
other developers:

Are there any objections to generating a write conflict instead of
a duplicate key error if the duplicate key was added by a
concurrent transaction? Only for transactions at isolation level
REPEATABLE READ or higher?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Nikita Volkov (#1)
hackersbugs
Re: BUG #12330: ACID is broken for unique constraints

nikita.y.volkov@mail.ru writes:

Executing concurrent transactions inserting the same value of a unique key
fails with the "duplicate key" error under code "23505" instead of any of
transaction conflict errors with a "40***" code.

Sounds fine to me, in fact preferable to a 40XXX code.

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#4Marko Tiikkaja
marko@joh.to
In reply to: Kevin Grittner (#2)
hackersbugs
Re: BUG #12330: ACID is broken for unique constraints

On 2014-12-26 17:23, Kevin Grittner wrote:

Are there any objections to generating a write conflict instead of
a duplicate key error if the duplicate key was added by a
concurrent transaction? Only for transactions at isolation level
REPEATABLE READ or higher?

Is it possible to distinguish between an actual write conflict and a
completely unrelated UNIQUE constraint ending up violated for some reason?

.marko

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Kevin Grittner (#2)
hackersbugs
Re: BUG #12330: ACID is broken for unique constraints

Kevin Grittner <kgrittn@ymail.com> writes:

Are there any objections to generating a write conflict instead of
a duplicate key error if the duplicate key was added by a
concurrent transaction?

Yes. This will deliver a less meaningful error code, *and* break
existing code that is expecting the current behavior.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Tom Lane (#5)
hackersbugs
Re: BUG #12330: ACID is broken for unique constraints

Tom Lane <tgl@sss.pgh.pa.us> wrote:

Kevin Grittner <kgrittn@ymail.com> writes:

Are there any objections to generating a write conflict instead
of a duplicate key error if the duplicate key was added by a
concurrent transaction?

Yes. This will deliver a less meaningful error code,

That depends entirely on whether you care more about whether the
problem was created by a concurrent transaction or exactly how that
concurrent transaction created the problem. For those using
serializable transactions to manage concurrency the former is at
least an order of magnitude more important. This is not the first
time getting a constraint violation SQLSTATE for the actions of a
concurrent serializable transaction has been reported as a bug.
Going from memory, I think this might be about the fifth time users
have reported it as a bug or potential bug on these lists.

People using serializable transactions normally run all queries
through common code with will retry the transaction from the start
if there is a SQLSTATE starting with '40' (or perhaps picking out
the specific codes '40001' and '40P01'). Not doing so for *some*
types of errors generated by concurrent transactions reduces the
application's level of user-friendliness, and may introduce
surprising bugs. In this particular case the OP wants to do one
thing if a row with a paricular value for a unique index exists,
and something different if it doesn't. If we generate the write
conflict for the case that it is concurrently added, it can retry
the transaction and do one or the other; if we don't pay attention
to that, they need weird heuristics for "the third case". That
really is not more meaningful or useful.

*and* break existing code that is expecting the current behavior.

Possibly, but my experience is more that failure to behave the way
I suggest is biting people and causing them a lot of extra work and
pain. I would be fine with limiting the new behavior to
serializable transactions, since that seems to be where people want
this behavior. It would bring us closer to "the transaction will
run as though it were the only transaction running or roll back
with a serialization failure" without having to add caveats and
exceptions.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Kevin Grittner (#6)
hackersbugs
Re: BUG #12330: ACID is broken for unique constraints

Kevin Grittner <kgrittn@ymail.com> writes:

Tom Lane <tgl@sss.pgh.pa.us> wrote:

Yes. This will deliver a less meaningful error code,

That depends entirely on whether you care more about whether the
problem was created by a concurrent transaction or exactly how that
concurrent transaction created the problem.

Just for starters, a 40XXX error report will fail to provide the
duplicated key's value. This will be a functional regression,
on top of breaking existing code.

I think an appropriate response to these complaints is to fix the
documentation to point out that duplicate-key violations may also
be worthy of retries. (I sort of thought it did already, actually,
but I see no mention of the issue in chapter 13.)

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Tom Lane (#7)
hackersbugs
Re: BUG #12330: ACID is broken for unique constraints

Tom Lane <tgl@sss.pgh.pa.us> wrote:

Just for starters, a 40XXX error report will fail to provide the
duplicated key's value. This will be a functional regression,

Not if, as is normally the case, the transaction is retried from
the beginning on a serialization failure. Either the code will
check for a duplicate (as in the case of the OP on this thread) and
they won't see the error, *or* the the transaction which created
the duplicate key will have committed before the start of the retry
and you will get the duplicate key error.

I think an appropriate response to these complaints is to fix the
documentation to point out that duplicate-key violations may also
be worthy of retries.

but I see no mention of the issue in chapter 13.)

I agree that's the best we can do for stable branches, and worth
doing.

It would be interesting to hear from others who have rely on
serializable transactions in production environments about what
makes sense to them. This is probably the wrong list to find such
people directly; but I seem to recall Josh Berkus has a lot of
clients who do. Josh? Any opinion on this thread?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Nikita Volkov
nikita.y.volkov@mail.ru
In reply to: Kevin Grittner (#8)
hackersbugs
Re: BUG #12330: ACID is broken for unique constraints

I'll repost my (OP) case, for the references to it to make more sense to
the others.

Having the following table:

CREATE TABLE "song_artist" (
"song_id" INT8 NOT NULL,
"artist_id" INT8 NOT NULL,
PRIMARY KEY ("song_id", "artist_id")
);

Even trying to protect from this with a select, won't help to get away from
the error, because at the beginning of the transaction the key does not
exist yet.

BEGIN ISOLATION LEVEL SERIALIZABLE READ WRITE;
INSERT INTO song_artist (song_id, artist_id)
SELECT 1, 2
WHERE NOT EXISTS (SELECT * FROM song_artist WHERE song_id=1 AND
artist_id=2);
COMMIT;

2014-12-26 21:38 GMT+03:00 Kevin Grittner <kgrittn@ymail.com>:

Show quoted text

Tom Lane <tgl@sss.pgh.pa.us> wrote:

Just for starters, a 40XXX error report will fail to provide the
duplicated key's value. This will be a functional regression,

Not if, as is normally the case, the transaction is retried from
the beginning on a serialization failure. Either the code will
check for a duplicate (as in the case of the OP on this thread) and
they won't see the error, *or* the the transaction which created
the duplicate key will have committed before the start of the retry
and you will get the duplicate key error.

I think an appropriate response to these complaints is to fix the
documentation to point out that duplicate-key violations may also
be worthy of retries.

but I see no mention of the issue in chapter 13.)

I agree that's the best we can do for stable branches, and worth
doing.

It would be interesting to hear from others who have rely on
serializable transactions in production environments about what
makes sense to them. This is probably the wrong list to find such
people directly; but I seem to recall Josh Berkus has a lot of
clients who do. Josh? Any opinion on this thread?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In reply to: Kevin Grittner (#2)
hackersbugs
Re: BUG #12330: ACID is broken for unique constraints

On Fri, Dec 26, 2014 at 7:23 AM, Kevin Grittner <kgrittn@ymail.com> wrote:

Are there any objections to generating a write conflict instead of
a duplicate key error if the duplicate key was added by a
concurrent transaction? Only for transactions at isolation level
REPEATABLE READ or higher?

This independently occurred to me as perhaps preferable over a year
ago. The INSERT...ON CONFLICT IGNORE feature that my patch adds will
throw such an error for REPEATABLE READ and higher modes rather than
proceeding on the basis of having ignored something from a concurrent
transaction.

+1 from me for doing this in the master branch, if it isn't too
invasive (I think it might be; where is estate available from to work
with close to duplicate checking for B-Trees?). It only makes sense to
do what you propose if the users expects to check that there is no
duplicate ahead of time, and only ever fail with a serialization
failure (not a duplicate violation) from concurrent conflicts. That is
a bit narrow; the OP can only reasonably expect to not see a duplicate
violation on retry because he happens to be checking...most clients
won't, and will "waste" a retry.

The proposed ON CONFLICT IGNORE feature would do this checking for him
correctly, FWIW, but in typical cases there is no real point in
retrying the xact -- you'll just get another duplicate violation
error.

In any case I think exclusion violations should be covered, too.
--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Merlin Moncure
mmoncure@gmail.com
In reply to: Kevin Grittner (#8)
hackersbugs
Re: BUG #12330: ACID is broken for unique constraints

On Fri, Dec 26, 2014 at 12:38 PM, Kevin Grittner <kgrittn@ymail.com> wrote:

Tom Lane <tgl@sss.pgh.pa.us> wrote:

Just for starters, a 40XXX error report will fail to provide the
duplicated key's value. This will be a functional regression,

Not if, as is normally the case, the transaction is retried from
the beginning on a serialization failure. Either the code will
check for a duplicate (as in the case of the OP on this thread) and
they won't see the error, *or* the the transaction which created
the duplicate key will have committed before the start of the retry
and you will get the duplicate key error.

I'm not buying that; that argument assumes duplicate key errors are
always 'upsert' driven. Although OP's code may have checked for
duplicates it's perfectly reasonable (and in many cases preferable) to
force the transaction to fail and report the error directly back to
the application. The application will then switch on the error code
and decide what to do: retry for deadlock/serialization or abort for
data integrity error. IOW, the error handling semantics are
fundamentally different and should not be mixed.

merlin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Merlin Moncure (#11)
hackersbugs
Re: BUG #12330: ACID is broken for unique constraints

Merlin Moncure <mmoncure@gmail.com> wrote:

On Fri, Dec 26, 2014 at 12:38 PM, Kevin Grittner <kgrittn@ymail.com> wrote:

Tom Lane <tgl@sss.pgh.pa.us> wrote:

Just for starters, a 40XXX error report will fail to provide the
duplicated key's value.  This will be a functional regression,

Not if, as is normally the case, the transaction is retried from
the beginning on a serialization failure.  Either the code will
check for a duplicate (as in the case of the OP on this thread) and
they won't see the error, *or* the the transaction which created
the duplicate key will have committed before the start of the retry
and you will get the duplicate key error.

I'm not buying that; that argument assumes duplicate key errors are
always 'upsert' driven.  Although OP's code may have checked for
duplicates it's perfectly reasonable (and in many cases preferable) to
force the transaction to fail and report the error directly back to
the application.  The application will then switch on the error code
and decide what to do: retry for deadlock/serialization or abort for
data integrity error.  IOW, the error handling semantics are
fundamentally different and should not be mixed.

I think you might be agreeing with me without realizing it.  Right
now you get "duplicate key error" even if the duplication is caused
by a concurrent transaction -- it is not possible to check the
error code (well, SQLSTATE, technically) to determine whether this
is fundamentally a serialization problem.  What we're talking about
is returning the serialization failure return code for the cases
where it is a concurrent transaction causing the failure and
continuing to return the duplicate key error for all other cases.

Either I'm not understanding what you wrote above, or you seem to
be arguing for being able to distinguish between errors caused by
concurrent transactions and those which aren't.

--
Kevin GrittnerEDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#13Merlin Moncure
mmoncure@gmail.com
In reply to: Kevin Grittner (#12)
hackersbugs
Re: BUG #12330: ACID is broken for unique constraints

On Mon, Dec 29, 2014 at 8:03 AM, Kevin Grittner <kgrittn@ymail.com> wrote:

Merlin Moncure <mmoncure@gmail.com> wrote:

On Fri, Dec 26, 2014 at 12:38 PM, Kevin Grittner <kgrittn@ymail.com>
wrote:

Tom Lane <tgl@sss.pgh.pa.us> wrote:

Just for starters, a 40XXX error report will fail to provide the
duplicated key's value. This will be a functional regression,

Not if, as is normally the case, the transaction is retried from
the beginning on a serialization failure. Either the code will
check for a duplicate (as in the case of the OP on this thread) and
they won't see the error, *or* the the transaction which created
the duplicate key will have committed before the start of the retry
and you will get the duplicate key error.

I'm not buying that; that argument assumes duplicate key errors are
always 'upsert' driven. Although OP's code may have checked for
duplicates it's perfectly reasonable (and in many cases preferable) to
force the transaction to fail and report the error directly back to
the application. The application will then switch on the error code
and decide what to do: retry for deadlock/serialization or abort for
data integrity error. IOW, the error handling semantics are
fundamentally different and should not be mixed.

I think you might be agreeing with me without realizing it. Right
now you get "duplicate key error" even if the duplication is caused
by a concurrent transaction -- it is not possible to check the
error code (well, SQLSTATE, technically) to determine whether this
is fundamentally a serialization problem. What we're talking about
is returning the serialization failure return code for the cases
where it is a concurrent transaction causing the failure and
continuing to return the duplicate key error for all other cases.

Either I'm not understanding what you wrote above, or you seem to
be arguing for being able to distinguish between errors caused by
concurrent transactions and those which aren't.

Well, I'm arguing that duplicate key errors are not serialization
failures unless it's likely the insertion would succeed upon a retry;
a proper insert, not an upsert. If that's the case with what you're
proposing, then it makes sense to me. But that's not what it sounds
like...your language suggests AIUI that having the error simply be
caused by another transaction being concurrent would be sufficient to
switch to a serialization error (feel free to correct me if I'm
wrong!).

In other words, the current behavior is:
txn A,B begin
txn A inserts
txn B inserts over A, locks, waits
txn A commits. B aborts with duplicate key error

Assuming that case is untouched, then we're good! My long winded
point above is that case must fail with duplicate key error; a
serialization error is suggesting the transaction should be retried
and it shouldn't be...it would simply fail a second time.

merlin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Merlin Moncure (#13)
hackersbugs
Re: BUG #12330: ACID is broken for unique constraints

Merlin Moncure <mmoncure@gmail.com> wrote:

Well, I'm arguing that duplicate key errors are not serialization
failures unless it's likely the insertion would succeed upon a retry;
a proper insert, not an upsert. If that's the case with what you're
proposing, then it makes sense to me. But that's not what it sounds
like...your language suggests AIUI that having the error simply be
caused by another transaction being concurrent would be sufficient to
switch to a serialization error (feel free to correct me if I'm
wrong!).

In other words, the current behavior is:
txn A,B begin
txn A inserts
txn B inserts over A, locks, waits
txn A commits. B aborts with duplicate key error

Assuming that case is untouched, then we're good! My long winded
point above is that case must fail with duplicate key error; a
serialization error is suggesting the transaction should be retried
and it shouldn't be...it would simply fail a second time.

What I'm proposing is that for serializable transactions B would
get a serialization failure; otherwise B would get a duplicate key
error. If the retry of B looks at something in the database to
determine what it's primary key should be it will get a new value
on the retry, since it will be starting after the commit of A. If
it is using a literal key, not based on something changed by A, it
will get a duplicate key error on the retry, since it will be
starting after the commit of A.

It will either succeed on retry or get an error for a different
reason.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Merlin Moncure
mmoncure@gmail.com
In reply to: Kevin Grittner (#14)
hackersbugs
Re: BUG #12330: ACID is broken for unique constraints

On Mon, Dec 29, 2014 at 9:09 AM, Kevin Grittner <kgrittn@ymail.com> wrote:

Merlin Moncure <mmoncure@gmail.com> wrote:

In other words, the current behavior is:
txn A,B begin
txn A inserts
txn B inserts over A, locks, waits
txn A commits. B aborts with duplicate key error

What I'm proposing is that for serializable transactions B would
get a serialization failure; otherwise B would get a duplicate key
error. If the retry of B looks at something in the database to
determine what it's primary key should be it will get a new value
on the retry, since it will be starting after the commit of A. If
it is using a literal key, not based on something changed by A, it
will get a duplicate key error on the retry, since it will be
starting after the commit of A.

It will either succeed on retry or get an error for a different
reason.

In that case: we don't agree. How come duplicate key errors would be
reported as serialization failures but not RI errors (for example,
inserting a record pointing to another record which a concurrent
transaction deleted)?

IMO, serialization errors are an implementation artifact and should
not mask well defined errors in SQL under any circumstances (or if
they must, those cases should absolutely be as narrow as possible).

merlin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16Bruce Momjian
bruce@momjian.us
In reply to: Merlin Moncure (#15)
hackersbugs
Re: BUG #12330: ACID is broken for unique constraints

On Mon, Dec 29, 2014 at 3:31 PM, Merlin Moncure <mmoncure@gmail.com> wrote:

In that case: we don't agree. How come duplicate key errors would be
reported as serialization failures but not RI errors (for example,
inserting a record pointing to another record which a concurrent
transaction deleted)?

The key question is not the type of error but whether the transaction
saw another state previously. That is, if you select to check for
duplicate keys, don't see any, and then try to insert and get a
duplicate key error that would be easy to argue is a serialization
error. The same could be true for an RI check -- if you select some
data and then insert a record that refers to the data you already
verified existed and get an RI failure then you could consider that a
serialization failure.

--
greg

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17Nikita Volkov
nikita.y.volkov@mail.ru
In reply to: Merlin Moncure (#15)
hackersbugs
Re: BUG #12330: ACID is broken for unique constraints

I believe, the objections expressed in this thread miss a very important
point of all this: the isolation property (the "I" in ACID) is violated.

Here’s a quote from the Wikipedia article on ACID
<http://en.wikipedia.org/wiki/ACID&gt;:

The isolation property ensures that the concurrent execution of
transactions results in a system state that would be obtained if
transactions were executed serially, i.e., one after the other.

The original example proves that it's violated. Such behaviour can neither
be expected by a user, nor is even mentioned anywhere. Instead in the first
paragraph of the “About” section of the Postgres site
<http://www.postgresql.org/about/&gt; it states:

It is fully ACID compliant

Which is basically just a lie, until issues like this one get dealt with.

2014-12-29 18:31 GMT+03:00 Merlin Moncure <mmoncure@gmail.com>:

Show quoted text

On Mon, Dec 29, 2014 at 9:09 AM, Kevin Grittner <kgrittn@ymail.com> wrote:

Merlin Moncure <mmoncure@gmail.com> wrote:

In other words, the current behavior is:
txn A,B begin
txn A inserts
txn B inserts over A, locks, waits
txn A commits. B aborts with duplicate key error

What I'm proposing is that for serializable transactions B would
get a serialization failure; otherwise B would get a duplicate key
error. If the retry of B looks at something in the database to
determine what it's primary key should be it will get a new value
on the retry, since it will be starting after the commit of A. If
it is using a literal key, not based on something changed by A, it
will get a duplicate key error on the retry, since it will be
starting after the commit of A.

It will either succeed on retry or get an error for a different
reason.

In that case: we don't agree. How come duplicate key errors would be
reported as serialization failures but not RI errors (for example,
inserting a record pointing to another record which a concurrent
transaction deleted)?

IMO, serialization errors are an implementation artifact and should
not mask well defined errors in SQL under any circumstances (or if
they must, those cases should absolutely be as narrow as possible).

merlin

#18Merlin Moncure
mmoncure@gmail.com
In reply to: Bruce Momjian (#16)
hackersbugs
Re: BUG #12330: ACID is broken for unique constraints

On Mon, Dec 29, 2014 at 9:44 AM, Greg Stark <stark@mit.edu> wrote:

On Mon, Dec 29, 2014 at 3:31 PM, Merlin Moncure <mmoncure@gmail.com> wrote:

In that case: we don't agree. How come duplicate key errors would be
reported as serialization failures but not RI errors (for example,
inserting a record pointing to another record which a concurrent
transaction deleted)?

The key question is not the type of error but whether the transaction
saw another state previously.

[combining replies -- nikita, better not to top-post (FYI)]

How is that relevant? Serialization errors only exist as a concession
to concurrency and performance. Again, they should be returned as
sparsely as possible because they provide absolutely (as Tom pointed
out) zero detail to the application. The precise definition of the
error is up to us, but I'd rather keep it to it's current rather
specific semantics.

On Mon, Dec 29, 2014 at 9:48 AM, Nikita Volkov <nikita.y.volkov@mail.ru> wrote:

I believe, the objections expressed in this thread miss a very important
point of all this: the isolation property (the "I" in ACID) is violated.

Here’s a quote from the Wikipedia article on ACID:

The isolation property ensures that the concurrent execution of transactions
results in a system state that would be obtained if transactions were
executed serially, i.e., one after the other.

The original example proves that it's violated. Such behaviour can neither
be expected by a user, nor is even mentioned anywhere. Instead in the first
paragraph of the “About” section of the Postgres site it states:

It is fully ACID compliant

Which is basically just a lie, until issues like this one get dealt with.

That's simply untrue: inconvenience != acid violation

Transaction levels provide certain guarantees regarding the state of
the data in the presence of concurrent overlapping operations. They
do not define the mechanism of failure or how/when those failures
should occur. To prove your statement, you need to demonstrate how a
transaction left the database in a bad state given concurrent activity
without counting failures. Postgres can, and does, for example,
return concurrency type errors more aggressively than it needs to for
the 'repeatable read', level. Point being, this is completely ok as
database implementation is free to understand that, just as it's free
to define precisely how and when it fails given concurrency as long as
those guarantees are provided.

merlin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19Nikita Volkov
nikita.y.volkov@mail.ru
In reply to: Merlin Moncure (#18)
hackersbugs
Re: BUG #12330: ACID is broken for unique constraints

[combining replies -- nikita, better not to top-post (FYI)]

I'm sorry. I don't know what you mean. I just replied to an email.

To prove your statement, you need to demonstrate how a transaction left

the database in a bad state given concurrent activity without counting
failures.

1. Transaction A looks up a row by ID 1 and gets an empty result.
2. Concurrent transaction B inserts a row with ID 1.
3. Transaction A goes on with the presumption that a row with ID 1 does not
exist, because a transaction is supposed to be isolated and because it has
made sure that the row does not exist. With this presumption it confidently
inserts a row with ID 1 only to get Postgres report a duplicate key. Wat?

2014-12-29 19:17 GMT+03:00 Merlin Moncure <mmoncure@gmail.com>:

Show quoted text

On Mon, Dec 29, 2014 at 9:44 AM, Greg Stark <stark@mit.edu> wrote:

On Mon, Dec 29, 2014 at 3:31 PM, Merlin Moncure <mmoncure@gmail.com>

wrote:

In that case: we don't agree. How come duplicate key errors would be
reported as serialization failures but not RI errors (for example,
inserting a record pointing to another record which a concurrent
transaction deleted)?

The key question is not the type of error but whether the transaction
saw another state previously.

[combining replies -- nikita, better not to top-post (FYI)]

How is that relevant? Serialization errors only exist as a concession
to concurrency and performance. Again, they should be returned as
sparsely as possible because they provide absolutely (as Tom pointed
out) zero detail to the application. The precise definition of the
error is up to us, but I'd rather keep it to it's current rather
specific semantics.

On Mon, Dec 29, 2014 at 9:48 AM, Nikita Volkov <nikita.y.volkov@mail.ru>
wrote:

I believe, the objections expressed in this thread miss a very important
point of all this: the isolation property (the "I" in ACID) is violated.

Here’s a quote from the Wikipedia article on ACID:

The isolation property ensures that the concurrent execution of

transactions

results in a system state that would be obtained if transactions were
executed serially, i.e., one after the other.

The original example proves that it's violated. Such behaviour can

neither

be expected by a user, nor is even mentioned anywhere. Instead in the

first

paragraph of the “About” section of the Postgres site it states:

It is fully ACID compliant

Which is basically just a lie, until issues like this one get dealt with.

That's simply untrue: inconvenience != acid violation

Transaction levels provide certain guarantees regarding the state of
the data in the presence of concurrent overlapping operations. They
do not define the mechanism of failure or how/when those failures
should occur. To prove your statement, you need to demonstrate how a
transaction left the database in a bad state given concurrent activity
without counting failures. Postgres can, and does, for example,
return concurrency type errors more aggressively than it needs to for
the 'repeatable read', level. Point being, this is completely ok as
database implementation is free to understand that, just as it's free
to define precisely how and when it fails given concurrency as long as
those guarantees are provided.

merlin

#20Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Merlin Moncure (#18)
hackersbugs
Re: BUG #12330: ACID is broken for unique constraints

Merlin Moncure <mmoncure@gmail.com> wrote:

Serialization errors only exist as a concession to concurrency
and performance. Again, they should be returned as sparsely as
possible because they provide absolutely (as Tom pointed
out) zero detail to the application.

That is false. They provide an *extremely* valuable piece of
information which is not currently available when you get a
duplicate key error -- whether the error occurred because of a race
condition and will not fail for the same cause if retried.

The precise definition of the error is up to us, but I'd rather
keep it to it's current rather specific semantics.

The semantics are so imprecise that Tom argued that we should
document that transactions should be retried from the start when
you get the duplicate key error, since it *might* have been caused
by a race condition.

I'm curious how heavily you use serializable transactions, because
I have trouble believing that those who rely on them as their
primary (or only) strategy for dealing with race conditions under
high concurrency would take that position.

As for the fact that RI violations also don't return a
serialization failure when caused by a race with concurrent
transactions, I view that as another weakness in PostgreSQL. I
don't think there is a problem curing one without curing the other
at the same time. I have known of people writing their own
triggers to enforce RI rather than defining FKs precisely so that
they can get a serialization failure return code and do automatic
retry if it is caused by a race condition. That's less practical
to compensate for when it comes to unique indexes or constraints.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21Merlin Moncure
mmoncure@gmail.com
In reply to: Nikita Volkov (#19)
hackersbugs
#22Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Merlin Moncure (#21)
hackersbugs
#23Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Kevin Grittner (#22)
hackersbugs
#24Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Kevin Grittner (#20)
hackersbugs
#25Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Jim Nasby (#24)
hackersbugs
#26Merlin Moncure
mmoncure@gmail.com
In reply to: Kevin Grittner (#22)
hackersbugs
#27Bruce Momjian
bruce@momjian.us
In reply to: Merlin Moncure (#18)
hackersbugs