9.1/9.2 SERIALIZABLE: expected serialization failure between INSERT and SELECT not encountered

Started by Craig Ringerover 13 years ago9 messagesgeneral
Jump to latest
#1Craig Ringer
craig@2ndquadrant.com

Hi all

In this SO question:

http://dba.stackexchange.com/questions/26905/how-do-i-implement-insert-if-not-found-for-transactions-at-serializable-isolatio/26909#26909

the author is running a series of queries that I'd expect to abort on
commit with a serialisation failure. No such failure occurs, and I'm
wondering why.

SETUP

create table artist (id serial primary key, name text);

SESSION 1 SESSION 2

BEGIN ISOLATION LEVEL SERIALIZABLE;

BEGIN ISOLATION LEVEL
SERIALIZABLE;

SELECT id FROM artist
WHERE name = 'Bob';

INSERT INTO artist (name)
VALUES ('Bob')

INSERT INTO artist (name)
VALUES ('Bob')

COMMIT; COMMIT;

I'd expect one of these two to abort with a serialization failure and
I'm not sure I understand why they don't in 9.1/9.2's new serializable
mode. Shouldn't the SELECT for "Bob" cause the insertion of "Bob" in the
other transaction to violate serializability?

--
Craig Ringer

#2Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Craig Ringer (#1)
Re: 9.1/9.2 SERIALIZABLE: expected serialization failure between INSERT and SELECT not encountered

Craig Ringer wrote:

In this SO question:

http://dba.stackexchange.com/questions/26905/how-do-i-implement-insert-i
f-not-found-for-transactions-

at-serializable-isolatio/26909#26909

the author is running a series of queries that I'd expect to abort on

commit with a serialisation

failure. No such failure occurs, and I'm wondering why.

SETUP

create table artist (id serial primary key, name text);

SESSION 1 SESSION 2

BEGIN ISOLATION LEVEL SERIALIZABLE;

BEGIN ISOLATION LEVEL

SERIALIZABLE;

SELECT id FROM artist
WHERE name = 'Bob';

INSERT INTO artist (name)
VALUES ('Bob')

INSERT INTO artist (name)
VALUES ('Bob')

COMMIT; COMMIT;

I'd expect one of these two to abort with a serialization failure and

I'm not sure I understand why

they don't in 9.1/9.2's new serializable mode. Shouldn't the SELECT

for "Bob" cause the insertion of

"Bob" in the other transaction to violate serializability?

Why? They can be serialized. The outcome would be exactly the same
if session 2 completed before session 1 began.

You would have a serialization problem if each session tried
to read what the other tries to write:

SESSION 1 SESSION 2

BEGIN ISOLATION LEVEL SERIALIZABLE;

BEGIN ISOLATION LEVEL SERIALIZABLE;

INSERT INTO artist (name) VALUES ('Bob');

INSERT INTO artist (name) VALUES
('Bob');

SELECT * FROM artist WHERE name = 'Bob';

SELECT * FROM artist WHERE name =
'Bob';

COMMIT;

COMMIT; /* throws serialization
error */

Yours,
Laurenz Albe

#3Craig Ringer
craig@2ndquadrant.com
In reply to: Laurenz Albe (#2)
Re: 9.1/9.2 SERIALIZABLE: expected serialization failure between INSERT and SELECT not encountered

On 10/17/2012 04:16 PM, Albe Laurenz wrote:

Why? They can be serialized. The outcome would be exactly the same
if session 2 completed before session 1 began.

Hmm. Good point; so long as *either* ordering is valid it's fine, it's
only when *both* orderings are invalid that a serialization failure
would occur. For some reason I had myself thinking that if a conflict
could occur in either ordering the tx would fail, which wouldn't really
be desirable and isn't how it works.

BTW, the issue with the underlying question is that their "name" column
is unique. They expected to get a serialization failure on duplicate
insert into "name", not a unique constraint violation. The question
wasn't "why doesn't this fail" but "Why does this fail with a different
error than I expected". Not that the question made that particularly clear.

--
Craig Ringer

#4Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Craig Ringer (#3)
Re: 9.1/9.2 SERIALIZABLE: expected serialization failure between INSERT and SELECT not encountered

Craig Ringer wrote:

Why? They can be serialized. The outcome would be exactly the same
if session 2 completed before session 1 began.

Hmm. Good point; so long as *either* ordering is valid it's fine, it's
only when *both* orderings are invalid that a serialization failure
would occur. For some reason I had myself thinking that if a conflict
could occur in either ordering the tx would fail, which wouldn't

really

be desirable and isn't how it works.

BTW, the issue with the underlying question is that their "name"

column

is unique. They expected to get a serialization failure on duplicate
insert into "name", not a unique constraint violation. The question
wasn't "why doesn't this fail" but "Why does this fail with a

different

error than I expected". Not that the question made that particularly

clear.

But the unasked question is also answered, right?

Yours,
Laurenz Albe

#5Chris Angelico
rosuav@gmail.com
In reply to: Craig Ringer (#3)
Re: 9.1/9.2 SERIALIZABLE: expected serialization failure between INSERT and SELECT not encountered

On Thu, Oct 18, 2012 at 3:08 PM, Craig Ringer <ringerc@ringerc.id.au> wrote:

BTW, the issue with the underlying question is that their "name" column is
unique. They expected to get a serialization failure on duplicate insert
into "name", not a unique constraint violation. The question wasn't "why
doesn't this fail" but "Why does this fail with a different error than I
expected". Not that the question made that particularly clear.

Sounds to me like it's giving a better error anyway - more helpful to
know _why_ the second transaction failed than to simply know _that_ it
failed.

I've actually never used serializable transaction isolation,
preferring more explicit constraints.

ChrisA

#6Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Chris Angelico (#5)
Re: 9.1/9.2 SERIALIZABLE: expected serialization failure between INSERT and SELECT not encountered

Chris Angelico wrote:

Craig Ringer <ringerc@ringerc.id.au> wrote:

BTW, the issue with the underlying question is that their "name"
column is unique. They expected to get a serialization failure on
duplicate insert into "name", not a unique constraint violation.
The question wasn't "why doesn't this fail" but "Why does this
fail with a different error than I expected".

I remember reading a paper about an attempt to use software to do a
static analysis of software accessing production databases, to
identify where additional protections (ecplicit locking, conflict
promotion, or conflict materialization) were needed to prevent
serialization anomalies under snapshot isolation. They initially got
a lot of hits for situations where no anomaly could actually happen
due to declarative constraints. Primary keys, unique indexes, and
foreign keys could all prevent anomalies in situations where you
would see them if the logic were left to, for example, trigger code
instead of the declarative constraints. The researchers argued that
in such situations, there was no point generating extra overhead to
use other techniques to redundantly protect data integrity.  I was
pursuaded. (I tried to find the paper to reference it here, but
wasn't successful -- I know that Alan Fekete was one of the authors
and the software they were looking at was in production use by the
Indian Institute of Technology for accounting and also a system for
tracking academic information.)

Sounds to me like it's giving a better error anyway - more helpful
to know _why_ the second transaction failed than to simply know
_that_ it failed.

It is a double-edged sword -- you have a more efficient way to
protect the data and a more specific error message; *but*, you don't
have a SQLSTATE on the error message which makes it clear that the
error was due to a race condition and that it is reasonable to retry
the transaction. The application programmer can still find techniques
which will allow automated retry without bugging the user with
spurious error messages which are effectively about hitting a race
condition from which the software can automatically recover, but this
does undercut the mantra that the transaction will do the right thing
or generate a serialization failure.

As an example, if you want to let serialization failures deal with
automatic retries, without pestering the user about having hit a
recoverable race condition, you stay away from the SELECT max(id) + 1
style of key assignment in favor of sequences (if contiguous numbers
are not critical) or updating a "last_used" number in a table and
using the result (if it is *is* critical that there are no gaps in
the numbers). That is actually the only "special rule" I've found to
be needed in practice so far to otherwise allow programmers to code
each transaction as though it were going to be run alone, and
otherwise ignore concurrency issue when using serializable
transactions.

-Kevin

#7Chris Angelico
rosuav@gmail.com
In reply to: Kevin Grittner (#6)
Re: 9.1/9.2 SERIALIZABLE: expected serialization failure between INSERT and SELECT not encountered

On Thu, Oct 18, 2012 at 11:26 PM, Kevin Grittner <kgrittn@mail.com> wrote:

updating a "last_used" number in a table and
using the result (if it is *is* critical that there are no gaps in
the numbers).

Correct me if I'm wrong, but wouldn't:

update some_table set last_used=last_used+1 returning last_used

simply block until the other transaction is committed/rolled back?
That avoids all the issues of serialization AND retrying transactions.
Or is the delay itself a problem?

ChrisA

#8Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Chris Angelico (#7)
Re: 9.1/9.2 SERIALIZABLE: expected serialization failure between INSERT and SELECT not encountered

Chris Angelico wrote:

Correct me if I'm wrong, but wouldn't:

update some_table set last_used=last_used+1 returning last_used

simply block until the other transaction is committed/rolled back?

Only at the READ COMMITTED transaction isolation level.

That avoids all the issues of serialization AND retrying
transactions.

If you can tolerate the occasional oddities of READ COMMITTED
transactions, it makes this quite simple, yes.

Or is the delay itself a problem?

The delay isn't a problem per se; it's hard to get around some form
of blocking if you want transactional integrity and gap-free
assignment of numbers.  If you can do this sort of assignment near
the front of the transaction in REPEATABLE READ or SERIALIZABLE
transactions, it doesn't cost that much more than in READ COMMITTED.

And the manifestations of weirdness in READ COMMITTED can be
daunting.  The simplest example looks pretty contrived, but simple
examples usually do.

Create and populate a simple table:

create table rc (id int primary key);
insert into rc select generate_series(1, 5);

Now, if no records are inserted or deleted by another connection, how
many rows will be deleted by this statement?:

delete from rc where id = (select min(id) from rc);

It's a trick question; the answer depends on a race condition.

Before you run that delete, on a separate connection run this:

begin;
update rc set id = id - 1;

Run your delete on the first connection and it will block.  Commit
the transaction on the second connection, and go look at the results
on the first, and select from the table.

Rather than make sure that every programmer writing code to go
against your database knows exactly where all the problems are and
consistently codes around them, it can be a lot easier to use
serializable transactions.

-Kevin

#9Chris Angelico
rosuav@gmail.com
In reply to: Kevin Grittner (#8)
Re: 9.1/9.2 SERIALIZABLE: expected serialization failure between INSERT and SELECT not encountered

On Fri, Oct 19, 2012 at 1:22 AM, Kevin Grittner <kgrittn@mail.com> wrote:

Now, if no records are inserted or deleted by another connection, how
many rows will be deleted by this statement?:

delete from rc where id = (select min(id) from rc);

It's a trick question; the answer depends on a race condition.

Before you run that delete, on a separate connection run this:

begin;
update rc set id = id - 1;

Run your delete on the first connection and it will block. Commit
the transaction on the second connection, and go look at the results
on the first, and select from the table.

Ah yes, I believe that one can be best explained with the Magic: The
Gathering rules about triggers and the "intervening if" clause. In
M:TG, a trigger might say "At the beginning of your upkeep, if you
have 40 or more life, you win the game". It checks the condition
twice, once when it triggers, and again when it resolves. (Yeah, I'm a
nerd. I know detaily rules to a nerd's game. So sue me.)

In the first pass, Postgres decides which row(s) should be deleted.
Okay, let's find the one that's the lowest ID. Lowest ID is 1, that
means this record. Hey, mind if I delete this? Oh, it's locked. Hold
on.

Commit the other one.

Okay, let's go delete this one. Check to see that it still matches the
WHERE clause. Does it have ID=1? Nope, ID=0. Don't delete anything!

Yes, it's slightly odd. But really, if you're doing sweeping changes
like that, a table-level massively exclusive lock is probably the best
way to do it. I haven't seen any issues with READ COMMITTED that have
caused problems; although that's partly because I usually have an
immutable ID 'serial primary key' on every row. Helps a lot.

ChrisA