How does PostgreSQL serializable transaction works? (SELECT/UPDATE vs INSERT conflict)

Started by Alexandru Lazarevabout 10 years ago7 messagesgeneral

alexandru.lazarev@gmail.com

about 10 years ago

Transaction 1 operated on set of data (`WHERE` clause) on which 2nd
transaction do an `INSERT`, which fit to clause from 1st transaction.
Shouldn't 1st transaction fail if 2nd commit first?

I have following table (in PostgreSQL 9.5 db)

`CREATE TABLE public.foo (id serial PRIMARY KEY, mynum integer);`

and following data

id | mynum
----+-------
1 | 10
2 | 10
3 | 10
4 | 10
(4 rows)

I run 2 serialize transactions in parallel (2 `psql` consoles):

-- both transactions
mydb=# begin;
BEGIN
mydb=# set transaction isolation level serializable;
SET

-- tx1
mydb=# select * from foo where mynum < 100;
id | mynum
----+-------
1 | 10
2 | 10
3 | 10
4 | 10
(4 rows)
--tx1: Shouldn't freeze data visible for tx1 select?

--tx2
mydb=# insert into foo (mynum) values (10);
INSERT 0 1
-- tx2 will insert next row with id 5 in foo table
-- Shouldn't insert of tx2 broke data snapshot visible for tx1?

--tx1
mydb=# update foo set mynum = 20 where id < 100;
UPDATE 4
-- Shouldn't here appear serialization fail or at least on tx1 commit?

--tx2
mydb=# commit;
COMMIT

--tx1
mydb=# commit;
COMMIT
-- tx1 Commit is OK - no any error

-- implicit tx
mydb=# select * from foo;
id | mynum
----+-------
1 | 20
2 | 20
3 | 20
4 | 20
5 | 10
(4 rows)

I am wondering why it behave so, taking in consideration PostgreSQL
documentation

"To guarantee true serializability PostgreSQL uses predicate locking,
which means that it keeps locks which allow it to determine when a
write would have had an impact on the result of a previous read from a
concurrent transaction, had it run first."

link: http://www.postgresql.org/docs/current/static/transaction-iso.html

Alexandru Lazarev

alexandru.lazarev@gmail.com

about 10 years ago

In reply to: Alexandru Lazarev (#1)

Fwd: How does PostgreSQL serializable transaction works? (SELECT/UPDATE vs INSERT conflict)

Transaction 1 operated on set of data (`WHERE` clause) on which 2nd
transaction do an `INSERT`, which fit to clause from 1st transaction.
Shouldn't 1st transaction fail if 2nd commit first?

I have following table (in PostgreSQL 9.5 db)

`CREATE TABLE public.foo (id serial PRIMARY KEY, mynum integer);`

and following data

id | mynum
----+-------
1 | 10
2 | 10
3 | 10
4 | 10
(4 rows)

I run 2 serialize transactions in parallel (2 `psql` consoles):

-- both transactions
mydb=# begin;
BEGIN
mydb=# set transaction isolation level serializable;
SET

-- tx1
mydb=# select * from foo where mynum < 100;
id | mynum
----+-------
1 | 10
2 | 10
3 | 10
4 | 10
(4 rows)
--tx1: Shouldn't freeze data visible for tx1 select?

--tx2
mydb=# insert into foo (mynum) values (10);
INSERT 0 1
-- tx2 will insert next row with id 5 in foo table
-- Shouldn't insert of tx2 broke data snapshot visible for tx1?

--tx1
mydb=# update foo set mynum = 20 where id < 100;
UPDATE 4
-- Shouldn't here appear serialization fail or at least on tx1 commit?

--tx2
mydb=# commit;
COMMIT

--tx1
mydb=# commit;
COMMIT
-- tx1 Commit is OK - no any error

-- implicit tx
mydb=# select * from foo;
id | mynum
----+-------
1 | 20
2 | 20
3 | 20
4 | 20
5 | 10
(5 rows)

I am wondering why it behave so, taking in consideration PostgreSQL
documentation

"To guarantee true serializability PostgreSQL uses predicate locking,
which means that it keeps locks which allow it to determine when a
write would have had an impact on the result of a previous read from a
concurrent transaction, had it run first."

link: http://www.postgresql.org/docs/current/static/transaction-iso.html

David G. Johnston

david.g.johnston@gmail.com

about 10 years ago

In reply to: Alexandru Lazarev (#1)

Re: How does PostgreSQL serializable transaction works? (SELECT/UPDATE vs INSERT conflict)

On Wed, Mar 9, 2016 at 5:18 AM, Alexandru Lazarev <
alexandru.lazarev@gmail.com> wrote:

Transaction 1 operated on set of data (`WHERE` clause) on which 2nd
transaction do an `INSERT`, which fit to clause from 1st transaction.
Shouldn't 1st transaction fail if 2nd commit first?

I have following table (in PostgreSQL 9.5 db)

`CREATE TABLE public.foo (id serial PRIMARY KEY, mynum integer);`

and following data

id | mynum
----+-------
1 | 10
2 | 10
3 | 10
4 | 10
(4 rows)

I run 2 serialize transactions in parallel (2 `psql` consoles):

-- both transactions
mydb=# begin;
BEGIN
mydb=# set transaction isolation level serializable;
SET

-- tx1
mydb=# select * from foo where mynum < 100;
id | mynum
----+-------
1 | 10
2 | 10
3 | 10
4 | 10
(4 rows)
--tx1: Shouldn't freeze data visible for tx1 select?

--tx2
mydb=# insert into foo (mynum) values (10);
INSERT 0 1
-- tx2 will insert next row with id 5 in foo table
-- Shouldn't insert of tx2 broke data snapshot visible for tx1?

--tx1
mydb=# update foo set mynum = 20 where id < 100;
UPDATE 4
-- Shouldn't here appear serialization fail or at least on tx1 commit?

--tx2
mydb=# commit;
COMMIT

--tx1
mydb=# commit;
COMMIT
-- tx1 Commit is OK - no any error

-- implicit tx
mydb=# select * from foo;
id | mynum
----+-------
1 | 20
2 | 20
3 | 20
4 | 20
5 | 10
(4 rows)

I am wondering why it behave so, taking in consideration PostgreSQL
documentation

"To guarantee true serializability PostgreSQL uses predicate locking,
which means that it keeps locks which allow it to determine when a
write would have had an impact on the result of a previous read from a
concurrent transaction, had it run first."

link: http://www.postgresql.org/docs/current/static/transaction-iso.html

Next paragraph:

"

Predicate locks in PostgreSQL, like in most other database systems, are
based on data actually accessed by a transaction
."

i.e., the system doesn't keep a record of which where clauses are
presently in effect but only which rows have been seen.

The promise of serializable is that the following will not occur:

"

The result of successfully committing a group of transactions is
inconsistent with all possible orderings of running those transactions one
at a time.
"

But as long as at least a single possible serial ordering is consistent we
are fine - and since executing tx1 to completion and then executing tx2 to
completion will result in exactly the outcome you describe (5 rows, four of
which have been incremented) there is no violation.

David J.

Jeff Janes

jeff.janes@gmail.com

about 10 years ago

In reply to: Alexandru Lazarev (#2)

Re: Fwd: How does PostgreSQL serializable transaction works? (SELECT/UPDATE vs INSERT conflict)

On Wed, Mar 9, 2016 at 11:39 AM, Alexandru Lazarev
<alexandru.lazarev@gmail.com> wrote:

Transaction 1 operated on set of data (`WHERE` clause) on which 2nd
transaction do an `INSERT`, which fit to clause from 1st transaction.
Shouldn't 1st transaction fail if 2nd commit first?

I have following table (in PostgreSQL 9.5 db)

`CREATE TABLE public.foo (id serial PRIMARY KEY, mynum integer);`

and following data

id | mynum
----+-------
1 | 10
2 | 10
3 | 10
4 | 10
(4 rows)

I run 2 serialize transactions in parallel (2 `psql` consoles):

-- both transactions
mydb=# begin;
BEGIN
mydb=# set transaction isolation level serializable;
SET

-- tx1
mydb=# select * from foo where mynum < 100;
id | mynum
----+-------
1 | 10
2 | 10
3 | 10
4 | 10
(4 rows)
--tx1: Shouldn't freeze data visible for tx1 select?

--tx2
mydb=# insert into foo (mynum) values (10);
INSERT 0 1
-- tx2 will insert next row with id 5 in foo table
-- Shouldn't insert of tx2 broke data snapshot visible for tx1?

--tx1
mydb=# update foo set mynum = 20 where id < 100;
UPDATE 4
-- Shouldn't here appear serialization fail or at least on tx1 commit?

--tx2
mydb=# commit;
COMMIT

--tx1
mydb=# commit;
COMMIT
-- tx1 Commit is OK - no any error

-- implicit tx
mydb=# select * from foo;
id | mynum
----+-------
1 | 20
2 | 20
3 | 20
4 | 20
5 | 10
(5 rows)

What you are seeing here is exactly what you would see if tx1 started
and ran to completion, and then tx2 started and ran to completion, and
then the implicit tx started and ran to completion, in that order.
Isn't it? If so, there is no serialization failure.

Serializable means that there needs to be some serial ordering of the
transactions which would result in the same overall outcome that
actually occurred. It doesn't mean that the serial ordering which
would produce that outcome has to be the same as the actual
chronological commit order.

Cheers,

Jeff

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Kevin Grittner

Kevin.Grittner@wicourts.gov

about 10 years ago

In reply to: Alexandru Lazarev (#2)

Re: Fwd: How does PostgreSQL serializable transaction works? (SELECT/UPDATE vs INSERT conflict)

On Wed, Mar 9, 2016 at 1:39 PM, Alexandru Lazarev
<alexandru.lazarev@gmail.com> wrote:

Jeff's answer is entirely correct; I'm just going to go into more
detail -- just in case you're interested enough to work through it.

`CREATE TABLE public.foo (id serial PRIMARY KEY, mynum integer);`

and following data

id | mynum
----+-------
1 | 10
2 | 10
3 | 10
4 | 10
(4 rows)

I run 2 serialize transactions in parallel (2 `psql` consoles):

-- both transactions
mydb=# begin;
BEGIN
mydb=# set transaction isolation level serializable;
SET

-- tx1
mydb=# select * from foo where mynum < 100;
id | mynum
----+-------
1 | 10
2 | 10
3 | 10
4 | 10
(4 rows)
--tx1: Shouldn't freeze data visible for tx1 select?

Yes, tx1 does have a snapshot which will guarantee that it sees a
repeatable set view of the data for this predicate.

--tx2
mydb=# insert into foo (mynum) values (10);
INSERT 0 1
-- tx2 will insert next row with id 5 in foo table
-- Shouldn't insert of tx2 broke data snapshot visible for tx1?

The snapshot tx1 has guarantees that overlapping changes won't
change it's view of things, and there is no reason for anything to
be blocked or canceled here. The insert creates what is called a
read-write dependency (or rw-conflict for short) that establishes
that in any serial ordering of a set of transactions which includes
tx1 and tx2, tx1 must precede tx2 in the apparent order of
execution.

--tx1
mydb=# update foo set mynum = 20 where id < 100;
UPDATE 4
-- Shouldn't here appear serialization fail or at least on tx1 commit?

No, there is no cycle in the apparent order of execution. The
snapshot for tx1 still limits it to the same set of rows, and there
is nothing visible that is inconsistent with tx1 running before
tx2.

--tx2
mydb=# commit;
COMMIT

--tx1
mydb=# commit;
COMMIT
-- tx1 Commit is OK - no any error

According to the SQL standard, and in the PostgreSQL implementation
of SERIALIZABLE transactions, commit order does not, by itself,
establish apparent order of execution.

-- implicit tx
mydb=# select * from foo;
id | mynum
----+-------
1 | 20
2 | 20
3 | 20
4 | 20
5 | 10
(5 rows)

As Jeff said, this is consistent with the implicit transaction
running last, so tx1 -> tx2 -> implicit_tx.

Now, you are pretty close to a situation which does need to trigger
a serialization failure -- just switch the commit of tx1 and the
implicit transaction. If tx2 has committed but tx1 has not yet
committed:

mydb=# select * from foo;
id | mynum
----+-------
1 | 10
2 | 10
3 | 10
4 | 10
5 | 10
(5 rows)

*Now* we have a problem -- this only makes sense if the implicit tx
was run after tx2 and before tx1. So apparent order of execution
is tx1 -> tx2 -> implicit_tx -> tx1. There is a cycle in the
apparent order of execution, which causes anomalies which can ruin
data integrity. Now, if the implicit transaction is not
serializable, it is allowed to see such things, but if you make it
serializable (and let's call it tx3 now) it sees a state where only
tx2 ran; tx1 could not have run:

-- tx3
mydb=# begin;
BEGIN
mydb=# set transaction isolation level serializable;
SET
mydb=# select * from foo;
id | mynum
----+-------
1 | 10
2 | 10
3 | 10
4 | 10
5 | 10
(5 rows)

mydb=# commit;
COMMIT

So now, tx1 is not allowed to commit, or for that matter do
anything else -- it has been "doomed" by tx3:

mydb=# select * from foo;
ERROR: could not serialize access due to read/write dependencies
among transactions
DETAIL: Reason code: Canceled on identification as a pivot, during
conflict out checking.
HINT: The transaction might succeed if retried.

Hopefully you are using some framework to automatically detect this
SQLSTATE and retry the transaction from the start. So on retry,
tx1 does this:

-- tx1 (retry)
mydb=# begin;
BEGIN
mydb=# set transaction isolation level serializable;
SET
mydb=# select * from foo where mynum < 100;
id | mynum
----+-------
1 | 10
2 | 10
3 | 10
4 | 10
5 | 10
(5 rows)

mydb=# update foo set mynum = 20 where id < 100;
UPDATE 5
mydb=# select * from foo;
id | mynum
----+-------
1 | 20
2 | 20
3 | 20
4 | 20
5 | 20
(5 rows)

mydb=# commit;
COMMIT

Now the result of all successfully committed serializiable
transactions is consistent with the order tx2 -> tx3 -> tx1. All
is good.

Kevin Grittner

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Alexandru Lazarev

alexandru.lazarev@gmail.com

about 10 years ago

In reply to: Kevin Grittner (#5)

Re: [GENERAL] Fwd: How does PostgreSQL serializable transaction works? (SELECT/UPDATE vs INSERT conflict)

Ok,
Now it's more clear for me. Thanks to all, especially on @Kevin deep
explanation and (tx3) example. Question is closed.

I intuited that it might be as all of You explained, but was not sure, I
was confused by "predicate lock" - I thought it's related to `SELECT+WHERE`
and not to data (a kind of "subject lock").
Now I understood that key-words are "serial execution in any order will
lead to conflict" - here I also was a little bit confused by chronological
order of commit.

P.S. One more "offtop" question - What kind of frameworks do automatically
retries for failed transactions? Are Hibernate/Spring in that list?

Best Regards,
AlexL

On Thu, Mar 10, 2016 at 12:41 AM, Kevin Grittner <kgrittn@gmail.com> wrote:

Show quoted text

On Wed, Mar 9, 2016 at 1:39 PM, Alexandru Lazarev
<alexandru.lazarev@gmail.com> wrote:

Jeff's answer is entirely correct; I'm just going to go into more
detail -- just in case you're interested enough to work through it.

`CREATE TABLE public.foo (id serial PRIMARY KEY, mynum integer);`

and following data

id | mynum
----+-------
1 | 10
2 | 10
3 | 10
4 | 10
(4 rows)

I run 2 serialize transactions in parallel (2 `psql` consoles):

-- both transactions
mydb=# begin;
BEGIN
mydb=# set transaction isolation level serializable;
SET

-- tx1
mydb=# select * from foo where mynum < 100;
id | mynum
----+-------
1 | 10
2 | 10
3 | 10
4 | 10
(4 rows)
--tx1: Shouldn't freeze data visible for tx1 select?

Yes, tx1 does have a snapshot which will guarantee that it sees a
repeatable set view of the data for this predicate.

--tx2
mydb=# insert into foo (mynum) values (10);
INSERT 0 1
-- tx2 will insert next row with id 5 in foo table
-- Shouldn't insert of tx2 broke data snapshot visible for tx1?

The snapshot tx1 has guarantees that overlapping changes won't
change it's view of things, and there is no reason for anything to
be blocked or canceled here. The insert creates what is called a
read-write dependency (or rw-conflict for short) that establishes
that in any serial ordering of a set of transactions which includes
tx1 and tx2, tx1 must precede tx2 in the apparent order of
execution.

--tx1
mydb=# update foo set mynum = 20 where id < 100;
UPDATE 4
-- Shouldn't here appear serialization fail or at least on tx1

commit?

No, there is no cycle in the apparent order of execution. The
snapshot for tx1 still limits it to the same set of rows, and there
is nothing visible that is inconsistent with tx1 running before
tx2.

--tx2
mydb=# commit;
COMMIT

--tx1
mydb=# commit;
COMMIT
-- tx1 Commit is OK - no any error

According to the SQL standard, and in the PostgreSQL implementation
of SERIALIZABLE transactions, commit order does not, by itself,
establish apparent order of execution.

-- implicit tx
mydb=# select * from foo;
id | mynum
----+-------
1 | 20
2 | 20
3 | 20
4 | 20
5 | 10
(5 rows)

As Jeff said, this is consistent with the implicit transaction
running last, so tx1 -> tx2 -> implicit_tx.

Now, you are pretty close to a situation which does need to trigger
a serialization failure -- just switch the commit of tx1 and the
implicit transaction. If tx2 has committed but tx1 has not yet
committed:

mydb=# select * from foo;
id | mynum
----+-------
1 | 10
2 | 10
3 | 10
4 | 10
5 | 10
(5 rows)

*Now* we have a problem -- this only makes sense if the implicit tx
was run after tx2 and before tx1. So apparent order of execution
is tx1 -> tx2 -> implicit_tx -> tx1. There is a cycle in the
apparent order of execution, which causes anomalies which can ruin
data integrity. Now, if the implicit transaction is not
serializable, it is allowed to see such things, but if you make it
serializable (and let's call it tx3 now) it sees a state where only
tx2 ran; tx1 could not have run:

-- tx3
mydb=# begin;
BEGIN
mydb=# set transaction isolation level serializable;
SET
mydb=# select * from foo;
id | mynum
----+-------
1 | 10
2 | 10
3 | 10
4 | 10
5 | 10
(5 rows)

mydb=# commit;
COMMIT

So now, tx1 is not allowed to commit, or for that matter do
anything else -- it has been "doomed" by tx3:

mydb=# select * from foo;
ERROR: could not serialize access due to read/write dependencies
among transactions
DETAIL: Reason code: Canceled on identification as a pivot, during
conflict out checking.
HINT: The transaction might succeed if retried.

Hopefully you are using some framework to automatically detect this
SQLSTATE and retry the transaction from the start. So on retry,
tx1 does this:

-- tx1 (retry)
mydb=# begin;
BEGIN
mydb=# set transaction isolation level serializable;
SET
mydb=# select * from foo where mynum < 100;
id | mynum
----+-------
1 | 10
2 | 10
3 | 10
4 | 10
5 | 10
(5 rows)

mydb=# update foo set mynum = 20 where id < 100;
UPDATE 5
mydb=# select * from foo;
id | mynum
----+-------
1 | 20
2 | 20
3 | 20
4 | 20
5 | 20
(5 rows)

mydb=# commit;
COMMIT

Now the result of all successfully committed serializiable
transactions is consistent with the order tx2 -> tx3 -> tx1. All
is good.

Kevin Grittner

Kevin Grittner

Kevin.Grittner@wicourts.gov

about 10 years ago

In reply to: Alexandru Lazarev (#6)

Re: Fwd: How does PostgreSQL serializable transaction works? (SELECT/UPDATE vs INSERT conflict)

On Thu, Mar 10, 2016 at 1:50 PM, Alexandru Lazarev
<alexandru.lazarev@gmail.com> wrote:

One more "offtop" question - What kind of frameworks do
automatically retries for failed transactions? Are
Hibernate/Spring in that list?

I have seen that done in Hibernate/Spring using dependency
injection to create a transaction manager with the necessary logic.
I was told by the developer that doing so was not trivial, but not
outrageously hard, either.

Every framework may have a different way to do this; I would just
say that any framework which does not provide a reasonable
mechanism for implementing such behavior is not one I would
consider to be mature enough for "prime time" -- although others
might feel differently.

Kevin Grittner

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general