Serializable Transaction Anomoly

Started by PG Bug reporting formover 1 year ago5 messagesdocs
Jump to latest
#1PG Bug reporting form
noreply@postgresql.org

The following documentation comment has been logged on the website:

Page: https://www.postgresql.org/docs/17/transaction-iso.html
Description:

I discovered an oddity in Serializable Transaction behavior and while
referencing the current docs there is a possible contradiction and I'm not
sure if this is a bug or expected behavior. At minimum there seems to be a
contradiction in the Transaction Isolation page of the docs.

1. At the top "serialization anomoly" is defined as "The result of
successfully committing a group of transactions is inconsistent with *all
possible* orderings of running those transactions one at a time." (emphasis
mine).
2. In the first paragraph of 13.2.3, sentence 4 states "In fact, this
isolation level works exactly the same as Repeatable Read except that it
also monitors for conditions which could make execution of a concurrent set
of serializable transactions behave in a manner inconsistent with *all
possible* serial (one at a time) executions of those transactions." (again
I'm emphasizing 'all possible")
3. In the first large paragraph above the unordered list at the bottom it
states "While PostgreSQL's Serializable transaction isolation level only
allows concurrent transactions to commit if it can prove there is a serial
order of execution that would produce the same effect, it doesn't always
prevent errors from being raised that would not occur in true serial
execution." - "if it can prove there is a serial order" implies if it can
find a serial execution of statements that would have the same effect - that
seems at odd with 1. and 2.?

The example I found is caused by poor application code design but based on
the docs I would expect the serialization anomaly detection to report a
concurrent modification. The example I'm looking at assumes there is a
`example` table with id and name.

Serializable Transaction 1:
INSERT INTO example (name) VALUES ('test1') RETURNING id; -- assume it
returns id: 10
-- Don't commit

Serializable Transaction 2:
SELECT * from example WHERE id = 10 FOR UPDATE; -- Other databases block
here, postgreSQL does not and returns 0 rows
UPDATE example SET name = 'test2' WHERE id = 10; -- updates 0 rows because
insert wasn't committed

Serializable Transaction 1:
COMMIT; -- example record with id 10 now exists in the database

Serializable Transaction 2:
COMMIT; -- I expected 40001 error but instead transaction committed without
updating name.

I understand that with Snapshot Isolation the new record doesn't exist when
either SELECT FOR UPDATE or UPDATE execute in the 2nd transaction and the
docs do specify "Predicate locks in PostgreSQL, like in most other database
systems, are based on data actually accessed by a transaction." which
implies if transaction 2 can't see the data it can't predicate lock the
data, And I believe the application code should not have been triggering a
background process (Transaction 2) before Transaction 1 commits because it
could rollback.

This seems to be in alignment with #3 but at odds with #1 and #2. Based on
the docs I referenced above, our observed behavior seems to be in alignment
with some of the documentation but in conflict with others. Am I misreading
#1 and 2?

Thanks,
Daniel

#2Laurenz Albe
laurenz.albe@cybertec.at
In reply to: PG Bug reporting form (#1)
Re: Serializable Transaction Anomoly

On Tue, 2024-11-05 at 15:05 +0000, PG Doc comments form wrote:

I discovered an oddity in Serializable Transaction behavior and while
referencing the current docs there is a possible contradiction and I'm not
sure if this is a bug or expected behavior. At minimum there seems to be a
contradiction in the Transaction Isolation page of the docs.

1. At the top "serialization anomoly" is defined as "The result of
successfully committing a group of transactions is inconsistent with *all
possible* orderings of running those transactions one at a time." (emphasis
mine).
2. In the first paragraph of 13.2.3, sentence 4 states "In fact, this
isolation level works exactly the same as Repeatable Read except that it
also monitors for conditions which could make execution of a concurrent set
of serializable transactions behave in a manner inconsistent with *all
possible* serial (one at a time) executions of those transactions." (again
I'm emphasizing 'all possible")
3. In the first large paragraph above the unordered list at the bottom it
states "While PostgreSQL's Serializable transaction isolation level only
allows concurrent transactions to commit if it can prove there is a serial
order of execution that would produce the same effect, it doesn't always
prevent errors from being raised that would not occur in true serial
execution." - "if it can prove there is a serial order" implies if it can
find a serial execution of statements that would have the same effect - that
seems at odd with 1. and 2.?

I don't see a contradiction.

#1 defines what an anomaly is.

#2 says that if there would be an anomaly with SERIALIZABLE isolation,
you will get a serialization error.
So there cannot be any false negatives.

#3 says that it is possible to get false positive serialization errors,
that is, serialization errors that occur even though the transactions really
would be serializable.

The example I found is caused by poor application code design but based on
the docs I would expect the serialization anomaly detection to report a
concurrent modification. The example I'm looking at assumes there is a
`example` table with id and name.

Serializable Transaction 1:
INSERT INTO example (name) VALUES ('test1') RETURNING id; -- assume it
returns id: 10
-- Don't commit

Serializable Transaction 2:
SELECT * from example WHERE id = 10 FOR UPDATE; -- Other databases block
here, postgreSQL does not and returns 0 rows
UPDATE example SET name = 'test2' WHERE id = 10; -- updates 0 rows because
insert wasn't committed

Serializable Transaction 1:
COMMIT; -- example record with id 10 now exists in the database

Serializable Transaction 2:
COMMIT; -- I expected 40001 error but instead transaction committed without
updating name.

I understand that with Snapshot Isolation the new record doesn't exist when
either SELECT FOR UPDATE or UPDATE execute in the 2nd transaction and the
docs do specify "Predicate locks in PostgreSQL, like in most other database
systems, are based on data actually accessed by a transaction." which
implies if transaction 2 can't see the data it can't predicate lock the
data,  And I believe the application code should not have been triggering a
background process (Transaction 2) before Transaction 1 commits because it
could rollback.

The transactions you show above are serializable: if you execute transaction 2
strictly before transaction 1, you would end up with the same result.
So there is an equivalent serial execution.

Yours,
Laurenz Albe

#3Daniel Bickler
daniel.bickler@goprominent.com
In reply to: Laurenz Albe (#2)
Re: Serializable Transaction Anomoly

Thank you for the prompt response, I’m probably misreading the documentation and how it relates to the example we ran into.

The way I interpreted the documentation, the example I ran into was a false negative according to the definition of a serialization anomaly, because it’s serial in one ordering but not the other which seems incorrect with “all possible”.

I think where I don’t fully understand is the documentation seems to imply all serial orderings must be valid to commit a SERIALIZABLE transaction but it seems like just one serial ordering must be valid?

I appreciate your assistance and responses,
Daniel Bickler

From: Laurenz Albe <laurenz.albe@cybertec.at>
Date: Tuesday, November 5, 2024 at 12:21 PM
To: Daniel Bickler <daniel.bickler@goprominent.com>, pgsql-docs@lists.postgresql.org <pgsql-docs@lists.postgresql.org>
Subject: Re: Serializable Transaction Anomoly
[Image removed by sender.]
IRONSCALES couldn't recognize this email as this is the first time you received an email from this sender laurenz.albe @ cybertec.at

[You don't often get email from laurenz.albe@cybertec.at. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]

On Tue, 2024-11-05 at 15:05 +0000, PG Doc comments form wrote:

I discovered an oddity in Serializable Transaction behavior and while
referencing the current docs there is a possible contradiction and I'm not
sure if this is a bug or expected behavior. At minimum there seems to be a
contradiction in the Transaction Isolation page of the docs.

1. At the top "serialization anomoly" is defined as "The result of
successfully committing a group of transactions is inconsistent with *all
possible* orderings of running those transactions one at a time." (emphasis
mine).
2. In the first paragraph of 13.2.3, sentence 4 states "In fact, this
isolation level works exactly the same as Repeatable Read except that it
also monitors for conditions which could make execution of a concurrent set
of serializable transactions behave in a manner inconsistent with *all
possible* serial (one at a time) executions of those transactions." (again
I'm emphasizing 'all possible")
3. In the first large paragraph above the unordered list at the bottom it
states "While PostgreSQL's Serializable transaction isolation level only
allows concurrent transactions to commit if it can prove there is a serial
order of execution that would produce the same effect, it doesn't always
prevent errors from being raised that would not occur in true serial
execution." - "if it can prove there is a serial order" implies if it can
find a serial execution of statements that would have the same effect - that
seems at odd with 1. and 2.?

I don't see a contradiction.

#1 defines what an anomaly is.

#2 says that if there would be an anomaly with SERIALIZABLE isolation,
you will get a serialization error.
So there cannot be any false negatives.

#3 says that it is possible to get false positive serialization errors,
that is, serialization errors that occur even though the transactions really
would be serializable.

The example I found is caused by poor application code design but based on
the docs I would expect the serialization anomaly detection to report a
concurrent modification. The example I'm looking at assumes there is a
`example` table with id and name.

Serializable Transaction 1:
INSERT INTO example (name) VALUES ('test1') RETURNING id; -- assume it
returns id: 10
-- Don't commit

Serializable Transaction 2:
SELECT * from example WHERE id = 10 FOR UPDATE; -- Other databases block
here, postgreSQL does not and returns 0 rows
UPDATE example SET name = 'test2' WHERE id = 10; -- updates 0 rows because
insert wasn't committed

Serializable Transaction 1:
COMMIT; -- example record with id 10 now exists in the database

Serializable Transaction 2:
COMMIT; -- I expected 40001 error but instead transaction committed without
updating name.

I understand that with Snapshot Isolation the new record doesn't exist when
either SELECT FOR UPDATE or UPDATE execute in the 2nd transaction and the
docs do specify "Predicate locks in PostgreSQL, like in most other database
systems, are based on data actually accessed by a transaction." which
implies if transaction 2 can't see the data it can't predicate lock the
data, And I believe the application code should not have been triggering a
background process (Transaction 2) before Transaction 1 commits because it
could rollback.

The transactions you show above are serializable: if you execute transaction 2
strictly before transaction 1, you would end up with the same result.
So there is an equivalent serial execution.

Yours,
Laurenz Albe

#4Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Daniel Bickler (#3)
Re: Serializable Transaction Anomoly

On Tue, 2024-11-05 at 18:41 +0000, Daniel Bickler wrote:

The way I interpreted the documentation, the example I ran into was a false negative
according to the definition of a serialization anomaly, because it’s serial in one
ordering but not the other which seems incorrect with “all possible”.
 
I think where I don’t fully understand is the documentation seems to imply all serial
orderings must be valid to commit a SERIALIZABLE transaction but it seems like just
one serial ordering must be valid?

You seem to think that transactions are serializable if their result is consistent
with all possible serial execution orders. But that is not so.

What the documentation says is:
It is an serialization anomaly (that is, not serializable) if the execution is
*in*consistent will all possible serial executions.

That implies: It is serializable if the execution is consistent with one serial
execution.

Yours,
Laurenz Albe

#5Daniel Bickler
daniel.bickler@goprominent.com
In reply to: Laurenz Albe (#4)
Re: Serializable Transaction Anomoly

You are correct, that is where I made the mistake.
Thank you for the clarification; I understand much better now and the behavior we experienced makes sense with the correct reading of the documentation.

Daniel Bickler

From: Laurenz Albe <laurenz.albe@cybertec.at>
Date: Tuesday, November 5, 2024 at 2:17 PM
To: Daniel Bickler <daniel.bickler@goprominent.com>, pgsql-docs@lists.postgresql.org <pgsql-docs@lists.postgresql.org>
Subject: Re: Serializable Transaction Anomoly
[You don't often get email from laurenz.albe@cybertec.at. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]

On Tue, 2024-11-05 at 18:41 +0000, Daniel Bickler wrote:

The way I interpreted the documentation, the example I ran into was a false negative
according to the definition of a serialization anomaly, because it’s serial in one
ordering but not the other which seems incorrect with “all possible”.

I think where I don’t fully understand is the documentation seems to imply all serial
orderings must be valid to commit a SERIALIZABLE transaction but it seems like just
one serial ordering must be valid?

You seem to think that transactions are serializable if their result is consistent
with all possible serial execution orders. But that is not so.

What the documentation says is:
It is an serialization anomaly (that is, not serializable) if the execution is
*in*consistent will all possible serial executions.

That implies: It is serializable if the execution is consistent with one serial
execution.

Yours,
Laurenz Albe