Repeatable Read Isolation Level "transaction start time"

Started by Wizard Bronyover 1 year ago30 messagesgeneral
Jump to latest
#1Wizard Brony
wizardbrony@gmail.com

https://www.postgresql.org/docs/16/transaction-iso.html#XACT-REPEATABLE-READ

The PostgreSQL documentation for the Repeatable Read Isolation Level states the following:

“UPDATE, DELETE, MERGE, SELECT FOR UPDATE, and SELECT FOR SHARE commands behave the same as SELECT in terms of searching for target rows: they will only find target rows that were committed as of the transaction start time.”

What is defined as the "transaction start time?" When I first read the statement, I interpreted it as the start of the transaction:

BEGIN;
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;

But in my testing, I find that according to that statement, the transaction start time is actually "the start of the first non-transaction-control statement in the transaction" (as mentioned earlier in the section). Is my conclusion correct, or am I misunderstanding the documentation?

#2David G. Johnston
david.g.johnston@gmail.com
In reply to: Wizard Brony (#1)
Re: Repeatable Read Isolation Level "transaction start time"

On Monday, September 23, 2024, Wizard Brony <wizardbrony@gmail.com> wrote:

https://www.postgresql.org/docs/16/transaction-iso.html#
XACT-REPEATABLE-READ

The PostgreSQL documentation for the Repeatable Read Isolation Level
states the following:

“UPDATE, DELETE, MERGE, SELECT FOR UPDATE, and SELECT FOR SHARE commands
behave the same as SELECT in terms of searching for target rows: they will
only find target rows that were committed as of the transaction start time.”

What is defined as the "transaction start time?" When I first read the
statement, I interpreted it as the start of the transaction:

BEGIN;
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;

But in my testing, I find that according to that statement, the
transaction start time is actually "the start of the first
non-transaction-control statement in the transaction" (as mentioned earlier
in the section). Is my conclusion correct, or am I misunderstanding the
documentation?

Probably, since indeed the transaction cannot start at begin because once
it does start it cannot be modified.

David J.

#3Ron
ronljohnsonjr@gmail.com
In reply to: David G. Johnston (#2)
Re: Repeatable Read Isolation Level "transaction start time"

On Tue, Sep 24, 2024 at 8:29 AM David G. Johnston <
david.g.johnston@gmail.com> wrote:

On Monday, September 23, 2024, Wizard Brony <wizardbrony@gmail.com> wrote:

https://www.postgresql.org/docs/16/transaction-iso.html#XACT-REPEATABLE-READ

The PostgreSQL documentation for the Repeatable Read Isolation Level
states the following:

“UPDATE, DELETE, MERGE, SELECT FOR UPDATE, and SELECT FOR SHARE commands
behave the same as SELECT in terms of searching for target rows: they will
only find target rows that were committed as of the transaction start time.”

What is defined as the "transaction start time?" When I first read the
statement, I interpreted it as the start of the transaction:

BEGIN;
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;

But in my testing, I find that according to that statement, the
transaction start time is actually "the start of the first
non-transaction-control statement in the transaction" (as mentioned earlier
in the section). Is my conclusion correct, or am I misunderstanding the
documentation?

Probably, since indeed the transaction cannot start at begin because once
it does start it cannot be modified.

Huh?

--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> crustacean!

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Wizard Brony (#1)
Re: Repeatable Read Isolation Level "transaction start time"

Wizard Brony <wizardbrony@gmail.com> writes:

But in my testing, I find that according to that statement, the transaction start time is actually "the start of the first non-transaction-control statement in the transaction" (as mentioned earlier in the section). Is my conclusion correct, or am I misunderstanding the documentation?

It's even looser than that, really: it's the first statement that
requires an MVCC snapshot. From memory, LOCK TABLE is an important
exception --- you can acquire table locks before pinning down
a snapshot, and this is important in some scenarios.

regards, tom lane

#5Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Ron (#3)
Re: Repeatable Read Isolation Level "transaction start time"

On 9/24/24 05:59, Ron Johnson wrote:

On Tue, Sep 24, 2024 at 8:29 AM David G. Johnston
<david.g.johnston@gmail.com <mailto:david.g.johnston@gmail.com>> wrote:

On Monday, September 23, 2024, Wizard Brony <wizardbrony@gmail.com
<mailto:wizardbrony@gmail.com>> wrote:

https://www.postgresql.org/docs/16/transaction-iso.html#XACT-REPEATABLE-READ <https://www.postgresql.org/docs/16/transaction-iso.html#XACT-REPEATABLE-READ&gt;

The PostgreSQL documentation for the Repeatable Read Isolation
Level states the following:

“UPDATE, DELETE, MERGE, SELECT FOR UPDATE, and SELECT FOR SHARE
commands behave the same as SELECT in terms of searching for
target rows: they will only find target rows that were committed
as of the transaction start time.”

What is defined as the "transaction start time?" When I first
read the statement, I interpreted it as the start of the
transaction:

BEGIN;
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;

But in my testing, I find that according to that statement, the
transaction start time is actually "the start of the first
non-transaction-control statement in the transaction" (as
mentioned earlier in the section). Is my conclusion correct, or
am I misunderstanding the documentation?

Probably, since indeed the transaction cannot start at begin because
once it does start it cannot be modified.

Huh?

BEGIN;
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;

I read it as the transaction does not start at BEGIN because if it did
you could not SET TRANSACTION to change it's characteristics.

The docs go into more detail:

https://www.postgresql.org/docs/current/sql-set-transaction.html

The transaction isolation level cannot be changed after the first query
or data-modification statement (SELECT, INSERT, DELETE, UPDATE, MERGE,
FETCH, or COPY) of a transaction has been executed.

So:

begin ;
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
SET
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
SET
select * from csv_test ;
[...]
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
ERROR: SET TRANSACTION ISOLATION LEVEL must be called before any query

--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> crustacean!

--
Adrian Klaver
adrian.klaver@aklaver.com

#6Ron
ronljohnsonjr@gmail.com
In reply to: Adrian Klaver (#5)
Re: Repeatable Read Isolation Level "transaction start time"

On Tue, Sep 24, 2024 at 12:06 PM Adrian Klaver <adrian.klaver@aklaver.com>
wrote:

On 9/24/24 05:59, Ron Johnson wrote:

On Tue, Sep 24, 2024 at 8:29 AM David G. Johnston
<david.g.johnston@gmail.com <mailto:david.g.johnston@gmail.com>> wrote:

On Monday, September 23, 2024, Wizard Brony <wizardbrony@gmail.com
<mailto:wizardbrony@gmail.com>> wrote:

https://www.postgresql.org/docs/16/transaction-iso.html#XACT-REPEATABLE-READ
<
https://www.postgresql.org/docs/16/transaction-iso.html#XACT-REPEATABLE-READ

The PostgreSQL documentation for the Repeatable Read Isolation
Level states the following:

“UPDATE, DELETE, MERGE, SELECT FOR UPDATE, and SELECT FOR SHARE
commands behave the same as SELECT in terms of searching for
target rows: they will only find target rows that were committed
as of the transaction start time.”

What is defined as the "transaction start time?" When I first
read the statement, I interpreted it as the start of the
transaction:

BEGIN;
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;

But in my testing, I find that according to that statement, the
transaction start time is actually "the start of the first
non-transaction-control statement in the transaction" (as
mentioned earlier in the section). Is my conclusion correct, or
am I misunderstanding the documentation?

Probably, since indeed the transaction cannot start at begin because
once it does start it cannot be modified.

Huh?

BEGIN;
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;

I read it as the transaction does not start at BEGIN because if it did
you could not SET TRANSACTION to change it's characteristics.

The docs go into more detail:

https://www.postgresql.org/docs/current/sql-set-transaction.html

The transaction isolation level cannot be changed after the first query
or data-modification statement (SELECT, INSERT, DELETE, UPDATE, MERGE,
FETCH, or COPY) of a transaction has been executed.

So:

begin ;
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
SET
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
SET
select * from csv_test ;
[...]
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
ERROR: SET TRANSACTION ISOLATION LEVEL must be called before any query

Makes sense. Never would have occurred to me to try and change the
isolation level using a second SET TRANSACTION statement, though.

--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> crustacean!

#7Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Ron (#6)
Re: Repeatable Read Isolation Level "transaction start time"

On 9/24/24 09:12, Ron Johnson wrote:

Makes sense.  Never would have occurred to me to try and change the
isolation level using a second SET TRANSACTION statement, though.

From the docs:

https://www.postgresql.org/docs/current/sql-set-transaction.html

The SET TRANSACTION command sets the characteristics of the current
transaction. It has no effect on any subsequent transactions. SET
SESSION CHARACTERISTICS sets the default transaction characteristics for
subsequent transactions of a session. These defaults can be overridden
by SET TRANSACTION for an individual transaction.

--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> crustacean!

--
Adrian Klaver
adrian.klaver@aklaver.com

#8Greg Sabino Mullane
greg@turnstep.com
In reply to: Tom Lane (#4)
Re: Repeatable Read Isolation Level "transaction start time"

On Tue, Sep 24, 2024 at 10:28 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

It's even looser than that, really: it's the first statement that
requires an MVCC snapshot.

Hm....so why does "SELECT 1;" work as a transaction start marker then, as
opposed to "SHOW work_mem;", which does not? Do we simply consider anything
with a SELECT as needing a snapshot?

Cheers,
Greg

#9Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Greg Sabino Mullane (#8)
Re: Repeatable Read Isolation Level "transaction start time"

On 9/25/24 10:22 AM, Greg Sabino Mullane wrote:

On Tue, Sep 24, 2024 at 10:28 AM Tom Lane <tgl@sss.pgh.pa.us
<mailto:tgl@sss.pgh.pa.us>> wrote:

It's even looser than that, really: it's the first statement that
requires an MVCC snapshot.

Hm....so why does "SELECT 1;" work as a transaction start marker then,
as opposed to "SHOW work_mem;", which does not? Do we simply consider
anything with a SELECT as needing a snapshot?

SELECT some_func();

Where some_func() does something that requires a snapshot.

Cheers,
Greg

--
Adrian Klaver
adrian.klaver@aklaver.com

#10Ron
ronljohnsonjr@gmail.com
In reply to: Adrian Klaver (#9)
Re: Repeatable Read Isolation Level "transaction start time"

On Wed, Sep 25, 2024 at 1:45 PM Adrian Klaver <adrian.klaver@aklaver.com>
wrote:

On 9/25/24 10:22 AM, Greg Sabino Mullane wrote:

On Tue, Sep 24, 2024 at 10:28 AM Tom Lane <tgl@sss.pgh.pa.us
<mailto:tgl@sss.pgh.pa.us>> wrote:

It's even looser than that, really: it's the first statement that
requires an MVCC snapshot.

Hm....so why does "SELECT 1;" work as a transaction start marker then,
as opposed to "SHOW work_mem;", which does not? Do we simply consider
anything with a SELECT as needing a snapshot?

SELECT some_func();

Where some_func() does something that requires a snapshot.

But why does "SELECT 1;" need a snapshot? Heck, why does "SELECT
<immutable>;" need a snapshot?

--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> crustacean!

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Ron (#10)
Re: Repeatable Read Isolation Level "transaction start time"

Ron Johnson <ronljohnsonjr@gmail.com> writes:

But why does "SELECT 1;" need a snapshot? Heck, why does "SELECT
<immutable>;" need a snapshot?

Because we're not going to analyze the statement in the amount of
depth needed to make that distinction before we crank up the
transactional machinery. If it says SELECT, it gets a snapshot.

regards, tom lane

#12Ron
ronljohnsonjr@gmail.com
In reply to: Tom Lane (#11)
Re: Repeatable Read Isolation Level "transaction start time"

On Wed, Sep 25, 2024 at 1:53 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Ron Johnson <ronljohnsonjr@gmail.com> writes:

But why does "SELECT 1;" need a snapshot? Heck, why does "SELECT
<immutable>;" need a snapshot?

Because we're not going to analyze the statement in the amount of
depth needed to make that distinction before we crank up the
transactional machinery. If it says SELECT, it gets a snapshot.

Perfectly reasonable.

--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> crustacean!

#13Greg Sabino Mullane
greg@turnstep.com
In reply to: Tom Lane (#11)
Re: Repeatable Read Isolation Level "transaction start time"

On Wed, Sep 25, 2024 at 1:53 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Because we're not going to analyze the statement in the amount of depth
needed to make that distinction before we crank up the
transactional machinery. If it says SELECT, it gets a snapshot.

Ok, thanks. So to the original poster's point, perhaps the path with the
least side effects / best Principle of Least Surprise (POLS) support is to
start the transaction, and immediately call a "SELECT 1;" or perhaps better
still, a 'SELECT timeofday();'

Cheers,
Greg

#14Ron
ronljohnsonjr@gmail.com
In reply to: Greg Sabino Mullane (#13)
Re: Repeatable Read Isolation Level "transaction start time"

On Wed, Sep 25, 2024 at 4:23 PM Greg Sabino Mullane <htamfids@gmail.com>
wrote:

On Wed, Sep 25, 2024 at 1:53 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Because we're not going to analyze the statement in the amount of depth
needed to make that distinction before we crank up the
transactional machinery. If it says SELECT, it gets a snapshot.

Ok, thanks. So to the original poster's point, perhaps the path with the
least side effects / best Principle of Least Surprise (POLS) support is to
start the transaction, and immediately call a "SELECT 1;" or perhaps better
still, a 'SELECT timeofday();'

Since transactions should be "as short as possible, without being too
short", how much time is there between when you run "BEGIN;" and the first
"work statement"?

--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> crustacean!

#15Greg Sabino Mullane
greg@turnstep.com
In reply to: Ron (#14)
Re: Repeatable Read Isolation Level "transaction start time"

Since transactions should be "as short as possible, without being too

short", how much time is there between when you run "BEGIN;" and the first
"work statement"?

I don't know that it really matters. For something automated, it would be a
few milliseconds. Either way, I'm sure most people/apps already think of
the initial 'BEGIN ...' as the start of the transaction, and act
accordingly.

Maybe long-term something like

BEGIN ISOLATION MODE REPEATABLE READ SNAPSHOT NOW;

could be useful.

Cheers,
Greg

#16Christophe Pettus
xof@thebuild.com
In reply to: Greg Sabino Mullane (#15)
Re: Repeatable Read Isolation Level "transaction start time"

On Sep 25, 2024, at 13:49, Greg Sabino Mullane <htamfids@gmail.com> wrote:
BEGIN ISOLATION MODE REPEATABLE READ SNAPSHOT NOW;

This might well be a failure of imagination on my part, but when would it pragmatically matter that the snapshot is taken at the first statement as opposed to at BEGIN?

#17Ron
ronljohnsonjr@gmail.com
In reply to: Greg Sabino Mullane (#15)
Re: Repeatable Read Isolation Level "transaction start time"

On Wed, Sep 25, 2024 at 4:50 PM Greg Sabino Mullane <htamfids@gmail.com>
wrote:

Since transactions should be "as short as possible, without being too

short", how much time is there between when you run "BEGIN;" and the first
"work statement"?

I don't know that it really matters. For something automated, it would be
a few milliseconds.

That's what I'm thinking, too. It might cause a problem if you're typing
transaction commands in between drinking coffee and poking around other
PgAdmin tabs, but that's *your* fault, not PG's fault.

Either way, I'm sure most people/apps already think of the initial 'BEGIN
...' as the start of the transaction, and act accordingly.

Maybe long-term something like

BEGIN ISOLATION MODE REPEATABLE READ SNAPSHOT NOW;

Without the "NOW", that's essentially the command used by the legacy rdbms
which I used to work on.

I'm trying to remember, though, if "SET TRANSACTION READ WRITE RESERVING
foo FOR <isolation level>;" (it's syntax for beginning a transaction)
started the transaction, or waited until an "action" statement. Been too
long.

--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> crustacean!

#18Greg Sabino Mullane
greg@turnstep.com
In reply to: Christophe Pettus (#16)
Re: Repeatable Read Isolation Level "transaction start time"

On Wed, Sep 25, 2024 at 4:54 PM Christophe Pettus <xof@thebuild.com> wrote:

On Sep 25, 2024, at 13:49, Greg Sabino Mullane <htamfids@gmail.com> wrote:

BEGIN ISOLATION MODE REPEATABLE READ SNAPSHOT NOW;

This might well be a failure of imagination on my part, but when would it
pragmatically matter that the snapshot is taken at the first statement as
opposed to at BEGIN?

I could imagine lots of cases where you know something is about to happen
(say, a major delete), and you want to get a snapshot of the database as it
existed just before that point. Many people will (quite understandably)
assume that a BEGIN ISOLATION MODE <non read committed>; command would do
just that, and be quite surprised to find that when they actually query the
table in that first process, the rows are not there.

It's certainly a non-intuitive behavior. I understand why we do it this
way, but perhaps this warrants a stronger warning in the docs at least?
It's too late in the day for me to tackle that now, but I'll throw it out
there.

Cheers,
Greg

#19Peter J. Holzer
hjp-pgsql@hjp.at
In reply to: Christophe Pettus (#16)
Re: Repeatable Read Isolation Level "transaction start time"

On 2024-09-25 13:53:30 -0700, Christophe Pettus wrote:

On Sep 25, 2024, at 13:49, Greg Sabino Mullane <htamfids@gmail.com> wrote:
BEGIN ISOLATION MODE REPEATABLE READ SNAPSHOT NOW;

This might well be a failure of imagination on my part, but when would
it pragmatically matter that the snapshot is taken at the first
statement as opposed to at BEGIN?

It may make a difference if you're comparing timestamps.

For example, if you're using isolation level REPEATABLE READ and
(mistakenly) assume that the snapshot is taken at BEGIN, you would
expect any transaction_timestamp() written by a different transaction
and readable by this transaction to be earlier than the
transaction_timestamp() of this transaction.

But that's wrong because the other transaction could have happened
entirely in the time between your BEGIN and the statement which actually
triggers the snapshot.

hp

--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"

#20Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter J. Holzer (#19)
Re: Repeatable Read Isolation Level "transaction start time"

"Peter J. Holzer" <hjp-pgsql@hjp.at> writes:

On 2024-09-25 13:53:30 -0700, Christophe Pettus wrote:

This might well be a failure of imagination on my part, but when would
it pragmatically matter that the snapshot is taken at the first
statement as opposed to at BEGIN?

It may make a difference if you're comparing timestamps.

For example, if you're using isolation level REPEATABLE READ and
(mistakenly) assume that the snapshot is taken at BEGIN, you would
expect any transaction_timestamp() written by a different transaction
and readable by this transaction to be earlier than the
transaction_timestamp() of this transaction.

But that's wrong because the other transaction could have happened
entirely in the time between your BEGIN and the statement which actually
triggers the snapshot.

I don't find that hugely compelling, because there's always going
to be some skew between the time we read the clock for the timestamp
and the time we obtain the snapshot. Admittedly, that would normally
not be a very long interval if BEGIN did both things ... but on a
busy system you could lose the CPU for awhile in between.

regards, tom lane

#21Peter J. Holzer
hjp-pgsql@hjp.at
In reply to: Tom Lane (#20)
#22Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Peter J. Holzer (#21)
#23Greg Sabino Mullane
greg@turnstep.com
In reply to: Adrian Klaver (#22)
#24Tom Lane
tgl@sss.pgh.pa.us
In reply to: Greg Sabino Mullane (#23)
#25Peter J. Holzer
hjp-pgsql@hjp.at
In reply to: Adrian Klaver (#22)
#26Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter J. Holzer (#25)
#27Peter J. Holzer
hjp-pgsql@hjp.at
In reply to: Tom Lane (#26)
#28Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Peter J. Holzer (#27)
#29Peter J. Holzer
hjp-pgsql@hjp.at
In reply to: Adrian Klaver (#28)
#30Greg Sabino Mullane
greg@turnstep.com
In reply to: Tom Lane (#26)