pgsql: Tune GetSnapshotData() during Hot Standby by avoiding loop

Started by Nonameover 15 years ago12 messages
#1Noname
sriggs@postgresql.org

Log Message:
-----------
Tune GetSnapshotData() during Hot Standby by avoiding loop
through normal backends. Makes code clearer also, since we
avoid various Assert()s. Performance of snapshots taken
during recovery no longer depends upon number of read-only
backends.

Modified Files:
--------------
pgsql/src/backend/access/transam:
xlog.c (r1.397 -> r1.398)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/access/transam/xlog.c?r1=1.397&r2=1.398)
pgsql/src/backend/storage/ipc:
procarray.c (r1.62 -> r1.63)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/storage/ipc/procarray.c?r1=1.62&r2=1.63)
pgsql/src/include/access:
xlog.h (r1.106 -> r1.107)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/include/access/xlog.h?r1=1.106&r2=1.107)

#2Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Noname (#1)
Re: pgsql: Tune GetSnapshotData() during Hot Standby by avoiding loop

Simon Riggs wrote:

Log Message:
-----------
Tune GetSnapshotData() during Hot Standby by avoiding loop
through normal backends. Makes code clearer also, since we
avoid various Assert()s. Performance of snapshots taken
during recovery no longer depends upon number of read-only
backends.

I think there's a little race condition there.
snapshot->takenDuringRecovery is set before acquiring ProcArrayLock, so
it's possible that by the time we acquire the lock, we're no longer in
recovery. So this can happen:

1. Backend A starts to take a snapshot, while we're still in recovery.
takenDuringRecovery is assigned true.
2. Recovery ends, and a normal transaction X begins in backend B.
3. A skips scanning ProcArray because takenDuringRecovery is true.

The snapshot doesn't include X, so any changes done in that transaction
will not be visible to the snapshot while the transaction is still
running, but will be after it commits.

Of course, it's highly improbable for 2. to happen, but it's there.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#3Simon Riggs
simon@2ndQuadrant.com
In reply to: Heikki Linnakangas (#2)
Re: [COMMITTERS] pgsql: Tune GetSnapshotData() during Hot Standby by avoiding loop

On Mon, 2010-04-19 at 10:36 +0300, Heikki Linnakangas wrote:

Simon Riggs wrote:

Log Message:
-----------
Tune GetSnapshotData() during Hot Standby by avoiding loop
through normal backends. Makes code clearer also, since we
avoid various Assert()s. Performance of snapshots taken
during recovery no longer depends upon number of read-only
backends.

I think there's a little race condition there.
snapshot->takenDuringRecovery is set before acquiring ProcArrayLock, so
it's possible that by the time we acquire the lock, we're no longer in
recovery. So this can happen:

1. Backend A starts to take a snapshot, while we're still in recovery.
takenDuringRecovery is assigned true.
2. Recovery ends, and a normal transaction X begins in backend B.
3. A skips scanning ProcArray because takenDuringRecovery is true.

The snapshot doesn't include X, so any changes done in that transaction
will not be visible to the snapshot while the transaction is still
running, but will be after it commits.

Of course, it's highly improbable for 2. to happen, but it's there.

The order of events is as you say, though I don't see the problem. The
new xids will be beyond xmax and would be filtered out even if we did
scan the procs, so they will be treated as running, which they are. Xmax
will not have advanced since that relies on completed transactions, not
started ones.

--
Simon Riggs www.2ndQuadrant.com

#4Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Simon Riggs (#3)
Re: [COMMITTERS] pgsql: Tune GetSnapshotData() during Hot Standby by avoiding loop

Simon Riggs wrote:

On Mon, 2010-04-19 at 10:36 +0300, Heikki Linnakangas wrote:

Simon Riggs wrote:

Log Message:
-----------
Tune GetSnapshotData() during Hot Standby by avoiding loop
through normal backends. Makes code clearer also, since we
avoid various Assert()s. Performance of snapshots taken
during recovery no longer depends upon number of read-only
backends.

I think there's a little race condition there.
snapshot->takenDuringRecovery is set before acquiring ProcArrayLock, so
it's possible that by the time we acquire the lock, we're no longer in
recovery. So this can happen:

1. Backend A starts to take a snapshot, while we're still in recovery.
takenDuringRecovery is assigned true.
2. Recovery ends, and a normal transaction X begins in backend B.
3. A skips scanning ProcArray because takenDuringRecovery is true.

The snapshot doesn't include X, so any changes done in that transaction
will not be visible to the snapshot while the transaction is still
running, but will be after it commits.

Of course, it's highly improbable for 2. to happen, but it's there.

The order of events is as you say, though I don't see the problem. The
new xids will be beyond xmax and would be filtered out even if we did
scan the procs, so they will be treated as running, which they are. Xmax
will not have advanced since that relies on completed transactions, not
started ones.

Good point. However, it is theoretically possible that yet another
transaction starts and commits in that same window, bumping xmax.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#5Simon Riggs
simon@2ndQuadrant.com
In reply to: Heikki Linnakangas (#4)
Re: [COMMITTERS] pgsql: Tune GetSnapshotData() during Hot Standby by avoiding loop

On Mon, 2010-04-19 at 11:37 +0300, Heikki Linnakangas wrote:

Simon Riggs wrote:

On Mon, 2010-04-19 at 10:36 +0300, Heikki Linnakangas wrote:

Simon Riggs wrote:

Log Message:
-----------
Tune GetSnapshotData() during Hot Standby by avoiding loop
through normal backends. Makes code clearer also, since we
avoid various Assert()s. Performance of snapshots taken
during recovery no longer depends upon number of read-only
backends.

I think there's a little race condition there.
snapshot->takenDuringRecovery is set before acquiring ProcArrayLock, so
it's possible that by the time we acquire the lock, we're no longer in
recovery. So this can happen:

1. Backend A starts to take a snapshot, while we're still in recovery.
takenDuringRecovery is assigned true.
2. Recovery ends, and a normal transaction X begins in backend B.
3. A skips scanning ProcArray because takenDuringRecovery is true.

The snapshot doesn't include X, so any changes done in that transaction
will not be visible to the snapshot while the transaction is still
running, but will be after it commits.

Of course, it's highly improbable for 2. to happen, but it's there.

The order of events is as you say, though I don't see the problem. The
new xids will be beyond xmax and would be filtered out even if we did
scan the procs, so they will be treated as running, which they are. Xmax
will not have advanced since that relies on completed transactions, not
started ones.

Good point. However, it is theoretically possible that yet another
transaction starts and commits in that same window, bumping xmax.

If the snapshotting backend (#1) froze at the exact point between one
CPU instruction and the next, during which time recovery ends, two new
sessions connect, two new write transactions start (#2 and #3), they
begin to write data and so assign xids, then write WAL, then the #3
writes commit record into WAL, flushes disk (maybe), updates clog and
then is granted procarraylock in exclusive mode before snapshotting
backend attempts to do so, while #2 keeps running.

Yes, it does seem theoretically possible, at that one exact point in
time, never to be repeated.

It doesn't seem to be something we should place highly on the list of
events we need protection from, does it?

--
Simon Riggs www.2ndQuadrant.com

#6Robert Haas
robertmhaas@gmail.com
In reply to: Simon Riggs (#5)
Re: Re: [COMMITTERS] pgsql: Tune GetSnapshotData() during Hot Standby by avoiding loop

On Mon, Apr 19, 2010 at 5:05 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

It doesn't seem to be something we should place highly on the list of
events we need protection from, does it?

Since when do we not protect against race-conditions just because
they're low likelihood?

...Robert

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#6)
Re: Re: [COMMITTERS] pgsql: Tune GetSnapshotData() during Hot Standby by avoiding loop

Robert Haas <robertmhaas@gmail.com> writes:

On Mon, Apr 19, 2010 at 5:05 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

It doesn't seem to be something we should place highly on the list of
events we need protection from, does it?

Since when do we not protect against race-conditions just because
they're low likelihood?

Murphy's law says that the probability of any race condition happening
in the field is orders of magnitude higher than you think. This has
been proven true many times ...

regards, tom lane

#8Simon Riggs
simon@2ndQuadrant.com
In reply to: Tom Lane (#7)
Re: Re: [COMMITTERS] pgsql: Tune GetSnapshotData() during Hot Standby by avoiding loop

On Mon, 2010-04-19 at 10:24 -0400, Tom Lane wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Mon, Apr 19, 2010 at 5:05 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

It doesn't seem to be something we should place highly on the list of
events we need protection from, does it?

Since when do we not protect against race-conditions just because
they're low likelihood?

Murphy's law says that the probability of any race condition happening
in the field is orders of magnitude higher than you think. This has
been proven true many times ...

Choices are

1. Check RecoveryInProgress() once outside of lock, plus wild rumour of
Murphy

2. Check RecoveryInProgress() before and after holding lock

3. Check RecoveryInProgress() while holding lock

All of which perform better than

4. Revert patch

--
Simon Riggs www.2ndQuadrant.com

#9Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Simon Riggs (#8)
Re: Re: [COMMITTERS] pgsql: Tune GetSnapshotData() during Hot Standby by avoiding loop

Simon Riggs wrote:

On Mon, 2010-04-19 at 10:24 -0400, Tom Lane wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Mon, Apr 19, 2010 at 5:05 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

It doesn't seem to be something we should place highly on the list of
events we need protection from, does it?

Since when do we not protect against race-conditions just because
they're low likelihood?

Murphy's law says that the probability of any race condition happening
in the field is orders of magnitude higher than you think. This has
been proven true many times ...

Right. And some future code changes elsewhere could make it more likely,
by the time we've forgotten all about this.

Choices are

1. Check RecoveryInProgress() once outside of lock, plus wild rumour of
Murphy

2. Check RecoveryInProgress() before and after holding lock

3. Check RecoveryInProgress() while holding lock

4. Check RecoveryInProgress() once outside of lock, and scan the
ProcArray anyway, just in case. That's what we did before this patch.
Document that takenDuringRecovery == true means that the snapshot was
most likely taken during recovery, but there is some race conditions
where takenDuringRecovery is true even though the snapshot was taken
just after recovery finished. AFAICS all of the other current uses of
takenDuringRecovery work fine with that.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#10Simon Riggs
simon@2ndQuadrant.com
In reply to: Heikki Linnakangas (#9)
Re: Re: [COMMITTERS] pgsql: Tune GetSnapshotData() during Hot Standby by avoiding loop

On Mon, 2010-04-19 at 17:44 +0300, Heikki Linnakangas wrote:

Choices are

1. Check RecoveryInProgress() once outside of lock, plus wild rumour of
Murphy

2. Check RecoveryInProgress() before and after holding lock

3. Check RecoveryInProgress() while holding lock

4. Check RecoveryInProgress() once outside of lock, and scan the
ProcArray anyway, just in case. That's what we did before this patch.
Document that takenDuringRecovery == true means that the snapshot was
most likely taken during recovery, but there is some race conditions
where takenDuringRecovery is true even though the snapshot was taken
just after recovery finished. AFAICS all of the other current uses of
takenDuringRecovery work fine with that.

Checking RecoveryInProgress() is much cheaper than scanning the whole
ProcArray, so (4) is definitely worse than 1-3.

--
Simon Riggs www.2ndQuadrant.com

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#10)
Re: Re: [COMMITTERS] pgsql: Tune GetSnapshotData() during Hot Standby by avoiding loop

Simon Riggs <simon@2ndQuadrant.com> writes:

On Mon, 2010-04-19 at 17:44 +0300, Heikki Linnakangas wrote:

Choices are
1. Check RecoveryInProgress() once outside of lock, plus wild rumour of
Murphy
2. Check RecoveryInProgress() before and after holding lock
3. Check RecoveryInProgress() while holding lock

4. Check RecoveryInProgress() once outside of lock, and scan the
ProcArray anyway, just in case. That's what we did before this patch.
Document that takenDuringRecovery == true means that the snapshot was
most likely taken during recovery, but there is some race conditions
where takenDuringRecovery is true even though the snapshot was taken
just after recovery finished. AFAICS all of the other current uses of
takenDuringRecovery work fine with that.

Checking RecoveryInProgress() is much cheaper than scanning the whole
ProcArray, so (4) is definitely worse than 1-3.

If the lock we're talking about is an LWLock, #3 is okay. If it's a
spinlock, not so much.

regards, tom lane

#12Simon Riggs
simon@2ndQuadrant.com
In reply to: Tom Lane (#11)
1 attachment(s)
Re: Re: [COMMITTERS] pgsql: Tune GetSnapshotData() during Hot Standby by avoiding loop

On Mon, 2010-04-19 at 10:55 -0400, Tom Lane wrote:

Simon Riggs <simon@2ndQuadrant.com> writes:

On Mon, 2010-04-19 at 17:44 +0300, Heikki Linnakangas wrote:

Choices are
1. Check RecoveryInProgress() once outside of lock, plus wild rumour of
Murphy
2. Check RecoveryInProgress() before and after holding lock
3. Check RecoveryInProgress() while holding lock

4. Check RecoveryInProgress() once outside of lock, and scan the
ProcArray anyway, just in case. That's what we did before this patch.
Document that takenDuringRecovery == true means that the snapshot was
most likely taken during recovery, but there is some race conditions
where takenDuringRecovery is true even though the snapshot was taken
just after recovery finished. AFAICS all of the other current uses of
takenDuringRecovery work fine with that.

Checking RecoveryInProgress() is much cheaper than scanning the whole
ProcArray, so (4) is definitely worse than 1-3.

If the lock we're talking about is an LWLock, #3 is okay. If it's a
spinlock, not so much.

Just committed the following patch to implement option #3.

We test RecoveryInProgress() after the LWLockAcquire rather than before.

--
Simon Riggs www.2ndQuadrant.com

Attachments:

switcheroo.patchtext/x-patch; charset=UTF-8; name=switcheroo.patchDownload
*** a/src/backend/storage/ipc/procarray.c
--- b/src/backend/storage/ipc/procarray.c
***************
*** 1074,1081 **** GetSnapshotData(Snapshot snapshot)
  					 errmsg("out of memory")));
  	}
  
- 	snapshot->takenDuringRecovery = RecoveryInProgress();
- 
  	/*
  	 * It is sufficient to get shared lock on ProcArrayLock, even if we are
  	 * going to set MyProc->xmin.
--- 1074,1079 ----
***************
*** 1091,1098 **** GetSnapshotData(Snapshot snapshot)
  	globalxmin = xmin = xmax;
  
  	/*
! 	 * If in recovery get any known assigned xids.
  	 */
  	if (!snapshot->takenDuringRecovery)
  	{
  		/*
--- 1089,1103 ----
  	globalxmin = xmin = xmax;
  
  	/*
! 	 * If we're in recovery then snapshot data comes from a different place,
! 	 * so decide which route we take before grab the lock. It is possible
! 	 * for recovery to end before we finish taking snapshot, and for newly
! 	 * assigned transaction ids to be added to the procarray. Xmax cannot
! 	 * change while we hold ProcArrayLock, so those newly added transaction
! 	 * ids would be filtered away, so we need not be concerned about them.
  	 */
+ 	snapshot->takenDuringRecovery = RecoveryInProgress();
+ 
  	if (!snapshot->takenDuringRecovery)
  	{
  		/*