Review: Hot standby
I think we should avoid the #define's like below which uses a local
variable. I guess the same #define is used elsewhere in the code as well. If
the code is rearranged or the variable name is changed, the code would
break.
The use of malloc should also be avoided (unless the memory subsystem is not
up yet; I haven't checked).
***************
*** 706,713 ****
(errcode(ERRCODE_OUT_OF_MEMORY),
errmsg("out of memory")));
Assert(snapshot->subxip == NULL);
snapshot->subxip = (TransactionId *)
! malloc(arrayP->maxProcs * PGPROC_MAX_CACHED_SUBXIDS *
sizeof(TransactionId));
if (snapshot->subxip == NULL)
ereport(ERROR,
(errcode(ERRCODE_OUT_OF_MEMORY),
--- 1003,1011 ----
(errcode(ERRCODE_OUT_OF_MEMORY),
errmsg("out of memory")));
Assert(snapshot->subxip == NULL);
+ #define maxNumSubXids (arrayP->maxProcs * PGPROC_MAX_CACHED_SUBXIDS)
snapshot->subxip = (TransactionId *)
! malloc(maxNumSubXids * sizeof(TransactionId));
if (snapshot->subxip == NULL)
ereport(ERROR,
(errcode(ERRCODE_OUT_OF_MEMORY),
Thanks,
Pavan
--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com
I wonder if there is corner case below where there are no WAL records to
replay during standby recovery. Specifically, that may cause
IsRunningXactDataValid() to return false since latestObservedXid remains set
to InvalidTransactionId and that prevents postmaster from serving read-only
clients.
I don't have a test case, but I recall seeing the issue while experimenting.
Though that could be because of the WALInsertLock issue reported earlier.
/*
* Can we signal Postmaster to enter consistent recovery
mode?
*
* There are two points in the log that we must pass. The
first
* is minRecoveryPoint, which is the LSN at the time the
* base backup was taken that we are about to rollfoward
from.
* If recovery has ever crashed or was stopped there is also
* another point also: minSafeStartPoint, which we know the
* latest LSN that recovery could have reached prior to
crash.
*
* We must also have assembled sufficient information about
* transaction state to allow valid snapshots to be taken.
*/
if (!reachedSafeStartPoint &&
IsRunningXactDataValid() &&
XLByteLE(ControlFile->minSafeStartPoint, EndRecPtr) &&
XLByteLE(ControlFile->minRecoveryPoint, EndRecPtr))
{
reachedSafeStartPoint = true;
if (InArchiveRecovery)
{
ereport(LOG,
(errmsg("database has now reached consistent
state at %X/%X",
EndRecPtr.xlogid, EndRecPtr.xrecoff)));
InitRecoveryTransactionEnvironment();
StartCleanupDelayStats();
if (IsUnderPostmaster)
SendPostmasterSignal(PMSIGNAL_RECOVERY_START);
}
}
Thanks,
Pavan
--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com
+ /*
+ * Release locks, if any.
+ */
+ RelationReleaseRecoveryLocks(xlrec->slotId);
If I am reading the patch correctly, AccessExclusiveLocks acquired by a
transaction will be released when the transaction is committed or aborted.
If the transaction errors out with FATAL, the locks will be released when
the next transaction occupying the same slot is committed/aborted.
I smell some sort of deadlock condition here. What if the following events
happen:
- transaction A (slot 1) starts and acquires AEL lock on relation
- transaction A errors out with FATAL error
- transaction B (slot 1) starts and requests AEL lock on the same relation
Won't B deadlock with A ? Since B hasn't yet committed/aborted, the lock
held by A is not released and
relation_redo_lock() would indefinitely wait for the lock.
Thanks,
Pavan
--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com
On Fri, 2008-11-21 at 13:22 +0530, Pavan Deolasee wrote:
I think we should avoid the #define's like below which uses a local
variable. I guess the same #define is used elsewhere in the code as
well. If the code is rearranged or the variable name is changed, the
code would break.
I use a #define because same value is used elsewhere in code.
The use of malloc should also be avoided (unless the memory subsystem
is not up yet; I haven't checked).
The malloc was part of the existing code, explained by comments.
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support
On Fri, 2008-11-21 at 15:21 +0530, Pavan Deolasee wrote:
+ /* + * Release locks, if any. + */ + RelationReleaseRecoveryLocks(xlrec->slotId);If I am reading the patch correctly, AccessExclusiveLocks acquired by
a transaction will be released when the transaction is committed or
aborted. If the transaction errors out with FATAL, the locks will be
released when the next transaction occupying the same slot is
committed/aborted.I smell some sort of deadlock condition here. What if the following
events happen:- transaction A (slot 1) starts and acquires AEL lock on relation
- transaction A errors out with FATAL error
- transaction B (slot 1) starts and requests AEL lock on the same
relationWon't B deadlock with A ? Since B hasn't yet committed/aborted, the
lock held by A is not released and
relation_redo_lock() would indefinitely wait for the lock.
This won't happen because the lock is held by the startup process on
behalf of slot 1. Explained in comments in inval.c code.
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support
On Fri, Nov 21, 2008 at 6:59 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
The malloc was part of the existing code, explained by comments.
Oh I see. But I don't see any explanations for using malloc instead of
palloc. Not that the current patch is responsible for this, I am wondering
why its done that way and if we are freeing the malloced memory at all ?
malloc is used at another place in a new code. Although it seems that the
allocation happens just once, please check if its better to use palloc
there.
Thanks,
Pavan
--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com
On Sat, 2008-11-22 at 15:14 +0530, Pavan Deolasee wrote:
The malloc was part of the existing code, explained by
comments.Oh I see. But I don't see any explanations for using malloc instead of
palloc. Not that the current patch is responsible for this, I am
wondering why its done that way and if we are freeing the malloced
memory at all ?malloc is used at another place in a new code. Although it seems that
the allocation happens just once, please check if its better to use
palloc there.
Thanks for your comments. OK, here's my thoughts after checking.
The malloc in the current code occurs within GetSnapshotData where it
exists to optimise memory allocation. The memory is allocated once and
never released, so malloc is appropriate.
The patch adds another use of malloc in GetRunningTransactionData()
which is similar to GetSnapshotData in many ways. It only ever runs
within bgwriter, and again, once allocated the memory is never released.
So malloc is appropriate there also.
I will change the #define as you suggested earlier.
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support
Pavan Deolasee escribi�:
On Fri, Nov 21, 2008 at 6:59 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
The malloc was part of the existing code, explained by comments.
Oh I see. But I don't see any explanations for using malloc instead of
palloc. Not that the current patch is responsible for this, I am wondering
why its done that way and if we are freeing the malloced memory at all ?
It's an optimization. We don't ever free it -- we alloc it once (the
first time the snapshot is taken) and then the allocated space is reused
until the backend dies. The reason for not using palloc is that if
you're not going to do any context-related management, what would be the
point? We save the palloc overhead this way (admittedly not a lot).
--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.
On Fri, 2008-11-21 at 14:17 +0530, Pavan Deolasee wrote:
I wonder if there is corner case
I remain on the lookout for these, so thanks for thinking of this.
below where there are no WAL records to replay during standby
recovery. Specifically, that may cause IsRunningXactDataValid() to
return false since latestObservedXid remains set to
InvalidTransactionId and that prevents postmaster from serving
read-only clients.
I don't think so, because this section of code is only called if there
is redo to perform. If no redo, we never get here.
There certainly is no guarantee that the correct conditions will ever
exist to allow read-only access. If snapshots overflow consistently over
a long period then the correct state will never be reached. That is very
unlikely, but possible. I could prevent the delay, but then the code
would so seldom run that it would probably have bugs, so it didn't seem
worth trying to plug the gap.
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support
On Sun, Nov 23, 2008 at 3:12 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
I don't think so, because this section of code is only called if there
is redo to perform. If no redo, we never get here.
OK. But why not open up the database for read-only access when there is no
redo action to be done ? Or am I missing something trivial ?
Anyways, I will wait for your latest version to continue with the
test/review.
Thanks,
Pavan
--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com
The following sequence of commands causes server crash.
postgres=# BEGIN TRANSACTION READ WRITE ;
BEGIN
postgres=# SELECT * FROM pg_class FOR UPDATE;
FATAL: cannot assign TransactionIds during recovery
STATEMENT: SELECT * FROM pg_class FOR UPDATE;
FATAL: cannot assign TransactionIds during recovery
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
I think we should disallow starting a read-write transaction during
recovery. The patch attempts to do that by setting transaction mode to
read-only, but not enough to prevent somebody to explicitly mark the
transaction as read-write.
Thanks,
Pavan
--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com
"Pavan Deolasee" <pavan.deolasee@gmail.com> writes:
I think we should disallow starting a read-write transaction during
recovery. The patch attempts to do that by setting transaction mode to
read-only, but not enough to prevent somebody to explicitly mark the
transaction as read-write.
Huh? The "read only" transaction mode is not hard read-only anyway,
so if that's the only step being taken, it's entirely useless.
regards, tom lane
On Tue, Nov 25, 2008 at 6:55 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Huh? The "read only" transaction mode is not hard read-only anyway,
so if that's the only step being taken, it's entirely useless.
I think there are explicit checks for some utility statements (like VACUUM),
but I haven't checked if all necessary code paths are covered or not.
Thanks,
Pavan
--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com
On Tue, 2008-11-25 at 19:02 +0530, Pavan Deolasee wrote:
On Tue, Nov 25, 2008 at 6:55 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Huh? The "read only" transaction mode is not hard read-only
anyway,
so if that's the only step being taken, it's entirely useless.I think there are explicit checks for some utility statements (like
VACUUM), but I haven't checked if all necessary code paths are covered
or not.
The commands that need protecting have been explicitly identified in the
notes and there are 7 files changed that were specifically identified
with protective changes.
You've identified a way of breaking part the first line of defence, but
the command was caught by the second line of defence in the patch.
Problem, yes. Major issue, no. Will fix.
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support
ISTM that the redo conflict resolution is not working as intended. I did the
following test and it throws some surprises.
On standby:
postgres=# BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE ;
BEGIN
postgres=# SELECT * from test;
a | b
-----+---
102 |
103 |
(2 rows)
On primary:
postgres=# SELECT * from test;
a | b
-----+---
102 |
103 |
(2 rows)
postgres=#
postgres=# UPDATE test SET a = a + 100;
UPDATE 2
postgres=# VACUUM test;
VACUUM
postgres=# SELECT pg_switch_xlog();
pg_switch_xlog
----------------
0/2D000288
(1 row)
On standby (server log):
LOG: restored log file "00000001000000000000002D" from archive
LOG: recovery cancels activity of virtual transaction 2/2 pid 10593 because
it blocks exclusive locks (current delay now 5 secs)
CONTEXT: xlog redo exclusive relation lock: slot 99 db 11517 rel 24576
LOG: recovery cancels activity of virtual transaction 2/2 pid 10593 because
it blocks exclusive locks (current delay now 5 secs)
CONTEXT: xlog redo exclusive relation lock: slot 99 db 11517 rel 24576
LOG: recovery cancels activity of virtual transaction 2/2 pid 10593 because
it blocks exclusive locks (current delay now 5 secs)
CONTEXT: xlog redo exclusive relation lock: slot 99 db 11517 rel 24576
LOG: recovery cancels activity of virtual transaction 2/2 pid 10593 because
it blocks exclusive locks (current delay now 5 secs)
CONTEXT: xlog redo exclusive relation lock: slot 99 db 11517 rel 24576
LOG: recovery cancels activity of virtual transaction 2/2 pid 10593 because
it blocks exclusive locks (current delay now 5 secs)
CONTEXT: xlog redo exclusive relation lock: slot 99 db 11517 rel 24576
LOG: recovery cancels activity of virtual transaction 2/2 pid 10593 because
it blocks exclusive locks (current delay now 5 secs)
CONTEXT: xlog redo exclusive relation lock: slot 99 db 11517 rel 24576
LOG: recovery cancels activity of virtual transaction 2/2 pid 10593 because
it blocks exclusive locks (current delay now 5 secs)
CONTEXT: xlog redo exclusive relation lock: slot 99 db 11517 rel 24576
<same message repeated>
The open transaction (see above) on the standby is not still not aborted and
if I query the table in the same transaction, I get:
(Note: the transaction is still open)
postgres=#
postgres=# SELECT * from test;
a | b
---+---
(0 rows)
I think whats happening is that ResolveRecoveryConflictWithVirtualXIDs() is
failing to abort the open transaction and it keeps trying for that,
everytime doubling the sleep time, so the LOG messages come less frequently
later, but they are never ending. Soon the sleep becomes exponentially
large.
Even though the standby has a open transaction, its obvious that the
cleanup_redo has also failed to abort the transaction. Thats why the tuples
have disappeared from the standby (most likely because they are cleaned up
by VACUUM).
Thanks,
Pavan
--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com
On Wed, Nov 26, 2008 at 3:52 PM, Pavan Deolasee <pavan.deolasee@gmail.com>wrote:
I think whats happening is that ResolveRecoveryConflictWithVirtualXIDs() is
failing to abort the open transaction
Btw, ISTM that SIGINT works only for statement cancellation. So if the
transaction is in idle state, SIGINT has nothing to cancel and hence also
fails to abort the transaction.
Thanks,
Pavan
--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com
On Wed, 2008-11-26 at 18:02 +0530, Pavan Deolasee wrote:
I think whats happening is that
ResolveRecoveryConflictWithVirtualXIDs() is failing to abort
the open transactionBtw, ISTM that SIGINT works only for statement cancellation. So if the
transaction is in idle state, SIGINT has nothing to cancel and hence
also fails to abort the transaction.
[If I read this correctly this second post is the cause of the first
problem, so we have one problem, rather than two.]
Understood; yes that seems to be a problem.
I will propose a solution later today. (I've been tied up with a few
things over last few days, but I'm free of that now).
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support
On Thu, 2008-11-27 at 17:14 +0000, Simon Riggs wrote:
On Wed, 2008-11-26 at 18:02 +0530, Pavan Deolasee wrote:
I think whats happening is that
ResolveRecoveryConflictWithVirtualXIDs() is failing to abort
the open transactionBtw, ISTM that SIGINT works only for statement cancellation. So if the
transaction is in idle state, SIGINT has nothing to cancel and hence
also fails to abort the transaction.[If I read this correctly this second post is the cause of the first
problem, so we have one problem, rather than two.]Understood; yes that seems to be a problem.
After some thought, the way I would handle this is by sending a slightly
different kind of signal.
We can send a shared invalidation message which means "end the
transaction, whether or not you are currently running a statement". We
would then send the backend a SIGUSR1 to ensure that it reads the shared
invalidation message as soon as possible. This is easily possible with
the new sinval processing for 8.4.
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support
Simon Riggs <simon@2ndQuadrant.com> writes:
After some thought, the way I would handle this is by sending a slightly
different kind of signal.
We can send a shared invalidation message which means "end the
transaction, whether or not you are currently running a statement".
No, a thousand times no. The sinval queue is an *utterly* inappropriate
mechanism for such a thing.
regards, tom lane
On Fri, 2008-11-28 at 11:14 -0500, Tom Lane wrote:
Simon Riggs <simon@2ndQuadrant.com> writes:
After some thought, the way I would handle this is by sending a slightly
different kind of signal.We can send a shared invalidation message which means "end the
transaction, whether or not you are currently running a statement".No, a thousand times no.
So you're against it? ;-)
The sinval queue is an *utterly* inappropriate
mechanism for such a thing.
To be honest, it did seem quite a neat solution. Any particular
direction of thought you'd like me to pursue instead?
Asking the backend to kill itself is much cleaner than the other ways I
imagined. So my other thoughts steer towards hijacking the SIGUSR1
signal somehow for my nefarious purposes. Would that way sound OK?
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support
Simon Riggs <simon@2ndQuadrant.com> writes:
On Fri, 2008-11-28 at 11:14 -0500, Tom Lane wrote:
The sinval queue is an *utterly* inappropriate
mechanism for such a thing.
To be honest, it did seem quite a neat solution. Any particular
direction of thought you'd like me to pursue instead?
I hadn't been following the discussion closely enough to know what the
problem is. But "cancel the current transaction" is far outside the
bounds of what sinval events are supposed to do. If you try to do that
we'll forever be fighting bugs in code that expected
AcceptInvalidationMessages to do no more than invalidate cache entries.
regards, tom lane
On Fri, 2008-11-28 at 11:44 -0500, Tom Lane wrote:
Simon Riggs <simon@2ndQuadrant.com> writes:
On Fri, 2008-11-28 at 11:14 -0500, Tom Lane wrote:
The sinval queue is an *utterly* inappropriate
mechanism for such a thing.To be honest, it did seem quite a neat solution. Any particular
direction of thought you'd like me to pursue instead?I hadn't been following the discussion closely enough to know what the
problem is.
When we replay an AccessExclusiveLock on the standby we need to kick off
any current lock holders, after a configurable grace period. Current
lock holders may include some read-only backends that are
idle-in-transaction. SIGINT, which is what the current patch uses, is
not sufficient to dislodge the idle backends.
So we need to send a signal to the idle backends and then have them
react. We could use a multi-meaning approach for SIGUSR1 as we do for
pmsignal, or ...
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support
Simon Riggs <simon@2ndQuadrant.com> writes:
On Fri, 2008-11-28 at 11:44 -0500, Tom Lane wrote:
I hadn't been following the discussion closely enough to know what the
problem is.
When we replay an AccessExclusiveLock on the standby we need to kick off
any current lock holders, after a configurable grace period. Current
lock holders may include some read-only backends that are
idle-in-transaction. SIGINT, which is what the current patch uses, is
not sufficient to dislodge the idle backends.
Hm. People have complained of that fact from time to time in normal
usage as well. Should we simply change SIGINT handling to allow it to
cancel an idle transaction?
regards, tom lane
On Fri, 2008-11-28 at 12:45 -0500, Tom Lane wrote:
Simon Riggs <simon@2ndQuadrant.com> writes:
On Fri, 2008-11-28 at 11:44 -0500, Tom Lane wrote:
I hadn't been following the discussion closely enough to know what the
problem is.When we replay an AccessExclusiveLock on the standby we need to kick off
any current lock holders, after a configurable grace period. Current
lock holders may include some read-only backends that are
idle-in-transaction. SIGINT, which is what the current patch uses, is
not sufficient to dislodge the idle backends.Hm. People have complained of that fact from time to time in normal
usage as well. Should we simply change SIGINT handling to allow it to
cancel an idle transaction?
Yes, that is by far the best solution. ISTM many people will be happy.
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support
On Fri, Nov 28, 2008 at 6:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Hm. People have complained of that fact from time to time in normal
usage as well. Should we simply change SIGINT handling to allow it to
cancel an idle transaction?
If this means that we would be able to able to easily rollback the
current transaction of a backend stuck in "idle in transaction"
status, a big +1.
--
Guillaume
Guillaume Smet wrote:
On Fri, Nov 28, 2008 at 6:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Hm. People have complained of that fact from time to time in normal
usage as well. Should we simply change SIGINT handling to allow it to
cancel an idle transaction?If this means that we would be able to able to easily rollback the
current transaction of a backend stuck in "idle in transaction"
status, a big +1.
Yes, I have had clients caught by this regularly (last time less than a
week ago), it would be a significant enhancement.
cheers
andrew
On Fri, 2008-11-28 at 12:45 -0500, Tom Lane wrote:
Simon Riggs <simon@2ndQuadrant.com> writes:
On Fri, 2008-11-28 at 11:44 -0500, Tom Lane wrote:
I hadn't been following the discussion closely enough to know what the
problem is.When we replay an AccessExclusiveLock on the standby we need to kick off
any current lock holders, after a configurable grace period. Current
lock holders may include some read-only backends that are
idle-in-transaction. SIGINT, which is what the current patch uses, is
not sufficient to dislodge the idle backends.Hm. People have complained of that fact from time to time in normal
usage as well. Should we simply change SIGINT handling to allow it to
cancel an idle transaction?
I'm looking at allowing SIGINT cancel an idle transaction.
Just edit StatementCancelHandler() in postgres.c, so that it doesn't
ignore a signal when DoingCommandRead at line 2577.
This patch does actually do what I wanted, but it has some unintended
consequences as well. These mask the fact that it does actually work,
which is confusing and has taken me a while to understand.
The backend accepts the signal and throws an error which then cancels
the transaction. The ERROR appears in the log immediately. However, a
psql client does not respond in any way when this occurs and only when a
new request is sent do we then generate the ERROR message on the client.
pg_stat_activity continues to show "<IDLE> in transaction", even after
global xmin is higher than the xid of the cancelled backend.
Then afterwards the client gets out of sync with the server and starts
putting replies on the wrong messages. Wow.
I'm not sure why recv() doesn't return with EINTR, but I guess I'm about
to find out. Hopefully?
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support
Attachments:
idle_sigint.patchtext/x-patch; charset=utf-8; name=idle_sigint.patchDownload
Index: src/backend/tcop/postgres.c
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/backend/tcop/postgres.c,v
retrieving revision 1.562
diff -c -r1.562 postgres.c
*** src/backend/tcop/postgres.c 13 Dec 2008 02:29:21 -0000 1.562
--- src/backend/tcop/postgres.c 16 Dec 2008 17:02:05 -0000
***************
*** 2569,2580 ****
QueryCancelPending = true;
/*
! * If it's safe to interrupt, and we're waiting for a lock, service
! * the interrupt immediately. No point in interrupting if we're
! * waiting for input, however.
*/
! if (ImmediateInterruptOK && InterruptHoldoffCount == 0 &&
! CritSectionCount == 0 && !DoingCommandRead)
{
/* bump holdoff count to make ProcessInterrupts() a no-op */
/* until we are done getting ready for it */
--- 2569,2580 ----
QueryCancelPending = true;
/*
! * Service the interrupt immediately if we are waiting for input,
! * or if it's safe to interrupt, and we're waiting for a lock.
*/
! if (DoingCommandRead ||
! (ImmediateInterruptOK && InterruptHoldoffCount == 0 &&
! CritSectionCount == 0))
{
/* bump holdoff count to make ProcessInterrupts() a no-op */
/* until we are done getting ready for it */