Review: Hot standby

Started by Pavan Deolaseeover 17 years ago27 messageshackers

pavan.deolasee@gmail.com

over 17 years ago

I think we should avoid the #define's like below which uses a local
variable. I guess the same #define is used elsewhere in the code as well. If
the code is rearranged or the variable name is changed, the code would
break.

The use of malloc should also be avoided (unless the memory subsystem is not
up yet; I haven't checked).

***************
*** 706,713 ****
                    (errcode(ERRCODE_OUT_OF_MEMORY),
                     errmsg("out of memory")));
        Assert(snapshot->subxip == NULL);
        snapshot->subxip = (TransactionId *)
!           malloc(arrayP->maxProcs * PGPROC_MAX_CACHED_SUBXIDS *
sizeof(TransactionId));
        if (snapshot->subxip == NULL)
            ereport(ERROR,
                    (errcode(ERRCODE_OUT_OF_MEMORY),
--- 1003,1011 ----
                    (errcode(ERRCODE_OUT_OF_MEMORY),
                     errmsg("out of memory")));
        Assert(snapshot->subxip == NULL);
+ #define maxNumSubXids (arrayP->maxProcs * PGPROC_MAX_CACHED_SUBXIDS)
        snapshot->subxip = (TransactionId *)
!           malloc(maxNumSubXids * sizeof(TransactionId));
        if (snapshot->subxip == NULL)
            ereport(ERROR,
                    (errcode(ERRCODE_OUT_OF_MEMORY),

Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

Pavan Deolasee

pavan.deolasee@gmail.com

over 17 years ago

In reply to: Pavan Deolasee (#1)

Re: Review: Hot standby

I wonder if there is corner case below where there are no WAL records to
replay during standby recovery. Specifically, that may cause
IsRunningXactDataValid() to return false since latestObservedXid remains set
to InvalidTransactionId and that prevents postmaster from serving read-only
clients.

I don't have a test case, but I recall seeing the issue while experimenting.
Though that could be because of the WALInsertLock issue reported earlier.

/*
* Can we signal Postmaster to enter consistent recovery
mode?
*
* There are two points in the log that we must pass. The
first
* is minRecoveryPoint, which is the LSN at the time the
* base backup was taken that we are about to rollfoward
from.
* If recovery has ever crashed or was stopped there is also
* another point also: minSafeStartPoint, which we know the
* latest LSN that recovery could have reached prior to
crash.
*
* We must also have assembled sufficient information about
* transaction state to allow valid snapshots to be taken.
*/
if (!reachedSafeStartPoint &&
IsRunningXactDataValid() &&
XLByteLE(ControlFile->minSafeStartPoint, EndRecPtr) &&
XLByteLE(ControlFile->minRecoveryPoint, EndRecPtr))
{
reachedSafeStartPoint = true;
if (InArchiveRecovery)
{
ereport(LOG,
(errmsg("database has now reached consistent
state at %X/%X",
EndRecPtr.xlogid, EndRecPtr.xrecoff)));
InitRecoveryTransactionEnvironment();
StartCleanupDelayStats();
if (IsUnderPostmaster)
SendPostmasterSignal(PMSIGNAL_RECOVERY_START);
}
}

Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

Pavan Deolasee

pavan.deolasee@gmail.com

over 17 years ago

In reply to: Pavan Deolasee (#2)

Re: Review: Hot standby

+       /*
+        * Release locks, if any.
+        */
+       RelationReleaseRecoveryLocks(xlrec->slotId);

If I am reading the patch correctly, AccessExclusiveLocks acquired by a
transaction will be released when the transaction is committed or aborted.
If the transaction errors out with FATAL, the locks will be released when
the next transaction occupying the same slot is committed/aborted.

I smell some sort of deadlock condition here. What if the following events
happen:

- transaction A (slot 1) starts and acquires AEL lock on relation
- transaction A errors out with FATAL error
- transaction B (slot 1) starts and requests AEL lock on the same relation

Won't B deadlock with A ? Since B hasn't yet committed/aborted, the lock
held by A is not released and
relation_redo_lock() would indefinitely wait for the lock.

Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

Simon Riggs

simon@2ndQuadrant.com

over 17 years ago

In reply to: Pavan Deolasee (#1)

Re: Review: Hot standby

On Fri, 2008-11-21 at 13:22 +0530, Pavan Deolasee wrote:

I think we should avoid the #define's like below which uses a local
variable. I guess the same #define is used elsewhere in the code as
well. If the code is rearranged or the variable name is changed, the
code would break.

I use a #define because same value is used elsewhere in code.

The use of malloc should also be avoided (unless the memory subsystem
is not up yet; I haven't checked).

The malloc was part of the existing code, explained by comments.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

Simon Riggs

simon@2ndQuadrant.com

over 17 years ago

In reply to: Pavan Deolasee (#3)

Re: Review: Hot standby

On Fri, 2008-11-21 at 15:21 +0530, Pavan Deolasee wrote:

+       /*
+        * Release locks, if any.
+        */
+       RelationReleaseRecoveryLocks(xlrec->slotId);
If I am reading the patch correctly, AccessExclusiveLocks acquired by
a transaction will be released when the transaction is committed or
aborted. If the transaction errors out with FATAL, the locks will be
released when the next transaction occupying the same slot is
committed/aborted.

I smell some sort of deadlock condition here. What if the following
events happen:

- transaction A (slot 1) starts and acquires AEL lock on relation
- transaction A errors out with FATAL error
- transaction B (slot 1) starts and requests AEL lock on the same
relation

Won't B deadlock with A ? Since B hasn't yet committed/aborted, the
lock held by A is not released and
relation_redo_lock() would indefinitely wait for the lock.

This won't happen because the lock is held by the startup process on
behalf of slot 1. Explained in comments in inval.c code.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

Pavan Deolasee

pavan.deolasee@gmail.com

over 17 years ago

In reply to: Simon Riggs (#4)

Re: Review: Hot standby

On Fri, Nov 21, 2008 at 6:59 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

The malloc was part of the existing code, explained by comments.

Oh I see. But I don't see any explanations for using malloc instead of
palloc. Not that the current patch is responsible for this, I am wondering
why its done that way and if we are freeing the malloced memory at all ?

malloc is used at another place in a new code. Although it seems that the
allocation happens just once, please check if its better to use palloc
there.

Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

Simon Riggs

simon@2ndQuadrant.com

over 17 years ago

In reply to: Pavan Deolasee (#6)

Re: Review: Hot standby

On Sat, 2008-11-22 at 15:14 +0530, Pavan Deolasee wrote:

The malloc was part of the existing code, explained by
comments.

Oh I see. But I don't see any explanations for using malloc instead of
palloc. Not that the current patch is responsible for this, I am
wondering why its done that way and if we are freeing the malloced
memory at all ?

malloc is used at another place in a new code. Although it seems that
the allocation happens just once, please check if its better to use
palloc there.

Thanks for your comments. OK, here's my thoughts after checking.

The malloc in the current code occurs within GetSnapshotData where it
exists to optimise memory allocation. The memory is allocated once and
never released, so malloc is appropriate.

The patch adds another use of malloc in GetRunningTransactionData()
which is similar to GetSnapshotData in many ways. It only ever runs
within bgwriter, and again, once allocated the memory is never released.
So malloc is appropriate there also.

I will change the #define as you suggested earlier.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

Alvaro Herrera

alvherre@2ndquadrant.com

over 17 years ago

In reply to: Pavan Deolasee (#6)

Re: Review: Hot standby

Pavan Deolasee escribiï¿½:

On Fri, Nov 21, 2008 at 6:59 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

The malloc was part of the existing code, explained by comments.

Oh I see. But I don't see any explanations for using malloc instead of
palloc. Not that the current patch is responsible for this, I am wondering
why its done that way and if we are freeing the malloced memory at all ?

It's an optimization. We don't ever free it -- we alloc it once (the
first time the snapshot is taken) and then the allocated space is reused
until the backend dies. The reason for not using palloc is that if
you're not going to do any context-related management, what would be the
point? We save the palloc overhead this way (admittedly not a lot).

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Simon Riggs

simon@2ndQuadrant.com

over 17 years ago

In reply to: Pavan Deolasee (#2)

Re: Review: Hot standby

On Fri, 2008-11-21 at 14:17 +0530, Pavan Deolasee wrote:

I wonder if there is corner case

I remain on the lookout for these, so thanks for thinking of this.

below where there are no WAL records to replay during standby
recovery. Specifically, that may cause IsRunningXactDataValid() to
return false since latestObservedXid remains set to
InvalidTransactionId and that prevents postmaster from serving
read-only clients.

I don't think so, because this section of code is only called if there
is redo to perform. If no redo, we never get here.

There certainly is no guarantee that the correct conditions will ever
exist to allow read-only access. If snapshots overflow consistently over
a long period then the correct state will never be reached. That is very
unlikely, but possible. I could prevent the delay, but then the code
would so seldom run that it would probably have bugs, so it didn't seem
worth trying to plug the gap.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

#10

Pavan Deolasee

pavan.deolasee@gmail.com

over 17 years ago

In reply to: Simon Riggs (#9)

Re: Review: Hot standby

On Sun, Nov 23, 2008 at 3:12 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

I don't think so, because this section of code is only called if there
is redo to perform. If no redo, we never get here.

OK. But why not open up the database for read-only access when there is no
redo action to be done ? Or am I missing something trivial ?

Anyways, I will wait for your latest version to continue with the
test/review.

Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

#11

Pavan Deolasee

pavan.deolasee@gmail.com

over 17 years ago

In reply to: Pavan Deolasee (#10)

Re: Review: Hot standby

The following sequence of commands causes server crash.

postgres=# BEGIN TRANSACTION READ WRITE ;
BEGIN
postgres=# SELECT * FROM pg_class FOR UPDATE;
FATAL: cannot assign TransactionIds during recovery
STATEMENT: SELECT * FROM pg_class FOR UPDATE;
FATAL: cannot assign TransactionIds during recovery
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.

I think we should disallow starting a read-write transaction during
recovery. The patch attempts to do that by setting transaction mode to
read-only, but not enough to prevent somebody to explicitly mark the
transaction as read-write.

Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

#12

Tom Lane

tgl@sss.pgh.pa.us

over 17 years ago

In reply to: Pavan Deolasee (#11)

Re: Review: Hot standby

"Pavan Deolasee" <pavan.deolasee@gmail.com> writes:

I think we should disallow starting a read-write transaction during
recovery. The patch attempts to do that by setting transaction mode to
read-only, but not enough to prevent somebody to explicitly mark the
transaction as read-write.

Huh? The "read only" transaction mode is not hard read-only anyway,
so if that's the only step being taken, it's entirely useless.

regards, tom lane

#13

Pavan Deolasee

pavan.deolasee@gmail.com

over 17 years ago

In reply to: Tom Lane (#12)

Re: Review: Hot standby

On Tue, Nov 25, 2008 at 6:55 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Huh? The "read only" transaction mode is not hard read-only anyway,
so if that's the only step being taken, it's entirely useless.

I think there are explicit checks for some utility statements (like VACUUM),
but I haven't checked if all necessary code paths are covered or not.

Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

#14

Simon Riggs

simon@2ndQuadrant.com

over 17 years ago

In reply to: Pavan Deolasee (#13)

Re: Review: Hot standby

On Tue, 2008-11-25 at 19:02 +0530, Pavan Deolasee wrote:

On Tue, Nov 25, 2008 at 6:55 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Huh? The "read only" transaction mode is not hard read-only
anyway,
so if that's the only step being taken, it's entirely useless.

I think there are explicit checks for some utility statements (like
VACUUM), but I haven't checked if all necessary code paths are covered
or not.

The commands that need protecting have been explicitly identified in the
notes and there are 7 files changed that were specifically identified
with protective changes.

You've identified a way of breaking part the first line of defence, but
the command was caught by the second line of defence in the patch.

Problem, yes. Major issue, no. Will fix.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

#15

Pavan Deolasee

pavan.deolasee@gmail.com

over 17 years ago

In reply to: Simon Riggs (#14)

Re: Review: Hot standby

ISTM that the redo conflict resolution is not working as intended. I did the
following test and it throws some surprises.

On standby:

postgres=# BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE ;
BEGIN
postgres=# SELECT * from test;
a | b
-----+---
102 |
103 |
(2 rows)

On primary:

postgres=# SELECT * from test;
a | b
-----+---
102 |
103 |
(2 rows)

postgres=#
postgres=# UPDATE test SET a = a + 100;
UPDATE 2
postgres=# VACUUM test;
VACUUM
postgres=# SELECT pg_switch_xlog();
pg_switch_xlog
----------------
0/2D000288
(1 row)

On standby (server log):

LOG: restored log file "00000001000000000000002D" from archive
LOG: recovery cancels activity of virtual transaction 2/2 pid 10593 because
it blocks exclusive locks (current delay now 5 secs)
CONTEXT: xlog redo exclusive relation lock: slot 99 db 11517 rel 24576
LOG: recovery cancels activity of virtual transaction 2/2 pid 10593 because
it blocks exclusive locks (current delay now 5 secs)
CONTEXT: xlog redo exclusive relation lock: slot 99 db 11517 rel 24576
LOG: recovery cancels activity of virtual transaction 2/2 pid 10593 because
it blocks exclusive locks (current delay now 5 secs)
CONTEXT: xlog redo exclusive relation lock: slot 99 db 11517 rel 24576
LOG: recovery cancels activity of virtual transaction 2/2 pid 10593 because
it blocks exclusive locks (current delay now 5 secs)
CONTEXT: xlog redo exclusive relation lock: slot 99 db 11517 rel 24576
LOG: recovery cancels activity of virtual transaction 2/2 pid 10593 because
it blocks exclusive locks (current delay now 5 secs)
CONTEXT: xlog redo exclusive relation lock: slot 99 db 11517 rel 24576
LOG: recovery cancels activity of virtual transaction 2/2 pid 10593 because
it blocks exclusive locks (current delay now 5 secs)
CONTEXT: xlog redo exclusive relation lock: slot 99 db 11517 rel 24576
LOG: recovery cancels activity of virtual transaction 2/2 pid 10593 because
it blocks exclusive locks (current delay now 5 secs)
CONTEXT: xlog redo exclusive relation lock: slot 99 db 11517 rel 24576
<same message repeated>

The open transaction (see above) on the standby is not still not aborted and
if I query the table in the same transaction, I get:

(Note: the transaction is still open)
postgres=#
postgres=# SELECT * from test;
a | b
---+---
(0 rows)

I think whats happening is that ResolveRecoveryConflictWithVirtualXIDs() is
failing to abort the open transaction and it keeps trying for that,
everytime doubling the sleep time, so the LOG messages come less frequently
later, but they are never ending. Soon the sleep becomes exponentially
large.

Even though the standby has a open transaction, its obvious that the
cleanup_redo has also failed to abort the transaction. Thats why the tuples
have disappeared from the standby (most likely because they are cleaned up
by VACUUM).

Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

#16

Pavan Deolasee

pavan.deolasee@gmail.com

over 17 years ago

In reply to: Pavan Deolasee (#15)

Re: Review: Hot standby

On Wed, Nov 26, 2008 at 3:52 PM, Pavan Deolasee <pavan.deolasee@gmail.com>wrote:

I think whats happening is that ResolveRecoveryConflictWithVirtualXIDs() is
failing to abort the open transaction

Btw, ISTM that SIGINT works only for statement cancellation. So if the
transaction is in idle state, SIGINT has nothing to cancel and hence also
fails to abort the transaction.

Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

#17

Simon Riggs

simon@2ndQuadrant.com

over 17 years ago

In reply to: Pavan Deolasee (#16)

Re: Review: Hot standby

On Wed, 2008-11-26 at 18:02 +0530, Pavan Deolasee wrote:

I think whats happening is that
ResolveRecoveryConflictWithVirtualXIDs() is failing to abort
the open transaction

Btw, ISTM that SIGINT works only for statement cancellation. So if the
transaction is in idle state, SIGINT has nothing to cancel and hence
also fails to abort the transaction.

[If I read this correctly this second post is the cause of the first
problem, so we have one problem, rather than two.]

Understood; yes that seems to be a problem.

I will propose a solution later today. (I've been tied up with a few
things over last few days, but I'm free of that now).

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

#18

Simon Riggs

simon@2ndQuadrant.com

over 17 years ago

In reply to: Simon Riggs (#17)

Re: Review: Hot standby

On Thu, 2008-11-27 at 17:14 +0000, Simon Riggs wrote:

On Wed, 2008-11-26 at 18:02 +0530, Pavan Deolasee wrote:

I think whats happening is that
ResolveRecoveryConflictWithVirtualXIDs() is failing to abort
the open transaction

Btw, ISTM that SIGINT works only for statement cancellation. So if the
transaction is in idle state, SIGINT has nothing to cancel and hence
also fails to abort the transaction.

[If I read this correctly this second post is the cause of the first
problem, so we have one problem, rather than two.]

Understood; yes that seems to be a problem.

After some thought, the way I would handle this is by sending a slightly
different kind of signal.

We can send a shared invalidation message which means "end the
transaction, whether or not you are currently running a statement". We
would then send the backend a SIGUSR1 to ensure that it reads the shared
invalidation message as soon as possible. This is easily possible with
the new sinval processing for 8.4.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

#19

Tom Lane

tgl@sss.pgh.pa.us

over 17 years ago

In reply to: Simon Riggs (#18)

Re: Review: Hot standby

Simon Riggs <simon@2ndQuadrant.com> writes:

After some thought, the way I would handle this is by sending a slightly
different kind of signal.

We can send a shared invalidation message which means "end the
transaction, whether or not you are currently running a statement".

No, a thousand times no. The sinval queue is an *utterly* inappropriate
mechanism for such a thing.

regards, tom lane

#20

Simon Riggs

simon@2ndQuadrant.com

over 17 years ago

In reply to: Tom Lane (#19)

Re: Review: Hot standby

On Fri, 2008-11-28 at 11:14 -0500, Tom Lane wrote:

Simon Riggs <simon@2ndQuadrant.com> writes:

After some thought, the way I would handle this is by sending a slightly
different kind of signal.

We can send a shared invalidation message which means "end the
transaction, whether or not you are currently running a statement".

No, a thousand times no.

So you're against it? ;-)

The sinval queue is an *utterly* inappropriate
mechanism for such a thing.

To be honest, it did seem quite a neat solution. Any particular
direction of thought you'd like me to pursue instead?

Asking the backend to kill itself is much cleaner than the other ways I
imagined. So my other thoughts steer towards hijacking the SIGUSR1
signal somehow for my nefarious purposes. Would that way sound OK?

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

#21

Tom Lane

tgl@sss.pgh.pa.us

over 17 years ago

In reply to: Simon Riggs (#20)

#22

Simon Riggs

simon@2ndQuadrant.com

over 17 years ago

In reply to: Tom Lane (#21)

#23