Hot standby doesn't come up on some situation.

Started by Kyotaro HORIGUCHIalmost 12 years ago12 messages

horiguchi.kyotaro@lab.ntt.co.jp

almost 12 years ago

2 attachment(s)

Hello, we found that hot standby doesn't came up under certain
condition. This occurs for 9.3 and 9.4dev.

The recovery process stays on 'incosistent' state forever when
the server has crashed before any wal record is inserted after
the last checkpoint.

This seems to be because EndRecPtr is set to minRecoveryPoint at
the end of crash recovery in ReadRecord. EndRecPtr here points to
the beginning of the next record to the record alread read, just
after the last checkpoint and no record is there in this
case. Then successive CheckRecoveryConsistency won't consider
that the 'consistent state' comes in spite that actually it is
already consistent.

I diffidently think that lastReplayedEndRecPtr is suitable there.

The script attached first causes the situation. Run it, then
after the server complains that it can't connect to the primary,
connecting it by psql results in,

| psql: FATAL: the database system is starting up

The attached patch fixes the problem on 9.4dev.

What do you think about this?

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachments:

test.shtext/plain; charset=us-asciiDownload

recovery_fix_20140228.patchtext/x-patch; charset=us-asciiDownload

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 483d5c3..f1f54f1 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -4496,7 +4496,15 @@ ReadRecord(XLogReaderState *xlogreader, XLogRecPtr RecPtr, int emode,
 				ControlFile->state = DB_IN_ARCHIVE_RECOVERY;
 				if (ControlFile->minRecoveryPoint < EndRecPtr)
 				{
-					ControlFile->minRecoveryPoint = EndRecPtr;
+					/*
+					 * Altough EndRecPtr is the right value for
+					 * minRecoveryPoint in archive recovery, it is a bit too
+					 * far when the last checkpoint record is the last wal
+					 * record here. Use lastReplayedEndRecPtr as
+					 * minRecoveryPoint point to start hot stanby just after.
+					 */
+					ControlFile->minRecoveryPoint =
+						XLogCtl->lastReplayedEndRecPtr;
 					ControlFile->minRecoveryPointTLI = ThisTimeLineID;
 				}
 				/* update local copy */

Kyotaro HORIGUCHI

horiguchi.kyotaro@lab.ntt.co.jp

almost 12 years ago

In reply to: Kyotaro HORIGUCHI (#1)

Re: Hot standby doesn't come up on some situation.

Ouch. this is the same issue to the mail below,

/messages/by-id/53104595.6060000@lab.ntt.co.jp

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Andres Freund

andres@2ndquadrant.com

almost 12 years ago

In reply to: Kyotaro HORIGUCHI (#1)

Re: Hot standby doesn't come up on some situation.

On 2014-02-28 17:55:21 +0900, Kyotaro HORIGUCHI wrote:

The recovery process stays on 'incosistent' state forever when
the server has crashed before any wal record is inserted after
the last checkpoint.

# killall postgres
# rm -rf $PGDATA/*
initdb
pg_ctl start -w
sleep 1
pg_ctl stop -m i
cat > $PGDATA/recovery.conf <<EOF
standby_mode = 'on'
primary_conninfo = 'host=localhost port=9999 user=repuser application_name=pm01 keepalives_idle=60 keepalives_interval=5 keepalives_count=5'
#restore_command = '/bin/true'
recovery_target_timeline = 'latest'
EOF
cat >> $PGDATA/postgresql.conf <<EOF
#log_min_messages = debug5
hot_standby = on
EOF
pg_ctl start

Uh. So, if I understand correctly, what you did is to convert a normal
live pg, into a replica that doesn't have a upstream node, right?
Normally the primary will just do an additional write shortly
afterwards, resolving this situation?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Kyotaro HORIGUCHI

kyota.horiguchi@gmail.com

almost 12 years ago

In reply to: Andres Freund (#3)

Re: Hot standby doesn't come up on some situation.

Hello,

2014/02/28 18:07 "Andres Freund" :

On 2014-02-28 17:55:21 +0900, Kyotaro HORIGUCHI wrote:

The recovery process stays on 'incosistent' state forever when
the server has crashed before any wal record is inserted after
the last checkpoint.

# killall postgres
# rm -rf $PGDATA/*
initdb
pg_ctl start -w
sleep 1
pg_ctl stop -m i
cat > $PGDATA/recovery.conf <<EOF
standby_mode = 'on'
primary_conninfo = 'host=localhost port=9999 user=repuser

application_name=pm01 keepalives_idle=60 keepalives_interval=5
keepalives_count=5'

#restore_command = '/bin/true'
recovery_target_timeline = 'latest'
EOF
cat >> $PGDATA/postgresql.conf <<EOF
#log_min_messages = debug5
hot_standby = on
EOF
pg_ctl start

Uh. So, if I understand correctly, what you did is to convert a normal
live pg, into a replica that doesn't have a upstream node, right?

Yes, but the same stuation could be made by restarting crashed secondary. I
didn't tried it since my console is far behind now...

Can a restart of secondry after crashing just after the completion of
restartpoint with no further records make the same situation?

I have no idea about the scenario on whitch this behavior was regarded as
undesirable but anyway I think that the secondry should start accepting
client just after crash recovery is completed.

Normally the primary will just do an additional write shortly
afterwards, resolving this situation?

Maybe so. I haven't tried though.

regards,
--
Kyotaro Horiguchi
NTT Opensource Software Center

Heikki Linnakangas

hlinnakangas@vmware.com

almost 12 years ago

In reply to: Kyotaro HORIGUCHI (#4)

1 attachment(s)

Re: Hot standby doesn't come up on some situation.

On 02/28/2014 11:51 AM, Kyotaro HORIGUCHI wrote:

Hello,

2014/02/28 18:07 "Andres Freund" :

On 2014-02-28 17:55:21 +0900, Kyotaro HORIGUCHI wrote:

The recovery process stays on 'incosistent' state forever when
the server has crashed before any wal record is inserted after
the last checkpoint.

# killall postgres
# rm -rf $PGDATA/*
initdb
pg_ctl start -w
sleep 1
pg_ctl stop -m i
cat > $PGDATA/recovery.conf <<EOF
standby_mode = 'on'
primary_conninfo = 'host=localhost port=9999 user=repuser

application_name=pm01 keepalives_idle=60 keepalives_interval=5
keepalives_count=5'

#restore_command = '/bin/true'
recovery_target_timeline = 'latest'
EOF
cat >> $PGDATA/postgresql.conf <<EOF
#log_min_messages = debug5
hot_standby = on
EOF
pg_ctl start

Uh. So, if I understand correctly, what you did is to convert a normal
live pg, into a replica that doesn't have a upstream node, right?

Yes, but the same stuation could be made by restarting crashed secondary.

Yeah.

I have no idea about the scenario on whitch this behavior was regarded as
undesirable but anyway I think that the secondry should start accepting
client just after crash recovery is completed.

Agreed, this is a bug.

I don't think your patch is the right fix for this though. Setting
minRecoveryPoint to EndRecPtr is the right thing to do; EndRecPtr points
to the end of the last read and replayed record. What's wrong in this
case is lastReplayedEndRecptr. At the beginning of recovery, it's
initialized to the REDO point, but with a shutdown checkpoint, that's
not quite right. When starting from a shutdown checkpoint, REDO points
to the beginning of the shutdown record, but we've already effectively
replayed it. The next record we replay is the one after the checkpoint.

To see that, I added some elog(LOG) calls:

~/pgsql.93stable$ bin/postmaster -D data
LOG: database system was shut down at 2014-02-28 14:06:18 EET
LOG: ReadCheckpointRecord: 0/16479C98
LOG: database system is ready to accept connections
LOG: autovacuum launcher started
^CLOG: received fast shutdown request
LOG: aborting any active transactions
LOG: autovacuum launcher shutting down
LOG: shutting down
LOG: INSERT @ 0/16479D00: prev 0/16479C98; xid 0; len 72: XLOG -
checkpoint: redo 0/16479D00; tli 1; prev tli 1; fpw true; xid 0/793393;
oid 24988; multi 655288; offset 1356722; oldest xid 687 in DB 1; oldest
multi 1 in DB 1; oldest running xid 0; shutdown
LOG: xlog flush request 0/16479D68; write 0/0; flush 0/0
LOG: database system is shut down
~/pgsql.93stable$ bin/postmaster -D data
LOG: database system was shut down at 2014-02-28 14:06:23 EET
LOG: ReadCheckpointRecord: 0/16479D00
LOG: database system is ready to accept connections
LOG: autovacuum launcher started
Killed

At this point, the last record is the shutdown checkpoint, beginning at
16479D00, and the server has been killed (immediate shutdown).

~/pgsql.93stable$ cp recovery.conf data/recovery.conf
~/pgsql.93stable$ bin/postmaster -D data
LOG: database system was interrupted; last known up at 2014-02-28
14:06:29 EET
LOG: entering standby mode
LOG: ReadCheckpointRecord: 0/16479D00
LOG: database system was not properly shut down; automatic recovery in
progress
LOG: record with zero length at 0/16479D68
LOG: reached end of WAL in pg_xlog, entering archive recovery
LOG: EndRecPtr: 0/16479D68 lastReplayedEndRecPtr: 0/16479D00
FATAL: could not connect to the primary server: could not connect to
server: Connection refused
...

Recovery starts from the checkpoint record, but lastReplayedEndRecPtr is
set to the *beginning* of the checkpoint record, even though the
checkpoint record has already been effectively replayed, by the feat of
starting recovery from it. EndRecPtr correctly points to the end of the
checkpoint record. Because of the incorrect lastReplayedEndRecPtr value,
the CheckRecoveryConsistency() call concludes that it's not consistent.

I believe the attached fix is the right way to fix this.

- Heikki

Attachments:

startup-from-shutdown-checkpoint-fix.patchtext/x-diff; name=startup-from-shutdown-checkpoint-fix.patchDownload

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 7399fd4..a36354f 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -5457,13 +5457,17 @@ StartupXLOG(void)
 
 		/*
 		 * Initialize shared variables for tracking progress of WAL replay,
-		 * as if we had just replayed the record before the REDO location.
+		 * as if we had just replayed the record before the REDO location
+		 * (or the checkpoint record itself, if it's a shutdown checkpoint).
 		 */
 		SpinLockAcquire(&xlogctl->info_lck);
-		xlogctl->replayEndRecPtr = checkPoint.redo;
+		if (checkPoint.redo < RecPtr)
+			xlogctl->replayEndRecPtr = checkPoint.redo;
+		else
+			xlogctl->replayEndRecPtr = EndRecPtr;
 		xlogctl->replayEndTLI = ThisTimeLineID;
-		xlogctl->lastReplayedEndRecPtr = checkPoint.redo;
-		xlogctl->lastReplayedTLI = ThisTimeLineID;
+		xlogctl->lastReplayedEndRecPtr = xlogctl->replayEndRecPtr;
+		xlogctl->lastReplayedTLI = xlogctl->replayEndTLI;
 		xlogctl->recoveryLastXTime = 0;
 		xlogctl->currentChunkStartTime = 0;
 		xlogctl->recoveryPause = false;

Kyotaro HORIGUCHI

horiguchi.kyotaro@lab.ntt.co.jp

almost 12 years ago

In reply to: Heikki Linnakangas (#5)

Re: Hot standby doesn't come up on some situation.

Hello,

At Fri, 28 Feb 2014 14:45:58 +0200, Heikki Linnakangas <hlinnakangas@vmware.com> wrote in <53108506.2010200@vmware.com>

Yes, but the same stuation could be made by restarting crashed
secondary.

Yeah.

I have no idea about the scenario on whitch this behavior was regarded
as
undesirable but anyway I think that the secondry should start
accepting
client just after crash recovery is completed.

Agreed, this is a bug.

I don't think your patch is the right fix for this though. Setting
minRecoveryPoint to EndRecPtr is the right thing to do; EndRecPtr
points to the end of the last read and replayed record. What's wrong
in this case is lastReplayedEndRecptr. At the beginning of recovery,
it's initialized to the REDO point, but with a shutdown checkpoint,
that's not quite right. When starting from a shutdown checkpoint, REDO
points to the beginning of the shutdown record, but we've already
effectively replayed it. The next record we replay is the one after
the checkpoint.

It's more reasonable. I felt uncelar about that but I forgot to
doubt the correctness of lastReplayedEndRecptr then, but surely
the shutdown record itself was effectively alredy replayed when
the recored is inseretd.

To see that, I added some elog(LOG) calls:

~/pgsql.93stable$ bin/postmaster -D data
LOG: database system was shut down at 2014-02-28 14:06:18 EET
LOG: ReadCheckpointRecord: 0/16479C98
LOG: database system is ready to accept connections
LOG: autovacuum launcher started
^CLOG: received fast shutdown request
LOG: aborting any active transactions
LOG: autovacuum launcher shutting down
LOG: shutting down
LOG: INSERT @ 0/16479D00: prev 0/16479C98; xid 0; len 72: XLOG -
checkpoint: redo 0/16479D00; tli 1; prev tli 1; fpw true; xid
0/793393; oid 24988; multi 655288; offset 1356722; oldest xid 687 in
DB 1; oldest multi 1 in DB 1; oldest running xid 0; shutdown
LOG: xlog flush request 0/16479D68; write 0/0; flush 0/0
LOG: database system is shut down
~/pgsql.93stable$ bin/postmaster -D data
LOG: database system was shut down at 2014-02-28 14:06:23 EET
LOG: ReadCheckpointRecord: 0/16479D00
LOG: database system is ready to accept connections
LOG: autovacuum launcher started
Killed

At this point, the last record is the shutdown checkpoint, beginning
at 16479D00, and the server has been killed (immediate shutdown).

~/pgsql.93stable$ cp recovery.conf data/recovery.conf
~/pgsql.93stable$ bin/postmaster -D data
LOG: database system was interrupted; last known up at 2014-02-28
14:06:29 EET
LOG: entering standby mode
LOG: ReadCheckpointRecord: 0/16479D00
LOG: database system was not properly shut down; automatic recovery in
progress
LOG: record with zero length at 0/16479D68
LOG: reached end of WAL in pg_xlog, entering archive recovery
LOG: EndRecPtr: 0/16479D68 lastReplayedEndRecPtr: 0/16479D00
FATAL: could not connect to the primary server: could not connect to
server: Connection refused
...

Recovery starts from the checkpoint record, but lastReplayedEndRecPtr
is set to the *beginning* of the checkpoint record, even though the
checkpoint record has already been effectively replayed, by the feat
of starting recovery from it. EndRecPtr correctly points to the end of
the checkpoint record. Because of the incorrect lastReplayedEndRecPtr
value, the CheckRecoveryConsistency() call concludes that it's not
consistent.

I completely understood the behavior thanks to your detailed
explanation. (And how to use log messages effectively :-)

I agree that the fix is appropriate.

I believe the attached fix is the right way to fix this.

It also worked for me. Thank you.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Kyotaro HORIGUCHI

horiguchi.kyotaro@lab.ntt.co.jp

almost 12 years ago

In reply to: Kyotaro HORIGUCHI (#6)

Re: Hot standby doesn't come up on some situation.

Ouch! It brought another bug.

I completely understood the behavior thanks to your detailed
explanation. (And how to use log messages effectively :-)

Sorry, I just found that it's wrong, and found another problem
brought by your patch.

I agree that the fix is appropriate.

I believe the attached fix is the right way to fix this.

It also worked for me. Thank you.

| * as if we had just replayed the record before the REDO location
| * (or the checkpoint record itself, if it's a shutdown checkpoint).

The test script following raises assertion failure. It's added
with 'non-shutdown' checkpoint' just before shutting down
immediately. Starting server aborts with the following message.

| LOG: database system was not properly shut down; automatic recovery in progress
| TRAP: FailedAssertion("!(((oldestActiveXID) != ((TransactionId) 0)))", File: "xlog.c", Line: 6771)
| LOG: startup process (PID 28561) was terminated by signal 6: Aborted

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

===
#! /bin/sh

killall postgres
rm -rf $PGDATA/*
initdb
pg_ctl start -w
sleep 1
psql postgres -c 'checkpoint'
pg_ctl stop -m i
cat > $PGDATA/recovery.conf <<EOF
standby_mode = 'on'
primary_conninfo = 'host=localhost port=9999 user=repuser application_name=pm01 keepalives_idle=60 keepalives_interval=5 keepalives_count=5'
#restore_command = '/bin/true'
recovery_target_timeline = 'latest'
EOF
cat >> $PGDATA/postgresql.conf <<EOF
#log_min_messages = debug5
hot_standby = on
EOF
pg_ctl start
====

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Kyotaro HORIGUCHI

horiguchi.kyotaro@lab.ntt.co.jp

almost 12 years ago

In reply to: Kyotaro HORIGUCHI (#7)

Re: Hot standby doesn't come up on some situation.

Correcting one point of my last mail.

Ouch! It brought another bug.

My patch also did.

regards,

====

I completely understood the behavior thanks to your detailed
explanation. (And how to use log messages effectively :-)

Sorry, I just found that it's wrong, and found another problem
brought by your patch.

I agree that the fix is appropriate.

I believe the attached fix is the right way to fix this.

It also worked for me. Thank you.

| * as if we had just replayed the record before the REDO location
| * (or the checkpoint record itself, if it's a shutdown checkpoint).

The test script following raises assertion failure. It's added
with 'non-shutdown' checkpoint' just before shutting down
immediately. Starting server aborts with the following message.

| LOG: database system was not properly shut down; automatic recovery in progress
| TRAP: FailedAssertion("!(((oldestActiveXID) != ((TransactionId) 0)))", File: "xlog.c", Line: 6771)
| LOG: startup process (PID 28561) was terminated by signal 6: Aborted

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

===
#! /bin/sh

killall postgres
rm -rf $PGDATA/*
initdb
pg_ctl start -w
sleep 1
psql postgres -c 'checkpoint'
pg_ctl stop -m i
cat > $PGDATA/recovery.conf <<EOF
standby_mode = 'on'
primary_conninfo = 'host=localhost port=9999 user=repuser application_name=pm01 keepalives_idle=60 keepalives_interval=5 keepalives_count=5'
#restore_command = '/bin/true'
recovery_target_timeline = 'latest'
EOF
cat >> $PGDATA/postgresql.conf <<EOF
#log_min_messages = debug5
hot_standby = on
EOF
pg_ctl start
====

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Kyotaro HORIGUCHI

horiguchi.kyotaro@lab.ntt.co.jp

almost 12 years ago

In reply to: Kyotaro HORIGUCHI (#8)

Re: Hot standby doesn't come up on some situation.

Hello,

| * as if we had just replayed the record before the REDO location
| * (or the checkpoint record itself, if it's a shutdown checkpoint).

The test script following raises assertion failure. It's added
with 'non-shutdown' checkpoint' just before shutting down
immediately. Starting server aborts with the following message.

| LOG: database system was not properly shut down; automatic recovery in progress
| TRAP: FailedAssertion("!(((oldestActiveXID) != ((TransactionId) 0)))", File: "xlog.c", Line: 6771)
| LOG: startup process (PID 28561) was terminated by signal 6: Aborted

This is because the checkpoint was done with 'wal_level =
minimal'. The server restarts correctly by starting the server
with 'wal_level = hot_standby' at first.

It looks a mere mis-op. The log message looks unfriendly but I'm
uncertain of the necessity of changing it.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10

Kyotaro HORIGUCHI

horiguchi.kyotaro@lab.ntt.co.jp

almost 12 years ago

In reply to: Kyotaro HORIGUCHI (#6)

Re: Hot standby doesn't come up on some situation.

Hello,

After all, I have confirmed that this fixes the problem on crash
recovery of hot-standby botfor 9.3 and HEAD and no problem was
found except unreadability :(

By the way, I moderately want to fix an assertion message to a
ordinary one. Details are below.

====
The server stops with following message during restarting after
crash requesting archive recovery when the WAL has been produced
with the wal_level below WAL_LEVEL_HOT_STANDBY.

| TRAP: FailedAssertion("!(((oldestActiveXID) != ((TransactionId) 0)))", File: "xlog.c", Line: 6799)
| LOG: startup process (PID 7270) was terminated by signal 6: Aborted

Surely this is the consequence of illegal operation but I think
it is also not a issue of assertion - which fires on something
wrong in design or quite rare cases(this case ?). So it might be
better to show message as below on the case.

| FATAL: Checkpoint doesn't have valid oldest active transaction id
| HINT: Reading WAL might have been written under insufficient wal_level.

This could do in this way,

======
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index e3d5e10..bb6922a 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -6789,7 +6789,13 @@ StartupXLOG(void)
 			if (wasShutdown)
 				oldestActiveXID = PrescanPreparedTransactions(&xids, &nxids);
 			else
+			{
 				oldestActiveXID = checkPoint.oldestActiveXid;
+				if (!TransactionIdIsValid(oldestActiveXID))
+					ereport(FATAL,
+							(errmsg("Checkpoint doesn't have valid oldest active transaction id"),
+							 errhint("Reading WAL might have been written under insufficient wal_level.")));
+			}
 			Assert(TransactionIdIsValid(oldestActiveXID));

/* Tell procarray about the range of xids it has to deal with */
=====

What do you think about this? Feel free dumping this if you feel
negative on this.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11

Heikki Linnakangas

hlinnakangas@vmware.com

almost 12 years ago

In reply to: Kyotaro HORIGUCHI (#10)

Re: Hot standby doesn't come up on some situation.

On 03/05/2014 10:51 AM, Kyotaro HORIGUCHI wrote:

Hello,

After all, I have confirmed that this fixes the problem on crash
recovery of hot-standby botfor 9.3 and HEAD and no problem was
found except unreadability :(

Ok, committed. Thanks!

Any concrete suggestions about the readability? Is there some particular
spot that needs clarifying?

By the way, I moderately want to fix an assertion message to a
ordinary one. Details are below.

====
The server stops with following message during restarting after
crash requesting archive recovery when the WAL has been produced
with the wal_level below WAL_LEVEL_HOT_STANDBY.

| TRAP: FailedAssertion("!(((oldestActiveXID) != ((TransactionId) 0)))", File: "xlog.c", Line: 6799)
| LOG: startup process (PID 7270) was terminated by signal 6: Aborted

Surely this is the consequence of illegal operation but I think
it is also not a issue of assertion - which fires on something
wrong in design or quite rare cases(this case ?).

Ah, I see. Yes, that's definitely a bug. If you don't hit the assertion,
because the oldestActiveXID is set in the checkpoint record even though
wal_level is 'archive', or if you simply have assertions disabled, the
system will start up in hot standby mode even though it's not safe.

So it might be
better to show message as below on the case.

| FATAL: Checkpoint doesn't have valid oldest active transaction id
| HINT: Reading WAL might have been written under insufficient wal_level.
This could do in this way,
======
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index e3d5e10..bb6922a 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -6789,7 +6789,13 @@ StartupXLOG(void)
if (wasShutdown)
oldestActiveXID = PrescanPreparedTransactions(&xids, &nxids);
else
+			{
oldestActiveXID = checkPoint.oldestActiveXid;
+				if (!TransactionIdIsValid(oldestActiveXID))
+					ereport(FATAL,
+							(errmsg("Checkpoint doesn't have valid oldest active transaction id"),
+							 errhint("Reading WAL might have been written under insufficient wal_level.")));
+			}
Assert(TransactionIdIsValid(oldestActiveXID));
/* Tell procarray about the range of xids it has to deal with */
=====

What do you think about this? Feel free dumping this if you feel
negative on this.

Hmm. When I test that with 9.2, oldestActiveXID is not 0, even though
wal_level is 'archive'. So the above patch doesn't fix the whole problem.

The real bug here is that CheckRequiredParameterValues() tests for
InArchiveRecovery, when it should be testing for
ArchiveRecoveryRequested. Otherwise, the checks are not performed when
going through the crash recovery followed by archive recovery. I
should've changed that as part of the commit that added the crash
recovery then archive recovery behavior.

Fixed, thanks for pointing it out!

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12

Kyotaro HORIGUCHI

horiguchi.kyotaro@lab.ntt.co.jp

almost 12 years ago

In reply to: Heikki Linnakangas (#11)

Re: Hot standby doesn't come up on some situation.

Hello,

After all, I have confirmed that this fixes the problem on crash
recovery of hot-standby botfor 9.3 and HEAD and no problem was
found except unreadability :(

Ok, committed. Thanks!

Thank you.

Any concrete suggestions about the readability? Is there some
particular spot that needs clarifying?

Well, Although I became to see no problem there after I understood how
it works:-), I'll write about two points where I had difficulties to
understand.

| * (or the checkpoint record itself, if it's a shutdown checkpoint).
| */
| if (checkPoint.redo < RecPtr)

First, it was a bit tough work to confirm the equivalence between
(redo == RecPtr) and that the checkpoint is shutdown checkpoint.
Although finally I was convinced that it surely holds, that is
actually not the point. The point here is in the first half of the
phrase. The comment might be less perplexing if it were as folowing
even if only shutdown checkpoint satisfies the condition. But it would
occur another quiestion in readers' mind.

| * (or the checkpoint record itself, e.g. if it's a shutdown checkpoint).

Second, the added code depends on the assumption that RecPtr points to
the checkpoint record and EndRecPtr points to the next record there.
It would be better for understandability and stability (against
modifying code) to explicitly declare the precondition, like this

| Here RecPtr points the checkpoint record and EndRecPtr points to the
| place for the record just after.

Surely this is the consequence of illegal operation but I think
it is also not a issue of assertion - which fires on something
wrong in design or quite rare cases(this case ?).

Ah, I see. Yes, that's definitely a bug. If you don't hit the
assertion, because the oldestActiveXID is set in the checkpoint record
even though wal_level is 'archive', or if you simply have assertions
disabled, the system will start up in hot standby mode even though
it's not safe.

So it might be
better to show message as below on the case.

| FATAL: Checkpoint doesn't have valid oldest active transaction id
| HINT: Reading WAL might have been written under insufficient
| wal_level.

Agreed.

Hmm. When I test that with 9.2, oldestActiveXID is not 0, even though
wal_level is 'archive'. So the above patch doesn't fix the whole
problem.

The real bug here is that CheckRequiredParameterValues() tests for
InArchiveRecovery, when it should be testing for
ArchiveRecoveryRequested. Otherwise, the checks are not performed when
going through the crash recovery followed by archive recovery. I
should've changed that as part of the commit that added the crash
recovery then archive recovery behavior.

Fixed, thanks for pointing it out!

It's my pleasre.

regrds,

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers