9.2.3 crashes during archive recovery

Started by Kyotaro Horiguchiabout 13 years ago47 messageshackers
Jump to latest
#1Kyotaro Horiguchi
horikyota.ntt@gmail.com

Hello, 9.2.3 crashes during archive recovery.

This was also corrected at some point on origin/master with
another problem fixed by the commit below if my memory is
correct. But current HEAD and 9.2.3 crashes during archive
recovery (not on standby) by the 'marking deleted page visible'
problem.

/messages/by-id/50C7612E.5060008@vmware.com

The script attached replays the situation.

This could be illustrated as below,

1. StartupXLOG() initializes minRecoveryPoint with
ControlFile->CheckPoint.redo (virtually) at first of archive
recovery process.

2. Consistency state becomes true just before redo starts.

3. Redoing certain XLOG_HEAP2_VISIBLE record causes crash because
the page for visibility map has been alrady removed by
smgr_truncate() who had emitted XLOG_SMGR_TRUNCATE record
after that.

PANIC: WAL contains references to invalid pages

After all, the consistency point should be postponed until the
XLOG_SMGR_TRUNCATE.

In this case, the FINAL consistency point is at the
XLOG_SMGR_TRUNCATE record, but current implemet does not record
the consistency point (checkpoint, or commit or smgr_truncate)
itself, so we cannot predict the final consistency point on
starting of recovery.

Recovery was completed successfully with the small and rough
patch below. This allows multiple consistency points but also
kills quick cessation. (And of course no check is done against
replication/promotion until now X-)

--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -6029,7 +6029,9 @@ CheckRecoveryConsistency(void)
*/
XLogCheckInvalidPages();
-		reachedConsistency = true;
+		//reachedConsistency = true;
+		minRecoveryPoint = InvalidXLogRecPtr;
+
ereport(LOG,
(errmsg("consistent recovery state reached at %X/%X",
(uint32) (XLogCtl->lastReplayedEndRecPtr >> 32),

On the other hand, updating control file on every commits or
smgr_truncate's should slow the transactions..

Regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

====== Replay script.
#! /bin/sh
PGDATA=$HOME/pgdata
PGARC=$HOME/pgarc
rm -rf $PGDATA/* $PGARC/*
initdb
cat >> $PGDATA/postgresql.conf <<EOF
wal_level = hot_standby
checkpoint_segments = 300
checkpoint_timeout = 1h
archive_mode = on
archive_command = 'cp %p $PGARC/%f'
EOF
pg_ctl -w start
createdb
psql <<EOF
select version();
create table t (a int);
insert into t values (5);
checkpoint;
vacuum;
delete from t;
vacuum;
EOF
pg_ctl -w stop -m i
cat >> $PGDATA/recovery.conf <<EOF
restore_command='if [ -f $PGARC/%f ]; then cp $PGARC/%f %p; fi'
EOF
postgres
=====

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#2Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Kyotaro Horiguchi (#1)
Re: 9.2.3 crashes during archive recovery

On 13.02.2013 09:46, Kyotaro HORIGUCHI wrote:

In this case, the FINAL consistency point is at the
XLOG_SMGR_TRUNCATE record, but current implemet does not record
the consistency point (checkpoint, or commit or smgr_truncate)
itself, so we cannot predict the final consistency point on
starting of recovery.

Hmm, what you did was basically:

1. Run server normally.
2. Kill it with "pg_ctl stop -m immediate".
3. Create a recovery.conf file, turning the server into a hot standby.

Without step 3, the server would perform crash recovery, and it would
work. But because of the recovery.conf file, the server goes into
archive recovery, and because minRecoveryPoint is not set, it assumes
that the system is consistent from the start.

Aside from the immediate issue with truncation, the system really isn't
consistent until the WAL has been replayed far enough, so it shouldn't
open for hot standby queries. There might be other, later, changes
already flushed to data files. The system has no way of knowing how far
it needs to replay the WAL to become consistent.

At least in back-branches, I'd call this a pilot error. You can't turn a
master into a standby just by creating a recovery.conf file. At least
not if the master was not shut down cleanly first.

If there's a use case for doing that, maybe we can do something better
in HEAD. If the control file says that the system was running
(DB_IN_PRODUCTION), but there is a recovery.conf file, we could do crash
recovery first, until we reach the end of WAL, and go into archive
recovery mode after that. We'd recover all the WAL files in pg_xlog as
far as we can, same as in crash recovery, and only start restoring files
from the archive once we reach the end of WAL in pg_xlog. At that point,
we'd also consider the system as consistent, and start up for hot standby.

I'm not sure that's worth the trouble, though. Perhaps it would be
better to just throw an error if the control file state is
DB_IN_PRODUCTION and a recovery.conf file exists. The admin can always
start the server normally first, shut it down cleanly, and then create
the recovery.conf file.

On the other hand, updating control file on every commits or
smgr_truncate's should slow the transactions..

To be precise, we'd need to update the control file on every
XLogFlush(), like we do during archive recovery. That would indeed be
unacceptable from a performance point of view. Updating the control file
that often would also be bad for robustness.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Heikki Linnakangas (#2)
Re: 9.2.3 crashes during archive recovery

Heikki Linnakangas <hlinnakangas@vmware.com> writes:

At least in back-branches, I'd call this a pilot error. You can't turn a
master into a standby just by creating a recovery.conf file. At least
not if the master was not shut down cleanly first.
...
I'm not sure that's worth the trouble, though. Perhaps it would be
better to just throw an error if the control file state is
DB_IN_PRODUCTION and a recovery.conf file exists.

+1 for that approach, at least until it's clear there's a market for
doing this sort of thing. I think the error check could be
back-patched, too.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Simon Riggs
simon@2ndQuadrant.com
In reply to: Heikki Linnakangas (#2)
Re: 9.2.3 crashes during archive recovery

On 13 February 2013 09:04, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:

To be precise, we'd need to update the control file on every XLogFlush(),
like we do during archive recovery. That would indeed be unacceptable from a
performance point of view. Updating the control file that often would also
be bad for robustness.

If those arguments make sense, then why don't they apply to recovery as well?

It sounds like we need to look at something better for use during
archive recovery.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#4)
Re: 9.2.3 crashes during archive recovery

Simon Riggs <simon@2ndQuadrant.com> writes:

On 13 February 2013 09:04, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:

To be precise, we'd need to update the control file on every XLogFlush(),
like we do during archive recovery. That would indeed be unacceptable from a
performance point of view. Updating the control file that often would also
be bad for robustness.

If those arguments make sense, then why don't they apply to recovery as well?

In plain old crash recovery, don't the checks on whether to apply WAL
records based on LSN take care of this?

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Simon Riggs (#4)
Re: 9.2.3 crashes during archive recovery

On 13.02.2013 20:25, Simon Riggs wrote:

On 13 February 2013 09:04, Heikki Linnakangas<hlinnakangas@vmware.com> wrote:

To be precise, we'd need to update the control file on every XLogFlush(),
like we do during archive recovery. That would indeed be unacceptable from a
performance point of view. Updating the control file that often would also
be bad for robustness.

If those arguments make sense, then why don't they apply to recovery as well?

To some degree, they do. The big difference is that during normal
operation, every commit is XLogFlushed(). During recovery, XLogFlush()
happens much less frequently - certainly not after replaying each commit
record.

It sounds like we need to look at something better for use during
archive recovery.

Well, no-one's complained about the performance. From a robustness point
of view, it might be good to keep the minRecoveryPoint value in a
separate file, for example, to avoid rewriting the control file that
often. Then again, why fix it when it's not broken.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Tom Lane (#5)
Re: 9.2.3 crashes during archive recovery

On 13.02.2013 21:03, Tom Lane wrote:

Simon Riggs<simon@2ndQuadrant.com> writes:

On 13 February 2013 09:04, Heikki Linnakangas<hlinnakangas@vmware.com> wrote:

To be precise, we'd need to update the control file on every XLogFlush(),
like we do during archive recovery. That would indeed be unacceptable from a
performance point of view. Updating the control file that often would also
be bad for robustness.

If those arguments make sense, then why don't they apply to recovery as well?

In plain old crash recovery, don't the checks on whether to apply WAL
records based on LSN take care of this?

The problem we're trying to solve is determining how much WAL needs to
be replayed until the database is consistent again. In crash recovery,
the answer is "all of it". That's why the CRC in the WAL is essential;
it's required to determine where the WAL ends. But if we had some other
mechanism, like if we updated minRecoveryPoint after every XLogFlush()
like Simon suggested, we wouldn't necessarily need the CRC to detect end
of WAL (not that I'd suggest removing it anyway), and we could throw an
error if there is corrupt bit somewhere in the WAL before the true end
of WAL.

In archive recovery, we can't just say "replay all the WAL", because the
whole idea of PITR is to not recover all the WAL. So we use
minRecoveryPoint to keep track of how far the WAL needs to be replayed
at a minimum, for the database to be consistent.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Heikki Linnakangas (#6)
Re: 9.2.3 crashes during archive recovery

Heikki Linnakangas <hlinnakangas@vmware.com> writes:

Well, no-one's complained about the performance. From a robustness point
of view, it might be good to keep the minRecoveryPoint value in a
separate file, for example, to avoid rewriting the control file that
often. Then again, why fix it when it's not broken.

It would only be broken if someone interrupted a crash recovery
mid-flight and tried to establish a recovery stop point before the end
of WAL, no? Why don't we just forbid that case? This would either be
the same as, or a small extension of, the pg_control state vs existence
of recovery.conf error check that was just discussed.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Tom Lane (#8)
Re: 9.2.3 crashes during archive recovery

On 13.02.2013 21:21, Tom Lane wrote:

Heikki Linnakangas<hlinnakangas@vmware.com> writes:

Well, no-one's complained about the performance. From a robustness point
of view, it might be good to keep the minRecoveryPoint value in a
separate file, for example, to avoid rewriting the control file that
often. Then again, why fix it when it's not broken.

It would only be broken if someone interrupted a crash recovery
mid-flight and tried to establish a recovery stop point before the end
of WAL, no? Why don't we just forbid that case? This would either be
the same as, or a small extension of, the pg_control state vs existence
of recovery.conf error check that was just discussed.

The problem is when you interrupt archive recovery (kill -9), and
restart. After restart, the system needs to know how far the WAL was
replayed before the crash, because it must not open for hot standby
queries, or allow the database to be started up in master-mode, until
it's replayed the WAL up to that same point again.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Heikki Linnakangas (#9)
Re: 9.2.3 crashes during archive recovery

Heikki Linnakangas <hlinnakangas@vmware.com> writes:

On 13.02.2013 21:21, Tom Lane wrote:

It would only be broken if someone interrupted a crash recovery
mid-flight and tried to establish a recovery stop point before the end
of WAL, no? Why don't we just forbid that case? This would either be
the same as, or a small extension of, the pg_control state vs existence
of recovery.conf error check that was just discussed.

The problem is when you interrupt archive recovery (kill -9), and
restart. After restart, the system needs to know how far the WAL was
replayed before the crash, because it must not open for hot standby
queries, or allow the database to be started up in master-mode, until
it's replayed the WAL up to that same point again.

Well, archive recovery is a different scenario --- Simon was questioning
whether we need a minRecoveryPoint mechanism in crash recovery, or at
least that's what I thought he asked.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Tom Lane (#10)
Re: 9.2.3 crashes during archive recovery

On 13.02.2013 21:30, Tom Lane wrote:

Heikki Linnakangas<hlinnakangas@vmware.com> writes:

On 13.02.2013 21:21, Tom Lane wrote:

It would only be broken if someone interrupted a crash recovery
mid-flight and tried to establish a recovery stop point before the end
of WAL, no? Why don't we just forbid that case? This would either be
the same as, or a small extension of, the pg_control state vs existence
of recovery.conf error check that was just discussed.

The problem is when you interrupt archive recovery (kill -9), and
restart. After restart, the system needs to know how far the WAL was
replayed before the crash, because it must not open for hot standby
queries, or allow the database to be started up in master-mode, until
it's replayed the WAL up to that same point again.

Well, archive recovery is a different scenario --- Simon was questioning
whether we need a minRecoveryPoint mechanism in crash recovery, or at
least that's what I thought he asked.

Ah, ok. The short answer to that is "no", because in crash recovery, we
just replay the WAL all the way to the end. I thought he was questioning
updating the control file at every XLogFlush() during archive recovery.
The answer to that is that it's not so bad, because XLogFlush() is
called so infrequently during recovery.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Heikki Linnakangas (#7)
Re: 9.2.3 crashes during archive recovery

Heikki Linnakangas <hlinnakangas@vmware.com> writes:

The problem we're trying to solve is determining how much WAL needs to
be replayed until the database is consistent again. In crash recovery,
the answer is "all of it". That's why the CRC in the WAL is essential;
it's required to determine where the WAL ends. But if we had some other
mechanism, like if we updated minRecoveryPoint after every XLogFlush()
like Simon suggested, we wouldn't necessarily need the CRC to detect end
of WAL (not that I'd suggest removing it anyway), and we could throw an
error if there is corrupt bit somewhere in the WAL before the true end
of WAL.

Meh. I think that would be disastrous from both performance and
reliability standpoints. (Performance because the whole point of WAL is
to commit with only one disk write in one place, and reliability because
of greatly increasing the number of writes to the utterly-critical
pg_control file.)

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Heikki Linnakangas (#11)
Re: 9.2.3 crashes during archive recovery

Heikki Linnakangas <hlinnakangas@vmware.com> writes:

On 13.02.2013 21:30, Tom Lane wrote:

Well, archive recovery is a different scenario --- Simon was questioning
whether we need a minRecoveryPoint mechanism in crash recovery, or at
least that's what I thought he asked.

Ah, ok. The short answer to that is "no", because in crash recovery, we
just replay the WAL all the way to the end. I thought he was questioning
updating the control file at every XLogFlush() during archive recovery.
The answer to that is that it's not so bad, because XLogFlush() is
called so infrequently during recovery.

Right, and it's not so evil from a reliability standpoint either, partly
because of that and partly because, by definition, this isn't your only
copy of the data.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Tom Lane (#3)
Re: 9.2.3 crashes during archive recovery

On 13.02.2013 17:02, Tom Lane wrote:

Heikki Linnakangas<hlinnakangas@vmware.com> writes:

At least in back-branches, I'd call this a pilot error. You can't turn a
master into a standby just by creating a recovery.conf file. At least
not if the master was not shut down cleanly first.
...
I'm not sure that's worth the trouble, though. Perhaps it would be
better to just throw an error if the control file state is
DB_IN_PRODUCTION and a recovery.conf file exists.

+1 for that approach, at least until it's clear there's a market for
doing this sort of thing. I think the error check could be
back-patched, too.

Hmm, I just realized a little problem with that approach. If you take a
base backup using an atomic filesystem backup from a running server, and
start archive recovery from that, that's essentially the same thing as
Kyotaro's test case.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Simon Riggs
simon@2ndQuadrant.com
In reply to: Heikki Linnakangas (#2)
Re: 9.2.3 crashes during archive recovery

On 13 February 2013 09:04, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:

Without step 3, the server would perform crash recovery, and it would work.
But because of the recovery.conf file, the server goes into archive
recovery, and because minRecoveryPoint is not set, it assumes that the
system is consistent from the start.

Aside from the immediate issue with truncation, the system really isn't
consistent until the WAL has been replayed far enough, so it shouldn't open
for hot standby queries. There might be other, later, changes already
flushed to data files. The system has no way of knowing how far it needs to
replay the WAL to become consistent.

At least in back-branches, I'd call this a pilot error. You can't turn a
master into a standby just by creating a recovery.conf file. At least not if
the master was not shut down cleanly first.

If there's a use case for doing that, maybe we can do something better in
HEAD. If the control file says that the system was running
(DB_IN_PRODUCTION), but there is a recovery.conf file, we could do crash
recovery first, until we reach the end of WAL, and go into archive recovery
mode after that. We'd recover all the WAL files in pg_xlog as far as we can,
same as in crash recovery, and only start restoring files from the archive
once we reach the end of WAL in pg_xlog. At that point, we'd also consider
the system as consistent, and start up for hot standby.

I'm not sure that's worth the trouble, though. Perhaps it would be better to
just throw an error if the control file state is DB_IN_PRODUCTION and a
recovery.conf file exists. The admin can always start the server normally
first, shut it down cleanly, and then create the recovery.conf file.

Now I've read the whole thing...

The problem is that we startup Hot Standby before we hit the min
recovery point because that isn't recorded. For me, the thing to do is
to make the min recovery point == end of WAL when state is
DB_IN_PRODUCTION. That way we don't need to do any new writes and we
don't need to risk people seeing inconsistent results if they do this.

But I think that still gives you a timeline problem when putting a
master back into a standby.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16Ants Aasma
ants.aasma@cybertec.at
In reply to: Heikki Linnakangas (#14)
Re: 9.2.3 crashes during archive recovery

On Feb 13, 2013 10:29 PM, "Heikki Linnakangas" <hlinnakangas@vmware.com>
wrote:

Hmm, I just realized a little problem with that approach. If you take a

base backup using an atomic filesystem backup from a running server, and
start archive recovery from that, that's essentially the same thing as
Kyotaro's test case.

Coincidentally I was wondering about the same thing when I was reviewing
our slave provisioning procedures. There didn't seem to be any
communication channel from pg_stop_backup for the slave to know when it was
safe to allow connections.

Maybe there should be some mechanism akin to backup label to communicate
the minimum recovery point? When the min recovery point file exists the
value inside it is used, when the recovery point is reached the file is
removed. pg_basebackup would just append the file to the archive. Custom
backup procedures could also use it to communicate the necessary WAL
location.

--
Ants Aasma

#17Fujii Masao
masao.fujii@gmail.com
In reply to: Heikki Linnakangas (#14)
Re: 9.2.3 crashes during archive recovery

On Thu, Feb 14, 2013 at 5:15 AM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:

On 13.02.2013 17:02, Tom Lane wrote:

Heikki Linnakangas<hlinnakangas@vmware.com> writes:

At least in back-branches, I'd call this a pilot error. You can't turn a
master into a standby just by creating a recovery.conf file. At least
not if the master was not shut down cleanly first.
...
I'm not sure that's worth the trouble, though. Perhaps it would be
better to just throw an error if the control file state is
DB_IN_PRODUCTION and a recovery.conf file exists.

+1 for that approach, at least until it's clear there's a market for
doing this sort of thing. I think the error check could be
back-patched, too.

Hmm, I just realized a little problem with that approach. If you take a base
backup using an atomic filesystem backup from a running server, and start
archive recovery from that, that's essentially the same thing as Kyotaro's
test case.

Yes. And the resource agent for streaming replication in Pacemaker (it's the
OSS clusterware) is the user of that archive recovery scenario, too. When it
starts up the server, it always creates the recovery.conf and starts the server
as the standby. It cannot start the master directly, IOW the server is always
promoted to the master from the standby. So when it starts up the server
after the server crashes, obviously it executes the same recovery scenario
(i.e., force archive recovery instead of crash one) as Kyotaro described.

The reason why that resource agent cannot start up the master directly is
that it manages three server states, called Master, Slave and Down. It can
move the server state from Down to Slave, and the reverse direction.
Also it can move the state from Slave to Master, and the reverse direction.
But there is no way to move the state between Down and Master directly.
This kind of the state transition model is isolated case in
clusterware, I think.

Regards,

--
Fujii Masao

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18Fujii Masao
masao.fujii@gmail.com
In reply to: Simon Riggs (#15)
Re: 9.2.3 crashes during archive recovery

On Thu, Feb 14, 2013 at 5:52 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

On 13 February 2013 09:04, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:

Without step 3, the server would perform crash recovery, and it would work.
But because of the recovery.conf file, the server goes into archive
recovery, and because minRecoveryPoint is not set, it assumes that the
system is consistent from the start.

Aside from the immediate issue with truncation, the system really isn't
consistent until the WAL has been replayed far enough, so it shouldn't open
for hot standby queries. There might be other, later, changes already
flushed to data files. The system has no way of knowing how far it needs to
replay the WAL to become consistent.

At least in back-branches, I'd call this a pilot error. You can't turn a
master into a standby just by creating a recovery.conf file. At least not if
the master was not shut down cleanly first.

If there's a use case for doing that, maybe we can do something better in
HEAD. If the control file says that the system was running
(DB_IN_PRODUCTION), but there is a recovery.conf file, we could do crash
recovery first, until we reach the end of WAL, and go into archive recovery
mode after that. We'd recover all the WAL files in pg_xlog as far as we can,
same as in crash recovery, and only start restoring files from the archive
once we reach the end of WAL in pg_xlog. At that point, we'd also consider
the system as consistent, and start up for hot standby.

I'm not sure that's worth the trouble, though. Perhaps it would be better to
just throw an error if the control file state is DB_IN_PRODUCTION and a
recovery.conf file exists. The admin can always start the server normally
first, shut it down cleanly, and then create the recovery.conf file.

Now I've read the whole thing...

The problem is that we startup Hot Standby before we hit the min
recovery point because that isn't recorded. For me, the thing to do is
to make the min recovery point == end of WAL when state is
DB_IN_PRODUCTION. That way we don't need to do any new writes and we
don't need to risk people seeing inconsistent results if they do this.

+1

And if it's the standby case, the min recovery point can be set to the end
of WAL files located in the standby. IOW, we can regard the database as
consistent when we replay all the WAL files in local and try to connect to
the master.

Regards,

--
Fujii Masao

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Heikki Linnakangas (#14)
Re: 9.2.3 crashes during archive recovery

Sorry, I omitted to show how we found this issue.

In HA DB cluster cosists of Pacemaker and PostgreSQL, PostgreSQL
is stopped by 'pg_ctl stop -m i' regardless of situation.

On the other hand, PosrgreSQL RA(Rsource Agent) is obliged to
start the master node via hot standby state because of the
restriction of the state transition of Pacemaker,

So the simply stopping and then starting the master node can fall
into this situation.

Hmm, I just realized a little problem with that approach. If you take
a base backup using an atomic filesystem backup from a running server,
and start archive recovery from that, that's essentially the same
thing as Kyotaro's test case.

Regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20Ants Aasma
ants.aasma@cybertec.at
In reply to: Simon Riggs (#15)
Re: 9.2.3 crashes during archive recovery

On Wed, Feb 13, 2013 at 10:52 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

The problem is that we startup Hot Standby before we hit the min
recovery point because that isn't recorded. For me, the thing to do is
to make the min recovery point == end of WAL when state is
DB_IN_PRODUCTION. That way we don't need to do any new writes and we
don't need to risk people seeing inconsistent results if they do this.

While this solution would help solve my issue, it assumes that the
correct amount of WAL files are actually there. Currently the docs for
setting up a standby refer to "24.3.4. Recovering Using a Continuous
Archive Backup", and that step recommends emptying the contents of
pg_xlog. If this is chosen as the solution the docs should be adjusted
to recommend using pg_basebackup -x for setting up the standby. As a
related point, pointing standby setup to that section has confused at
least one of my clients. That chapter is rather scarily complicated
compared to what's usually necessary.

Ants Aasma
--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Ants Aasma (#20)
#22Ants Aasma
ants.aasma@cybertec.at
In reply to: Heikki Linnakangas (#21)
#23Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Ants Aasma (#22)
#24Ants Aasma
ants.aasma@cybertec.at
In reply to: Heikki Linnakangas (#23)
#25Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Ants Aasma (#24)
#26Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Kyotaro Horiguchi (#25)
#27Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Kyotaro Horiguchi (#26)
#28Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Heikki Linnakangas (#21)
#29Michael Paquier
michael@paquier.xyz
In reply to: Heikki Linnakangas (#28)
#30Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Fujii Masao (#17)
#31Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Kyotaro Horiguchi (#19)
#32Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Michael Paquier (#29)
#33Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Heikki Linnakangas (#32)
#34Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Heikki Linnakangas (#31)
#35Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Heikki Linnakangas (#27)
#36Josh Berkus
josh@agliodbs.com
In reply to: Kyotaro Horiguchi (#1)
#37Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Kyotaro Horiguchi (#33)
#38Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Kyotaro Horiguchi (#37)
#39Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Kyotaro Horiguchi (#38)
#40KONDO Mitsumasa
kondo.mitsumasa@lab.ntt.co.jp
In reply to: Kyotaro Horiguchi (#38)
#41Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: KONDO Mitsumasa (#40)
#42Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Kyotaro Horiguchi (#41)
#43Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: KONDO Mitsumasa (#40)
#44KONDO Mitsumasa
kondo.mitsumasa@lab.ntt.co.jp
In reply to: Heikki Linnakangas (#43)
#45Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: KONDO Mitsumasa (#44)
#46KONDO Mitsumasa
kondo.mitsumasa@lab.ntt.co.jp
In reply to: Heikki Linnakangas (#45)
#47Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: KONDO Mitsumasa (#46)