Archive recovery won't be completed on some situation.
Hello, we found that postgreql won't complete archive recovery
foever on some situation. This occurs HEAD, 9.3.3, 9.2.7, 9.1.12.
Restarting server with archive recovery fails as following just
after it was killed with SIGKILL after pg_start_backup and some
wal writes but before pg_stop_backup.
| FATAL: WAL ends before end of online backup
| HINT: Online backup started with pg_start_backup() must be
| ended with pg_stop_backup(), and all WAL up to that point must
| be available at recovery.
What the mess is once entering this situation, I could find no
formal operation to exit from it.
On this situation, 'Backup start location' in controldata has
some valid location but corresponding 'end of backup' WAL record
won't come forever.
But I think PG cannot tell the situation dintinctly whether the
'end of backup' reocred is not exists at all or it will come
later especially when the server starts as a streaming
replication hot-standby.
One solution for it would be a new parameter in recovery.conf
which tells that the operator wants the server to start as if
there were no backup label ever before when the situation
comes. It looks ugly and somewhat danger but seems necessary.
The first attached file is the script to replay the problem, and
the second is the patch trying to do what is described above.
After applying this patch on HEAD and uncommneting the
'cancel_backup_label_on_failure = true' in test.sh, the test
script runs as following,
| LOG: record with zero length at 0/2010F40
| WARNING: backup_label was canceled.
| HINT: server might have crashed during backup mode.
| LOG: consistent recovery state reached at 0/2010F40
| LOG: redo done at 0/2010DA0
What do you thing about this?
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
On 03/14/2014 12:32 PM, Kyotaro HORIGUCHI wrote:
Restarting server with archive recovery fails as following just
after it was killed with SIGKILL after pg_start_backup and some
wal writes but before pg_stop_backup.| FATAL: WAL ends before end of online backup
| HINT: Online backup started with pg_start_backup() must be
| ended with pg_stop_backup(), and all WAL up to that point must
| be available at recovery.What the mess is once entering this situation, I could find no
formal operation to exit from it.On this situation, 'Backup start location' in controldata has
some valid location but corresponding 'end of backup' WAL record
won't come forever.But I think PG cannot tell the situation dintinctly whether the
'end of backup' reocred is not exists at all or it will come
later especially when the server starts as a streaming
replication hot-standby.
If you kill the server while a backup is in progress, the backup is
broken. It's correct that the server refuses to start up from the broken
backup. So basically, don't do that.
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Thank you.
2014/03/14 19:42 "Heikki Linnakangas" <hlinnakangas@vmware.com>:
On 03/14/2014 12:32 PM, Kyotaro HORIGUCHI wrote:
Restarting server with archive recovery fails as following just
after it was killed with SIGKILL after pg_start_backup and some
wal writes but before pg_stop_backup.| FATAL: WAL ends before end of online backup
| HINT: Online backup started with pg_start_backup() must be
| ended with pg_stop_backup(), and all WAL up to that point must
| be available at recovery.What the mess is once entering this situation, I could find no
formal operation to exit from it.
If you kill the server while a backup is in progress, the backup is
broken. It's correct that the server refuses to start up from the broken
backup. So basically, don't do that.
Hmm.. What I did is simplly restarting server just after a crash but the
server was accidentially in backup mode. No backup copy is used. Basically,
the server is in the same situation with the simple restart after crash.
The difference here is the restarting made the database completly useless
while it had been not. I wish to save the database for the case and I
suppose it so acceptable.
regards,
--
Kyotaro Horiguchi
NTT Opensource Software Center
Sorry, I wrote a little wrong.
2014/03/14 20:24 "Kyotaro HORIGUCHI" <horiguchi.kyotaro@lab.ntt.co.jp>:
I wish to save the database for the case and I suppose it so acceptable.
and I don't suppose it so unacceptable.
regards,
--
Kyotaro Horiguchi
NTT Opensource Software Center
On 03/14/2014 01:24 PM, Kyotaro HORIGUCHI wrote:
Hmm.. What I did is simplly restarting server just after a crash but the
server was accidentially in backup mode. No backup copy is used. Basically,
the server is in the same situation with the simple restart after crash.
You created recovery.conf in the master server after crash. Just don't
do that.
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hello,
2014/03/14 20:51 "Heikki Linnakangas" <hlinnakangas@vmware.com>:
You created recovery.conf in the master server after crash. Just don't do
that.
Ah, ok. I understood what you meant.
Sorry that I can't confirm rihgt now, the original issue should occur on
the standby. I might've oversimplicated.
regards,
--
Kyotaro Horiguchi
NTT Opensource Software Center
Umm.. Sorry for repeated correction.
2014/03/14 21:12 "Kyotaro HORIGUCHI" <kyota.horiguchi@gmail.com>:
Ah, ok. I understood what you meant.
Sorry that I can't confirm rihgt now, the original issue should occur on
the standby.
The original issue should have occurred on standby
Show quoted text
I might've oversimplicated.
regards,
--
Kyotaro Horiguchi
NTT Opensource Software Center
On Fri, Mar 14, 2014 at 7:32 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
Hello, we found that postgreql won't complete archive recovery
foever on some situation. This occurs HEAD, 9.3.3, 9.2.7, 9.1.12.Restarting server with archive recovery fails as following just
after it was killed with SIGKILL after pg_start_backup and some
wal writes but before pg_stop_backup.| FATAL: WAL ends before end of online backup
| HINT: Online backup started with pg_start_backup() must be
| ended with pg_stop_backup(), and all WAL up to that point must
| be available at recovery.What the mess is once entering this situation, I could find no
formal operation to exit from it.
Though this is formal way, you can exit from that situation by
(1) Remove recovery.conf and start the server with crash recovery
(2) Execute pg_start_backup() after crash recovery ends
(3) Copy backup_label to somewhere
(4) Execute pg_stop_backup() and shutdown the server
(5) Copy backup_label back to $PGDATA
(6) Create recovery.conf and start the server with archive recovery
On this situation, 'Backup start location' in controldata has
some valid location but corresponding 'end of backup' WAL record
won't come forever.But I think PG cannot tell the situation dintinctly whether the
'end of backup' reocred is not exists at all or it will come
later especially when the server starts as a streaming
replication hot-standby.One solution for it would be a new parameter in recovery.conf
which tells that the operator wants the server to start as if
there were no backup label ever before when the situation
comes. It looks ugly and somewhat danger but seems necessary.The first attached file is the script to replay the problem, and
the second is the patch trying to do what is described above.After applying this patch on HEAD and uncommneting the
'cancel_backup_label_on_failure = true' in test.sh, the test
script runs as following,| LOG: record with zero length at 0/2010F40
| WARNING: backup_label was canceled.
| HINT: server might have crashed during backup mode.
| LOG: consistent recovery state reached at 0/2010F40
| LOG: redo done at 0/2010DA0What do you thing about this?
What about adding new option into pg_resetxlog so that we can
reset the pg_control's backup start location? Even after we've
accidentally entered into the situation that you described, we can
exit from that by resetting the backup start location in pg_control.
Also this option seems helpful to salvage the data as a last resort
from the corrupted backup.
Regards,
--
Fujii Masao
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Thank you for good suggestion.
What the mess is once entering this situation, I could find no
formal operation to exit from it.Though this is formal way, you can exit from that situation by
(1) Remove recovery.conf and start the server with crash recovery
(2) Execute pg_start_backup() after crash recovery ends
(3) Copy backup_label to somewhere
(4) Execute pg_stop_backup() and shutdown the server
(5) Copy backup_label back to $PGDATA
(6) Create recovery.conf and start the server with archive recovery
It will do. And pg_resetxlog was the first thing I checked out
for reseting backupStartPoint.
What about adding new option into pg_resetxlog so that we can
reset the pg_control's backup start location? Even after we've
accidentally entered into the situation that you described, we can
exit from that by resetting the backup start location in pg_control.
Also this option seems helpful to salvage the data as a last resort
from the corrupted backup.
It is in far better proportion than recovery.conf option:), since
it is already warned to be dangerous as its nature. Anyway I'll
make sure the situation under the trouble fist.
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hello, very sorry to have bothered you by silly question.
me> It is in far better proportion than recovery.conf option:), since
me> it is already warned to be dangerous as its nature. Anyway I'll
me> make sure the situation under the trouble fist.
It looks exactly the 'starting up as standby of ex-master which
crashed during backup mode' case as I checked out the original
issue. I agree that no save is needed for the case since it is
simply a db corruption. Usefulness of pg_resetxlog's
resetting-backup_label-related-items feature is not clear so far,
so I don't wish it realised for this time.
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 03/15/2014 05:59 PM, Fujii Masao wrote:
What about adding new option into pg_resetxlog so that we can
reset the pg_control's backup start location? Even after we've
accidentally entered into the situation that you described, we can
exit from that by resetting the backup start location in pg_control.
Also this option seems helpful to salvage the data as a last resort
from the corrupted backup.
Yeah, seems reasonable. After you run pg_resetxlog, there's no hope that
the backup end record would arrive any time later. And if it does, it
won't really do much good after you've reset the WAL.
We probably should just clear out the backup start/stop location always
when you run pg_resetxlog. Your database is potentially broken if you
reset the WAL before reaching consistency, but if forcibly do that with
"pg_resetxlog -f", you've been warned.
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hello, thank you for suggestions.
The *problematic* operation sequence I saw was performed by
pgsql-RA/Pacemaker. It stops a server already with immediate mode
and starts the Master as a Standby at first, then
promote. Focusing on this situation, there would be reasonable to
reset backup positions. 9.4 canceles backup mode even on
immediate shutdown so the operation causes no problem, but 9.3
and before are doesn't. Finally, needed amendments per versions
are
9.4: Nothing more is needed (but resetting backup mode by
resetxlog is acceptable)
9.3: Can be recovered without resetting backup positions in
controlfile. (but smarter with it)
9.2: Same to 9.3
9.1: Cannot be recoverd without directly resetting backup
position in controlfile. Resetting feature is needed.
At Mon, 17 Mar 2014 15:59:09 +0200, Heikki Linnakangas wrote
On 03/15/2014 05:59 PM, Fujii Masao wrote:
What about adding new option into pg_resetxlog so that we can
reset the pg_control's backup start location? Even after we've
accidentally entered into the situation that you described, we can
exit from that by resetting the backup start location in pg_control.
Also this option seems helpful to salvage the data as a last resort
from the corrupted backup.Yeah, seems reasonable. After you run pg_resetxlog, there's no hope
that the backup end record would arrive any time later. And if it
does, it won't really do much good after you've reset the WAL.We probably should just clear out the backup start/stop location
always when you run pg_resetxlog. Your database is potentially broken
if you reset the WAL before reaching consistency, but if forcibly do
that with "pg_resetxlog -f", you've been warned.
Agreed. Attached patches do that and I could "recover" the
database state with following steps,
(1) Remove recovery.conf and do pg_resetxlog -bf
(the option name 'b' would be arguable)
(2) Start the server (with crash recovery)
(3) Stop the server (in any mode)
(4) Create recovery.conf and start the server with archive recovery.
Some annoyance in step 2 and 3 but I don't want to support the
pacemaker's in-a-sense broken sequence no further:(
This is alterable by the following steps suggested in Masao's
previous mail for 9.2 and alter, but 9.1 needs forcibly resetting
startBackupPoint.
At Sun, 16 Mar 2014 00:59:01 +0900, Fujii Masao wrote
Though this is formal way, you can exit from that situation by
(1) Remove recovery.conf and start the server with crash recovery
(2) Execute pg_start_backup() after crash recovery ends
(3) Copy backup_label to somewhere
(4) Execute pg_stop_backup() and shutdown the server
(5) Copy backup_label back to $PGDATA
(6) Create recovery.conf and start the server with archive recovery
This worked for 9.2, 9.3 and HEAD but failed for 9.1 at step 1.
| 2014-03-19 15:53:02.512 JST FATAL: WAL ends before end of online backup
| 2014-03-19 15:53:02.512 JST HINT: Online backup started with pg_start_backup() must be ended with pg_stop_backup(), and all WAL up to that point must be available at recovery.
This seems inevitable.
| if (InRecovery &&
| (XLByteLT(EndOfLog, minRecoveryPoint) ||
| !XLogRecPtrIsInvalid(ControlFile->backupStartPoint)))
| {
...
| /*
| * Ran off end of WAL before reaching end-of-backup WAL record, or
| * minRecoveryPoint.
| */
| if (!XLogRecPtrIsInvalid(ControlFile->backupStartPoint))
| ereport(FATAL,
| (errmsg("WAL ends before end of online backup"),
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
Import Notes
Reply to msg id not found: 5326FFAD.4010100@vmware.comCAHGQGwHftiGCRrh0LgvQ+VsoK5tcwFR7vYWHZoBP9h2pCxo+1g@mail.gmail.com
On Wed, Mar 19, 2014 at 5:28 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
Hello, thank you for suggestions.
The *problematic* operation sequence I saw was performed by
pgsql-RA/Pacemaker. It stops a server already with immediate mode
and starts the Master as a Standby at first, then
promote. Focusing on this situation, there would be reasonable to
reset backup positions. 9.4 canceles backup mode even on
immediate shutdown so the operation causes no problem, but 9.3
and before are doesn't. Finally, needed amendments per versions
are9.4: Nothing more is needed (but resetting backup mode by
resetxlog is acceptable)9.3: Can be recovered without resetting backup positions in
controlfile. (but smarter with it)9.2: Same to 9.3
9.1: Cannot be recoverd without directly resetting backup
position in controlfile. Resetting feature is needed.At Mon, 17 Mar 2014 15:59:09 +0200, Heikki Linnakangas wrote
On 03/15/2014 05:59 PM, Fujii Masao wrote:
What about adding new option into pg_resetxlog so that we can
reset the pg_control's backup start location? Even after we've
accidentally entered into the situation that you described, we can
exit from that by resetting the backup start location in pg_control.
Also this option seems helpful to salvage the data as a last resort
from the corrupted backup.Yeah, seems reasonable. After you run pg_resetxlog, there's no hope
that the backup end record would arrive any time later. And if it
does, it won't really do much good after you've reset the WAL.We probably should just clear out the backup start/stop location
always when you run pg_resetxlog. Your database is potentially broken
if you reset the WAL before reaching consistency, but if forcibly do
that with "pg_resetxlog -f", you've been warned.Agreed. Attached patches do that and I could "recover" the
database state with following steps,
Adding new option looks like new feature rather than bug fix.
I'm afraid that the backpatch of such a change to 9.3 or before
is not acceptable.
Regards,
--
Fujii Masao
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 03/19/2014 10:28 AM, Kyotaro HORIGUCHI wrote:
The*problematic* operation sequence I saw was performed by
pgsql-RA/Pacemaker. It stops a server already with immediate mode
and starts the Master as a Standby at first, then
promote. Focusing on this situation, there would be reasonable to
reset backup positions.
Well, that's scary. I would suggest doing a fast shutdown instead. But
if you really want to do an immediate shutdown, you should delete the
backup_label file after the shutdown
When restarting after immediate shutdown and a backup_label file is
present, the system doesn't know if the system crashed during a backup,
and it needs to perform crash recovery, or if you're trying restore from
a backup. It makes a compromise, and starts recovery from the checkpoint
given in the backup_label, as if it was restoring from a backup, but if
it doesn't see a backup-end WAL record, it just starts up anyway (which
would be wrong if you are indeed restoring from a backup). But if you
create a recovery.conf file, that indicates that you are definitely
restoring from a backup, so it's more strict and insists that the
backup-end record must be replayed.
9.4 canceles backup mode even on
immediate shutdown so the operation causes no problem, but 9.3
and before are doesn't.
Hmm, I don't think we've changed that behavior in 9.4.
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Mar 19, 2014 at 7:57 PM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:
On 03/19/2014 10:28 AM, Kyotaro HORIGUCHI wrote:
The*problematic* operation sequence I saw was performed by
pgsql-RA/Pacemaker. It stops a server already with immediate mode
and starts the Master as a Standby at first, then
promote. Focusing on this situation, there would be reasonable to
reset backup positions.Well, that's scary. I would suggest doing a fast shutdown instead. But if
you really want to do an immediate shutdown, you should delete the
backup_label file after the shutdownWhen restarting after immediate shutdown and a backup_label file is present,
the system doesn't know if the system crashed during a backup, and it needs
to perform crash recovery, or if you're trying restore from a backup. It
makes a compromise, and starts recovery from the checkpoint given in the
backup_label, as if it was restoring from a backup, but if it doesn't see a
backup-end WAL record, it just starts up anyway (which would be wrong if you
are indeed restoring from a backup). But if you create a recovery.conf file,
that indicates that you are definitely restoring from a backup, so it's more
strict and insists that the backup-end record must be replayed.9.4 canceles backup mode even on
immediate shutdown so the operation causes no problem, but 9.3
and before are doesn't.Hmm, I don't think we've changed that behavior in 9.4.
ISTM 82233ce7ea42d6ba519aaec63008aff49da6c7af changed immdiate
shutdown that way.
Regards,
--
Fujii Masao
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Fujii Masao escribi�:
On Wed, Mar 19, 2014 at 7:57 PM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:
9.4 canceles backup mode even on immediate shutdown so the
operation causes no problem, but 9.3 and before are doesn't.Hmm, I don't think we've changed that behavior in 9.4.
ISTM 82233ce7ea42d6ba519aaec63008aff49da6c7af changed immdiate
shutdown that way.
Uh, interesting. I didn't see that secondary effect. I hope it's not
for ill?
--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi, I confirmed that 82233ce7ea4 surely did it.
At Wed, 19 Mar 2014 09:35:16 -0300, Alvaro Herrera wrote
Fujii Masao escribió:
On Wed, Mar 19, 2014 at 7:57 PM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:9.4 canceles backup mode even on immediate shutdown so the
operation causes no problem, but 9.3 and before are doesn't.Hmm, I don't think we've changed that behavior in 9.4.
ISTM 82233ce7ea42d6ba519aaec63008aff49da6c7af changed immdiate
shutdown that way.Uh, interesting. I didn't see that secondary effect. I hope it's not
for ill?
The crucial factor for the behavior change is that pmdie has
become not to exit immediately for SIGQUIT. 'case SIGQUIT:' in
pmdie() ended with "ExitPostmaster(0)" before the patch but now
it ends with 'PostmasterStateMachine(); break;' so continues to
run with pmState = PM_WAIT_BACKENDS, similar to SIGINT (fast
shutdown).
After all, pmState changes to PM_NO_CHILDREN via PM_WAIT_DEAD_END
by SIGCHLDs from non-significant processes, then CancelBackup().
Focusing on the point described above, the small patch below
rewinds the behavior back to 9.3 and before but I don't know the
appropriateness in regard to the intention of the patch.
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index e9072b7..f87c25c 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -2498,16 +2498,7 @@ pmdie(SIGNAL_ARGS)
(errmsg("received immediate shutdown request")));
TerminateChildren(SIGQUIT);
- pmState = PM_WAIT_BACKENDS;
-
- /* set stopwatch for them to die */
- AbortStartTime = time(NULL);
-
- /*
- * Now wait for backends to exit. If there are none,
- * PostmasterStateMachine will take the next step.
- */
- PostmasterStateMachine();
+ ExitPostmaster(0);
break;
}
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hello,
At Wed, 19 Mar 2014 19:34:10 +0900, Fujii Masao wrote
Agreed. Attached patches do that and I could "recover" the
database state with following steps,Adding new option looks like new feature rather than bug fix.
I'm afraid that the backpatch of such a change to 9.3 or before
is not acceptable.
Me too. But on the other hand it simplly is a relief for the
consequence of the behavior of server (altough it was ill
operation:), and especially it is needed for at least 9.1 which
seems cannot be saved without it. Plus it has utterly no impact
on servers' behavior of any corresponding versions. So I hope it
is accepted.
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hello,
On 03/19/2014 10:28 AM, Kyotaro HORIGUCHI wrote:
The*problematic* operation sequence I saw was performed by
pgsql-RA/Pacemaker. It stops a server already with immediate mode
and starts the Master as a Standby at first, then
promote. Focusing on this situation, there would be reasonable to
reset backup positions.Well, that's scary. I would suggest doing a fast shutdown instead. But
if you really want to do an immediate shutdown, you should delete the
backup_label file after the shutdown
"We"'d also said them the former thing on several occations. They
answered that they hate shutdown checkpoint to take long time
before shutdown is completed. The latter one has not come on my
mind and seems promising. Thank you for the suggestion.
When restarting after immediate shutdown and a backup_label file is
present, the system doesn't know if the system crashed during a
backup, and it needs to perform crash recovery, or if you're trying
restore from a backup. It makes a compromise, and starts recovery from
the checkpoint given in the backup_label, as if it was restoring from
a backup, but if it doesn't see a backup-end WAL record, it just
starts up anyway (which would be wrong if you are indeed restoring
from a backup). But if you create a recovery.conf file, that indicates
that you are definitely restoring from a backup, so it's more strict
and insists that the backup-end record must be replayed.9.4 canceles backup mode even on
immediate shutdown so the operation causes no problem, but 9.3
and before are doesn't.Hmm, I don't think we've changed that behavior in 9.4.
Now pmdie behaves in the similar manner between fast and
immediate shutdown after 82233ce7ea42d6ba519. It is an side
effect of a change on immediate shutdown which make it to wait
the children to die by SIGQUIT.
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Kyotaro HORIGUCHI escribi�:
Hi, I confirmed that 82233ce7ea4 surely did it.
At Wed, 19 Mar 2014 09:35:16 -0300, Alvaro Herrera wrote
Fujii Masao escribi�:
On Wed, Mar 19, 2014 at 7:57 PM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:9.4 canceles backup mode even on immediate shutdown so the
operation causes no problem, but 9.3 and before are doesn't.Hmm, I don't think we've changed that behavior in 9.4.
ISTM 82233ce7ea42d6ba519aaec63008aff49da6c7af changed immdiate
shutdown that way.Uh, interesting. I didn't see that secondary effect. I hope it's not
for ill?The crucial factor for the behavior change is that pmdie has
become not to exit immediately for SIGQUIT. 'case SIGQUIT:' in
pmdie() ended with "ExitPostmaster(0)" before the patch but now
it ends with 'PostmasterStateMachine(); break;' so continues to
run with pmState = PM_WAIT_BACKENDS, similar to SIGINT (fast
shutdown).After all, pmState changes to PM_NO_CHILDREN via PM_WAIT_DEAD_END
by SIGCHLDs from non-significant processes, then CancelBackup().
Judging from what was being said on the thread, it seems that running
CancelBackup() after an immediate shutdown is better than not doing it,
correct?
Focusing on the point described above, the small patch below
rewinds the behavior back to 9.3 and before but I don't know the
appropriateness in regard to the intention of the patch.
I see. Obviously your patch would, in effect, revert 82233ce7ea
completely, which is not something we want. I think if we want to go
back to the previous behavior of not stopping the backup, some other
method should be used.
--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers