[9.3 bug] disk space in pg_xlog increases during archive recovery
Hello,
I'm sorry I've been touching several things recently before fixing any of
them.
I've noticed undesirable disk space increase while performing archive
recovery with PostgreSQL 9.3. This happens with 9.2, too.
I just performed archived recovery with the following parameters in
recovery.conf. I'm not using replication.
restore_command = 'cp ...'
recovery_target_timeline = 'latest'
As the archive recovery progresses, the disk space used by $PGDATA/pg_xlog
increases. It seems that restored archive WAL files are accumulated there.
This is considerable amount depending on the number of archived WAL files.
In my case, the recovery failed because of the shortage of disk space. This
did not happen with 9.1
The cause appears to be KeepFileRestoredFromArchive(). This function saves
restored archive WAL files in pg_xlog/. I guess this is for cascading
replication, a new feature added in 9.2.
So, I think it is a bug that the disk space increases if not using cascading
replication. Those who migrated from 9.1 and do not use 9.2 features would
be surprised like me.
Do you think this should be fixed? How should it be fixed? If possible,
could you fix it in the next minor release? If you all are busy, I'll try
to fix it, but give me advice how to do that.
Regards
MauMau
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sun, Jul 28, 2013 at 7:59 AM, MauMau <maumau307@gmail.com> wrote:
Hello,
I'm sorry I've been touching several things recently before fixing any of
them.I've noticed undesirable disk space increase while performing archive
recovery with PostgreSQL 9.3. This happens with 9.2, too.I just performed archived recovery with the following parameters in
recovery.conf. I'm not using replication.restore_command = 'cp ...'
recovery_target_timeline = 'latest'As the archive recovery progresses, the disk space used by $PGDATA/pg_xlog
increases. It seems that restored archive WAL files are accumulated there.
This is considerable amount depending on the number of archived WAL files.
In my case, the recovery failed because of the shortage of disk space. This
did not happen with 9.1The cause appears to be KeepFileRestoredFromArchive(). This function saves
restored archive WAL files in pg_xlog/. I guess this is for cascading
replication, a new feature added in 9.2.
Yes. Those accumulated WAL files will be removed basically when
restartpoint ends.
But if restartpoint is slow compared to the WAL reply, the problem you described
would happen.
So, I think it is a bug that the disk space increases if not using cascading
replication. Those who migrated from 9.1 and do not use 9.2 features would
be surprised like me.Do you think this should be fixed?
I think so.
How should it be fixed?
What about removing the restored archived file as soon as it's replayed
if cascading replication is not enabled (i.e., max_wal_senders = 0 or
hot_standby = off)? This doesn't seem to break the existing behavior
in 9.2.
Regards,
--
Fujii Masao
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
From: "Fujii Masao" <masao.fujii@gmail.com>
So, I think it is a bug that the disk space increases if not using
cascading
replication. Those who migrated from 9.1 and do not use 9.2 features
would
be surprised like me.Do you think this should be fixed?
I think so.
Thanks for your agreement. I'll try to submit the patch as soon as possible
so that the fix will be included in the next minor release and 9.3. If it
seems unexpectedly difficult, I'd like to consult you.
How should it be fixed?
What about removing the restored archived file as soon as it's replayed
if cascading replication is not enabled (i.e., max_wal_senders = 0 or
hot_standby = off)? This doesn't seem to break the existing behavior
in 9.2.
I'll consider this condition, but I wonder if 9.1 users are setting
max_wal_senders>0 and hot_standby=on even when just performing archive
recovery (e.g. to use the same recovery.conf for all cases). What do you
think? Is there any other good condition to judge if cascading replication
is used?
Regards
MauMau
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hello, Fujii san, all,
From: "Fujii Masao" <masao.fujii@gmail.com>
On Sun, Jul 28, 2013 at 7:59 AM, MauMau <maumau307@gmail.com> wrote:
Do you think this should be fixed?I think so.
How should it be fixed?
What about removing the restored archived file as soon as it's replayed
if cascading replication is not enabled (i.e., max_wal_senders = 0 or
hot_standby = off)? This doesn't seem to break the existing behavior
in 9.2.
Please find attached the patch to fix the problem. I changed to keep
restored WAL files in pg_xlog/ only on a cascading standby server. Could
you review and commit this?
BTW, KeepFileRestoredFromArchive() is also called to keep timeline history
files in pg_xlog/. What is this for? Is this necessary for recovery other
than cascading standbys? This seems to accumulate timeline history files
forever in pg_xlog/.
Regards
MauMau
Attachments:
skip_wal_save.patchapplication/octet-stream; name=skip_wal_save.patchDownload
diff -rpcd a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
*** a/src/backend/access/transam/xlog.c 2013-06-25 03:55:41.000000000 +0900
--- b/src/backend/access/transam/xlog.c 2013-07-31 22:26:31.913000000 +0900
*************** XLogFileRead(XLogSegNo segno, int emode,
*** 2658,2666 ****
/*
* If the segment was fetched from archival storage, replace the existing
! * xlog segment (if any) with the archival version.
*/
! if (source == XLOG_FROM_ARCHIVE)
{
KeepFileRestoredFromArchive(path, xlogfname);
--- 2658,2667 ----
/*
* If the segment was fetched from archival storage, replace the existing
! * xlog segment (if any) with the archival version on a cascading standby.
*/
! if (source == XLOG_FROM_ARCHIVE &&
! StandbyModeRequested && AllowCascadeReplication())
{
KeepFileRestoredFromArchive(path, xlogfname);
On Wed, Jul 31, 2013 at 10:43 PM, MauMau <maumau307@gmail.com> wrote:
Hello, Fujii san, all,
From: "Fujii Masao" <masao.fujii@gmail.com>
On Sun, Jul 28, 2013 at 7:59 AM, MauMau <maumau307@gmail.com> wrote:
Do you think this should be fixed?I think so.
How should it be fixed?
What about removing the restored archived file as soon as it's replayed
if cascading replication is not enabled (i.e., max_wal_senders = 0 or
hot_standby = off)? This doesn't seem to break the existing behavior
in 9.2.Please find attached the patch to fix the problem. I changed to keep
restored WAL files in pg_xlog/ only on a cascading standby server. Could
you review and commit this?
- if (source == XLOG_FROM_ARCHIVE)
+ if (source == XLOG_FROM_ARCHIVE &&
+ StandbyModeRequested && AllowCascadeReplication())
I think that the condition of StandbyModeRequested should be removed
because someone might want to set up the cascade standby from the standby
of warm-standby configuration.
BTW, KeepFileRestoredFromArchive() is also called to keep timeline history
files in pg_xlog/. What is this for? Is this necessary for recovery other
than cascading standbys?
Yes. Please see 60df192aea0e6458f20301546e11f7673c102101
This seems to accumulate timeline history files
forever in pg_xlog/.
Yes, because basically there is no way to delete the timeline history files
from pg_xlog.
Regards,
--
Fujii Masao
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
From: "Fujii Masao" <masao.fujii@gmail.com>
- if (source == XLOG_FROM_ARCHIVE) + if (source == XLOG_FROM_ARCHIVE && + StandbyModeRequested && AllowCascadeReplication())I think that the condition of StandbyModeRequested should be removed
because someone might want to set up the cascade standby from the standby
of warm-standby configuration.
Fixed and attached the revised patch.
However, isn't StandbyRequested true (= standby_mode set to on) to enable
warm standby? I'm afraid people set max_wal_senders>0 and hot_standby=on
even on the primary server to make the contents of postgresql.conf identical
on both the primary and the standby for easier configuration. If so, normal
archive recovery (PITR, not the standby recovery) would face the original
problem -- unnecessary WAL accumulation in pg_xlog/. So I'm wonder if
AllowCascadeReplication() is enough.
Please take either this patch or the previous one.
Regards
MauMau
Attachments:
skip_wal_save_v2.patchapplication/octet-stream; name=skip_wal_save_v2.patchDownload
diff -rpcd a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
*** a/src/backend/access/transam/xlog.c 2013-06-25 03:55:41.000000000 +0900
--- b/src/backend/access/transam/xlog.c 2013-08-01 07:19:32.599000000 +0900
*************** XLogFileRead(XLogSegNo segno, int emode,
*** 2658,2666 ****
/*
* If the segment was fetched from archival storage, replace the existing
! * xlog segment (if any) with the archival version.
*/
! if (source == XLOG_FROM_ARCHIVE)
{
KeepFileRestoredFromArchive(path, xlogfname);
--- 2658,2668 ----
/*
* If the segment was fetched from archival storage, replace the existing
! * xlog segment (if any) with the archival version when cascading
! * replication is allowed.
*/
! if (source == XLOG_FROM_ARCHIVE &&
! AllowCascadeReplication())
{
KeepFileRestoredFromArchive(path, xlogfname);
On Thu, Aug 1, 2013 at 7:37 AM, MauMau <maumau307@gmail.com> wrote:
From: "Fujii Masao" <masao.fujii@gmail.com>
- if (source == XLOG_FROM_ARCHIVE) + if (source == XLOG_FROM_ARCHIVE && + StandbyModeRequested && AllowCascadeReplication())I think that the condition of StandbyModeRequested should be removed
because someone might want to set up the cascade standby from the standby
of warm-standby configuration.Fixed and attached the revised patch.
However, isn't StandbyRequested true (= standby_mode set to on) to enable
warm standby?
We can set up warm-standby by using pg_standby even if standby_mode = off.
I'm afraid people set max_wal_senders>0 and hot_standby=on
even on the primary server to make the contents of postgresql.conf identical
on both the primary and the standby for easier configuration. If so, normal
archive recovery (PITR, not the standby recovery) would face the original
problem -- unnecessary WAL accumulation in pg_xlog/. So I'm wonder if
AllowCascadeReplication() is enough.
One idea is to add new GUC parameter like enable_cascade_replication
and then prevent WAL file from being kept in pg_xlog if that parameter is off.
But we cannot apply such change into the already-released version 9.2.
Regards,
--
Fujii Masao
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
From: "Fujii Masao" <masao.fujii@gmail.com>
However, isn't StandbyRequested true (= standby_mode set to on) to enable
warm standby?We can set up warm-standby by using pg_standby even if standby_mode = off.
I see. However, I understand that pg_standby is a legacy feature, and the
current way to setup a warm standby is to set standby=on and restore_command
in recovery.conf. So I think it might not be necessary to support cascading
replication with pg_standby, if supporting it would prevent better
implementation of new features.
I'm afraid people set max_wal_senders>0 and hot_standby=on
even on the primary server to make the contents of postgresql.conf
identical
on both the primary and the standby for easier configuration. If so,
normal
archive recovery (PITR, not the standby recovery) would face the original
problem -- unnecessary WAL accumulation in pg_xlog/. So I'm wonder if
AllowCascadeReplication() is enough.One idea is to add new GUC parameter like enable_cascade_replication
and then prevent WAL file from being kept in pg_xlog if that parameter is
off.
But we cannot apply such change into the already-released version 9.2.
Yes, I thought the same, too. Adding a new GUC to enable cascading
replication now would be a usability degradation. But I have no idea of any
better way. It seems we need to take either v1 or v2 of the patch I sent.
If we consider that we don't have to support cascading replication for a
legacy pg_standby, v1 patch is definitely better because the standby server
would not keep restored archive WAL in pg_xlog when cascading replication is
not used. Otherwise, we have to take v2 patch.
Could you choose either and commit it?
Regards
MauMau
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Aug 2, 2013 at 12:24 AM, MauMau <maumau307@gmail.com> wrote:
From: "Fujii Masao" <masao.fujii@gmail.com>
However, isn't StandbyRequested true (= standby_mode set to on) to enable
warm standby?
We can set up warm-standby by using pg_standby even if standby_mode = off.
I see. However, I understand that pg_standby is a legacy feature, and the
current way to setup a warm standby is to set standby=on and
restore_command in recovery.conf. So I think it might not be necessary to
support cascading replication with pg_standby, if supporting it would
prevent better implementation of new features.
You are right about that, you should stick with the core features as much
as you can and limit the use of wrapper utilities. Since 9.1 and the
apparition of a built-in standby mode inside Postgres (with the apparition
of the GUC parameter hot_standby), it seems better to use that instead of
pg_standby, which would likely (personal opinion, feel free to scream at
me) be removed in a future release.
I'm afraid people set max_wal_senders>0 and hot_standby=on
even on the primary server to make the contents of postgresql.conf
identical
on both the primary and the standby for easier configuration. If so,
normal
archive recovery (PITR, not the standby recovery) would face the original
problem -- unnecessary WAL accumulation in pg_xlog/. So I'm wonder if
AllowCascadeReplication() is enough.One idea is to add new GUC parameter like enable_cascade_replication
and then prevent WAL file from being kept in pg_xlog if that parameter is
off.
But we cannot apply such change into the already-released version 9.2.Yes, I thought the same, too. Adding a new GUC to enable cascading
replication now would be a usability degradation. But I have no idea of
any better way. It seems we need to take either v1 or v2 of the patch I
sent. If we consider that we don't have to support cascading replication
for a legacy pg_standby, v1 patch is definitely better because the standby
server would not keep restored archive WAL in pg_xlog when cascading
replication is not used. Otherwise, we have to take v2 patch.
By reading this thread, -1 for the addition of a new GUC parameter related
to cascading, it looks like an overkill for the possible gain. And +1 for
the removal of WAL file once it is replayed in archive recovery if
cascading replication is not allowed. However, what about
wal_keep_segments? Doesn't the server keep segments even if replication is
not allowed? If yes, the immediate WAL file drop after replay needs to take
care of that...
--
Michael
Michael Paquier <michael.paquier@gmail.com> writes:
By reading this thread, -1 for the addition of a new GUC parameter related
to cascading, it looks like an overkill for the possible gain. And +1 for
the removal of WAL file once it is replayed in archive recovery if
cascading replication is not allowed. However, what about
Well, we still don't register standby servers, so I vote against early
removal of files that you have no possible way to know if they are still
useful or not. We need something smarter here. Please.
Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Aug 2, 2013 at 10:52 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
On Fri, Aug 2, 2013 at 12:24 AM, MauMau <maumau307@gmail.com> wrote:
From: "Fujii Masao" <masao.fujii@gmail.com>
However, isn't StandbyRequested true (= standby_mode set to on) to
enable
warm standby?We can set up warm-standby by using pg_standby even if standby_mode =
off.I see. However, I understand that pg_standby is a legacy feature, and the
current way to setup a warm standby is to set standby=on and restore_command
in recovery.conf. So I think it might not be necessary to support cascading
replication with pg_standby, if supporting it would prevent better
implementation of new features.You are right about that, you should stick with the core features as much as
you can and limit the use of wrapper utilities. Since 9.1 and the apparition
of a built-in standby mode inside Postgres (with the apparition of the GUC
parameter hot_standby), it seems better to use that instead of pg_standby,
which would likely (personal opinion, feel free to scream at me) be removed
in a future release.
I'm sure that there are some users who use pg_standby for warm-standby
for some reasons, for example, the document (*1) explains such a configuration,
a user can specify the file-check-interval (by using -s option), and so on.
I agree that using pg_standby + cascade replication would be very rare.
But I'm not confident that *no one* uses pg_standby + cascade replication.
(*1) http://www.postgresql.org/docs/devel/static/log-shipping-alternative.html
Regards,
--
Fujii Masao
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Aug 2, 2013 at 12:24 AM, MauMau <maumau307@gmail.com> wrote:
From: "Fujii Masao" <masao.fujii@gmail.com>
However, isn't StandbyRequested true (= standby_mode set to on) to enable
warm standby?We can set up warm-standby by using pg_standby even if standby_mode = off.
I see. However, I understand that pg_standby is a legacy feature, and the
current way to setup a warm standby is to set standby=on and restore_command
in recovery.conf. So I think it might not be necessary to support cascading
replication with pg_standby, if supporting it would prevent better
implementation of new features.I'm afraid people set max_wal_senders>0 and hot_standby=on
even on the primary server to make the contents of postgresql.conf
identical
on both the primary and the standby for easier configuration. If so,
normal
archive recovery (PITR, not the standby recovery) would face the original
problem -- unnecessary WAL accumulation in pg_xlog/. So I'm wonder if
AllowCascadeReplication() is enough.One idea is to add new GUC parameter like enable_cascade_replication
and then prevent WAL file from being kept in pg_xlog if that parameter is
off.
But we cannot apply such change into the already-released version 9.2.Yes, I thought the same, too. Adding a new GUC to enable cascading
replication now would be a usability degradation. But I have no idea of any
better way. It seems we need to take either v1 or v2 of the patch I sent.
If we consider that we don't have to support cascading replication for a
legacy pg_standby, v1 patch is definitely better because the standby server
would not keep restored archive WAL in pg_xlog when cascading replication is
not used. Otherwise, we have to take v2 patch.Could you choose either and commit it?
On second thought, probably we cannot remove the restored WAL files early
because they might be required for fast promotion which is new feature in 9.3.
In fast promotion, an end-of-recovery checkpoint is not executed. After the end
of recovery, normal online checkpoint starts. If the server crashes before such
an online checkpoint completes, the server needs to replay again all the WAL
files which it replayed before the promotion. Since this is the crash recovery,
all those WAL files need to exist in pg_xlog directory. So if we remove the
restored WAL file from pg_xlog early, such a crash recovery might fail.
So even if cascade replication is disabled, if standby_mode = on, i.e., fast
promotion can be performed, we cannot remove the restored WAL files early.
Regards,
--
Fujii Masao
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Aug 7, 2013 at 7:03 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
On Fri, Aug 2, 2013 at 12:24 AM, MauMau <maumau307@gmail.com> wrote:
From: "Fujii Masao" <masao.fujii@gmail.com>
However, isn't StandbyRequested true (= standby_mode set to on) to enable
warm standby?We can set up warm-standby by using pg_standby even if standby_mode = off.
I see. However, I understand that pg_standby is a legacy feature, and the
current way to setup a warm standby is to set standby=on and restore_command
in recovery.conf. So I think it might not be necessary to support cascading
replication with pg_standby, if supporting it would prevent better
implementation of new features.I'm afraid people set max_wal_senders>0 and hot_standby=on
even on the primary server to make the contents of postgresql.conf
identical
on both the primary and the standby for easier configuration. If so,
normal
archive recovery (PITR, not the standby recovery) would face the original
problem -- unnecessary WAL accumulation in pg_xlog/. So I'm wonder if
AllowCascadeReplication() is enough.One idea is to add new GUC parameter like enable_cascade_replication
and then prevent WAL file from being kept in pg_xlog if that parameter is
off.
But we cannot apply such change into the already-released version 9.2.Yes, I thought the same, too. Adding a new GUC to enable cascading
replication now would be a usability degradation. But I have no idea of any
better way. It seems we need to take either v1 or v2 of the patch I sent.
If we consider that we don't have to support cascading replication for a
legacy pg_standby, v1 patch is definitely better because the standby server
would not keep restored archive WAL in pg_xlog when cascading replication is
not used. Otherwise, we have to take v2 patch.Could you choose either and commit it?
On second thought, probably we cannot remove the restored WAL files early
because they might be required for fast promotion which is new feature in 9.3.
In fast promotion, an end-of-recovery checkpoint is not executed. After the end
of recovery, normal online checkpoint starts. If the server crashes before such
an online checkpoint completes, the server needs to replay again all the WAL
files which it replayed before the promotion. Since this is the crash recovery,
all those WAL files need to exist in pg_xlog directory. So if we remove the
restored WAL file from pg_xlog early, such a crash recovery might fail.So even if cascade replication is disabled, if standby_mode = on, i.e., fast
promotion can be performed, we cannot remove the restored WAL files early.
But we should still be able to remove files older than the latest
restart point, right?
Cheers,
Jeff
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi, Fujii san,
On Wed, Aug 7, 2013 at 7:03 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
On second thought, probably we cannot remove the restored WAL files early
because they might be required for fast promotion which is new feature in
9.3.
In fast promotion, an end-of-recovery checkpoint is not executed. After
the end
of recovery, normal online checkpoint starts. If the server crashes
before such
an online checkpoint completes, the server needs to replay again all the
WAL
files which it replayed before the promotion. Since this is the crash
recovery,
all those WAL files need to exist in pg_xlog directory. So if we remove
the
restored WAL file from pg_xlog early, such a crash recovery might fail.So even if cascade replication is disabled, if standby_mode = on, i.e.,
fast
promotion can be performed, we cannot remove the restored WAL files
early.
Following Fujii-san's advice, I've made the attached patch. This prevents
unnecessary WAL accumulation in pg_xlog/ during archive recovery (not
standby recovery). To do this, I had to revive some code in
exitArchiveRecovery() from 9.1.
Could you commit this to 9.2 and later?
Regards
MauMau
Attachments:
wal_increase_in_pitr.patchapplication/octet-stream; name=wal_increase_in_pitr.patchDownload
diff -rpcd a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
*** a/src/backend/access/transam/xlog.c 2013-12-02 09:17:05.000000000 +0900
--- b/src/backend/access/transam/xlog.c 2013-12-16 11:25:29.000000000 +0900
*************** XLogFileRead(XLogSegNo segno, int emode,
*** 3763,3771 ****
/*
* If the segment was fetched from archival storage, replace the existing
! * xlog segment (if any) with the archival version.
*/
! if (source == XLOG_FROM_ARCHIVE)
{
KeepFileRestoredFromArchive(path, xlogfname);
--- 3763,3772 ----
/*
* If the segment was fetched from archival storage, replace the existing
! * xlog segment (if any) with the archival version. This is necessary for
! * cascading replication in 9.2 and fast promotion in 9.3.
*/
! if (source == XLOG_FROM_ARCHIVE && StandbyModeRequested)
{
KeepFileRestoredFromArchive(path, xlogfname);
*************** exitArchiveRecovery(TimeLineID endTLI, X
*** 5561,5582 ****
}
/*
! * If we are establishing a new timeline, we have to copy data from the
! * last WAL segment of the old timeline to create a starting WAL segment
! * for the new timeline.
*
! * Notify the archiver that the last WAL segment of the old timeline is
! * ready to copy to archival storage. Otherwise, it is not archived for a
! * while.
*/
! if (endTLI != ThisTimeLineID)
{
! XLogFileCopy(endLogSegNo, endTLI, endLogSegNo);
! if (XLogArchivingActive())
{
! XLogFileName(xlogpath, endTLI, endLogSegNo);
! XLogArchiveNotify(xlogpath);
}
}
--- 5562,5622 ----
}
/*
! * If the segment was fetched from archival storage, we want to replace
! * the existing xlog segment (if any) with the archival version. This is
! * because whatever is in XLOGDIR is very possibly older than what we have
! * from the archives, since it could have come from restoring a PGDATA
! * backup. In any case, the archival version certainly is more
! * descriptive of what our current database state is, because that is what
! * we replayed from.
*
! * Note that if we are establishing a new timeline, ThisTimeLineID is
! * already set to the new value, and so we will create a new file instead
! * of overwriting any existing file. (This is, in fact, always the case
! * at present.)
*/
! snprintf(recoveryPath, MAXPGPATH, XLOGDIR "/RECOVERYXLOG");
! XLogFilePath(xlogpath, ThisTimeLineID, endLogSegNo);
!
! if (restoredFromArchive)
{
! ereport(DEBUG3,
! (errmsg_internal("moving last restored xlog to \"%s\"",
! xlogpath)));
! unlink(xlogpath); /* might or might not exist */
! if (rename(recoveryPath, xlogpath) != 0)
! ereport(FATAL,
! (errcode_for_file_access(),
! errmsg("could not rename file \"%s\" to \"%s\": %m",
! recoveryPath, xlogpath)));
! /* XXX might we need to fix permissions on the file? */
! }
! else
! {
! /*
! * If the latest segment is not archival, but there's still a
! * RECOVERYXLOG laying about, get rid of it.
! */
! unlink(recoveryPath); /* ignore any error */
! /*
! * If we are establishing a new timeline, we have to copy data from
! * the last WAL segment of the old timeline to create a starting WAL
! * segment for the new timeline.
! *
! * Notify the archiver that the last WAL segment of the old timeline
! * is ready to copy to archival storage. Otherwise, it is not archived
! * for a while.
! */
! if (endTLI != ThisTimeLineID)
{
! XLogFileCopy(endLogSegNo, endTLI, endLogSegNo);
!
! if (XLogArchivingActive())
! {
! XLogFileName(xlogpath, endTLI, endLogSegNo);
! XLogArchiveNotify(xlogpath);
! }
}
}
*************** exitArchiveRecovery(TimeLineID endTLI, X
*** 5587,5599 ****
XLogFileName(xlogpath, ThisTimeLineID, endLogSegNo);
XLogArchiveCleanup(xlogpath);
- /*
- * Since there might be a partial WAL segment named RECOVERYXLOG, get rid
- * of it.
- */
- snprintf(recoveryPath, MAXPGPATH, XLOGDIR "/RECOVERYXLOG");
- unlink(recoveryPath); /* ignore any error */
-
/* Get rid of any remaining recovered timeline-history file, too */
snprintf(recoveryPath, MAXPGPATH, XLOGDIR "/RECOVERYHISTORY");
unlink(recoveryPath); /* ignore any error */
--- 5627,5632 ----
On Mon, Dec 16, 2013 at 9:22 PM, MauMau <maumau307@gmail.com> wrote:
Hi, Fujii san,
On Wed, Aug 7, 2013 at 7:03 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
On second thought, probably we cannot remove the restored WAL files early
because they might be required for fast promotion which is new feature in
9.3.
In fast promotion, an end-of-recovery checkpoint is not executed. After
the end
of recovery, normal online checkpoint starts. If the server crashes
before such
an online checkpoint completes, the server needs to replay again all the
WAL
files which it replayed before the promotion. Since this is the crash
recovery,
all those WAL files need to exist in pg_xlog directory. So if we remove
the
restored WAL file from pg_xlog early, such a crash recovery might fail.So even if cascade replication is disabled, if standby_mode = on, i.e.,
fast
promotion can be performed, we cannot remove the restored WAL files
early.Following Fujii-san's advice, I've made the attached patch.
Thanks for the patch!
! if (source == XLOG_FROM_ARCHIVE && StandbyModeRequested)
Even when standby_mode is not enabled, we can use cascade replication and
it needs the accumulated WAL files. So I think that AllowCascadeReplication()
should be added into this condition.
! snprintf(recoveryPath, MAXPGPATH, XLOGDIR "/RECOVERYXLOG");
! XLogFilePath(xlogpath, ThisTimeLineID, endLogSegNo);
!
! if (restoredFromArchive)
Don't we need to check !StandbyModeRequested and !AllowCascadeReplication()
here?
! /*
! * If the latest segment is not archival, but there's still a
! * RECOVERYXLOG laying about, get rid of it.
! */
! unlink(recoveryPath); /* ignore any error */
The similar line exists in the lower side of exitArchiveRecovery(), so ISTM that
you can refactor that.
Regards,
--
Fujii Masao
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
From: "Fujii Masao" <masao.fujii@gmail.com>
! if (source == XLOG_FROM_ARCHIVE && StandbyModeRequested)
Even when standby_mode is not enabled, we can use cascade replication and
it needs the accumulated WAL files. So I think that
AllowCascadeReplication()
should be added into this condition.! snprintf(recoveryPath, MAXPGPATH, XLOGDIR "/RECOVERYXLOG");
! XLogFilePath(xlogpath, ThisTimeLineID, endLogSegNo);
!
! if (restoredFromArchive)Don't we need to check !StandbyModeRequested and
!AllowCascadeReplication()
here?
Oh, you are correct. Okay, done.
! /*
! * If the latest segment is not archival, but there's
still a
! * RECOVERYXLOG laying about, get rid of it.
! */
! unlink(recoveryPath); /* ignore any error */The similar line exists in the lower side of exitArchiveRecovery(), so
ISTM that
you can refactor that.
That's already done in the previous patch: deletion of RECOVERYXLOG was
moved into else clause, as in:
- /*
- * Since there might be a partial WAL segment named RECOVERYXLOG, get rid
- * of it.
- */
- snprintf(recoveryPath, MAXPGPATH, XLOGDIR "/RECOVERYXLOG");
- unlink(recoveryPath); /* ignore any error */
-
Regards
MauMau
Attachments:
wal_increase_in_pitr_v2.patchapplication/octet-stream; name=wal_increase_in_pitr_v2.patchDownload
diff -rpcd a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
*** a/src/backend/access/transam/xlog.c 2013-12-05 01:17:08.000000000 +0900
--- b/src/backend/access/transam/xlog.c 2013-12-20 21:15:32.736000000 +0900
*************** XLogFileRead(XLogSegNo segno, int emode,
*** 3763,3771 ****
/*
* If the segment was fetched from archival storage, replace the existing
! * xlog segment (if any) with the archival version.
*/
! if (source == XLOG_FROM_ARCHIVE)
{
KeepFileRestoredFromArchive(path, xlogfname);
--- 3763,3773 ----
/*
* If the segment was fetched from archival storage, replace the existing
! * xlog segment (if any) with the archival version. This is necessary for
! * cascading replication in 9.2 and fast promotion in 9.3.
*/
! if (source == XLOG_FROM_ARCHIVE &&
! (AllowCascadeReplication() || StandbyModeRequested))
{
KeepFileRestoredFromArchive(path, xlogfname);
*************** exitArchiveRecovery(TimeLineID endTLI, X
*** 5561,5582 ****
}
/*
! * If we are establishing a new timeline, we have to copy data from the
! * last WAL segment of the old timeline to create a starting WAL segment
! * for the new timeline.
*
! * Notify the archiver that the last WAL segment of the old timeline is
! * ready to copy to archival storage. Otherwise, it is not archived for a
! * while.
*/
! if (endTLI != ThisTimeLineID)
{
! XLogFileCopy(endLogSegNo, endTLI, endLogSegNo);
! if (XLogArchivingActive())
{
! XLogFileName(xlogpath, endTLI, endLogSegNo);
! XLogArchiveNotify(xlogpath);
}
}
--- 5563,5624 ----
}
/*
! * If the segment was fetched from archival storage, we want to replace
! * the existing xlog segment (if any) with the archival version. This is
! * because whatever is in XLOGDIR is very possibly older than what we have
! * from the archives, since it could have come from restoring a PGDATA
! * backup. In any case, the archival version certainly is more
! * descriptive of what our current database state is, because that is what
! * we replayed from.
*
! * Note that if we are establishing a new timeline, ThisTimeLineID is
! * already set to the new value, and so we will create a new file instead
! * of overwriting any existing file. (This is, in fact, always the case
! * at present.)
*/
! snprintf(recoveryPath, MAXPGPATH, XLOGDIR "/RECOVERYXLOG");
! XLogFilePath(xlogpath, ThisTimeLineID, endLogSegNo);
!
! if (restoredFromArchive &&
! !AllowCascadeReplication() && !StandbyModeRequested)
{
! ereport(DEBUG3,
! (errmsg_internal("moving last restored xlog to \"%s\"",
! xlogpath)));
! unlink(xlogpath); /* might or might not exist */
! if (rename(recoveryPath, xlogpath) != 0)
! ereport(FATAL,
! (errcode_for_file_access(),
! errmsg("could not rename file \"%s\" to \"%s\": %m",
! recoveryPath, xlogpath)));
! /* XXX might we need to fix permissions on the file? */
! }
! else
! {
! /*
! * If the latest segment is not archival, but there's still a
! * RECOVERYXLOG laying about, get rid of it.
! */
! unlink(recoveryPath); /* ignore any error */
! /*
! * If we are establishing a new timeline, we have to copy data from
! * the last WAL segment of the old timeline to create a starting WAL
! * segment for the new timeline.
! *
! * Notify the archiver that the last WAL segment of the old timeline
! * is ready to copy to archival storage. Otherwise, it is not archived
! * for a while.
! */
! if (endTLI != ThisTimeLineID)
{
! XLogFileCopy(endLogSegNo, endTLI, endLogSegNo);
!
! if (XLogArchivingActive())
! {
! XLogFileName(xlogpath, endTLI, endLogSegNo);
! XLogArchiveNotify(xlogpath);
! }
}
}
*************** exitArchiveRecovery(TimeLineID endTLI, X
*** 5587,5599 ****
XLogFileName(xlogpath, ThisTimeLineID, endLogSegNo);
XLogArchiveCleanup(xlogpath);
- /*
- * Since there might be a partial WAL segment named RECOVERYXLOG, get rid
- * of it.
- */
- snprintf(recoveryPath, MAXPGPATH, XLOGDIR "/RECOVERYXLOG");
- unlink(recoveryPath); /* ignore any error */
-
/* Get rid of any remaining recovered timeline-history file, too */
snprintf(recoveryPath, MAXPGPATH, XLOGDIR "/RECOVERYHISTORY");
unlink(recoveryPath); /* ignore any error */
--- 5629,5634 ----
On Fri, Dec 20, 2013 at 9:21 PM, MauMau <maumau307@gmail.com> wrote:
From: "Fujii Masao" <masao.fujii@gmail.com>
! if (source == XLOG_FROM_ARCHIVE && StandbyModeRequested)
Even when standby_mode is not enabled, we can use cascade replication and
it needs the accumulated WAL files. So I think that
AllowCascadeReplication()
should be added into this condition.! snprintf(recoveryPath, MAXPGPATH, XLOGDIR "/RECOVERYXLOG");
! XLogFilePath(xlogpath, ThisTimeLineID, endLogSegNo);
!
! if (restoredFromArchive)Don't we need to check !StandbyModeRequested and
!AllowCascadeReplication()
here?Oh, you are correct. Okay, done.
Thanks! The patch looks good to me. Attached is the updated version of
the patch. I added the comments.
Did you test whether this patch works properly in several recovery cases?
Regards,
--
Fujii Masao
Attachments:
wal_increase_in_pitr_v3.patchtext/x-diff; charset=US-ASCII; name=wal_increase_in_pitr_v3.patchDownload
*** a/src/backend/access/transam/xlog.c
--- b/src/backend/access/transam/xlog.c
***************
*** 3772,3781 **** XLogFileRead(XLogSegNo segno, int emode, TimeLineID tli,
}
/*
! * If the segment was fetched from archival storage, replace the existing
! * xlog segment (if any) with the archival version.
*/
! if (source == XLOG_FROM_ARCHIVE)
{
KeepFileRestoredFromArchive(path, xlogfname);
--- 3772,3792 ----
}
/*
! * If the segment was fetched from archival storage and either cascading
! * replication or standby_mode is enabled, replace the existing xlog
! * segment (if any) with the archival version. Cascading replication needs
! * this replacement so that cascading walsender can send the xlog segment
! * which was restored from the archive. When standby_mode is enabled,
! * fast promotion is performed at the end of recovery, and also needs this
! * replacement so that the crash recovery just after fast promotion can
! * replay all the required segments from pg_xlog.
! *
! * If the replacement is not required, the segments are always restored onto
! * the same file named RECOVERYXLOG from the archive. This prevents the
! * large increase of segments in pg_xlog.
*/
! if (source == XLOG_FROM_ARCHIVE &&
! (AllowCascadeReplication() || StandbyModeRequested))
{
KeepFileRestoredFromArchive(path, xlogfname);
***************
*** 5572,5593 **** exitArchiveRecovery(TimeLineID endTLI, XLogSegNo endLogSegNo)
}
/*
! * If we are establishing a new timeline, we have to copy data from the
! * last WAL segment of the old timeline to create a starting WAL segment
! * for the new timeline.
*
! * Notify the archiver that the last WAL segment of the old timeline is
! * ready to copy to archival storage. Otherwise, it is not archived for a
! * while.
*/
! if (endTLI != ThisTimeLineID)
{
! XLogFileCopy(endLogSegNo, endTLI, endLogSegNo);
! if (XLogArchivingActive())
{
! XLogFileName(xlogpath, endTLI, endLogSegNo);
! XLogArchiveNotify(xlogpath);
}
}
--- 5583,5646 ----
}
/*
! * If the segment was fetched from archival storage and neither cascading
! * replication nor fast promotion is enabled, we replace the existing xlog
! * segment (if any) with the archival version named RECOVERYXLOG. This is
! * because whatever is in XLOGDIR is very possibly older than what we have
! * from the archives, since it could have come from restoring a PGDATA
! * backup. In any case, the archival version certainly is more
! * descriptive of what our current database state is, because that is what
! * we replayed from.
! *
! * Note that if we are establishing a new timeline, ThisTimeLineID is
! * already set to the new value, and so we will create a new file instead
! * of overwriting any existing file. (This is, in fact, always the case
! * at present.)
*
! * If either cascading replication or fast promotion is enabled, we don't
! * need the replacement here because it has already done just after the
! * segment was restored from the archive.
*/
! snprintf(recoveryPath, MAXPGPATH, XLOGDIR "/RECOVERYXLOG");
! XLogFilePath(xlogpath, ThisTimeLineID, endLogSegNo);
!
! if (restoredFromArchive &&
! !AllowCascadeReplication() && !StandbyModeRequested)
{
! unlink(xlogpath); /* might or might not exist */
! if (rename(recoveryPath, xlogpath) != 0)
! ereport(FATAL,
! (errcode_for_file_access(),
! errmsg("could not rename file \"%s\" to \"%s\": %m",
! recoveryPath, xlogpath)));
! /* XXX might we need to fix permissions on the file? */
! }
! else
! {
! /*
! * Since there might be a partial WAL segment named RECOVERYXLOG,
! * get rid of it.
! */
! unlink(recoveryPath); /* ignore any error */
! /*
! * If we are establishing a new timeline, we have to copy data from
! * the last WAL segment of the old timeline to create a starting WAL
! * segment for the new timeline.
! *
! * Notify the archiver that the last WAL segment of the old timeline
! * is ready to copy to archival storage. Otherwise, it is not archived
! * for a while.
! */
! if (endTLI != ThisTimeLineID)
{
! XLogFileCopy(endLogSegNo, endTLI, endLogSegNo);
!
! if (XLogArchivingActive())
! {
! XLogFileName(xlogpath, endTLI, endLogSegNo);
! XLogArchiveNotify(xlogpath);
! }
}
}
***************
*** 5598,5610 **** exitArchiveRecovery(TimeLineID endTLI, XLogSegNo endLogSegNo)
XLogFileName(xlogpath, ThisTimeLineID, endLogSegNo);
XLogArchiveCleanup(xlogpath);
- /*
- * Since there might be a partial WAL segment named RECOVERYXLOG, get rid
- * of it.
- */
- snprintf(recoveryPath, MAXPGPATH, XLOGDIR "/RECOVERYXLOG");
- unlink(recoveryPath); /* ignore any error */
-
/* Get rid of any remaining recovered timeline-history file, too */
snprintf(recoveryPath, MAXPGPATH, XLOGDIR "/RECOVERYHISTORY");
unlink(recoveryPath); /* ignore any error */
--- 5651,5656 ----
On 01/21/2014 07:31 PM, Fujii Masao wrote:
On Fri, Dec 20, 2013 at 9:21 PM, MauMau <maumau307@gmail.com> wrote:
From: "Fujii Masao" <masao.fujii@gmail.com>
! if (source == XLOG_FROM_ARCHIVE && StandbyModeRequested)
Even when standby_mode is not enabled, we can use cascade replication and
it needs the accumulated WAL files. So I think that
AllowCascadeReplication()
should be added into this condition.! snprintf(recoveryPath, MAXPGPATH, XLOGDIR "/RECOVERYXLOG");
! XLogFilePath(xlogpath, ThisTimeLineID, endLogSegNo);
!
! if (restoredFromArchive)Don't we need to check !StandbyModeRequested and
!AllowCascadeReplication()
here?Oh, you are correct. Okay, done.
Thanks! The patch looks good to me. Attached is the updated version of
the patch. I added the comments.
Sorry for reacting so slowly, but I'm not sure I like this patch. It's a
quite useful property that all the WAL files that are needed for
recovery are copied into pg_xlog, even when restoring from archive, even
when not doing cascading replication. It guarantees that you can restart
the standby, even if the connection to the archive is lost for some
reason. I intentionally changed the behavior for archive recovery too,
when it was introduced for cascading replication. Also, I think it's
good that the behavior does not depend on whether cascading replication
is enabled - it's a quite subtle difference.
So, IMHO this is not a bug, it's a feature.
To solve the original problem of running out of disk space in archive
recovery, I wonder if we should perform restartpoints more aggressively.
We intentionally don't trigger restatpoings by checkpoint_segments, only
checkpoint_timeout, but I wonder if there should be an option for that.
MauMau, did you try simply reducing checkpoint_timeout, while doing
recovery?
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Jan 22, 2014 at 6:37 AM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:
On 01/21/2014 07:31 PM, Fujii Masao wrote:
On Fri, Dec 20, 2013 at 9:21 PM, MauMau <maumau307@gmail.com> wrote:
From: "Fujii Masao" <masao.fujii@gmail.com>
! if (source == XLOG_FROM_ARCHIVE && StandbyModeRequested)
Even when standby_mode is not enabled, we can use cascade replication
and
it needs the accumulated WAL files. So I think that
AllowCascadeReplication()
should be added into this condition.! snprintf(recoveryPath, MAXPGPATH, XLOGDIR "/RECOVERYXLOG");
! XLogFilePath(xlogpath, ThisTimeLineID, endLogSegNo);
!
! if (restoredFromArchive)Don't we need to check !StandbyModeRequested and
!AllowCascadeReplication()
here?Oh, you are correct. Okay, done.
Thanks! The patch looks good to me. Attached is the updated version of
the patch. I added the comments.Sorry for reacting so slowly, but I'm not sure I like this patch. It's a
quite useful property that all the WAL files that are needed for recovery
are copied into pg_xlog, even when restoring from archive, even when not
doing cascading replication. It guarantees that you can restart the standby,
even if the connection to the archive is lost for some reason. I
intentionally changed the behavior for archive recovery too, when it was
introduced for cascading replication. Also, I think it's good that the
behavior does not depend on whether cascading replication is enabled - it's
a quite subtle difference.So, IMHO this is not a bug, it's a feature.
Yep.
To solve the original problem of running out of disk space in archive
recovery, I wonder if we should perform restartpoints more aggressively. We
intentionally don't trigger restatpoings by checkpoint_segments, only
checkpoint_timeout, but I wonder if there should be an option for that.
That's an option.
MauMau, did you try simply reducing checkpoint_timeout, while doing
recovery?
The problem is, we might not be able to perform restartpoints more aggressively
even if we reduce checkpoint_timeout in the server under the archive recovery.
Because the frequency of occurrence of restartpoints depends on not only that
checkpoint_timeout but also the checkpoints which happened while the server
was running.
Regards,
--
Fujii Masao
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
From: "Fujii Masao" <masao.fujii@gmail.com>
On Wed, Jan 22, 2014 at 6:37 AM, Heikki Linnakangas
Thanks! The patch looks good to me. Attached is the updated version of
the patch. I added the comments.
Thank you very much. Your comment looks great. I tested some recovery
situations, and confirmed that WAL segments were kept/removed as intended.
I'll update the CommitFest entry with this patch.
<hlinnakangas@vmware.com> wrote:
Sorry for reacting so slowly, but I'm not sure I like this patch. It's a
quite useful property that all the WAL files that are needed for recovery
are copied into pg_xlog, even when restoring from archive, even when not
doing cascading replication. It guarantees that you can restart the
standby,
even if the connection to the archive is lost for some reason. I
intentionally changed the behavior for archive recovery too, when it was
introduced for cascading replication. Also, I think it's good that the
behavior does not depend on whether cascading replication is enabled -
it's
a quite subtle difference.So, IMHO this is not a bug, it's a feature.
Yep.
I understood the benefit for the standby recovery.
To solve the original problem of running out of disk space in archive
recovery, I wonder if we should perform restartpoints more aggressively.
We
intentionally don't trigger restatpoings by checkpoint_segments, only
checkpoint_timeout, but I wonder if there should be an option for that.That's an option.
MauMau, did you try simply reducing checkpoint_timeout, while doing
recovery?The problem is, we might not be able to perform restartpoints more
aggressively
even if we reduce checkpoint_timeout in the server under the archive
recovery.
Because the frequency of occurrence of restartpoints depends on not only
that
checkpoint_timeout but also the checkpoints which happened while the
server
was running.
I haven't tried reducing checkpoint_timeout. I think we cannot take that
approach, because we cannot suggest appropriate checkpoint_timeout to users.
That is, what checkpoint_timeout setting can we suggest so that WAL doesn't
accumulate in pg_xlog/ more than 9.1?
In addition, as Fujii-san said, it doesn't seem we can restartpoint
completely. Plus, if we can cause restartpoints frequently, the recovery
would take (much?) longer, because shared buffers are flushed more
frequently.
So, how about just removing AllowCascadeReplication() condition from the
patch? That allows WAL to accumulate in pg_xlog/ during standby recovery
but not during archive recovery.
Regards
MauMau
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2014-01-21 23:37:43 +0200, Heikki Linnakangas wrote:
On 01/21/2014 07:31 PM, Fujii Masao wrote:
On Fri, Dec 20, 2013 at 9:21 PM, MauMau <maumau307@gmail.com> wrote:
From: "Fujii Masao" <masao.fujii@gmail.com>
! if (source == XLOG_FROM_ARCHIVE && StandbyModeRequested)
Even when standby_mode is not enabled, we can use cascade replication and
it needs the accumulated WAL files. So I think that
AllowCascadeReplication()
should be added into this condition.! snprintf(recoveryPath, MAXPGPATH, XLOGDIR "/RECOVERYXLOG");
! XLogFilePath(xlogpath, ThisTimeLineID, endLogSegNo);
!
! if (restoredFromArchive)Don't we need to check !StandbyModeRequested and
!AllowCascadeReplication()
here?Oh, you are correct. Okay, done.
Thanks! The patch looks good to me. Attached is the updated version of
the patch. I added the comments.Sorry for reacting so slowly, but I'm not sure I like this patch. It's a
quite useful property that all the WAL files that are needed for recovery
are copied into pg_xlog, even when restoring from archive, even when not
doing cascading replication. It guarantees that you can restart the standby,
even if the connection to the archive is lost for some reason. I
intentionally changed the behavior for archive recovery too, when it was
introduced for cascading replication. Also, I think it's good that the
behavior does not depend on whether cascading replication is enabled - it's
a quite subtle difference.So, IMHO this is not a bug, it's a feature.
Very much seconded. With the old behaviour it's possible to get into the
situation that you cannot get your standby productive anymore if the
archive server died. Since the archive server is often the primary
that's imo unacceptable.
I'd even go as far as saying the previous behaviour is a bug.
To solve the original problem of running out of disk space in archive
recovery, I wonder if we should perform restartpoints more aggressively. We
intentionally don't trigger restatpoings by checkpoint_segments, only
checkpoint_timeout, but I wonder if there should be an option for that.
MauMau, did you try simply reducing checkpoint_timeout, while doing
recovery?
Hm, don't we actually do cause trigger restartpoints based on checkpoint
segments?
static int
XLogPageRead(XLogReaderState *xlogreader, XLogRecPtr targetPagePtr, int reqLen,
XLogRecPtr targetRecPtr, char *readBuf, TimeLineID *readTLI)
{
...
if (readFile >= 0 && !XLByteInSeg(targetPagePtr, readSegNo))
{
/*
* Request a restartpoint if we've replayed too much xlog since the
* last one.
*/
if (StandbyModeRequested && bgwriterLaunched)
{
if (XLogCheckpointNeeded(readSegNo))
{
(void) GetRedoRecPtr();
if (XLogCheckpointNeeded(readSegNo))
RequestCheckpoint(CHECKPOINT_CAUSE_XLOG);
}
}
...
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2014-01-24 22:31:17 +0900, MauMau wrote:
From: "Fujii Masao" <masao.fujii@gmail.com>
On Wed, Jan 22, 2014 at 6:37 AM, Heikki Linnakangas
Thanks! The patch looks good to me. Attached is the updated version of
the patch. I added the comments.Thank you very much. Your comment looks great. I tested some recovery
situations, and confirmed that WAL segments were kept/removed as intended.
I'll update the CommitFest entry with this patch.
You haven't updated the patches status so far, so I've now marked as
returned feedback as several people voiced serious doubts about the
approach. If that's not accurate please speak up.
The problem is, we might not be able to perform restartpoints more
aggressively
even if we reduce checkpoint_timeout in the server under the archive
recovery.
Because the frequency of occurrence of restartpoints depends on not only
that
checkpoint_timeout but also the checkpoints which happened while the
server
was running.I haven't tried reducing checkpoint_timeout.
Did you try reducing checkpoint_segments? As I pointed out, at least if
standby_mode is enabled, it will also trigger checkpoints, independently
from checkpoint_timeout.
If the issue is that you're not using standby_mode (if so, why?), then
the fix maybe is to make that apply to a wider range of situations.
I think we cannot take that
approach, because we cannot suggest appropriate checkpoint_timeout to users.
That is, what checkpoint_timeout setting can we suggest so that WAL doesn't
accumulate in pg_xlog/ more than 9.1?
Well, <= 9.1's behaviour can lead to loss of a consistent database, so I
don't think it's what we need to compare the current state to.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sun, Feb 2, 2014 at 5:49 AM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2014-01-24 22:31:17 +0900, MauMau wrote:
From: "Fujii Masao" <masao.fujii@gmail.com>
On Wed, Jan 22, 2014 at 6:37 AM, Heikki Linnakangas
Thanks! The patch looks good to me. Attached is the updated version of
the patch. I added the comments.Thank you very much. Your comment looks great. I tested some recovery
situations, and confirmed that WAL segments were kept/removed as intended.
I'll update the CommitFest entry with this patch.You haven't updated the patches status so far, so I've now marked as
returned feedback as several people voiced serious doubts about the
approach. If that's not accurate please speak up.The problem is, we might not be able to perform restartpoints more
aggressively
even if we reduce checkpoint_timeout in the server under the archive
recovery.
Because the frequency of occurrence of restartpoints depends on not only
that
checkpoint_timeout but also the checkpoints which happened while the
server
was running.I haven't tried reducing checkpoint_timeout.
Did you try reducing checkpoint_segments? As I pointed out, at least if
standby_mode is enabled, it will also trigger checkpoints, independently
from checkpoint_timeout.
Right. If standby_mode is enabled, checkpoint_segment can trigger
the restartpoint. But the problem is that the timing of restartpoint
depends on not only the checkpoint parameters (i.e.,
checkpoint_timeout and checkpoint_segments) that are used during
archive recovery but also the checkpoint WAL that was generated
by the master.
For example, could you imagine the case where the master generated
only one checkpoint WAL since the last backup and it crashed with
database corruption. Then DBA decided to perform normal archive
recovery by using the last backup. In this case, even if DBA reduces
both checkpoint_timeout and checkpoint_segments, only one
restartpoint can occur during recovery. This low frequency of
restartpoint might fill up the disk space with lots of WAL files.
This would be harmless if the server that we are performing recovery
in has enough disk space. But I can imagine that some users want to
recover the database and restart the service temporarily in poor server
with less enough disk space until they can purchase sufficient server.
In this case, accumulating lots of WAL files in pg_xlog might be harmful.
If the issue is that you're not using standby_mode (if so, why?), then
the fix maybe is to make that apply to a wider range of situations.
I guess that he is not using standby_mode because, according to
his first email in this thread, he said he would like to prevent WAL
from accumulating in pg_xlog during normal archive recovery (i.e., PITR).
Regards,
--
Fujii Masao
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2014-02-02 23:50:40 +0900, Fujii Masao wrote:
On Sun, Feb 2, 2014 at 5:49 AM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2014-01-24 22:31:17 +0900, MauMau wrote:
I haven't tried reducing checkpoint_timeout.
Did you try reducing checkpoint_segments? As I pointed out, at least if
standby_mode is enabled, it will also trigger checkpoints, independently
from checkpoint_timeout.Right. If standby_mode is enabled, checkpoint_segment can trigger
the restartpoint. But the problem is that the timing of restartpoint
depends on not only the checkpoint parameters (i.e.,
checkpoint_timeout and checkpoint_segments) that are used during
archive recovery but also the checkpoint WAL that was generated
by the master.
Sure. But we really *need* all the WAL since the last checkpoint's redo
location locally to be safe.
For example, could you imagine the case where the master generated
only one checkpoint WAL since the last backup and it crashed with
database corruption. Then DBA decided to perform normal archive
recovery by using the last backup. In this case, even if DBA reduces
both checkpoint_timeout and checkpoint_segments, only one
restartpoint can occur during recovery. This low frequency of
restartpoint might fill up the disk space with lots of WAL files.
I am not sure I understand the point of this scenario. If the primary
crashed after a checkpoint, there won't be that much WAL since it
happened...
If the issue is that you're not using standby_mode (if so, why?), then
the fix maybe is to make that apply to a wider range of situations.I guess that he is not using standby_mode because, according to
his first email in this thread, he said he would like to prevent WAL
from accumulating in pg_xlog during normal archive recovery (i.e., PITR).
Well, that doesn't necessarily prevent you from using
standby_mode... But yes, that might be the case.
I wonder if we shouldn't just always look at checkpoint segments during
!crash recovery.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
From: "Andres Freund" <andres@2ndquadrant.com>
On 2014-02-02 23:50:40 +0900, Fujii Masao wrote:
Right. If standby_mode is enabled, checkpoint_segment can trigger
the restartpoint. But the problem is that the timing of restartpoint
depends on not only the checkpoint parameters (i.e.,
checkpoint_timeout and checkpoint_segments) that are used during
archive recovery but also the checkpoint WAL that was generated
by the master.Sure. But we really *need* all the WAL since the last checkpoint's redo
location locally to be safe.For example, could you imagine the case where the master generated
only one checkpoint WAL since the last backup and it crashed with
database corruption. Then DBA decided to perform normal archive
recovery by using the last backup. In this case, even if DBA reduces
both checkpoint_timeout and checkpoint_segments, only one
restartpoint can occur during recovery. This low frequency of
restartpoint might fill up the disk space with lots of WAL files.I am not sure I understand the point of this scenario. If the primary
crashed after a checkpoint, there won't be that much WAL since it
happened...If the issue is that you're not using standby_mode (if so, why?), then
the fix maybe is to make that apply to a wider range of situations.I guess that he is not using standby_mode because, according to
his first email in this thread, he said he would like to prevent WAL
from accumulating in pg_xlog during normal archive recovery (i.e., PITR).Well, that doesn't necessarily prevent you from using
standby_mode... But yes, that might be the case.I wonder if we shouldn't just always look at checkpoint segments during
!crash recovery.
Maybe we could consider in that direction, but there is a problem. Archive
recovery slows down compared to 9.1, because of repeated restartpoints.
Archive recovery should be as fast as possible, because it typically applies
dozens or hundreds of WAL files, and the DBA desires immediate resumption of
operation.
So, I think we should restore 9.1 behavior for archive recovery. The
attached patch keeps restored archived WAL in pg_xlog/ only during standby
recovery. It is based on Fujii-san's revison of the patch, with
AllowCascadeReplication() condition removed from two if statements.
Regards
MauMau
Attachments:
wal_increase_in_pitr_v4.patchapplication/octet-stream; name=wal_increase_in_pitr_v4.patchDownload
diff -rpcd a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
*** a/src/backend/access/transam/xlog.c 2014-02-03 09:17:02.000000000 +0900
--- b/src/backend/access/transam/xlog.c 2014-02-12 10:54:43.000000000 +0900
*************** XLogFileRead(XLogSegNo segno, int emode,
*** 3810,3819 ****
}
/*
! * If the segment was fetched from archival storage, replace the existing
! * xlog segment (if any) with the archival version.
*/
! if (source == XLOG_FROM_ARCHIVE)
{
KeepFileRestoredFromArchive(path, xlogfname);
--- 3810,3829 ----
}
/*
! * If the segment was fetched from archival storage and either cascading
! * replication or standby_mode is enabled, replace the existing xlog
! * segment (if any) with the archival version. Cascading replication needs
! * this replacement so that cascading walsender can send the xlog segment
! * which was restored from the archive. When standby_mode is enabled,
! * fast promotion is performed at the end of recovery, and also needs this
! * replacement so that the crash recovery just after fast promotion can
! * replay all the required segments from pg_xlog.
! *
! * If the replacement is not required, the segments are always restored onto
! * the same file named RECOVERYXLOG from the archive. This prevents the
! * large increase of segments in pg_xlog.
*/
! if (source == XLOG_FROM_ARCHIVE && StandbyModeRequested)
{
KeepFileRestoredFromArchive(path, xlogfname);
*************** exitArchiveRecovery(TimeLineID endTLI, X
*** 5631,5652 ****
}
/*
! * If we are establishing a new timeline, we have to copy data from the
! * last WAL segment of the old timeline to create a starting WAL segment
! * for the new timeline.
*
! * Notify the archiver that the last WAL segment of the old timeline is
! * ready to copy to archival storage. Otherwise, it is not archived for a
! * while.
*/
! if (endTLI != ThisTimeLineID)
{
! XLogFileCopy(endLogSegNo, endTLI, endLogSegNo);
! if (XLogArchivingActive())
{
! XLogFileName(xlogpath, endTLI, endLogSegNo);
! XLogArchiveNotify(xlogpath);
}
}
--- 5641,5703 ----
}
/*
! * If the segment was fetched from archival storage and neither cascading
! * replication nor fast promotion is enabled, we replace the existing xlog
! * segment (if any) with the archival version named RECOVERYXLOG. This is
! * because whatever is in XLOGDIR is very possibly older than what we have
! * from the archives, since it could have come from restoring a PGDATA
! * backup. In any case, the archival version certainly is more
! * descriptive of what our current database state is, because that is what
! * we replayed from.
*
! * Note that if we are establishing a new timeline, ThisTimeLineID is
! * already set to the new value, and so we will create a new file instead
! * of overwriting any existing file. (This is, in fact, always the case
! * at present.)
! *
! * If either cascading replication or fast promotion is enabled, we don't
! * need the replacement here because it has already done just after the
! * segment was restored from the archive.
*/
! snprintf(recoveryPath, MAXPGPATH, XLOGDIR "/RECOVERYXLOG");
! XLogFilePath(xlogpath, ThisTimeLineID, endLogSegNo);
!
! if (restoredFromArchive && !StandbyModeRequested)
{
! unlink(xlogpath); /* might or might not exist */
! if (rename(recoveryPath, xlogpath) != 0)
! ereport(FATAL,
! (errcode_for_file_access(),
! errmsg("could not rename file \"%s\" to \"%s\": %m",
! recoveryPath, xlogpath)));
! /* XXX might we need to fix permissions on the file? */
! }
! else
! {
! /*
! * Since there might be a partial WAL segment named RECOVERYXLOG,
! * get rid of it.
! */
! unlink(recoveryPath); /* ignore any error */
! /*
! * If we are establishing a new timeline, we have to copy data from
! * the last WAL segment of the old timeline to create a starting WAL
! * segment for the new timeline.
! *
! * Notify the archiver that the last WAL segment of the old timeline
! * is ready to copy to archival storage. Otherwise, it is not archived
! * for a while.
! */
! if (endTLI != ThisTimeLineID)
{
! XLogFileCopy(endLogSegNo, endTLI, endLogSegNo);
!
! if (XLogArchivingActive())
! {
! XLogFileName(xlogpath, endTLI, endLogSegNo);
! XLogArchiveNotify(xlogpath);
! }
}
}
*************** exitArchiveRecovery(TimeLineID endTLI, X
*** 5657,5669 ****
XLogFileName(xlogpath, ThisTimeLineID, endLogSegNo);
XLogArchiveCleanup(xlogpath);
- /*
- * Since there might be a partial WAL segment named RECOVERYXLOG, get rid
- * of it.
- */
- snprintf(recoveryPath, MAXPGPATH, XLOGDIR "/RECOVERYXLOG");
- unlink(recoveryPath); /* ignore any error */
-
/* Get rid of any remaining recovered timeline-history file, too */
snprintf(recoveryPath, MAXPGPATH, XLOGDIR "/RECOVERYHISTORY");
unlink(recoveryPath); /* ignore any error */
--- 5708,5713 ----
On 2014-02-12 21:23:54 +0900, MauMau wrote:
Maybe we could consider in that direction, but there is a problem. Archive
recovery slows down compared to 9.1, because of repeated restartpoints.
Archive recovery should be as fast as possible, because it typically applies
dozens or hundreds of WAL files, and the DBA desires immediate resumption of
operation.So, I think we should restore 9.1 behavior for archive recovery.
It's easy to be fast, if it's not correct. I don't see how that
behaviour is acceptable, imo the previous behaviour simply was a bug
whose fix was too invasive to backpatch.
I don't think you can convince me (but maybe others) that the old
behaviour is acceptable.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2014-02-12 13:33:31 +0100, Andres Freund wrote:
On 2014-02-12 21:23:54 +0900, MauMau wrote:
Maybe we could consider in that direction, but there is a problem. Archive
recovery slows down compared to 9.1, because of repeated restartpoints.
Archive recovery should be as fast as possible, because it typically applies
dozens or hundreds of WAL files, and the DBA desires immediate resumption of
operation.So, I think we should restore 9.1 behavior for archive recovery.
It's easy to be fast, if it's not correct. I don't see how that
behaviour is acceptable, imo the previous behaviour simply was a bug
whose fix was too invasive to backpatch.I don't think you can convince me (but maybe others) that the old
behaviour is acceptable.
I'm going to mark this patch as "Rejected". Please speak up if that's
not accurate.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers