Suppress generating WAL records during the upgrade
Dear hackers,
(CC: Julien, Sawada-san, Amit)
This is a fork thread from "[PoC] pg_upgrade: allow to upgrade publisher node" [1]https://commitfest.postgresql.org/44/4273/.
# Background
[1]: https://commitfest.postgresql.org/44/4273/
Followings describe the rough steps:
1. Boot old node as binary-upgrade mode
2. Check confirmed_lsn of all the slots, and confirm all WALs are replicated to downstream
3. Dump slot info to sql file
4. Stop old node
5. Boot new node as binary-upgrade mode
...
Here, step 2 was introduced for avoiding data loss. If there are some WAL records
ahead confirmed_lsn, such records would not be replicated anymore - it may be dangerous.
So in the current patch, pg_upgrade fails if other records than SHUTDOWN_CHECKPOINT
exits after any confirmed_flush_lsn.
# Problem
We found that following three records might be generated during the upgrade.
* RUNNING_XACT
* CHECKPOINT_ONLINE
* XLOG_FPI_FOR_HINT
RUNNING_XACT might be written by the background writer. Conditions for the generation are:
a. Elapsed 15 seconds since the last WAL creation or bootstraping of the process, and either of them:
b-1. The process had never create the RUNNING_XACT record, or
b-2. Some "important WALs" were created after the last RUNNING_XACT record
CHECKPOINT_ONLINE might be written by the checkpointer. Conditions for the generation are:
a. Elapsed checkpoint_timeout seconds since the last creation or bootstraping, and either of them:
b-1. The process had never create the CHECKPOINT_ONLINE record, or
b-2. Some "important WALs" were created after the last CHECKPOINT record
XLOG_FPI_FOR_HINT, which is raised by Sawada-san, might be generated by backend processes.
Conditions for the generation are:
a. Backend processes scanned any tuples (even if it was the system catalog), or either of them:
b-1. Data checksum was enabled, or
b-2. wal_log_hints was set to on
# Solution
I wanted to suppress generations of WALs during the upgrade, because of the "# Background".
Regarding the RUNNING_XACT and CHECKPOINT_ONLINE, it might be OK by removing the
condition b-1. The duration between bootstrap and initial {RUNNING_XACT|CHECKPOINT_ONLINE}
becomes longer, but I could not find impacts by it.
As for the XLOG_FPI_FOR_HINT, the simplest way I came up with is not to call
XLogSaveBufferForHint() during binary upgrade. Considerations may be not enough,
but I attached the patch for the fix. It passed CI on my repository.
Do you have any other considerations about it?
An approach, which adds "if (IsBinaryUpgare)" in XLogInsertAllowed(), was proposed in [2]/messages/by-id/20210121152357.s6eflhqyh4g5e6dv@dalibo.com.
But I'm not sure it could really solve the issue - e.g., XLogInsertRecord() just
raised an ERROR if !XLogInsertAllowed().
[1]: https://commitfest.postgresql.org/44/4273/
[2]: /messages/by-id/20210121152357.s6eflhqyh4g5e6dv@dalibo.com
Best Regards,
Hayato Kuroda
FUJITSU LIMITED
Attachments:
0001-Suppress-generating-WAL-records-during-the-upgrade.patchapplication/octet-stream; name=0001-Suppress-generating-WAL-records-during-the-upgrade.patchDownload
From ffa72cd4067b5d31ce6dc6bc2818607425181913 Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Tue, 8 Aug 2023 08:08:31 +0000
Subject: [PATCH] Suppress generating WAL records during the upgrade
---
src/backend/access/transam/xlog.c | 3 ++-
src/backend/postmaster/bgwriter.c | 5 +++--
src/backend/storage/buffer/bufmgr.c | 5 +++++
3 files changed, 10 insertions(+), 3 deletions(-)
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 60c0b7ec3a..2f1cbc6898 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -6555,7 +6555,8 @@ CreateCheckPoint(int flags)
if ((flags & (CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_END_OF_RECOVERY |
CHECKPOINT_FORCE)) == 0)
{
- if (last_important_lsn == ControlFile->checkPoint)
+ if (XLogRecPtrIsInvalid(last_important_lsn) ||
+ last_important_lsn == ControlFile->checkPoint)
{
WALInsertLockRelease();
END_CRIT_SECTION();
diff --git a/src/backend/postmaster/bgwriter.c b/src/backend/postmaster/bgwriter.c
index caad642ec9..95d88ac8bc 100644
--- a/src/backend/postmaster/bgwriter.c
+++ b/src/backend/postmaster/bgwriter.c
@@ -276,6 +276,7 @@ BackgroundWriterMain(void)
{
TimestampTz timeout = 0;
TimestampTz now = GetCurrentTimestamp();
+ XLogRecPtr last_important = GetLastImportantRecPtr();
timeout = TimestampTzPlusMilliseconds(last_snapshot_ts,
LOG_SNAPSHOT_INTERVAL_MS);
@@ -287,8 +288,8 @@ BackgroundWriterMain(void)
* start of a record, whereas last_snapshot_lsn points just past
* the end of the record.
*/
- if (now >= timeout &&
- last_snapshot_lsn <= GetLastImportantRecPtr())
+ if (now >= timeout && last_snapshot_lsn <= last_important &&
+ !XLogRecPtrIsInvalid(last_important))
{
last_snapshot_lsn = LogStandbySnapshot();
last_snapshot_ts = now;
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index df22aaa1c5..db0fc4ef96 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -4588,8 +4588,13 @@ MarkBufferDirtyHint(Buffer buffer, bool buffer_std)
*
* We don't check full_page_writes here because that logic is included
* when we call XLogInsert() since the value changes dynamically.
+ *
+ * XXX: avoid to write WAL if we are in the binary-upgrade mode. Some
+ * catalogs are read during the upgrade, but it may trigger to generate
+ * XLOG_FPI_FOR_HINT. It may become the "WAL hole".
*/
if (XLogHintBitIsNeeded() &&
+ !IsBinaryUpgrade &&
(pg_atomic_read_u32(&bufHdr->state) & BM_PERMANENT))
{
/*
--
2.27.0