Assertion failure in WaitForWALToBecomeAvailable state machine
Hi,
The problem is that whenever we are going for streaming we always set
XLogCtl->InstallXLogFileSegmentActive to true, but while switching
from streaming to archive we do not always reset it so it hits
assertion in some cases. Basically we reset it inside
XLogShutdownWalRcv() but while switching from the streaming mode we
only call it conditionally if WalRcvStreaming(). But it is very much
possible that even before we call WalRcvStreaming() the walreceiver
might have set alrcv->walRcvState to WALRCV_STOPPED. So now
WalRcvStreaming() will return false. So I agree now we do not want to
really shut down the walreceiver but who will reset the flag?
I just ran some tests on primary and attached the walreceiver to gdb
and waited for it to exit with timeout and then the recovery process
hit the assertion.
2022-02-11 14:33:56.976 IST [60978] FATAL: terminating walreceiver
due to timeout
cp: cannot stat
‘/home/dilipkumar/work/PG/install/bin/wal_archive/00000002.history’:
No such file or directory
2022-02-11 14:33:57.002 IST [60973] LOG: restored log file
"000000010000000000000003" from archive
TRAP: FailedAssertion("!XLogCtl->InstallXLogFileSegmentActive", File:
"xlog.c", Line: 3823, PID: 60973)
I have just applied a quick fix and that solved the issue, basically
if the last failed source was streaming and the WalRcvStreaming() is
false then just reset this flag.
@@ -12717,6 +12717,12 @@ WaitForWALToBecomeAvailable(XLogRecPtr
RecPtr, bool randAccess,
*/
if (WalRcvStreaming())
XLogShutdownWalRcv();
+ else
+ {
+
LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
XLogCtl->InstallXLogFileSegmentActive = false;
+ LWLockRelease(ControlFileLock);
+ }
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Fri, Feb 11, 2022 at 3:33 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
Hi,
The problem is that whenever we are going for streaming we always set
XLogCtl->InstallXLogFileSegmentActive to true, but while switching
from streaming to archive we do not always reset it so it hits
assertion in some cases. Basically we reset it inside
XLogShutdownWalRcv() but while switching from the streaming mode we
only call it conditionally if WalRcvStreaming(). But it is very much
possible that even before we call WalRcvStreaming() the walreceiver
might have set alrcv->walRcvState to WALRCV_STOPPED. So now
WalRcvStreaming() will return false. So I agree now we do not want to
really shut down the walreceiver but who will reset the flag?I just ran some tests on primary and attached the walreceiver to gdb
and waited for it to exit with timeout and then the recovery process
hit the assertion.2022-02-11 14:33:56.976 IST [60978] FATAL: terminating walreceiver
due to timeout
cp: cannot stat
‘/home/dilipkumar/work/PG/install/bin/wal_archive/00000002.history’:
No such file or directory
2022-02-11 14:33:57.002 IST [60973] LOG: restored log file
"000000010000000000000003" from archive
TRAP: FailedAssertion("!XLogCtl->InstallXLogFileSegmentActive", File:
"xlog.c", Line: 3823, PID: 60973)I have just applied a quick fix and that solved the issue, basically
if the last failed source was streaming and the WalRcvStreaming() is
false then just reset this flag.
IIUC, the issue can happen while the walreceiver failed to get WAL
from primary for whatever reasons and its status is not
WALRCV_STOPPING or WALRCV_STOPPED, and the startup process moved ahead
in WaitForWALToBecomeAvailable for reading from archive which ends up
in this assertion failure. ITSM, a rare scenario and it depends on
what walreceiver does between failure to get WAL from primary and
updating status to WALRCV_STOPPING or WALRCV_STOPPED.
If the above race condition is a serious problem, if one thinks at
least it is a problem at all, that needs to be fixed. I don't think
just making InstallXLogFileSegmentActive false is enough. By looking
at the comment [1]/* * Before we leave XLOG_FROM_STREAM state, make sure that * walreceiver is not active, so that it won't overwrite * WAL that we restore from archive. */ if (WalRcvStreaming()) XLogShutdownWalRcv();, it doesn't make sense to move ahead for restoring
from the archive location without the WAL receiver fully stopped.
IMO, the real fix is to just remove WalRcvStreaming() and call
XLogShutdownWalRcv() unconditionally. Anyways, we have the
Assert(!WalRcvStreaming()); down below. I don't think it will create
any problem.
[1]: /* * Before we leave XLOG_FROM_STREAM state, make sure that * walreceiver is not active, so that it won't overwrite * WAL that we restore from archive. */ if (WalRcvStreaming()) XLogShutdownWalRcv();
/*
* Before we leave XLOG_FROM_STREAM state, make sure that
* walreceiver is not active, so that it won't overwrite
* WAL that we restore from archive.
*/
if (WalRcvStreaming())
XLogShutdownWalRcv();
Regards,
Bharath Rupireddy.
On Fri, Feb 11, 2022 at 6:22 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
On Fri, Feb 11, 2022 at 3:33 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
IIUC, the issue can happen while the walreceiver failed to get WAL
from primary for whatever reasons and its status is not
WALRCV_STOPPING or WALRCV_STOPPED, and the startup process moved ahead
in WaitForWALToBecomeAvailable for reading from archive which ends up
in this assertion failure. ITSM, a rare scenario and it depends on
what walreceiver does between failure to get WAL from primary and
updating status to WALRCV_STOPPING or WALRCV_STOPPED.If the above race condition is a serious problem, if one thinks at
least it is a problem at all, that needs to be fixed.
I don't think we can design a software which has open race conditions
even though they are rarely occurring :)
I don't think
just making InstallXLogFileSegmentActive false is enough. By looking
at the comment [1], it doesn't make sense to move ahead for restoring
from the archive location without the WAL receiver fully stopped.
IMO, the real fix is to just remove WalRcvStreaming() and call
XLogShutdownWalRcv() unconditionally. Anyways, we have the
Assert(!WalRcvStreaming()); down below. I don't think it will create
any problem.
If WalRcvStreaming() is returning false that means walreceiver is
already stopped so we don't need to shutdown it externally. I think
like we are setting this flag outside start streaming we can reset it
also outside XLogShutdownWalRcv. Or I am fine even if we call
XLogShutdownWalRcv() because if walreceiver is stopped it will just
reset the flag we want it to reset and it will do nothing else.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Fri, Feb 11, 2022 at 6:31 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Fri, Feb 11, 2022 at 6:22 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:On Fri, Feb 11, 2022 at 3:33 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
IIUC, the issue can happen while the walreceiver failed to get WAL
from primary for whatever reasons and its status is not
WALRCV_STOPPING or WALRCV_STOPPED, and the startup process moved ahead
in WaitForWALToBecomeAvailable for reading from archive which ends up
in this assertion failure. ITSM, a rare scenario and it depends on
what walreceiver does between failure to get WAL from primary and
updating status to WALRCV_STOPPING or WALRCV_STOPPED.If the above race condition is a serious problem, if one thinks at
least it is a problem at all, that needs to be fixed.I don't think we can design a software which has open race conditions
even though they are rarely occurring :)
Yes.
I don't think
just making InstallXLogFileSegmentActive false is enough. By looking
at the comment [1], it doesn't make sense to move ahead for restoring
from the archive location without the WAL receiver fully stopped.
IMO, the real fix is to just remove WalRcvStreaming() and call
XLogShutdownWalRcv() unconditionally. Anyways, we have the
Assert(!WalRcvStreaming()); down below. I don't think it will create
any problem.If WalRcvStreaming() is returning false that means walreceiver is
already stopped so we don't need to shutdown it externally. I think
like we are setting this flag outside start streaming we can reset it
also outside XLogShutdownWalRcv. Or I am fine even if we call
XLogShutdownWalRcv() because if walreceiver is stopped it will just
reset the flag we want it to reset and it will do nothing else.
As I said, I'm okay with just calling XLogShutdownWalRcv()
unconditionally as it just returns in case walreceiver has already
stopped and updates the InstallXLogFileSegmentActive flag to false.
Let's also hear what other hackers have to say about this.
Regards,
Bharath Rupireddy.
At Fri, 11 Feb 2022 22:25:49 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in
I don't think
just making InstallXLogFileSegmentActive false is enough. By looking
at the comment [1], it doesn't make sense to move ahead for restoring
from the archive location without the WAL receiver fully stopped.
IMO, the real fix is to just remove WalRcvStreaming() and call
XLogShutdownWalRcv() unconditionally. Anyways, we have the
Assert(!WalRcvStreaming()); down below. I don't think it will create
any problem.If WalRcvStreaming() is returning false that means walreceiver is
already stopped so we don't need to shutdown it externally. I think
like we are setting this flag outside start streaming we can reset it
also outside XLogShutdownWalRcv. Or I am fine even if we call
XLogShutdownWalRcv() because if walreceiver is stopped it will just
reset the flag we want it to reset and it will do nothing else.As I said, I'm okay with just calling XLogShutdownWalRcv()
unconditionally as it just returns in case walreceiver has already
stopped and updates the InstallXLogFileSegmentActive flag to false.Let's also hear what other hackers have to say about this.
Firstly, good catch:) And the direction seems right.
It seems like an overlook of cc2c7d65fc. We cannot install new wal
segments only while we're in archive recovery. Conversely, we must
turn off it when entering archive recovery (not exiting streaming
recovery). So, *I* feel like to do that at the beginning of
XLOG_FROM_ARCHIVE/PG_WAL rather than the end of XLOG_FROM_STREAM.
(And I would like to remove XLogShutDownWalRcv() and turn off the flag
in StartupXLOG explicitly, but it would be overdone.)
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -12800,6 +12800,16 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
*/
Assert(!WalRcvStreaming());
+ /*
+ * WAL segment installation conflicts with archive
+ * recovery. Make sure it is turned off. XLogShutdownWalRcv()
+ * does that but it is not done when the process has voluntary
+ * exited for example for replication timeout.
+ */
+ LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+ XLogCtl->InstallXLogFileSegmentActive = false;
+ LWLockRelease(ControlFileLock);
+
/* Close any old file we might have open. */
if (readFile >= 0)
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
On Mon, Feb 14, 2022 at 1:44 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
At Fri, 11 Feb 2022 22:25:49 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in
I don't think
just making InstallXLogFileSegmentActive false is enough. By looking
at the comment [1], it doesn't make sense to move ahead for restoring
from the archive location without the WAL receiver fully stopped.
IMO, the real fix is to just remove WalRcvStreaming() and call
XLogShutdownWalRcv() unconditionally. Anyways, we have the
Assert(!WalRcvStreaming()); down below. I don't think it will create
any problem.If WalRcvStreaming() is returning false that means walreceiver is
already stopped so we don't need to shutdown it externally. I think
like we are setting this flag outside start streaming we can reset it
also outside XLogShutdownWalRcv. Or I am fine even if we call
XLogShutdownWalRcv() because if walreceiver is stopped it will just
reset the flag we want it to reset and it will do nothing else.As I said, I'm okay with just calling XLogShutdownWalRcv()
unconditionally as it just returns in case walreceiver has already
stopped and updates the InstallXLogFileSegmentActive flag to false.Let's also hear what other hackers have to say about this.
Firstly, good catch:) And the direction seems right.
It seems like an overlook of cc2c7d65fc. We cannot install new wal
segments only while we're in archive recovery. Conversely, we must
turn off it when entering archive recovery (not exiting streaming
recovery). So, *I* feel like to do that at the beginning of
XLOG_FROM_ARCHIVE/PG_WAL rather than the end of XLOG_FROM_STREAM.(And I would like to remove XLogShutDownWalRcv() and turn off the flag
in StartupXLOG explicitly, but it would be overdone.)--- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -12800,6 +12800,16 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess, */ Assert(!WalRcvStreaming());+ /* + * WAL segment installation conflicts with archive + * recovery. Make sure it is turned off. XLogShutdownWalRcv() + * does that but it is not done when the process has voluntary + * exited for example for replication timeout. + */ + LWLockAcquire(ControlFileLock, LW_EXCLUSIVE); + XLogCtl->InstallXLogFileSegmentActive = false; + LWLockRelease(ControlFileLock); + /* Close any old file we might have open. */ if (readFile >= 0)
Today I encountered the assertion failure [2]2022-08-11 05:10:29.051 UTC [1487086] FATAL: terminating walreceiver due to timeout cp: cannot stat '/home/ubuntu/archived_wal/00000002.history': No such file or directory 2022-08-11 05:10:29.054 UTC [1487075] LOG: switched WAL source from stream to archive after failure 2022-08-11 05:10:29.067 UTC [1487075] LOG: restored log file "000000010000000000000034" from archive TRAP: FailedAssertion("!IsInstallXLogFileSegmentActive()", File: "xlogrecovery.c", Line: 4149, PID: 1487075) postgres: startup waiting for 000000010000000000000034(ExceptionalCondition+0xd6)[0x559d20e6deb1] postgres: startup waiting for 000000010000000000000034(+0x1e4e14)[0x559d20813e14] postgres: startup waiting for 000000010000000000000034(+0x1e50c0)[0x559d208140c0] postgres: startup waiting for 000000010000000000000034(+0x1e4354)[0x559d20813354] postgres: startup waiting for 000000010000000000000034(+0x1e3946)[0x559d20812946] postgres: startup waiting for 000000010000000000000034(+0x1dbc58)[0x559d2080ac58] postgres: startup waiting for 000000010000000000000034(+0x1db3ce)[0x559d2080a3ce] postgres: startup waiting for 000000010000000000000034(XLogReadAhead+0x3d)[0x559d2080aa23] postgres: startup waiting for 000000010000000000000034(+0x1d9713)[0x559d20808713] postgres: startup waiting for 000000010000000000000034(+0x1d909c)[0x559d2080809c] postgres: startup waiting for 000000010000000000000034(XLogPrefetcherReadRecord+0xf9)[0x559d208092a2] postgres: startup waiting for 000000010000000000000034(+0x1e3427)[0x559d20812427] postgres: startup waiting for 000000010000000000000034(PerformWalRecovery+0x465)[0x559d2080fd0f] postgres: startup waiting for 000000010000000000000034(StartupXLOG+0x987)[0x559d207fc732] postgres: startup waiting for 000000010000000000000034(StartupProcessMain+0xf8)[0x559d20bc3990] postgres: startup waiting for 000000010000000000000034(AuxiliaryProcessMain+0x1ae)[0x559d20bb62cf] postgres: startup waiting for 000000010000000000000034(+0x593020)[0x559d20bc2020] postgres: startup waiting for 000000010000000000000034(+0x591ebb)[0x559d20bc0ebb] postgres: startup waiting for 000000010000000000000034(+0x590907)[0x559d20bbf907] /lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7fe518b26520] /lib/x86_64-linux-gnu/libc.so.6(__select+0xbd)[0x7fe518bff74d] postgres: startup waiting for 000000010000000000000034(+0x58e1bd)[0x559d20bbd1bd] postgres: startup waiting for 000000010000000000000034(PostmasterMain+0x14cc)[0x559d20bbcaa2] postgres: startup waiting for 000000010000000000000034(+0x44a752)[0x559d20a79752] /lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x7fe518b0dd90] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x7fe518b0de40] postgres: startup waiting for 000000010000000000000034(_start+0x25)[0x559d206fe5b5] 2022-08-11 05:10:29.161 UTC [1486413] LOG: startup process (PID 1487075) was terminated by signal 6: Aborted 2022-08-11 05:10:29.161 UTC [1486413] LOG: terminating any other active server processes 2022-08-11 05:10:29.169 UTC [1486413] LOG: shutting down due to startup process failure 2022-08-11 05:10:29.181 UTC [1486413] LOG: database system is shut down twice while working on
another patch [1]/messages/by-id/CALj2ACUYz1z6QPduGn5gguCkfd-ko44j4hKcOMtp6fzv9xEWgw@mail.gmail.com. The pattern seems to be that the walreceiver got
killed or crashed and set it's state to WALRCV_STOPPING or
WALRCV_STOPPED by the team the WAL state machine moves to archive and
hence the following XLogShutdownWalRcv() code will not get hit:
/*
* Before we leave XLOG_FROM_STREAM state, make sure that
* walreceiver is not active, so that it won't overwrite
* WAL that we restore from archive.
*/
if (WalRcvStreaming())
ShutdownWalRcv();
I agree with Kyotaro-san to reset InstallXLogFileSegmentActive before
entering into the archive mode. Hence I tweaked the code introduced by
the following commit a bit, the result v1 patch is attached herewith.
Please review it.
commit cc2c7d65fc27e877c9f407587b0b92d46cd6dd16
Author: Noah Misch <noah@leadboat.com>
Date: Mon Jun 28 18:34:56 2021 -0700
Skip WAL recycling and preallocation during archive recovery.
[1]: /messages/by-id/CALj2ACUYz1z6QPduGn5gguCkfd-ko44j4hKcOMtp6fzv9xEWgw@mail.gmail.com
[2]: 2022-08-11 05:10:29.051 UTC [1487086] FATAL: terminating walreceiver due to timeout cp: cannot stat '/home/ubuntu/archived_wal/00000002.history': No such file or directory 2022-08-11 05:10:29.054 UTC [1487075] LOG: switched WAL source from stream to archive after failure 2022-08-11 05:10:29.067 UTC [1487075] LOG: restored log file "000000010000000000000034" from archive TRAP: FailedAssertion("!IsInstallXLogFileSegmentActive()", File: "xlogrecovery.c", Line: 4149, PID: 1487075) postgres: startup waiting for 000000010000000000000034(ExceptionalCondition+0xd6)[0x559d20e6deb1] postgres: startup waiting for 000000010000000000000034(+0x1e4e14)[0x559d20813e14] postgres: startup waiting for 000000010000000000000034(+0x1e50c0)[0x559d208140c0] postgres: startup waiting for 000000010000000000000034(+0x1e4354)[0x559d20813354] postgres: startup waiting for 000000010000000000000034(+0x1e3946)[0x559d20812946] postgres: startup waiting for 000000010000000000000034(+0x1dbc58)[0x559d2080ac58] postgres: startup waiting for 000000010000000000000034(+0x1db3ce)[0x559d2080a3ce] postgres: startup waiting for 000000010000000000000034(XLogReadAhead+0x3d)[0x559d2080aa23] postgres: startup waiting for 000000010000000000000034(+0x1d9713)[0x559d20808713] postgres: startup waiting for 000000010000000000000034(+0x1d909c)[0x559d2080809c] postgres: startup waiting for 000000010000000000000034(XLogPrefetcherReadRecord+0xf9)[0x559d208092a2] postgres: startup waiting for 000000010000000000000034(+0x1e3427)[0x559d20812427] postgres: startup waiting for 000000010000000000000034(PerformWalRecovery+0x465)[0x559d2080fd0f] postgres: startup waiting for 000000010000000000000034(StartupXLOG+0x987)[0x559d207fc732] postgres: startup waiting for 000000010000000000000034(StartupProcessMain+0xf8)[0x559d20bc3990] postgres: startup waiting for 000000010000000000000034(AuxiliaryProcessMain+0x1ae)[0x559d20bb62cf] postgres: startup waiting for 000000010000000000000034(+0x593020)[0x559d20bc2020] postgres: startup waiting for 000000010000000000000034(+0x591ebb)[0x559d20bc0ebb] postgres: startup waiting for 000000010000000000000034(+0x590907)[0x559d20bbf907] /lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7fe518b26520] /lib/x86_64-linux-gnu/libc.so.6(__select+0xbd)[0x7fe518bff74d] postgres: startup waiting for 000000010000000000000034(+0x58e1bd)[0x559d20bbd1bd] postgres: startup waiting for 000000010000000000000034(PostmasterMain+0x14cc)[0x559d20bbcaa2] postgres: startup waiting for 000000010000000000000034(+0x44a752)[0x559d20a79752] /lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x7fe518b0dd90] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x7fe518b0de40] postgres: startup waiting for 000000010000000000000034(_start+0x25)[0x559d206fe5b5] 2022-08-11 05:10:29.161 UTC [1486413] LOG: startup process (PID 1487075) was terminated by signal 6: Aborted 2022-08-11 05:10:29.161 UTC [1486413] LOG: terminating any other active server processes 2022-08-11 05:10:29.169 UTC [1486413] LOG: shutting down due to startup process failure 2022-08-11 05:10:29.181 UTC [1486413] LOG: database system is shut down
2022-08-11 05:10:29.051 UTC [1487086] FATAL: terminating walreceiver
due to timeout
cp: cannot stat '/home/ubuntu/archived_wal/00000002.history': No such
file or directory
2022-08-11 05:10:29.054 UTC [1487075] LOG: switched WAL source from
stream to archive after failure
2022-08-11 05:10:29.067 UTC [1487075] LOG: restored log file
"000000010000000000000034" from archive
TRAP: FailedAssertion("!IsInstallXLogFileSegmentActive()", File:
"xlogrecovery.c", Line: 4149, PID: 1487075)
postgres: startup waiting for
000000010000000000000034(ExceptionalCondition+0xd6)[0x559d20e6deb1]
postgres: startup waiting for
000000010000000000000034(+0x1e4e14)[0x559d20813e14]
postgres: startup waiting for
000000010000000000000034(+0x1e50c0)[0x559d208140c0]
postgres: startup waiting for
000000010000000000000034(+0x1e4354)[0x559d20813354]
postgres: startup waiting for
000000010000000000000034(+0x1e3946)[0x559d20812946]
postgres: startup waiting for
000000010000000000000034(+0x1dbc58)[0x559d2080ac58]
postgres: startup waiting for
000000010000000000000034(+0x1db3ce)[0x559d2080a3ce]
postgres: startup waiting for
000000010000000000000034(XLogReadAhead+0x3d)[0x559d2080aa23]
postgres: startup waiting for
000000010000000000000034(+0x1d9713)[0x559d20808713]
postgres: startup waiting for
000000010000000000000034(+0x1d909c)[0x559d2080809c]
postgres: startup waiting for
000000010000000000000034(XLogPrefetcherReadRecord+0xf9)[0x559d208092a2]
postgres: startup waiting for
000000010000000000000034(+0x1e3427)[0x559d20812427]
postgres: startup waiting for
000000010000000000000034(PerformWalRecovery+0x465)[0x559d2080fd0f]
postgres: startup waiting for
000000010000000000000034(StartupXLOG+0x987)[0x559d207fc732]
postgres: startup waiting for
000000010000000000000034(StartupProcessMain+0xf8)[0x559d20bc3990]
postgres: startup waiting for
000000010000000000000034(AuxiliaryProcessMain+0x1ae)[0x559d20bb62cf]
postgres: startup waiting for
000000010000000000000034(+0x593020)[0x559d20bc2020]
postgres: startup waiting for
000000010000000000000034(+0x591ebb)[0x559d20bc0ebb]
postgres: startup waiting for
000000010000000000000034(+0x590907)[0x559d20bbf907]
/lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7fe518b26520]
/lib/x86_64-linux-gnu/libc.so.6(__select+0xbd)[0x7fe518bff74d]
postgres: startup waiting for
000000010000000000000034(+0x58e1bd)[0x559d20bbd1bd]
postgres: startup waiting for
000000010000000000000034(PostmasterMain+0x14cc)[0x559d20bbcaa2]
postgres: startup waiting for
000000010000000000000034(+0x44a752)[0x559d20a79752]
/lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x7fe518b0dd90]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x7fe518b0de40]
postgres: startup waiting for
000000010000000000000034(_start+0x25)[0x559d206fe5b5]
2022-08-11 05:10:29.161 UTC [1486413] LOG: startup process (PID
1487075) was terminated by signal 6: Aborted
2022-08-11 05:10:29.161 UTC [1486413] LOG: terminating any other
active server processes
2022-08-11 05:10:29.169 UTC [1486413] LOG: shutting down due to
startup process failure
2022-08-11 05:10:29.181 UTC [1486413] LOG: database system is shut down
2022-08-11 05:34:54.234 UTC [1509079] LOG: switched WAL source from
stream to archive after failure
2022-08-11 05:34:54.234 UTC [1509079] LOG: current WAL source is
archive and last WAL source is stream, standby mode 1
2022-08-11 05:34:54.248 UTC [1509079] LOG: restored log file
"000000010000000000000003" from archive
TRAP: FailedAssertion("!IsInstallXLogFileSegmentActive()", File:
"xlogrecovery.c", Line: 4155, PID: 1509079)
postgres: startup waiting for
000000010000000000000003(ExceptionalCondition+0xd6)[0x55e9cb1cbf2a]
postgres: startup waiting for
000000010000000000000003(+0x1e4e8d)[0x55e9cab71e8d]
postgres: startup waiting for
000000010000000000000003(+0x1e5139)[0x55e9cab72139]
postgres: startup waiting for
000000010000000000000003(+0x1e43cd)[0x55e9cab713cd]
postgres: startup waiting for
000000010000000000000003(+0x1e3946)[0x55e9cab70946]
postgres: startup waiting for
000000010000000000000003(+0x1dbc58)[0x55e9cab68c58]
postgres: startup waiting for
000000010000000000000003(+0x1db0b7)[0x55e9cab680b7]
postgres: startup waiting for
000000010000000000000003(XLogReadAhead+0x3d)[0x55e9cab68a23]
postgres: startup waiting for
000000010000000000000003(+0x1d9713)[0x55e9cab66713]
postgres: startup waiting for
000000010000000000000003(+0x1d909c)[0x55e9cab6609c]
postgres: startup waiting for
000000010000000000000003(XLogPrefetcherReadRecord+0xf9)[0x55e9cab672a2]
postgres: startup waiting for
000000010000000000000003(+0x1e3427)[0x55e9cab70427]
postgres: startup waiting for
000000010000000000000003(+0x1e4911)[0x55e9cab71911]
postgres: startup waiting for
000000010000000000000003(InitWalRecovery+0xaf4)[0x55e9cab6bc4e]
postgres: startup waiting for
000000010000000000000003(StartupXLOG+0x4af)[0x55e9cab5a25a]
postgres: startup waiting for
000000010000000000000003(StartupProcessMain+0xf8)[0x55e9caf21a09]
postgres: startup waiting for
000000010000000000000003(AuxiliaryProcessMain+0x1ae)[0x55e9caf14348]
postgres: startup waiting for
000000010000000000000003(+0x593099)[0x55e9caf20099]
postgres: startup waiting for
000000010000000000000003(PostmasterMain+0x1476)[0x55e9caf1aac5]
postgres: startup waiting for
000000010000000000000003(+0x44a7cb)[0x55e9cadd77cb]
/lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x7f754c287d90]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x7f754c287e40]
postgres: startup waiting for
000000010000000000000003(_start+0x25)[0x55e9caa5c5b5]
2022-08-11 05:34:54.336 UTC [1509076] LOG: startup process (PID
1509079) was terminated by signal 6: Aborted
2022-08-11 05:34:54.336 UTC [1509076] LOG: aborting startup due to
startup process failure
2022-08-11 05:34:54.337 UTC [1509076] LOG: database system is shut down
--
Bharath Rupireddy
RDS Open Source Databases: https://aws.amazon.com/rds/postgresql/
Attachments:
v1-0001-Avoid-race-condition-in-resetting-XLogCtl-Install.patchapplication/octet-stream; name=v1-0001-Avoid-race-condition-in-resetting-XLogCtl-Install.patchDownload
From 92924b28fc28038ecf694381250a6cea52bafd90 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Date: Thu, 11 Aug 2022 16:27:05 +0000
Subject: [PATCH v1] Avoid race condition in resetting
XLogCtl->InstallXLogFileSegmentActive
---
src/backend/access/transam/xlog.c | 15 +++++++--------
src/backend/access/transam/xlogrecovery.c | 17 ++++++++++++-----
src/include/access/xlog.h | 2 +-
3 files changed, 20 insertions(+), 14 deletions(-)
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 9cedd6876f..52a22c944c 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -8836,26 +8836,25 @@ GetOldestRestartPoint(XLogRecPtr *oldrecptr, TimeLineID *oldtli)
LWLockRelease(ControlFileLock);
}
-/* Thin wrapper around ShutdownWalRcv(). */
+/* Enable WAL file recycling and preallocation. */
void
-XLogShutdownWalRcv(void)
+SetInstallXLogFileSegmentActive(void)
{
- ShutdownWalRcv();
-
LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
- XLogCtl->InstallXLogFileSegmentActive = false;
+ XLogCtl->InstallXLogFileSegmentActive = true;
LWLockRelease(ControlFileLock);
}
-/* Enable WAL file recycling and preallocation. */
+/* Disable WAL file recycling and preallocation. */
void
-SetInstallXLogFileSegmentActive(void)
+ResetInstallXLogFileSegmentActive(void)
{
LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
- XLogCtl->InstallXLogFileSegmentActive = true;
+ XLogCtl->InstallXLogFileSegmentActive = false;
LWLockRelease(ControlFileLock);
}
+
bool
IsInstallXLogFileSegmentActive(void)
{
diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index a59a0e826b..651191e93c 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -1395,7 +1395,7 @@ FinishWalRecovery(void)
* over these records and subsequent ones if it's still alive when we
* start writing WAL.
*/
- XLogShutdownWalRcv();
+ ShutdownWalRcv();
/*
* We are now done reading the xlog from stream. Turn off streaming
@@ -1403,7 +1403,7 @@ FinishWalRecovery(void)
* recovery, e.g., timeline history file) from archive or pg_wal.
*
* Note that standby mode must be turned off after killing WAL receiver,
- * i.e., calling XLogShutdownWalRcv().
+ * i.e., calling ShutdownWalRcv().
*/
Assert(!WalRcvStreaming());
StandbyMode = false;
@@ -3469,7 +3469,7 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
*/
if (StandbyMode && CheckForStandbyTrigger())
{
- XLogShutdownWalRcv();
+ ShutdownWalRcv();
return XLREAD_FAIL;
}
@@ -3517,7 +3517,7 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
* WAL that we restore from archive.
*/
if (WalRcvStreaming())
- XLogShutdownWalRcv();
+ ShutdownWalRcv();
/*
* Before we sleep, re-scan for possible new timelines if
@@ -3604,6 +3604,13 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
*/
Assert(!WalRcvStreaming());
+ /*
+ * WAL segment installation conflicts with archive recovery.
+ * Make sure it is turned off.
+ */
+ if (currentSource == XLOG_FROM_ARCHIVE)
+ ResetInstallXLogFileSegmentActive();
+
/* Close any old file we might have open. */
if (readFile >= 0)
{
@@ -3647,7 +3654,7 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
*/
if (pendingWalRcvRestart && !startWalReceiver)
{
- XLogShutdownWalRcv();
+ ShutdownWalRcv();
/*
* Re-scan for possible new timelines if we were
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index cd674c3c23..52cd01b974 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -256,8 +256,8 @@ extern bool XLogCheckpointNeeded(XLogSegNo new_segno);
extern void SwitchIntoArchiveRecovery(XLogRecPtr EndRecPtr, TimeLineID replayTLI);
extern void ReachedEndOfBackup(XLogRecPtr EndRecPtr, TimeLineID tli);
extern void SetInstallXLogFileSegmentActive(void);
+extern void ResetInstallXLogFileSegmentActive(void);
extern bool IsInstallXLogFileSegmentActive(void);
-extern void XLogShutdownWalRcv(void);
/*
* Routines to start, stop, and get status of a base backup.
--
2.34.1
On Thu, Aug 11, 2022 at 10:06 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
Today I encountered the assertion failure [2] twice while working on
another patch [1]. The pattern seems to be that the walreceiver got
killed or crashed and set it's state to WALRCV_STOPPING or
WALRCV_STOPPED by the team the WAL state machine moves to archive and
hence the following XLogShutdownWalRcv() code will not get hit:/*
* Before we leave XLOG_FROM_STREAM state, make sure that
* walreceiver is not active, so that it won't overwrite
* WAL that we restore from archive.
*/
if (WalRcvStreaming())
ShutdownWalRcv();I agree with Kyotaro-san to reset InstallXLogFileSegmentActive before
entering into the archive mode. Hence I tweaked the code introduced by
the following commit a bit, the result v1 patch is attached herewith.
Please review it.
I added it to the current commitfest to not lose track of it:
https://commitfest.postgresql.org/39/3814/.
--
Bharath Rupireddy
RDS Open Source Databases: https://aws.amazon.com/rds/postgresql/
On Mon, Aug 15, 2022 at 11:30 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
On Thu, Aug 11, 2022 at 10:06 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:Today I encountered the assertion failure [2] twice while working on
another patch [1]. The pattern seems to be that the walreceiver got
killed or crashed and set it's state to WALRCV_STOPPING or
WALRCV_STOPPED by the team the WAL state machine moves to archive and
hence the following XLogShutdownWalRcv() code will not get hit:/*
* Before we leave XLOG_FROM_STREAM state, make sure that
* walreceiver is not active, so that it won't overwrite
* WAL that we restore from archive.
*/
if (WalRcvStreaming())
ShutdownWalRcv();I agree with Kyotaro-san to reset InstallXLogFileSegmentActive before
entering into the archive mode. Hence I tweaked the code introduced by
the following commit a bit, the result v1 patch is attached herewith.
Please review it.I added it to the current commitfest to not lose track of it:
https://commitfest.postgresql.org/39/3814/.
Today, I spent some more time on this issue, I modified the v1 patch
posted upthread a bit - now resetting the InstallXLogFileSegmentActive
only when the WAL source switched to archive, not every time in
archive mode.
I'm attaching v2 patch here with, please review it further.
Just for the records - there's another report of the assertion failure
at [1]/messages/by-id/20220909.172949.2223165886970819060.horikyota.ntt@gmail.com, many thanks to Kyotaro-san for providing consistent
reproducible steps.
[1]: /messages/by-id/20220909.172949.2223165886970819060.horikyota.ntt@gmail.com
--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Attachments:
v2-0001-Avoid-race-condition-in-resetting-XLogCtl-Install.patchapplication/octet-stream; name=v2-0001-Avoid-race-condition-in-resetting-XLogCtl-Install.patchDownload
From 8eba41a83dd98ccf55c12225af5b9b659e352c6c Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Date: Sat, 10 Sep 2022 02:12:09 +0000
Subject: [PATCH v2] Avoid race condition in resetting
XLogCtl->InstallXLogFileSegmentActive
There's a possibility of recovery hitting
Assert(!IsInstallXLogFileSegmentActive()); while reading WAL from
archive immediately after failing to stream from primary. The
pattern seems to be that the walreceiver gets killed or crashed
and its status set to WALRCV_STOPPING or WALRCV_STOPPED when
WaitForWALToBecomeAvailable()'s state machine is in stream mode,
by the team the state machine moves to archive and the code that
resets XLogCtl->InstallXLogFileSegmentActive conditionally isn't
getting hit, hence the assertion failure.
Per commit cc2c7d65fc27e877c9f407587b0b92d46cd6dd16, WAL recycling
and preallocation conflicts with archive recovery hence we needed
to disable it by resetting XLogCtl->InstallXLogFileSegmentActive
before switching WaitForWALToBecomeAvailable()'s state machine
from streaming to archive. The commit did this by adding it as
part of walreceiver shutdown code which isn't called
unconditionally always causing race conditions and assertion failure.
A better way to ensure that WAL recycling and preallocation is
disabled in archive recovery is to reset
XLogCtl->InstallXLogFileSegmentActive when
WaitForWALToBecomeAvailable()'s state machine switches the WAL
source to archive.
Reported-by: Dilip Kumar
Reported-by: Kyotaro Horiguchi
Author: Bharath Rupireddy
Discussion: https://www.postgresql.org/message-id/CAFiTN-sE3ry=ycMPVtC+Djw4Fd7gbUGVv_qqw6qfzp=JLvqT3g@mail.gmail.com
Discussion: https://www.postgresql.org/message-id/20220909.172949.2223165886970819060.horikyota.ntt@gmail.com
---
src/backend/access/transam/xlog.c | 15 +++++++--------
src/backend/access/transam/xlogrecovery.c | 19 ++++++++++++++-----
src/include/access/xlog.h | 2 +-
3 files changed, 22 insertions(+), 14 deletions(-)
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 7a710e6490..b44a84b8ec 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -8838,26 +8838,25 @@ GetOldestRestartPoint(XLogRecPtr *oldrecptr, TimeLineID *oldtli)
LWLockRelease(ControlFileLock);
}
-/* Thin wrapper around ShutdownWalRcv(). */
+/* Enable WAL file recycling and preallocation. */
void
-XLogShutdownWalRcv(void)
+SetInstallXLogFileSegmentActive(void)
{
- ShutdownWalRcv();
-
LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
- XLogCtl->InstallXLogFileSegmentActive = false;
+ XLogCtl->InstallXLogFileSegmentActive = true;
LWLockRelease(ControlFileLock);
}
-/* Enable WAL file recycling and preallocation. */
+/* Disable WAL file recycling and preallocation. */
void
-SetInstallXLogFileSegmentActive(void)
+ResetInstallXLogFileSegmentActive(void)
{
LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
- XLogCtl->InstallXLogFileSegmentActive = true;
+ XLogCtl->InstallXLogFileSegmentActive = false;
LWLockRelease(ControlFileLock);
}
+
bool
IsInstallXLogFileSegmentActive(void)
{
diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index 9a80084a68..ae71a92f7a 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -1395,7 +1395,7 @@ FinishWalRecovery(void)
* over these records and subsequent ones if it's still alive when we
* start writing WAL.
*/
- XLogShutdownWalRcv();
+ ShutdownWalRcv();
/*
* We are now done reading the xlog from stream. Turn off streaming
@@ -1403,7 +1403,7 @@ FinishWalRecovery(void)
* recovery, e.g., timeline history file) from archive or pg_wal.
*
* Note that standby mode must be turned off after killing WAL receiver,
- * i.e., calling XLogShutdownWalRcv().
+ * i.e., calling ShutdownWalRcv().
*/
Assert(!WalRcvStreaming());
StandbyMode = false;
@@ -3485,7 +3485,7 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
*/
if (StandbyMode && CheckForStandbyTrigger())
{
- XLogShutdownWalRcv();
+ ShutdownWalRcv();
return XLREAD_FAIL;
}
@@ -3533,7 +3533,7 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
* WAL that we restore from archive.
*/
if (WalRcvStreaming())
- XLogShutdownWalRcv();
+ ShutdownWalRcv();
/*
* Before we sleep, re-scan for possible new timelines if
@@ -3599,9 +3599,18 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
}
if (currentSource != oldSource)
+ {
+ /*
+ * WAL segment installation conflicts with archive recovery, hence
+ * make sure it is turned off before fetching WAL from archive.
+ */
+ if (currentSource == XLOG_FROM_ARCHIVE)
+ ResetInstallXLogFileSegmentActive();
+
elog(DEBUG2, "switched WAL source from %s to %s after %s",
xlogSourceNames[oldSource], xlogSourceNames[currentSource],
lastSourceFailed ? "failure" : "success");
+ }
/*
* We've now handled possible failure. Try to read from the chosen
@@ -3663,7 +3672,7 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
*/
if (pendingWalRcvRestart && !startWalReceiver)
{
- XLogShutdownWalRcv();
+ ShutdownWalRcv();
/*
* Re-scan for possible new timelines if we were
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index cd674c3c23..52cd01b974 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -256,8 +256,8 @@ extern bool XLogCheckpointNeeded(XLogSegNo new_segno);
extern void SwitchIntoArchiveRecovery(XLogRecPtr EndRecPtr, TimeLineID replayTLI);
extern void ReachedEndOfBackup(XLogRecPtr EndRecPtr, TimeLineID tli);
extern void SetInstallXLogFileSegmentActive(void);
+extern void ResetInstallXLogFileSegmentActive(void);
extern bool IsInstallXLogFileSegmentActive(void);
-extern void XLogShutdownWalRcv(void);
/*
* Routines to start, stop, and get status of a base backup.
--
2.34.1
On Sat, Sep 10, 2022 at 07:52:01AM +0530, Bharath Rupireddy wrote:
Today, I spent some more time on this issue, I modified the v1 patch
posted upthread a bit - now resetting the InstallXLogFileSegmentActive
only when the WAL source switched to archive, not every time in
archive mode.I'm attaching v2 patch here with, please review it further.
Just for the records - there's another report of the assertion failure
at [1], many thanks to Kyotaro-san for providing consistent
reproducible steps.[1] - /messages/by-id/20220909.172949.2223165886970819060.horikyota.ntt@gmail.com
Well, the fact that cc2c7d6 is involved here makes this thread an open
item for PG15 as far as I can see, assigned to Noah (added now in
CC).
While reading your last patch, I have found rather confusing that we
only reset InstallXLogFileSegmentActive when the current source is the
archives and it does not match the old source. This code is already
complicated, and I don't think that having more assumptions in its
internals is a good thing when it comes to its long-term maintenance.
In short, HEAD is rather conservative when it comes to set
InstallXLogFileSegmentActive, switching it only when we request
streaming with RequestXLogStreaming(), but too aggressive when it
comes to reset it and we want something in the middle ground. FWIW, I
find better the approach taken by Horiguchi-san in [1]/messages/by-id/20220214.171428.735280610520122187.horikyota.ntt@gmail.com -- Michael to reset the
state before we attempt to read WAL from the archives *or* pg_wal,
after we know that the last source has failed, where we know that we
are not streaming yet (but recovery may be requested soon).
Side note: I don't see much the point of having two routines named
SetInstallXLogFileSegmentActive and ResetInstallXLogFileSegmentActive
that do the opposite thing. We could just have one.
[1]: /messages/by-id/20220214.171428.735280610520122187.horikyota.ntt@gmail.com -- Michael
--
Michael
On Mon, Sep 12, 2022 at 12:30 PM Michael Paquier <michael@paquier.xyz> wrote:
On Sat, Sep 10, 2022 at 07:52:01AM +0530, Bharath Rupireddy wrote:
Today, I spent some more time on this issue, I modified the v1 patch
posted upthread a bit - now resetting the InstallXLogFileSegmentActive
only when the WAL source switched to archive, not every time in
archive mode.I'm attaching v2 patch here with, please review it further.
Just for the records - there's another report of the assertion failure
at [1], many thanks to Kyotaro-san for providing consistent
reproducible steps.[1] - /messages/by-id/20220909.172949.2223165886970819060.horikyota.ntt@gmail.com
Well, the fact that cc2c7d6 is involved here makes this thread an open
item for PG15 as far as I can see, assigned to Noah (added now in
CC).
Thanks.
While reading your last patch, I have found rather confusing that we
only reset InstallXLogFileSegmentActive when the current source is the
archives and it does not match the old source. This code is already
complicated, and I don't think that having more assumptions in its
internals is a good thing when it comes to its long-term maintenance.
In short, HEAD is rather conservative when it comes to set
InstallXLogFileSegmentActive, switching it only when we request
streaming with RequestXLogStreaming(), but too aggressive when it
comes to reset it and we want something in the middle ground. FWIW, I
find better the approach taken by Horiguchi-san in [1] to reset the
state before we attempt to read WAL from the archives *or* pg_wal,
after we know that the last source has failed, where we know that we
are not streaming yet (but recovery may be requested soon).[1]: /messages/by-id/20220214.171428.735280610520122187.horikyota.ntt@gmail.com
I think the fix at [1] is wrong. It basically resets
InstallXLogFileSegmentActive for both XLOG_FROM_ARCHIVE and
XLOG_FROM_PG_WAL, remember that we don't need WAL files preallocation
and recycling just for archive. Moreover, if we were to reset just for
archive there, it's too aggressive, meaning for every WAL record, we
take ControlFileLock in exclusive mode to reset it.
IMO, doing it once when we switch the source to archive is the correct
way because it avoids frequent ControlFileLock acquisitions and makes
the fix more intuitive. And we have a comment saying why we reset
InstallXLogFileSegmentActive, if required we can point to the commit
cc2c7d6 in the comments.
Side note: I don't see much the point of having two routines named
SetInstallXLogFileSegmentActive and ResetInstallXLogFileSegmentActive
that do the opposite thing. We could just have one.
It's an implementation choice. However, I changed it in the v3 patch.
Please review the attached v3 patch further.
--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Attachments:
v3-0001-Avoid-race-condition-in-resetting-XLogCtl-Install.patchapplication/octet-stream; name=v3-0001-Avoid-race-condition-in-resetting-XLogCtl-Install.patchDownload
From 272477a0afc5e9b12defe4b10ec2bf3f1e6a8c67 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Date: Mon, 12 Sep 2022 12:22:43 +0000
Subject: [PATCH v3] Avoid race condition in resetting
XLogCtl->InstallXLogFileSegmentActive
There's a possibility of recovery hitting
Assert(!IsInstallXLogFileSegmentActive()); while reading WAL from
archive immediately after failing to stream from primary. The
pattern seems to be that the walreceiver gets killed or crashed
and its status set to WALRCV_STOPPING or WALRCV_STOPPED when
WaitForWALToBecomeAvailable()'s state machine is in stream mode,
by the team the state machine moves to archive and the code that
resets XLogCtl->InstallXLogFileSegmentActive conditionally isn't
getting hit, hence the assertion failure.
Per commit cc2c7d65fc27e877c9f407587b0b92d46cd6dd16, WAL recycling
and preallocation conflicts with archive recovery hence we needed
to disable it by resetting XLogCtl->InstallXLogFileSegmentActive
before switching WaitForWALToBecomeAvailable()'s state machine
from streaming to archive. The commit did this by adding it as
part of walreceiver shutdown code which isn't called
unconditionally always causing race conditions and assertion failure.
A better way to ensure that WAL recycling and preallocation is
disabled in archive recovery is to reset
XLogCtl->InstallXLogFileSegmentActive when
WaitForWALToBecomeAvailable()'s state machine switches the WAL
source to archive.
Reported-by: Dilip Kumar
Reported-by: Kyotaro Horiguchi
Reviewed-by: Michael Paquier
Author: Bharath Rupireddy
Discussion: https://www.postgresql.org/message-id/CAFiTN-sE3ry=ycMPVtC+Djw4Fd7gbUGVv_qqw6qfzp=JLvqT3g@mail.gmail.com
Discussion: https://www.postgresql.org/message-id/20220909.172949.2223165886970819060.horikyota.ntt@gmail.com
---
src/backend/access/transam/xlog.c | 24 ++++++++---------------
src/backend/access/transam/xlogrecovery.c | 21 ++++++++++++++------
src/include/access/xlog.h | 3 +--
3 files changed, 24 insertions(+), 24 deletions(-)
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 7a710e6490..ca5b6e798e 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -4500,7 +4500,7 @@ BootStrapXLOG(void)
pg_crc32c crc;
/* allow ordinary WAL segment creation, like StartupXLOG() would */
- SetInstallXLogFileSegmentActive();
+ UpdateInstallXLogFileSegmentActive(false);
/*
* Select a hopefully-unique system identifier code for this installation.
@@ -5364,7 +5364,7 @@ StartupXLOG(void)
* Allow ordinary WAL segment creation before possibly switching to a new
* timeline, which creates a new segment, and after the last ReadRecord().
*/
- SetInstallXLogFileSegmentActive();
+ UpdateInstallXLogFileSegmentActive(false);
/*
* Consider whether we need to assign a new timeline ID.
@@ -8838,23 +8838,15 @@ GetOldestRestartPoint(XLogRecPtr *oldrecptr, TimeLineID *oldtli)
LWLockRelease(ControlFileLock);
}
-/* Thin wrapper around ShutdownWalRcv(). */
+/* Enable/Disable WAL file recycling and preallocation. */
void
-XLogShutdownWalRcv(void)
-{
- ShutdownWalRcv();
-
- LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
- XLogCtl->InstallXLogFileSegmentActive = false;
- LWLockRelease(ControlFileLock);
-}
-
-/* Enable WAL file recycling and preallocation. */
-void
-SetInstallXLogFileSegmentActive(void)
+UpdateInstallXLogFileSegmentActive(bool reset)
{
LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
- XLogCtl->InstallXLogFileSegmentActive = true;
+ if (reset)
+ XLogCtl->InstallXLogFileSegmentActive = false;
+ else
+ XLogCtl->InstallXLogFileSegmentActive = true;
LWLockRelease(ControlFileLock);
}
diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index 9a80084a68..f21f67b85a 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -1395,7 +1395,7 @@ FinishWalRecovery(void)
* over these records and subsequent ones if it's still alive when we
* start writing WAL.
*/
- XLogShutdownWalRcv();
+ ShutdownWalRcv();
/*
* We are now done reading the xlog from stream. Turn off streaming
@@ -1403,7 +1403,7 @@ FinishWalRecovery(void)
* recovery, e.g., timeline history file) from archive or pg_wal.
*
* Note that standby mode must be turned off after killing WAL receiver,
- * i.e., calling XLogShutdownWalRcv().
+ * i.e., calling ShutdownWalRcv().
*/
Assert(!WalRcvStreaming());
StandbyMode = false;
@@ -3485,7 +3485,7 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
*/
if (StandbyMode && CheckForStandbyTrigger())
{
- XLogShutdownWalRcv();
+ ShutdownWalRcv();
return XLREAD_FAIL;
}
@@ -3533,7 +3533,7 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
* WAL that we restore from archive.
*/
if (WalRcvStreaming())
- XLogShutdownWalRcv();
+ ShutdownWalRcv();
/*
* Before we sleep, re-scan for possible new timelines if
@@ -3599,9 +3599,18 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
}
if (currentSource != oldSource)
+ {
+ /*
+ * WAL segment installation conflicts with archive recovery, hence
+ * make sure it is turned off before fetching WAL from archive.
+ */
+ if (currentSource == XLOG_FROM_ARCHIVE)
+ UpdateInstallXLogFileSegmentActive(true);
+
elog(DEBUG2, "switched WAL source from %s to %s after %s",
xlogSourceNames[oldSource], xlogSourceNames[currentSource],
lastSourceFailed ? "failure" : "success");
+ }
/*
* We've now handled possible failure. Try to read from the chosen
@@ -3663,7 +3672,7 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
*/
if (pendingWalRcvRestart && !startWalReceiver)
{
- XLogShutdownWalRcv();
+ ShutdownWalRcv();
/*
* Re-scan for possible new timelines if we were
@@ -3713,7 +3722,7 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
tli, curFileTLI);
}
curFileTLI = tli;
- SetInstallXLogFileSegmentActive();
+ UpdateInstallXLogFileSegmentActive(false);
RequestXLogStreaming(tli, ptr, PrimaryConnInfo,
PrimarySlotName,
wal_receiver_create_temp_slot);
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index cd674c3c23..a23fe83b52 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -255,9 +255,8 @@ extern void RemoveNonParentXlogFiles(XLogRecPtr switchpoint, TimeLineID newTLI);
extern bool XLogCheckpointNeeded(XLogSegNo new_segno);
extern void SwitchIntoArchiveRecovery(XLogRecPtr EndRecPtr, TimeLineID replayTLI);
extern void ReachedEndOfBackup(XLogRecPtr EndRecPtr, TimeLineID tli);
-extern void SetInstallXLogFileSegmentActive(void);
+extern void UpdateInstallXLogFileSegmentActive(bool reset);
extern bool IsInstallXLogFileSegmentActive(void);
-extern void XLogShutdownWalRcv(void);
/*
* Routines to start, stop, and get status of a base backup.
--
2.34.1
On Mon, Sep 12, 2022 at 05:58:08PM +0530, Bharath Rupireddy wrote:
On Mon, Sep 12, 2022 at 12:30 PM Michael Paquier <michael@paquier.xyz> wrote:
On Sat, Sep 10, 2022 at 07:52:01AM +0530, Bharath Rupireddy wrote:
Just for the records - there's another report of the assertion failure
at [1], many thanks to Kyotaro-san for providing consistent
reproducible steps.[1] - /messages/by-id/20220909.172949.2223165886970819060.horikyota.ntt@gmail.com
I plan to use that message's patch, because it guarantees WALRCV_STOPPED at
the code location being changed. Today, in the unlikely event of
!WalRcvStreaming() due to WALRCV_WAITING or WALRCV_STOPPING, that code
proceeds without waiting for WALRCV_STOPPED. I think WALRCV_STOPPING can't
happen there. Even if WALRCV_WAITING also can't happen, it's a simplicity win
to remove the need to think about that. Dilip Kumar and Nathan Bossart also
stated support for that design.
If WALRCV_WAITING or WALRCV_STOPPING can happen at that patch's code site, I
perhaps should back-patch the change to released versions. Does anyone know
whether one or both can happen?
FWIW, I
find better the approach taken by Horiguchi-san in [1] to reset the
state before we attempt to read WAL from the archives *or* pg_wal,
after we know that the last source has failed, where we know that we
are not streaming yet (but recovery may be requested soon).[1]: /messages/by-id/20220214.171428.735280610520122187.horikyota.ntt@gmail.com
I think the fix at [1] is wrong. It basically resets
InstallXLogFileSegmentActive for both XLOG_FROM_ARCHIVE and
XLOG_FROM_PG_WAL, remember that we don't need WAL files preallocation
and recycling just for archive. Moreover, if we were to reset just for
archive there, it's too aggressive, meaning for every WAL record, we
take ControlFileLock in exclusive mode to reset it.IMO, doing it once when we switch the source to archive is the correct
way because it avoids frequent ControlFileLock acquisitions and makes
the fix more intuitive. And we have a comment saying why we reset
InstallXLogFileSegmentActive, if required we can point to the commit
cc2c7d6 in the comments.
The startup process maintains the invariant that archive recovery and
streaming recovery don't overlap in time; one stops before the other starts.
As long as it achieves that, I'm not aware of a cause to care whether the flag
reset happens when streaming ends vs. when archive recovery begins. If the
startup process develops a defect such that it ceases to maintain that
invariant, "when archive recovery begins" does have the
advantage-or-disadvantage of causing a failure in non-assert builds. The
other way gets only an assertion failure. Michael Paquier or Bharath
Rupireddy, did you envision benefits other than that one?
On Mon, Sep 12, 2022 at 08:22:56PM -0700, Noah Misch wrote:
I plan to use that message's patch, because it guarantees WALRCV_STOPPED at
the code location being changed. Today, in the unlikely event of
!WalRcvStreaming() due to WALRCV_WAITING or WALRCV_STOPPING, that code
proceeds without waiting for WALRCV_STOPPED. I think WALRCV_STOPPING can't
happen there. Even if WALRCV_WAITING also can't happen, it's a simplicity win
to remove the need to think about that. Dilip Kumar and Nathan Bossart also
stated support for that design.
I did not notice this one. Sounds pretty much right.
If WALRCV_WAITING or WALRCV_STOPPING can happen at that patch's code site, I
perhaps should back-patch the change to released versions. Does anyone know
whether one or both can happen?
Hmm. I agree that this could be a good thing in the long-term.
The startup process maintains the invariant that archive recovery and
streaming recovery don't overlap in time; one stops before the other starts.
As long as it achieves that, I'm not aware of a cause to care whether the flag
reset happens when streaming ends vs. when archive recovery begins. If the
startup process develops a defect such that it ceases to maintain that
invariant, "when archive recovery begins" does have the
advantage-or-disadvantage of causing a failure in non-assert builds. The
other way gets only an assertion failure. Michael Paquier or Bharath
Rupireddy, did you envision benefits other than that one?
I think that you are forgetting one case here though: a standby where
standby.signal is set without restore_command and with
primary_conninfo. It could move on with recovery even if the WAL
receiver is unstable when something external pushes more WAL segments
to the standby's pg_wal/. So my point of upthread is that we should
make sure that standby.signal+primary_conninfo resets the flag state
when the WAL receiver stops streaming in this case as well.
The proposed v1-0001-Do-not-skip-calling-XLogShutdownWalRcv.patch
would achieve that, because it does not change the state of
InstallXLogFileSegmentActive depending on the source being
XLOG_FROM_ARCHIVE. So I'm fine with what you want to do.
--
Michael
On Tue, Sep 13, 2022 at 8:52 AM Noah Misch <noah@leadboat.com> wrote:
[1] - /messages/by-id/20220909.172949.2223165886970819060.horikyota.ntt@gmail.com
I plan to use that message's patch, because it guarantees WALRCV_STOPPED at
the code location being changed. Today, in the unlikely event of
!WalRcvStreaming() due to WALRCV_WAITING or WALRCV_STOPPING, that code
proceeds without waiting for WALRCV_STOPPED.
Hm. That was the original fix [2]/messages/by-id/CALj2ACUoBWbaFo_t0gew+x6n0V+mpvB_23HLvsVD9abgCShV5A@mail.gmail.com proposed and it works. The concern
is that XLogShutdownWalRcv() does a bunch of work via ShutdownWalRcv()
- it calls ConditionVariablePrepareToSleep(),
ConditionVariableCancelSleep() (has lock 2 acquisitions and
requisitions) and 1 function call WalRcvRunning()) even for
WALRCV_STOPPED case, all this is unnecessary IMO when we determine the
walreceiver is state is already WALRCV_STOPPED. I think we can do
something like [3]diff --git a/src/backend/replication/walreceiverfuncs.c b/src/backend/replication/walreceiverfuncs.c index 90798b9d53..3487793c7a 100644 --- a/src/backend/replication/walreceiverfuncs.c +++ b/src/backend/replication/walreceiverfuncs.c @@ -181,6 +181,7 @@ ShutdownWalRcv(void) WalRcvData *walrcv = WalRcv; pid_t walrcvpid = 0; bool stopped = false; + bool was_stopped = false; to allay this concern and go with the fix proposed
at [1]/messages/by-id/20220909.172949.2223165886970819060.horikyota.ntt@gmail.com unconditionally calling XLogShutdownWalRcv().
If WALRCV_WAITING or WALRCV_STOPPING can happen at that patch's code site, I
perhaps should back-patch the change to released versions. Does anyone know
whether one or both can happen?
IMO, we must back-patch to the version where
cc2c7d65fc27e877c9f407587b0b92d46cd6dd16 got introduced irrespective
of any of the above happening.
Thoughts?
[1] - https://www.postgresql.org/message-id/flat/20220909.172949.2223165886970819060.horikyota.ntt%40gmail.com
[2] - https://www.postgresql.org/message-id/flat/CALj2ACUoBWbaFo_t0gew%2Bx6n0V%2BmpvB_23HLvsVD9abgCShV5A%40mail.gmail.com#e762ee94cbbe32a0b8c72c60793626b3
[3] -diff --git a/src/backend/replication/walreceiverfuncs.c
b/src/backend/replication/walreceiverfuncs.c
index 90798b9d53..3487793c7a 100644
--- a/src/backend/replication/walreceiverfuncs.c
+++ b/src/backend/replication/walreceiverfuncs.c
@@ -181,6 +181,7 @@ ShutdownWalRcv(void)
WalRcvData *walrcv = WalRcv;
pid_t walrcvpid = 0;
bool stopped = false;
+ bool was_stopped = false;
/*
* Request walreceiver to stop. Walreceiver will switch to
WALRCV_STOPPED
@@ -191,6 +192,7 @@ ShutdownWalRcv(void)
switch (walrcv->walRcvState)
{
case WALRCV_STOPPED:
+ was_stopped = true;
break;
case WALRCV_STARTING:
walrcv->walRcvState = WALRCV_STOPPED;
@@ -208,6 +210,10 @@ ShutdownWalRcv(void)
}
SpinLockRelease(&walrcv->mutex);
+ /* Quick exit if walreceiver was already stopped. */
+ if (was_stopped)
+ return;
+
/* Unnecessary but consistent. */
if (stopped)
ConditionVariableBroadcast(&walrcv->walRcvStoppedCV);
--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
At Tue, 13 Sep 2022 11:56:16 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in
On Tue, Sep 13, 2022 at 8:52 AM Noah Misch <noah@leadboat.com> wrote:
[1] - /messages/by-id/20220909.172949.2223165886970819060.horikyota.ntt@gmail.com
I plan to use that message's patch, because it guarantees WALRCV_STOPPED at
the code location being changed. Today, in the unlikely event of
!WalRcvStreaming() due to WALRCV_WAITING or WALRCV_STOPPING, that code
proceeds without waiting for WALRCV_STOPPED.Hm. That was the original fix [2] proposed and it works. The concern
(Mmm. sorry for omitting that.)
is that XLogShutdownWalRcv() does a bunch of work via ShutdownWalRcv()
- it calls ConditionVariablePrepareToSleep(),
Anyway the code path is executed in almost all cases because the same
assertion fires otherwise. So I don't see a problem if we do the bunch
of synchronization things also in that rare case. I'm not sure we
want to do [3].
If WALRCV_WAITING or WALRCV_STOPPING can happen at that patch's code site, I
perhaps should back-patch the change to released versions. Does anyone know
whether one or both can happen?IMO, we must back-patch to the version where
cc2c7d65fc27e877c9f407587b0b92d46cd6dd16 got introduced irrespective
of any of the above happening.
That is, PG15? I agree to that.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
On Tue, Sep 13, 2022 at 3:56 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
is that XLogShutdownWalRcv() does a bunch of work via ShutdownWalRcv()
- it calls ConditionVariablePrepareToSleep(),Anyway the code path is executed in almost all cases because the same
assertion fires otherwise. So I don't see a problem if we do the bunch
of synchronization things also in that rare case. I'm not sure we
want to do [3].
IMO, we don't need to let ShutdownWalRcv() to do extra work of
ConditionVariablePrepareToSleep() - WalRcvRunning() -
ConditionVariableCancelSleep() for WALRCV_STOPPED cases when we know
the walreceiver status before - even though it doesn't have any
problems per se and at that place in the code the WALRCV_STOPPED cases
are more frequent than any other walreceiver cases.
Having said that, let's also hear from other hackers.
--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
On Tue, Sep 13, 2022 at 11:56:16AM +0530, Bharath Rupireddy wrote:
On Tue, Sep 13, 2022 at 8:52 AM Noah Misch <noah@leadboat.com> wrote:
[1] - /messages/by-id/20220909.172949.2223165886970819060.horikyota.ntt@gmail.com
I plan to use that message's patch, because it guarantees WALRCV_STOPPED at
the code location being changed. Today, in the unlikely event of
!WalRcvStreaming() due to WALRCV_WAITING or WALRCV_STOPPING, that code
proceeds without waiting for WALRCV_STOPPED.
Pushed that way.
Hm. That was the original fix [2] proposed and it works. The concern
is that XLogShutdownWalRcv() does a bunch of work via ShutdownWalRcv()
- it calls ConditionVariablePrepareToSleep(),
ConditionVariableCancelSleep() (has lock 2 acquisitions and
requisitions) and 1 function call WalRcvRunning()) even for
WALRCV_STOPPED case, all this is unnecessary IMO when we determine the
walreceiver is state is already WALRCV_STOPPED.
That's fine. If we're reaching this code at high frequency, that implies
we're also forking walreceiver processes at high frequency. This code would
be a trivial part of the overall cost.
If WALRCV_WAITING or WALRCV_STOPPING can happen at that patch's code site, I
perhaps should back-patch the change to released versions. Does anyone know
whether one or both can happen?
If anyone discovers such cases later, we can extend the back-patch then.
IMO, we must back-patch to the version where
cc2c7d65fc27e877c9f407587b0b92d46cd6dd16 got introduced irrespective
of any of the above happening.
Correct. The sentences were about *released* versions, not v15.
On Thu, Sep 15, 2022 at 06:58:43AM -0700, Noah Misch wrote:
Pushed that way.
Thanks for taking care of it, Noah.
--
Michael
At Fri, 16 Sep 2022 09:05:46 +0900, Michael Paquier <michael@paquier.xyz> wrote in
On Thu, Sep 15, 2022 at 06:58:43AM -0700, Noah Misch wrote:
Pushed that way.
Thanks for taking care of it, Noah.
+1. Thanks.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center