pgsql: Implement pg_wal_replay_wait() stored procedure

Started by Alexander Korotkovover 1 year ago45 messages
#1Alexander Korotkov
akorotkov@postgresql.org

Implement pg_wal_replay_wait() stored procedure

pg_wal_replay_wait() is to be used on standby and specifies waiting for
the specific WAL location to be replayed. This option is useful when
the user makes some data changes on primary and needs a guarantee to see
these changes are on standby.

The queue of waiters is stored in the shared memory as an LSN-ordered pairing
heap, where the waiter with the nearest LSN stays on the top. During
the replay of WAL, waiters whose LSNs have already been replayed are deleted
from the shared memory pairing heap and woken up by setting their latches.

pg_wal_replay_wait() needs to wait without any snapshot held. Otherwise,
the snapshot could prevent the replay of WAL records, implying a kind of
self-deadlock. This is why it is only possible to implement
pg_wal_replay_wait() as a procedure working without an active snapshot,
not a function.

Catversion is bumped.

Discussion: /messages/by-id/eb12f9b03851bb2583adab5df9579b4b@postgrespro.ru
Author: Kartyshov Ivan, Alexander Korotkov
Reviewed-by: Michael Paquier, Peter Eisentraut, Dilip Kumar, Amit Kapila
Reviewed-by: Alexander Lakhin, Bharath Rupireddy, Euler Taveira
Reviewed-by: Heikki Linnakangas, Kyotaro Horiguchi

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/3c5db1d6b01642bcd8dbf5e34b68f034365747bb

Modified Files
--------------
doc/src/sgml/func.sgml | 117 ++++++++
src/backend/access/transam/xact.c | 6 +
src/backend/access/transam/xlog.c | 7 +
src/backend/access/transam/xlogrecovery.c | 11 +
src/backend/catalog/system_functions.sql | 3 +
src/backend/commands/Makefile | 3 +-
src/backend/commands/meson.build | 1 +
src/backend/commands/waitlsn.c | 363 ++++++++++++++++++++++++
src/backend/lib/pairingheap.c | 18 +-
src/backend/storage/ipc/ipci.c | 3 +
src/backend/storage/lmgr/proc.c | 6 +
src/backend/tcop/pquery.c | 9 +-
src/backend/utils/activity/wait_event_names.txt | 2 +
src/include/catalog/catversion.h | 2 +-
src/include/catalog/pg_proc.dat | 6 +
src/include/commands/waitlsn.h | 80 ++++++
src/include/lib/pairingheap.h | 3 +
src/include/storage/lwlocklist.h | 1 +
src/test/recovery/meson.build | 1 +
src/test/recovery/t/043_wal_replay_wait.pl | 150 ++++++++++
src/tools/pgindent/typedefs.list | 2 +
21 files changed, 786 insertions(+), 8 deletions(-)

#2Peter Eisentraut
peter@eisentraut.org
In reply to: Alexander Korotkov (#1)
Re: pgsql: Implement pg_wal_replay_wait() stored procedure

On 02.08.24 20:22, Alexander Korotkov wrote:

Implement pg_wal_replay_wait() stored procedure

Why is this under src/backend/command/? Wouldn't it belong under
src/backend/utils/adt/?

Show quoted text

pg_wal_replay_wait() is to be used on standby and specifies waiting for
the specific WAL location to be replayed. This option is useful when
the user makes some data changes on primary and needs a guarantee to see
these changes are on standby.

The queue of waiters is stored in the shared memory as an LSN-ordered pairing
heap, where the waiter with the nearest LSN stays on the top. During
the replay of WAL, waiters whose LSNs have already been replayed are deleted
from the shared memory pairing heap and woken up by setting their latches.

pg_wal_replay_wait() needs to wait without any snapshot held. Otherwise,
the snapshot could prevent the replay of WAL records, implying a kind of
self-deadlock. This is why it is only possible to implement
pg_wal_replay_wait() as a procedure working without an active snapshot,
not a function.

Catversion is bumped.

Discussion: /messages/by-id/eb12f9b03851bb2583adab5df9579b4b@postgrespro.ru
Author: Kartyshov Ivan, Alexander Korotkov
Reviewed-by: Michael Paquier, Peter Eisentraut, Dilip Kumar, Amit Kapila
Reviewed-by: Alexander Lakhin, Bharath Rupireddy, Euler Taveira
Reviewed-by: Heikki Linnakangas, Kyotaro Horiguchi

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/3c5db1d6b01642bcd8dbf5e34b68f034365747bb

Modified Files
--------------
doc/src/sgml/func.sgml | 117 ++++++++
src/backend/access/transam/xact.c | 6 +
src/backend/access/transam/xlog.c | 7 +
src/backend/access/transam/xlogrecovery.c | 11 +
src/backend/catalog/system_functions.sql | 3 +
src/backend/commands/Makefile | 3 +-
src/backend/commands/meson.build | 1 +
src/backend/commands/waitlsn.c | 363 ++++++++++++++++++++++++
src/backend/lib/pairingheap.c | 18 +-
src/backend/storage/ipc/ipci.c | 3 +
src/backend/storage/lmgr/proc.c | 6 +
src/backend/tcop/pquery.c | 9 +-
src/backend/utils/activity/wait_event_names.txt | 2 +
src/include/catalog/catversion.h | 2 +-
src/include/catalog/pg_proc.dat | 6 +
src/include/commands/waitlsn.h | 80 ++++++
src/include/lib/pairingheap.h | 3 +
src/include/storage/lwlocklist.h | 1 +
src/test/recovery/meson.build | 1 +
src/test/recovery/t/043_wal_replay_wait.pl | 150 ++++++++++
src/tools/pgindent/typedefs.list | 2 +
21 files changed, 786 insertions(+), 8 deletions(-)

#3Alexander Korotkov
aekorotkov@gmail.com
In reply to: Peter Eisentraut (#2)
Re: pgsql: Implement pg_wal_replay_wait() stored procedure

On Fri, Aug 30, 2024 at 10:42 PM Peter Eisentraut <peter@eisentraut.org> wrote:

On 02.08.24 20:22, Alexander Korotkov wrote:

Implement pg_wal_replay_wait() stored procedure

Why is this under src/backend/command/? Wouldn't it belong under
src/backend/utils/adt/?

This path hasn't changes since the patch revision when it was a
utility command. I agree that this doesn't look like proper path for
stored procedure. But I don't think src/backend/utils/adt is
appropriate path either, because it's not really about data type.
pg_wal_replay_wait() looks a good neighbor for
pg_wal_replay_pause()/pg_wal_replay_resume() and other WAL-related
functions. So, what about moving it to src/backend/access/transam?

------
Regards,
Alexander Korotkov
Supabase

#4Michael Paquier
michael@paquier.xyz
In reply to: Alexander Korotkov (#3)
Re: pgsql: Implement pg_wal_replay_wait() stored procedure

On Sun, Sep 01, 2024 at 10:35:27PM +0300, Alexander Korotkov wrote:

This path hasn't changes since the patch revision when it was a
utility command. I agree that this doesn't look like proper path for
stored procedure. But I don't think src/backend/utils/adt is
appropriate path either, because it's not really about data type.
pg_wal_replay_wait() looks a good neighbor for
pg_wal_replay_pause()/pg_wal_replay_resume() and other WAL-related
functions. So, what about moving it to src/backend/access/transam?

Moving the new function to xlogfuncs.c while publishing
WaitForLSNReplay() makes sense to me.
--
Michael

#5Alexander Korotkov
aekorotkov@gmail.com
In reply to: Michael Paquier (#4)
1 attachment(s)
Re: pgsql: Implement pg_wal_replay_wait() stored procedure

On Mon, Sep 2, 2024 at 2:28 AM Michael Paquier <michael@paquier.xyz> wrote:

On Sun, Sep 01, 2024 at 10:35:27PM +0300, Alexander Korotkov wrote:

This path hasn't changes since the patch revision when it was a
utility command. I agree that this doesn't look like proper path for
stored procedure. But I don't think src/backend/utils/adt is
appropriate path either, because it's not really about data type.
pg_wal_replay_wait() looks a good neighbor for
pg_wal_replay_pause()/pg_wal_replay_resume() and other WAL-related
functions. So, what about moving it to src/backend/access/transam?

Moving the new function to xlogfuncs.c while publishing
WaitForLSNReplay() makes sense to me.

Thank you for proposal. I like this.

Could you, please, check the attached patch?

------
Regards,
Alexander Korotkov
Supabase

Attachments:

v1-0001-Move-pg_wal_replay_wait-to-xlogfuncs.c.patchapplication/octet-stream; name=v1-0001-Move-pg_wal_replay_wait-to-xlogfuncs.c.patchDownload
From fb138e8c156bf51232bf95452205b2e40df0f77c Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <akorotkov@postgresql.org>
Date: Mon, 2 Sep 2024 02:49:19 +0300
Subject: [PATCH v1] Move pg_wal_replay_wait() to xlogfuncs.c

This commit moves pg_wal_replay_wait() procedure to be a neighbor of
WAL-related functions in xlogfuncs.c.  The implementation of LSN waiting
continues to reside in the same place.

By proposal from Michael Paquier.

Reported-by: Peter Eisentraut
Discussion: https://postgr.es/m/18c0fa64-0475-415e-a1bd-665d922c5201%40eisentraut.org
---
 src/backend/access/transam/xlogfuncs.c | 54 ++++++++++++++++++++++++++
 src/backend/commands/waitlsn.c         | 52 +------------------------
 src/include/commands/waitlsn.h         |  1 +
 3 files changed, 56 insertions(+), 51 deletions(-)

diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index 4e46baaebdf..a5a2f445a26 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -23,6 +23,7 @@
 #include "access/xlogbackup.h"
 #include "access/xlogrecovery.h"
 #include "catalog/pg_type.h"
+#include "commands/waitlsn.h"
 #include "funcapi.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -747,3 +748,56 @@ pg_promote(PG_FUNCTION_ARGS)
 						   wait_seconds)));
 	PG_RETURN_BOOL(false);
 }
+
+/*
+ * Waits until recovery replays the target LSN with optional timeout.
+ */
+Datum
+pg_wal_replay_wait(PG_FUNCTION_ARGS)
+{
+	XLogRecPtr	target_lsn = PG_GETARG_LSN(0);
+	int64		timeout = PG_GETARG_INT64(1);
+
+	if (timeout < 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+				 errmsg("\"timeout\" must not be negative")));
+
+	/*
+	 * We are going to wait for the LSN replay.  We should first care that we
+	 * don't hold a snapshot and correspondingly our MyProc->xmin is invalid.
+	 * Otherwise, our snapshot could prevent the replay of WAL records
+	 * implying a kind of self-deadlock.  This is the reason why
+	 * pg_wal_replay_wait() is a procedure, not a function.
+	 *
+	 * At first, we should check there is no active snapshot.  According to
+	 * PlannedStmtRequiresSnapshot(), even in an atomic context, CallStmt is
+	 * processed with a snapshot.  Thankfully, we can pop this snapshot,
+	 * because PortalRunUtility() can tolerate this.
+	 */
+	if (ActiveSnapshotSet())
+		PopActiveSnapshot();
+
+	/*
+	 * At second, invalidate a catalog snapshot if any.  And we should be done
+	 * with the preparation.
+	 */
+	InvalidateCatalogSnapshot();
+
+	/* Give up if there is still an active or registered sanpshot. */
+	if (GetOldestSnapshot())
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("pg_wal_replay_wait() must be only called without an active or registered snapshot"),
+				 errdetail("Make sure pg_wal_replay_wait() isn't called within a transaction with an isolation level higher than READ COMMITTED, another procedure, or a function.")));
+
+	/*
+	 * As the result we should hold no snapshot, and correspondingly our xmin
+	 * should be unset.
+	 */
+	Assert(MyProc->xmin == InvalidTransactionId);
+
+	(void) WaitForLSNReplay(target_lsn, timeout);
+
+	PG_RETURN_VOID();
+}
diff --git a/src/backend/commands/waitlsn.c b/src/backend/commands/waitlsn.c
index d9cf9e7d75e..5d273e66eb1 100644
--- a/src/backend/commands/waitlsn.c
+++ b/src/backend/commands/waitlsn.c
@@ -217,7 +217,7 @@ WaitLSNCleanup(void)
  * Wait using MyLatch till the given LSN is replayed, the postmaster dies or
  * timeout happens.
  */
-static void
+void
 WaitForLSNReplay(XLogRecPtr targetLSN, int64 timeout)
 {
 	XLogRecPtr	currentLSN;
@@ -336,53 +336,3 @@ WaitForLSNReplay(XLogRecPtr targetLSN, int64 timeout)
 						LSN_FORMAT_ARGS(currentLSN))));
 	}
 }
-
-Datum
-pg_wal_replay_wait(PG_FUNCTION_ARGS)
-{
-	XLogRecPtr	target_lsn = PG_GETARG_LSN(0);
-	int64		timeout = PG_GETARG_INT64(1);
-
-	if (timeout < 0)
-		ereport(ERROR,
-				(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
-				 errmsg("\"timeout\" must not be negative")));
-
-	/*
-	 * We are going to wait for the LSN replay.  We should first care that we
-	 * don't hold a snapshot and correspondingly our MyProc->xmin is invalid.
-	 * Otherwise, our snapshot could prevent the replay of WAL records
-	 * implying a kind of self-deadlock.  This is the reason why
-	 * pg_wal_replay_wait() is a procedure, not a function.
-	 *
-	 * At first, we should check there is no active snapshot.  According to
-	 * PlannedStmtRequiresSnapshot(), even in an atomic context, CallStmt is
-	 * processed with a snapshot.  Thankfully, we can pop this snapshot,
-	 * because PortalRunUtility() can tolerate this.
-	 */
-	if (ActiveSnapshotSet())
-		PopActiveSnapshot();
-
-	/*
-	 * At second, invalidate a catalog snapshot if any.  And we should be done
-	 * with the preparation.
-	 */
-	InvalidateCatalogSnapshot();
-
-	/* Give up if there is still an active or registered sanpshot. */
-	if (GetOldestSnapshot())
-		ereport(ERROR,
-				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
-				 errmsg("pg_wal_replay_wait() must be only called without an active or registered snapshot"),
-				 errdetail("Make sure pg_wal_replay_wait() isn't called within a transaction with an isolation level higher than READ COMMITTED, another procedure, or a function.")));
-
-	/*
-	 * As the result we should hold no snapshot, and correspondingly our xmin
-	 * should be unset.
-	 */
-	Assert(MyProc->xmin == InvalidTransactionId);
-
-	(void) WaitForLSNReplay(target_lsn, timeout);
-
-	PG_RETURN_VOID();
-}
diff --git a/src/include/commands/waitlsn.h b/src/include/commands/waitlsn.h
index f719feadb05..bb5ac858dcc 100644
--- a/src/include/commands/waitlsn.h
+++ b/src/include/commands/waitlsn.h
@@ -76,5 +76,6 @@ extern Size WaitLSNShmemSize(void);
 extern void WaitLSNShmemInit(void);
 extern void WaitLSNSetLatches(XLogRecPtr currentLSN);
 extern void WaitLSNCleanup(void);
+extern void WaitForLSNReplay(XLogRecPtr targetLSN, int64 timeout);
 
 #endif							/* WAIT_LSN_H */
-- 
2.39.3 (Apple Git-146)

#6Michael Paquier
michael@paquier.xyz
In reply to: Alexander Korotkov (#5)
Re: pgsql: Implement pg_wal_replay_wait() stored procedure

On Mon, Sep 02, 2024 at 02:55:50AM +0300, Alexander Korotkov wrote:

Could you, please, check the attached patch?

The patch moving the code looks correct at quick glance.

Now, I've been staring at this line, wondering why this is required
while WaitForLSNReplay() does not return any status:
+ (void) WaitForLSNReplay(target_lsn, timeout);

And this makes me question whether you have the right semantics
regarding the SQL function itself. Could it be more useful for
WaitForLSNReplay() to report an enum state that tells you the reason
why a wait may not have happened with a text or a tuple returned by
the function? There are quite a few reasons why a wait may or may not
happen:
- Recovery has ended, target LSN has been reached.
- We're not in recovery anymore. Is an error the most useful thing
here actually?
- Timeout may have been reached, where an error is also generated.
ERRCODE_QUERY_CANCELED is not a correct error state.
- Perhaps more of these in the future?

My point is: this is a feature that can be used for monitoring as far
as I know, still it does not stick as a feature that could be most
useful for such applications. Wouldn't it be more useful for client
applications to deal with a state returned by the SQL function rather
than having to parse error strings to know what happened? What is
returned by pg_wal_replay_wal() reflects on the design of
WaitForLSNReplay() itself.
--
Michael

#7Alexander Korotkov
aekorotkov@gmail.com
In reply to: Michael Paquier (#6)
Re: pgsql: Implement pg_wal_replay_wait() stored procedure

Hi, Michael!

On Mon, Sep 2, 2024 at 3:25 AM Michael Paquier <michael@paquier.xyz> wrote:

On Mon, Sep 02, 2024 at 02:55:50AM +0300, Alexander Korotkov wrote:

Could you, please, check the attached patch?

The patch moving the code looks correct at quick glance.

Thank you for your feedback.

Now, I've been staring at this line, wondering why this is required
while WaitForLSNReplay() does not return any status:
+ (void) WaitForLSNReplay(target_lsn, timeout);

And this makes me question whether you have the right semantics
regarding the SQL function itself. Could it be more useful for
WaitForLSNReplay() to report an enum state that tells you the reason
why a wait may not have happened with a text or a tuple returned by
the function? There are quite a few reasons why a wait may or may not
happen:
- Recovery has ended, target LSN has been reached.
- We're not in recovery anymore. Is an error the most useful thing
here actually?
- Timeout may have been reached, where an error is also generated.
ERRCODE_QUERY_CANCELED is not a correct error state.
- Perhaps more of these in the future?

My point is: this is a feature that can be used for monitoring as far
as I know, still it does not stick as a feature that could be most
useful for such applications. Wouldn't it be more useful for client
applications to deal with a state returned by the SQL function rather
than having to parse error strings to know what happened? What is
returned by pg_wal_replay_wal() reflects on the design of
WaitForLSNReplay() itself.

By design, pg_wal_replay_wal() should be followed up with read-only
query to standby. The read-only query gets guarantee that it's
executed after the replay of LSN specified as pg_wal_replay_wal()
design. Returning the status from pg_wal_replay_wal() would require
additional cycle of its processing. But one of the main objectives of
this feature was reducing roundtrips during waiting for LSN.

On the other hand, I see that returning status could make sense for
certain use cases. I think I could write two patches to provide that.
1. Make WaitForLSNReplay() return status, and make pg_wal_replay_wal()
be responsible for throwing all the errors.
2. New procedure pg_wal_replay_wal_status() (or some better name?),
which returns status to the user instead of throwing errors.

If no objections, I will push the patch moving code then go ahead
writing the two patches above.

------
Regards,
Alexander Korotkov
Supabase

#8Alexander Korotkov
aekorotkov@gmail.com
In reply to: Alexander Korotkov (#7)
1 attachment(s)
Re: pgsql: Implement pg_wal_replay_wait() stored procedure

On Tue, Sep 3, 2024 at 4:07 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:

If no objections, I will push the patch moving code then go ahead
writing the two patches above.

The patch for code movement missed couple of includes. Revised
version is attached.

------
Regards,
Alexander Korotkov
Supabase

Attachments:

v2-0001-Move-pg_wal_replay_wait-to-xlogfuncs.c.patchapplication/octet-stream; name=v2-0001-Move-pg_wal_replay_wait-to-xlogfuncs.c.patchDownload
From 53dc5245b419f86db6cf4d33a6d2cfa0e919408a Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <akorotkov@postgresql.org>
Date: Tue, 3 Sep 2024 16:43:52 +0300
Subject: [PATCH v2] Move pg_wal_replay_wait() to xlogfuncs.c

This commit moves pg_wal_replay_wait() procedure to be a neighbor of
WAL-related functions in xlogfuncs.c.  The implementation of LSN waiting
continues to reside in the same place.

By proposal from Michael Paquier.

Reported-by: Peter Eisentraut
Discussion: https://postgr.es/m/18c0fa64-0475-415e-a1bd-665d922c5201%40eisentraut.org
---
 src/backend/access/transam/xlogfuncs.c | 56 ++++++++++++++++++++++++++
 src/backend/commands/waitlsn.c         | 52 +-----------------------
 src/include/commands/waitlsn.h         |  1 +
 3 files changed, 58 insertions(+), 51 deletions(-)

diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index 4e46baaebdf..f4f564519a4 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -23,15 +23,18 @@
 #include "access/xlogbackup.h"
 #include "access/xlogrecovery.h"
 #include "catalog/pg_type.h"
+#include "commands/waitlsn.h"
 #include "funcapi.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "replication/walreceiver.h"
 #include "storage/fd.h"
+#include "storage/proc.h"
 #include "storage/standby.h"
 #include "utils/builtins.h"
 #include "utils/memutils.h"
 #include "utils/pg_lsn.h"
+#include "utils/snapmgr.h"
 #include "utils/timestamp.h"
 
 /*
@@ -747,3 +750,56 @@ pg_promote(PG_FUNCTION_ARGS)
 						   wait_seconds)));
 	PG_RETURN_BOOL(false);
 }
+
+/*
+ * Waits until recovery replays the target LSN with optional timeout.
+ */
+Datum
+pg_wal_replay_wait(PG_FUNCTION_ARGS)
+{
+	XLogRecPtr	target_lsn = PG_GETARG_LSN(0);
+	int64		timeout = PG_GETARG_INT64(1);
+
+	if (timeout < 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+				 errmsg("\"timeout\" must not be negative")));
+
+	/*
+	 * We are going to wait for the LSN replay.  We should first care that we
+	 * don't hold a snapshot and correspondingly our MyProc->xmin is invalid.
+	 * Otherwise, our snapshot could prevent the replay of WAL records
+	 * implying a kind of self-deadlock.  This is the reason why
+	 * pg_wal_replay_wait() is a procedure, not a function.
+	 *
+	 * At first, we should check there is no active snapshot.  According to
+	 * PlannedStmtRequiresSnapshot(), even in an atomic context, CallStmt is
+	 * processed with a snapshot.  Thankfully, we can pop this snapshot,
+	 * because PortalRunUtility() can tolerate this.
+	 */
+	if (ActiveSnapshotSet())
+		PopActiveSnapshot();
+
+	/*
+	 * At second, invalidate a catalog snapshot if any.  And we should be done
+	 * with the preparation.
+	 */
+	InvalidateCatalogSnapshot();
+
+	/* Give up if there is still an active or registered sanpshot. */
+	if (GetOldestSnapshot())
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("pg_wal_replay_wait() must be only called without an active or registered snapshot"),
+				 errdetail("Make sure pg_wal_replay_wait() isn't called within a transaction with an isolation level higher than READ COMMITTED, another procedure, or a function.")));
+
+	/*
+	 * As the result we should hold no snapshot, and correspondingly our xmin
+	 * should be unset.
+	 */
+	Assert(MyProc->xmin == InvalidTransactionId);
+
+	(void) WaitForLSNReplay(target_lsn, timeout);
+
+	PG_RETURN_VOID();
+}
diff --git a/src/backend/commands/waitlsn.c b/src/backend/commands/waitlsn.c
index d9cf9e7d75e..5d273e66eb1 100644
--- a/src/backend/commands/waitlsn.c
+++ b/src/backend/commands/waitlsn.c
@@ -217,7 +217,7 @@ WaitLSNCleanup(void)
  * Wait using MyLatch till the given LSN is replayed, the postmaster dies or
  * timeout happens.
  */
-static void
+void
 WaitForLSNReplay(XLogRecPtr targetLSN, int64 timeout)
 {
 	XLogRecPtr	currentLSN;
@@ -336,53 +336,3 @@ WaitForLSNReplay(XLogRecPtr targetLSN, int64 timeout)
 						LSN_FORMAT_ARGS(currentLSN))));
 	}
 }
-
-Datum
-pg_wal_replay_wait(PG_FUNCTION_ARGS)
-{
-	XLogRecPtr	target_lsn = PG_GETARG_LSN(0);
-	int64		timeout = PG_GETARG_INT64(1);
-
-	if (timeout < 0)
-		ereport(ERROR,
-				(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
-				 errmsg("\"timeout\" must not be negative")));
-
-	/*
-	 * We are going to wait for the LSN replay.  We should first care that we
-	 * don't hold a snapshot and correspondingly our MyProc->xmin is invalid.
-	 * Otherwise, our snapshot could prevent the replay of WAL records
-	 * implying a kind of self-deadlock.  This is the reason why
-	 * pg_wal_replay_wait() is a procedure, not a function.
-	 *
-	 * At first, we should check there is no active snapshot.  According to
-	 * PlannedStmtRequiresSnapshot(), even in an atomic context, CallStmt is
-	 * processed with a snapshot.  Thankfully, we can pop this snapshot,
-	 * because PortalRunUtility() can tolerate this.
-	 */
-	if (ActiveSnapshotSet())
-		PopActiveSnapshot();
-
-	/*
-	 * At second, invalidate a catalog snapshot if any.  And we should be done
-	 * with the preparation.
-	 */
-	InvalidateCatalogSnapshot();
-
-	/* Give up if there is still an active or registered sanpshot. */
-	if (GetOldestSnapshot())
-		ereport(ERROR,
-				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
-				 errmsg("pg_wal_replay_wait() must be only called without an active or registered snapshot"),
-				 errdetail("Make sure pg_wal_replay_wait() isn't called within a transaction with an isolation level higher than READ COMMITTED, another procedure, or a function.")));
-
-	/*
-	 * As the result we should hold no snapshot, and correspondingly our xmin
-	 * should be unset.
-	 */
-	Assert(MyProc->xmin == InvalidTransactionId);
-
-	(void) WaitForLSNReplay(target_lsn, timeout);
-
-	PG_RETURN_VOID();
-}
diff --git a/src/include/commands/waitlsn.h b/src/include/commands/waitlsn.h
index f719feadb05..bb5ac858dcc 100644
--- a/src/include/commands/waitlsn.h
+++ b/src/include/commands/waitlsn.h
@@ -76,5 +76,6 @@ extern Size WaitLSNShmemSize(void);
 extern void WaitLSNShmemInit(void);
 extern void WaitLSNSetLatches(XLogRecPtr currentLSN);
 extern void WaitLSNCleanup(void);
+extern void WaitForLSNReplay(XLogRecPtr targetLSN, int64 timeout);
 
 #endif							/* WAIT_LSN_H */
-- 
2.39.3 (Apple Git-146)

#9Alexander Korotkov
aekorotkov@gmail.com
In reply to: Alexander Korotkov (#7)
2 attachment(s)
Re: pgsql: Implement pg_wal_replay_wait() stored procedure

On Tue, Sep 3, 2024 at 4:07 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:

On the other hand, I see that returning status could make sense for
certain use cases. I think I could write two patches to provide that.
1. Make WaitForLSNReplay() return status, and make pg_wal_replay_wal()
be responsible for throwing all the errors.
2. New procedure pg_wal_replay_wal_status() (or some better name?),
which returns status to the user instead of throwing errors.

If no objections, I will push the patch moving code then go ahead
writing the two patches above.

I attempted to implement the patchset as promised. The 0001 is easy
and straighforward. The 0002 became tricky. Procedures can't return
values. They can have OUTPUT parameters instead. However, even for
output parameters you need to pass something in, and that couldn't be
a default value. Additional difficulty is that having OUTPUT
parameters seem to hold additional snapshot and prevents our
snapshot-releasing trick from working....

smagen@postgres=# CALL pg_wal_replay_wait('0/0'::pg_lsn);
CALL
Time: 2.061 ms
smagen@postgres=# CALL pg_wal_replay_wait_status(NULL, '0/0'::pg_lsn);
ERROR: pg_wal_replay_wait_status() must be only called without an
active or registered snapshot
DETAIL: Make sure pg_wal_replay_wait_status() isn't called within a
transaction with an isolation level higher than READ COMMITTED,
another procedure, or a function.
Time: 1.175 ms

I'm thinking about following solution:
1. pg_wal_replay_wait_no_error() procedure, which doesn't return
anything but throws no errors.
2. Make pg_wal_replay_wait()/pg_wal_replay_wait_no_error() save
waiting result status to the local variable.
3. New function pg_wal_replay_wal_get_status() displaying the result
status of the last pg_wal_replay_wait()/pg_wal_replay_wait_no_error()
CALL.

Then one could do.
CALL pg_wal_replay_wait_no_error(...);
SELECT pg_wal_replay_wal_get_status();

Probably looks a bit excessive, but probably the best we can do.

------
Regards,
Alexander Korotkov
Supabase

Attachments:

v1-0002-Attempt-to-implement-pg_wal_replay_wait_status.patchapplication/octet-stream; name=v1-0002-Attempt-to-implement-pg_wal_replay_wait_status.patchDownload
From 8fdc2932080a7ad7151d72639a7983f980876ee1 Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <akorotkov@postgresql.org>
Date: Thu, 19 Sep 2024 15:34:18 +0300
Subject: [PATCH v1 2/2] Attempt to implement pg_wal_replay_wait_status()

---
 src/backend/access/transam/xlogfuncs.c   | 90 ++++++++++++++++++++----
 src/backend/catalog/system_functions.sql |  5 ++
 src/include/catalog/pg_proc.dat          |  6 ++
 3 files changed, 87 insertions(+), 14 deletions(-)

diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index 74b493e437e..ecc47c045de 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -752,20 +752,11 @@ pg_promote(PG_FUNCTION_ARGS)
 }
 
 /*
- * Waits until recovery replays the target LSN with optional timeout.
+ * Prepare for waiting for LSN replay.
  */
-Datum
-pg_wal_replay_wait(PG_FUNCTION_ARGS)
+static void
+pg_wal_replay_wait_prepare(const char *funcname)
 {
-	XLogRecPtr	target_lsn = PG_GETARG_LSN(0);
-	int64		timeout = PG_GETARG_INT64(1);
-	WaitLSNResult result;
-
-	if (timeout < 0)
-		ereport(ERROR,
-				(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
-				 errmsg("\"timeout\" must not be negative")));
-
 	/*
 	 * We are going to wait for the LSN replay.  We should first care that we
 	 * don't hold a snapshot and correspondingly our MyProc->xmin is invalid.
@@ -791,14 +782,33 @@ pg_wal_replay_wait(PG_FUNCTION_ARGS)
 	if (GetOldestSnapshot())
 		ereport(ERROR,
 				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
-				 errmsg("pg_wal_replay_wait() must be only called without an active or registered snapshot"),
-				 errdetail("Make sure pg_wal_replay_wait() isn't called within a transaction with an isolation level higher than READ COMMITTED, another procedure, or a function.")));
+				 errmsg("%s must be only called without an active or registered snapshot", funcname),
+				 errdetail("Make sure %s isn't called within a transaction with an isolation level higher than READ COMMITTED, another procedure, or a function.", funcname)));
 
 	/*
 	 * As the result we should hold no snapshot, and correspondingly our xmin
 	 * should be unset.
 	 */
 	Assert(MyProc->xmin == InvalidTransactionId);
+}
+
+/*
+ * Waits until recovery replays the target LSN with optional timeout.  Throw
+ * an error on failure.
+ */
+Datum
+pg_wal_replay_wait(PG_FUNCTION_ARGS)
+{
+	XLogRecPtr	target_lsn = PG_GETARG_LSN(0);
+	int64		timeout = PG_GETARG_INT64(1);
+	WaitLSNResult result;
+
+	if (timeout < 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+				 errmsg("\"timeout\" must not be negative")));
+
+	pg_wal_replay_wait_prepare("pg_wal_replay_wait()");
 
 	result = WaitForLSNReplay(target_lsn, timeout);
 
@@ -839,3 +849,55 @@ pg_wal_replay_wait(PG_FUNCTION_ARGS)
 
 	PG_RETURN_VOID();
 }
+
+/*
+ * Waits until recovery replays the target LSN with optional timeout.  Return
+ * the waiting result as a text.
+ */
+Datum
+pg_wal_replay_wait_status(PG_FUNCTION_ARGS)
+{
+	XLogRecPtr	target_lsn = PG_GETARG_LSN(0);
+	int64		timeout = PG_GETARG_INT64(1);
+	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+	WaitLSNResult result;
+	const char *result_string;
+	Datum		values[1];
+	bool		nulls[1];
+
+	if (timeout < 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+				 errmsg("\"timeout\" must not be negative")));
+
+	pg_wal_replay_wait_prepare("pg_wal_replay_wait_status()");
+
+	result = WaitForLSNReplay(target_lsn, timeout);
+
+	/*
+	 * Process the result of WaitForLSNReplay().  Throw appropriate error if
+	 * needed.
+	 */
+	switch (result)
+	{
+		case WaitLSNResultSuccess:
+			result_string = "success";
+			break;
+
+		case WaitLSNResultTimeout:
+			result_string = "timeout";
+			break;
+
+		case WaitLSNResultNotInRecovery:
+			result_string = "not in recovery";
+			break;
+
+		case WaitLSNResultPromotedConcurrently:
+			result_string = "promoted concurrently";
+			break;
+	}
+
+	tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc,
+						 values, nulls);
+	return (Datum) 0;
+}
diff --git a/src/backend/catalog/system_functions.sql b/src/backend/catalog/system_functions.sql
index b0d0de051e7..ed092b748ef 100644
--- a/src/backend/catalog/system_functions.sql
+++ b/src/backend/catalog/system_functions.sql
@@ -417,6 +417,11 @@ CREATE OR REPLACE FUNCTION
 CREATE OR REPLACE PROCEDURE pg_wal_replay_wait(target_lsn pg_lsn, timeout int8 DEFAULT 0)
   LANGUAGE internal AS 'pg_wal_replay_wait';
 
+CREATE OR REPLACE PROCEDURE pg_wal_replay_wait_status(OUT status text,
+													  target_lsn pg_lsn,
+													  timeout int8 DEFAULT 0)
+  LANGUAGE internal AS 'pg_wal_replay_wait_status';
+
 CREATE OR REPLACE FUNCTION pg_logical_slot_get_changes(
     IN slot_name name, IN upto_lsn pg_lsn, IN upto_nchanges int, VARIADIC options text[] DEFAULT '{}',
     OUT lsn pg_lsn, OUT xid xid, OUT data text)
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 43f608d7a0a..22e58d26db6 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -6647,6 +6647,12 @@
   proname => 'pg_wal_replay_wait', prokind => 'p', prorettype => 'void',
   proargtypes => 'pg_lsn int8', proargnames => '{target_lsn,timeout}',
   prosrc => 'pg_wal_replay_wait' },
+{ oid => '226',
+  descr => 'wait for the target LSN to be replayed on standby with an optional timeout and returning the waiting result',
+  proname => 'pg_wal_replay_wait_status', prokind => 'p', prorettype => 'void',
+  proargtypes => 'pg_lsn int8 text', proargmodes => '{i,i,o}',
+  proargnames => '{target_lsn,timeout,status}',
+  prosrc => 'pg_wal_replay_wait_status' },
 
 { oid => '6224', descr => 'get resource managers loaded in system',
   proname => 'pg_get_wal_resource_managers', prorows => '50', proretset => 't',
-- 
2.39.5 (Apple Git-154)

v1-0001-Refactor-WaitForLSNReplay-to-return-the-result-of.patchapplication/octet-stream; name=v1-0001-Refactor-WaitForLSNReplay-to-return-the-result-of.patchDownload
From e6f05633e258e214db47c82ea853c1dcf0307153 Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <akorotkov@postgresql.org>
Date: Thu, 19 Sep 2024 14:48:44 +0300
Subject: [PATCH v1 1/2] Refactor WaitForLSNReplay() to return the result of
 waiting

Currently WaitForLSNReplay() immediatly throws an error if waiting for LSN
replay is not successful.  This commit teaches  WaitForLSNReplay() to return
the result of waiting, while making pg_wal_replay_wait() responsible for
throwing an appropriate error.

This is preparation for new procedure pg_wal_replay_wait_status(), which will
expose the waiting result to the user instead of throwing an error.
---
 src/backend/access/transam/xlogfuncs.c | 38 +++++++++++++++++++++++++-
 src/backend/commands/waitlsn.c         | 30 ++++++--------------
 src/include/commands/waitlsn.h         | 13 ++++++++-
 3 files changed, 58 insertions(+), 23 deletions(-)

diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index 3e3d2bb6189..74b493e437e 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -759,6 +759,7 @@ pg_wal_replay_wait(PG_FUNCTION_ARGS)
 {
 	XLogRecPtr	target_lsn = PG_GETARG_LSN(0);
 	int64		timeout = PG_GETARG_INT64(1);
+	WaitLSNResult result;
 
 	if (timeout < 0)
 		ereport(ERROR,
@@ -799,7 +800,42 @@ pg_wal_replay_wait(PG_FUNCTION_ARGS)
 	 */
 	Assert(MyProc->xmin == InvalidTransactionId);
 
-	(void) WaitForLSNReplay(target_lsn, timeout);
+	result = WaitForLSNReplay(target_lsn, timeout);
+
+	/*
+	 * Process the result of WaitForLSNReplay().  Throw appropriate error if
+	 * needed.
+	 */
+	switch (result)
+	{
+		case WaitLSNResultSuccess:
+			/* Nothing to do on success */
+			break;
+
+		case WaitLSNResultTimeout:
+			ereport(ERROR,
+					(errcode(ERRCODE_QUERY_CANCELED),
+					 errmsg("timed out while waiting for target LSN %X/%X to be replayed; current replay LSN %X/%X",
+							LSN_FORMAT_ARGS(target_lsn),
+							LSN_FORMAT_ARGS(GetXLogReplayRecPtr(NULL)))));
+			break;
+
+		case WaitLSNResultNotInRecovery:
+			ereport(ERROR,
+					(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+					 errmsg("recovery is not in progress"),
+					 errhint("Waiting for LSN can only be executed during recovery.")));
+			break;
+
+		case WaitLSNResultPromotedConcurrently:
+			ereport(ERROR,
+					(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+					 errmsg("recovery is not in progress"),
+					 errdetail("Recovery ended before replaying target LSN %X/%X; last replay LSN %X/%X.",
+							   LSN_FORMAT_ARGS(target_lsn),
+							   LSN_FORMAT_ARGS(GetXLogReplayRecPtr(NULL)))));
+			break;
+	}
 
 	PG_RETURN_VOID();
 }
diff --git a/src/backend/commands/waitlsn.c b/src/backend/commands/waitlsn.c
index 501938f4330..aab6aa18648 100644
--- a/src/backend/commands/waitlsn.c
+++ b/src/backend/commands/waitlsn.c
@@ -217,7 +217,7 @@ WaitLSNCleanup(void)
  * Wait using MyLatch till the given LSN is replayed, the postmaster dies or
  * timeout happens.
  */
-void
+WaitLSNResult
 WaitForLSNReplay(XLogRecPtr targetLSN, int64 timeout)
 {
 	XLogRecPtr	currentLSN;
@@ -240,17 +240,14 @@ WaitForLSNReplay(XLogRecPtr targetLSN, int64 timeout)
 		 * check the last replay LSN before reporting an error.
 		 */
 		if (targetLSN <= GetXLogReplayRecPtr(NULL))
-			return;
-		ereport(ERROR,
-				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
-				 errmsg("recovery is not in progress"),
-				 errhint("Waiting for LSN can only be executed during recovery.")));
+			return WaitLSNResultSuccess;
+		return WaitLSNResultNotInRecovery;
 	}
 	else
 	{
 		/* If target LSN is already replayed, exit immediately */
 		if (targetLSN <= GetXLogReplayRecPtr(NULL))
-			return;
+			return WaitLSNResultSuccess;
 	}
 
 	if (timeout > 0)
@@ -280,13 +277,8 @@ WaitForLSNReplay(XLogRecPtr targetLSN, int64 timeout)
 			 */
 			currentLSN = GetXLogReplayRecPtr(NULL);
 			if (targetLSN <= currentLSN)
-				return;
-			ereport(ERROR,
-					(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
-					 errmsg("recovery is not in progress"),
-					 errdetail("Recovery ended before replaying target LSN %X/%X; last replay LSN %X/%X.",
-							   LSN_FORMAT_ARGS(targetLSN),
-							   LSN_FORMAT_ARGS(currentLSN))));
+				return WaitLSNResultSuccess;
+			return WaitLSNResultPromotedConcurrently;
 		}
 		else
 		{
@@ -328,11 +320,7 @@ WaitForLSNReplay(XLogRecPtr targetLSN, int64 timeout)
 	 * If we didn't reach the target LSN, we must be exited by timeout.
 	 */
 	if (targetLSN > currentLSN)
-	{
-		ereport(ERROR,
-				(errcode(ERRCODE_QUERY_CANCELED),
-				 errmsg("timed out while waiting for target LSN %X/%X to be replayed; current replay LSN %X/%X",
-						LSN_FORMAT_ARGS(targetLSN),
-						LSN_FORMAT_ARGS(currentLSN))));
-	}
+		return WaitLSNResultTimeout;
+
+	return WaitLSNResultSuccess;
 }
diff --git a/src/include/commands/waitlsn.h b/src/include/commands/waitlsn.h
index bb5ac858dcc..36a580674c0 100644
--- a/src/include/commands/waitlsn.h
+++ b/src/include/commands/waitlsn.h
@@ -70,12 +70,23 @@ typedef struct WaitLSNState
 	WaitLSNProcInfo procInfos[FLEXIBLE_ARRAY_MEMBER];
 } WaitLSNState;
 
+/*
+ * Results statuses for WaitForLSNReplay().
+ */
+typedef enum
+{
+	WaitLSNResultSuccess, /* Target LSN is reached */
+	WaitLSNResultTimeout, /* Timeout occured */
+	WaitLSNResultNotInRecovery, /* Recovery ended before we start waiting */
+	WaitLSNResultPromotedConcurrently, /* Standby got promoted during waiting and before target LSN is reached */
+} WaitLSNResult;
+
 extern PGDLLIMPORT WaitLSNState *waitLSNState;
 
 extern Size WaitLSNShmemSize(void);
 extern void WaitLSNShmemInit(void);
 extern void WaitLSNSetLatches(XLogRecPtr currentLSN);
 extern void WaitLSNCleanup(void);
-extern void WaitForLSNReplay(XLogRecPtr targetLSN, int64 timeout);
+extern WaitLSNResult WaitForLSNReplay(XLogRecPtr targetLSN, int64 timeout);
 
 #endif							/* WAIT_LSN_H */
-- 
2.39.5 (Apple Git-154)

#10Alexander Korotkov
aekorotkov@gmail.com
In reply to: Alexander Korotkov (#9)
2 attachment(s)
Re: pgsql: Implement pg_wal_replay_wait() stored procedure

On Thu, Sep 19, 2024 at 3:47 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:

On Tue, Sep 3, 2024 at 4:07 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:

On the other hand, I see that returning status could make sense for
certain use cases. I think I could write two patches to provide that.
1. Make WaitForLSNReplay() return status, and make pg_wal_replay_wal()
be responsible for throwing all the errors.
2. New procedure pg_wal_replay_wal_status() (or some better name?),
which returns status to the user instead of throwing errors.

If no objections, I will push the patch moving code then go ahead
writing the two patches above.

I attempted to implement the patchset as promised. The 0001 is easy
and straighforward. The 0002 became tricky. Procedures can't return
values. They can have OUTPUT parameters instead. However, even for
output parameters you need to pass something in, and that couldn't be
a default value. Additional difficulty is that having OUTPUT
parameters seem to hold additional snapshot and prevents our
snapshot-releasing trick from working....

smagen@postgres=# CALL pg_wal_replay_wait('0/0'::pg_lsn);
CALL
Time: 2.061 ms
smagen@postgres=# CALL pg_wal_replay_wait_status(NULL, '0/0'::pg_lsn);
ERROR: pg_wal_replay_wait_status() must be only called without an
active or registered snapshot
DETAIL: Make sure pg_wal_replay_wait_status() isn't called within a
transaction with an isolation level higher than READ COMMITTED,
another procedure, or a function.
Time: 1.175 ms

I'm thinking about following solution:
1. pg_wal_replay_wait_no_error() procedure, which doesn't return
anything but throws no errors.
2. Make pg_wal_replay_wait()/pg_wal_replay_wait_no_error() save
waiting result status to the local variable.
3. New function pg_wal_replay_wal_get_status() displaying the result
status of the last pg_wal_replay_wait()/pg_wal_replay_wait_no_error()
CALL.

Then one could do.
CALL pg_wal_replay_wait_no_error(...);
SELECT pg_wal_replay_wal_get_status();

Probably looks a bit excessive, but probably the best we can do.

Please, check the attached patchset for implementation of proposed approach.

------
Regards,
Alexander Korotkov
Supabase

Attachments:

v2-0001-Refactor-WaitForLSNReplay-to-return-the-result-of.patchapplication/octet-stream; name=v2-0001-Refactor-WaitForLSNReplay-to-return-the-result-of.patchDownload
From e6f05633e258e214db47c82ea853c1dcf0307153 Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <akorotkov@postgresql.org>
Date: Thu, 19 Sep 2024 14:48:44 +0300
Subject: [PATCH v2 1/2] Refactor WaitForLSNReplay() to return the result of
 waiting

Currently WaitForLSNReplay() immediatly throws an error if waiting for LSN
replay is not successful.  This commit teaches  WaitForLSNReplay() to return
the result of waiting, while making pg_wal_replay_wait() responsible for
throwing an appropriate error.

This is preparation for new procedure pg_wal_replay_wait_status(), which will
expose the waiting result to the user instead of throwing an error.
---
 src/backend/access/transam/xlogfuncs.c | 38 +++++++++++++++++++++++++-
 src/backend/commands/waitlsn.c         | 30 ++++++--------------
 src/include/commands/waitlsn.h         | 13 ++++++++-
 3 files changed, 58 insertions(+), 23 deletions(-)

diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index 3e3d2bb6189..74b493e437e 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -759,6 +759,7 @@ pg_wal_replay_wait(PG_FUNCTION_ARGS)
 {
 	XLogRecPtr	target_lsn = PG_GETARG_LSN(0);
 	int64		timeout = PG_GETARG_INT64(1);
+	WaitLSNResult result;
 
 	if (timeout < 0)
 		ereport(ERROR,
@@ -799,7 +800,42 @@ pg_wal_replay_wait(PG_FUNCTION_ARGS)
 	 */
 	Assert(MyProc->xmin == InvalidTransactionId);
 
-	(void) WaitForLSNReplay(target_lsn, timeout);
+	result = WaitForLSNReplay(target_lsn, timeout);
+
+	/*
+	 * Process the result of WaitForLSNReplay().  Throw appropriate error if
+	 * needed.
+	 */
+	switch (result)
+	{
+		case WaitLSNResultSuccess:
+			/* Nothing to do on success */
+			break;
+
+		case WaitLSNResultTimeout:
+			ereport(ERROR,
+					(errcode(ERRCODE_QUERY_CANCELED),
+					 errmsg("timed out while waiting for target LSN %X/%X to be replayed; current replay LSN %X/%X",
+							LSN_FORMAT_ARGS(target_lsn),
+							LSN_FORMAT_ARGS(GetXLogReplayRecPtr(NULL)))));
+			break;
+
+		case WaitLSNResultNotInRecovery:
+			ereport(ERROR,
+					(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+					 errmsg("recovery is not in progress"),
+					 errhint("Waiting for LSN can only be executed during recovery.")));
+			break;
+
+		case WaitLSNResultPromotedConcurrently:
+			ereport(ERROR,
+					(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+					 errmsg("recovery is not in progress"),
+					 errdetail("Recovery ended before replaying target LSN %X/%X; last replay LSN %X/%X.",
+							   LSN_FORMAT_ARGS(target_lsn),
+							   LSN_FORMAT_ARGS(GetXLogReplayRecPtr(NULL)))));
+			break;
+	}
 
 	PG_RETURN_VOID();
 }
diff --git a/src/backend/commands/waitlsn.c b/src/backend/commands/waitlsn.c
index 501938f4330..aab6aa18648 100644
--- a/src/backend/commands/waitlsn.c
+++ b/src/backend/commands/waitlsn.c
@@ -217,7 +217,7 @@ WaitLSNCleanup(void)
  * Wait using MyLatch till the given LSN is replayed, the postmaster dies or
  * timeout happens.
  */
-void
+WaitLSNResult
 WaitForLSNReplay(XLogRecPtr targetLSN, int64 timeout)
 {
 	XLogRecPtr	currentLSN;
@@ -240,17 +240,14 @@ WaitForLSNReplay(XLogRecPtr targetLSN, int64 timeout)
 		 * check the last replay LSN before reporting an error.
 		 */
 		if (targetLSN <= GetXLogReplayRecPtr(NULL))
-			return;
-		ereport(ERROR,
-				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
-				 errmsg("recovery is not in progress"),
-				 errhint("Waiting for LSN can only be executed during recovery.")));
+			return WaitLSNResultSuccess;
+		return WaitLSNResultNotInRecovery;
 	}
 	else
 	{
 		/* If target LSN is already replayed, exit immediately */
 		if (targetLSN <= GetXLogReplayRecPtr(NULL))
-			return;
+			return WaitLSNResultSuccess;
 	}
 
 	if (timeout > 0)
@@ -280,13 +277,8 @@ WaitForLSNReplay(XLogRecPtr targetLSN, int64 timeout)
 			 */
 			currentLSN = GetXLogReplayRecPtr(NULL);
 			if (targetLSN <= currentLSN)
-				return;
-			ereport(ERROR,
-					(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
-					 errmsg("recovery is not in progress"),
-					 errdetail("Recovery ended before replaying target LSN %X/%X; last replay LSN %X/%X.",
-							   LSN_FORMAT_ARGS(targetLSN),
-							   LSN_FORMAT_ARGS(currentLSN))));
+				return WaitLSNResultSuccess;
+			return WaitLSNResultPromotedConcurrently;
 		}
 		else
 		{
@@ -328,11 +320,7 @@ WaitForLSNReplay(XLogRecPtr targetLSN, int64 timeout)
 	 * If we didn't reach the target LSN, we must be exited by timeout.
 	 */
 	if (targetLSN > currentLSN)
-	{
-		ereport(ERROR,
-				(errcode(ERRCODE_QUERY_CANCELED),
-				 errmsg("timed out while waiting for target LSN %X/%X to be replayed; current replay LSN %X/%X",
-						LSN_FORMAT_ARGS(targetLSN),
-						LSN_FORMAT_ARGS(currentLSN))));
-	}
+		return WaitLSNResultTimeout;
+
+	return WaitLSNResultSuccess;
 }
diff --git a/src/include/commands/waitlsn.h b/src/include/commands/waitlsn.h
index bb5ac858dcc..36a580674c0 100644
--- a/src/include/commands/waitlsn.h
+++ b/src/include/commands/waitlsn.h
@@ -70,12 +70,23 @@ typedef struct WaitLSNState
 	WaitLSNProcInfo procInfos[FLEXIBLE_ARRAY_MEMBER];
 } WaitLSNState;
 
+/*
+ * Results statuses for WaitForLSNReplay().
+ */
+typedef enum
+{
+	WaitLSNResultSuccess, /* Target LSN is reached */
+	WaitLSNResultTimeout, /* Timeout occured */
+	WaitLSNResultNotInRecovery, /* Recovery ended before we start waiting */
+	WaitLSNResultPromotedConcurrently, /* Standby got promoted during waiting and before target LSN is reached */
+} WaitLSNResult;
+
 extern PGDLLIMPORT WaitLSNState *waitLSNState;
 
 extern Size WaitLSNShmemSize(void);
 extern void WaitLSNShmemInit(void);
 extern void WaitLSNSetLatches(XLogRecPtr currentLSN);
 extern void WaitLSNCleanup(void);
-extern void WaitForLSNReplay(XLogRecPtr targetLSN, int64 timeout);
+extern WaitLSNResult WaitForLSNReplay(XLogRecPtr targetLSN, int64 timeout);
 
 #endif							/* WAIT_LSN_H */
-- 
2.39.5 (Apple Git-154)

v2-0002-Implement-pg_wal_replay_wait_no_error.patchapplication/octet-stream; name=v2-0002-Implement-pg_wal_replay_wait_no_error.patchDownload
From af3d86506e5dd20b84c05c54f66d57fb480f1904 Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <akorotkov@postgresql.org>
Date: Thu, 19 Sep 2024 15:34:18 +0300
Subject: [PATCH v2 2/2] Implement pg_wal_replay_wait_no_error()

And function pg_wal_replay_wait_status() to get the last status.
---
 src/backend/access/transam/xlogfuncs.c     | 93 ++++++++++++++++++----
 src/backend/catalog/system_functions.sql   |  5 ++
 src/include/catalog/pg_proc.dat            | 11 +++
 src/test/recovery/t/043_wal_replay_wait.pl | 12 +++
 4 files changed, 105 insertions(+), 16 deletions(-)

diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index 74b493e437e..6602507f452 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -752,20 +752,11 @@ pg_promote(PG_FUNCTION_ARGS)
 }
 
 /*
- * Waits until recovery replays the target LSN with optional timeout.
+ * Prepare for waiting for LSN replay.
  */
-Datum
-pg_wal_replay_wait(PG_FUNCTION_ARGS)
+static void
+pg_wal_replay_wait_prepare(const char *funcname)
 {
-	XLogRecPtr	target_lsn = PG_GETARG_LSN(0);
-	int64		timeout = PG_GETARG_INT64(1);
-	WaitLSNResult result;
-
-	if (timeout < 0)
-		ereport(ERROR,
-				(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
-				 errmsg("\"timeout\" must not be negative")));
-
 	/*
 	 * We are going to wait for the LSN replay.  We should first care that we
 	 * don't hold a snapshot and correspondingly our MyProc->xmin is invalid.
@@ -791,22 +782,42 @@ pg_wal_replay_wait(PG_FUNCTION_ARGS)
 	if (GetOldestSnapshot())
 		ereport(ERROR,
 				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
-				 errmsg("pg_wal_replay_wait() must be only called without an active or registered snapshot"),
-				 errdetail("Make sure pg_wal_replay_wait() isn't called within a transaction with an isolation level higher than READ COMMITTED, another procedure, or a function.")));
+				 errmsg("%s must be only called without an active or registered snapshot", funcname),
+				 errdetail("Make sure %s isn't called within a transaction with an isolation level higher than READ COMMITTED, another procedure, or a function.", funcname)));
 
 	/*
 	 * As the result we should hold no snapshot, and correspondingly our xmin
 	 * should be unset.
 	 */
 	Assert(MyProc->xmin == InvalidTransactionId);
+}
 
-	result = WaitForLSNReplay(target_lsn, timeout);
+static WaitLSNResult lastWaitLSNResult = WaitLSNResultSuccess;
+
+/*
+ * Waits until recovery replays the target LSN with optional timeout.  Throw
+ * an error on failure.
+ */
+Datum
+pg_wal_replay_wait(PG_FUNCTION_ARGS)
+{
+	XLogRecPtr	target_lsn = PG_GETARG_LSN(0);
+	int64		timeout = PG_GETARG_INT64(1);
+
+	if (timeout < 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+				 errmsg("\"timeout\" must not be negative")));
+
+	pg_wal_replay_wait_prepare("pg_wal_replay_wait()");
+
+	lastWaitLSNResult = WaitForLSNReplay(target_lsn, timeout);
 
 	/*
 	 * Process the result of WaitForLSNReplay().  Throw appropriate error if
 	 * needed.
 	 */
-	switch (result)
+	switch (lastWaitLSNResult)
 	{
 		case WaitLSNResultSuccess:
 			/* Nothing to do on success */
@@ -839,3 +850,53 @@ pg_wal_replay_wait(PG_FUNCTION_ARGS)
 
 	PG_RETURN_VOID();
 }
+
+/*
+ * Waits until recovery replays the target LSN with optional timeout.  Return
+ * the waiting result as a text.
+ */
+Datum
+pg_wal_replay_wait_no_error(PG_FUNCTION_ARGS)
+{
+	XLogRecPtr	target_lsn = PG_GETARG_LSN(0);
+	int64		timeout = PG_GETARG_INT64(1);
+
+	if (timeout < 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+				 errmsg("\"timeout\" must not be negative")));
+
+	pg_wal_replay_wait_prepare("pg_wal_replay_wait_status()");
+
+	lastWaitLSNResult = WaitForLSNReplay(target_lsn, timeout);
+
+	return (Datum) 0;
+}
+
+Datum
+pg_wal_replay_wait_status(PG_FUNCTION_ARGS)
+{
+	const char *result_string = "";
+
+	/* Process the result of WaitForLSNReplay(). */
+	switch (lastWaitLSNResult)
+	{
+		case WaitLSNResultSuccess:
+			result_string = "success";
+			break;
+
+		case WaitLSNResultTimeout:
+			result_string = "timeout";
+			break;
+
+		case WaitLSNResultNotInRecovery:
+			result_string = "not in recovery";
+			break;
+
+		case WaitLSNResultPromotedConcurrently:
+			result_string = "promoted concurrently";
+			break;
+	}
+
+	PG_RETURN_TEXT_P(cstring_to_text(result_string));
+}
diff --git a/src/backend/catalog/system_functions.sql b/src/backend/catalog/system_functions.sql
index b0d0de051e7..ed092b748ef 100644
--- a/src/backend/catalog/system_functions.sql
+++ b/src/backend/catalog/system_functions.sql
@@ -417,6 +417,11 @@ CREATE OR REPLACE FUNCTION
 CREATE OR REPLACE PROCEDURE pg_wal_replay_wait(target_lsn pg_lsn, timeout int8 DEFAULT 0)
   LANGUAGE internal AS 'pg_wal_replay_wait';
 
+CREATE OR REPLACE PROCEDURE pg_wal_replay_wait_status(OUT status text,
+													  target_lsn pg_lsn,
+													  timeout int8 DEFAULT 0)
+  LANGUAGE internal AS 'pg_wal_replay_wait_status';
+
 CREATE OR REPLACE FUNCTION pg_logical_slot_get_changes(
     IN slot_name name, IN upto_lsn pg_lsn, IN upto_nchanges int, VARIADIC options text[] DEFAULT '{}',
     OUT lsn pg_lsn, OUT xid xid, OUT data text)
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 43f608d7a0a..36024a6e2c9 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -6647,6 +6647,17 @@
   proname => 'pg_wal_replay_wait', prokind => 'p', prorettype => 'void',
   proargtypes => 'pg_lsn int8', proargnames => '{target_lsn,timeout}',
   prosrc => 'pg_wal_replay_wait' },
+{ oid => '226',
+  descr => 'wait for the target LSN to be replayed on standby with an optional timeout without error throwing',
+  proname => 'pg_wal_replay_wait_no_error', prokind => 'p', prorettype => 'void',
+  proargtypes => 'pg_lsn int8', proargmodes => '{i,i}',
+  proargnames => '{target_lsn,timeout}',
+  prosrc => 'pg_wal_replay_wait_no_error' },
+{ oid => '388',
+  descr => 'return twait for the target LSN to be replayed on standby with an optional timeout and returning the waiting result',
+  proname => 'pg_wal_replay_wait_status', prorettype => 'text',
+  proargtypes => '',
+  prosrc => 'pg_wal_replay_wait_status' },
 
 { oid => '6224', descr => 'get resource managers loaded in system',
   proname => 'pg_get_wal_resource_managers', prorows => '50', proretset => 't',
diff --git a/src/test/recovery/t/043_wal_replay_wait.pl b/src/test/recovery/t/043_wal_replay_wait.pl
index 024f1fe6488..d5bf48895fa 100644
--- a/src/test/recovery/t/043_wal_replay_wait.pl
+++ b/src/test/recovery/t/043_wal_replay_wait.pl
@@ -77,6 +77,18 @@ $node_standby->psql(
 ok( $stderr =~ /timed out while waiting for target LSN/,
 	"get timeout on waiting for unreachable LSN");
 
+$output = $node_standby->safe_psql('postgres', qq[
+	CALL pg_wal_replay_wait_no_error('${lsn2}', 10);
+	SELECT pg_wal_replay_wait_status();]);
+ok( $output == "success",
+	"pg_wal_replay_wait_status() returns correct status after successful waiting");
+$output = $node_standby->psql(
+	'postgres',	qq[
+	CALL pg_wal_replay_wait_no_error('${lsn2}', 10);
+	SELECT pg_wal_replay_wait_status();]);
+ok( $output == "timeout",
+	"pg_wal_replay_wait_status() returns correct status after timeout");
+
 # 4. Check that pg_wal_replay_wait() triggers an error if called on primary,
 # within another function, or inside a transaction with an isolation level
 # higher than READ COMMITTED.
-- 
2.39.5 (Apple Git-154)

#11Michael Paquier
michael@paquier.xyz
In reply to: Alexander Korotkov (#1)
Re: pgsql: Implement pg_wal_replay_wait() stored procedure

Hi Alexander,

On Fri, Aug 02, 2024 at 06:22:21PM +0000, Alexander Korotkov wrote:

Implement pg_wal_replay_wait() stored procedure

pg_wal_replay_wait() is to be used on standby and specifies waiting for
the specific WAL location to be replayed. This option is useful when
the user makes some data changes on primary and needs a guarantee to see
these changes are on standby.

The queue of waiters is stored in the shared memory as an LSN-ordered pairing
heap, where the waiter with the nearest LSN stays on the top. During
the replay of WAL, waiters whose LSNs have already been replayed are deleted
from the shared memory pairing heap and woken up by setting their latches.

pg_wal_replay_wait() needs to wait without any snapshot held. Otherwise,
the snapshot could prevent the replay of WAL records, implying a kind of
self-deadlock. This is why it is only possible to implement
pg_wal_replay_wait() as a procedure working without an active snapshot,
not a function.

Catversion is bumped.

I've spotted that this patch uses an OID that should not be used
during the development cycle:
+{ oid => '111',
+  descr => 'wait for the target LSN to be replayed on standby with an optional timeout', 

Please use something in the 8000-9999 range, as required by
98eab30b93d5.

Thanks,
--
Michael

#12Alexander Korotkov
aekorotkov@gmail.com
In reply to: Michael Paquier (#11)
Re: pgsql: Implement pg_wal_replay_wait() stored procedure

On Thu, Sep 26, 2024 at 11:19 AM Michael Paquier <michael@paquier.xyz> wrote:

On Fri, Aug 02, 2024 at 06:22:21PM +0000, Alexander Korotkov wrote:

Implement pg_wal_replay_wait() stored procedure

pg_wal_replay_wait() is to be used on standby and specifies waiting for
the specific WAL location to be replayed. This option is useful when
the user makes some data changes on primary and needs a guarantee to see
these changes are on standby.

The queue of waiters is stored in the shared memory as an LSN-ordered pairing
heap, where the waiter with the nearest LSN stays on the top. During
the replay of WAL, waiters whose LSNs have already been replayed are deleted
from the shared memory pairing heap and woken up by setting their latches.

pg_wal_replay_wait() needs to wait without any snapshot held. Otherwise,
the snapshot could prevent the replay of WAL records, implying a kind of
self-deadlock. This is why it is only possible to implement
pg_wal_replay_wait() as a procedure working without an active snapshot,
not a function.

Catversion is bumped.

I've spotted that this patch uses an OID that should not be used
during the development cycle:
+{ oid => '111',
+  descr => 'wait for the target LSN to be replayed on standby with an optional timeout',

Please use something in the 8000-9999 range, as required by
98eab30b93d5.

Fixed, sorry for messing this up.
I would appreciate if you have time to look at [0] to check if it
meets your expectations.

Links.
1. /messages/by-id/CAPpHfdsw9oq62Fvt65JApHJf1auUirdGJV7=nRyVnDL3M8z5xA@mail.gmail.com

------
Regards,
Alexander Korotkov
Supabase

#13Michael Paquier
michael@paquier.xyz
In reply to: Alexander Korotkov (#12)
Re: pgsql: Implement pg_wal_replay_wait() stored procedure

On Thu, Sep 26, 2024 at 06:41:18PM +0300, Alexander Korotkov wrote:

On Thu, Sep 26, 2024 at 11:19 AM Michael Paquier <michael@paquier.xyz> wrote:

Please use something in the 8000-9999 range, as required by
98eab30b93d5.

Fixed, sorry for messing this up.

Thanks for taking care of that.

I would appreciate if you have time to look at [0] to check if it
meets your expectations.

Links.
1. /messages/by-id/CAPpHfdsw9oq62Fvt65JApHJf1auUirdGJV7=nRyVnDL3M8z5xA@mail.gmail.com

I am aware of this one, sorry for the delay. It's on my pile of
things to look at, but I have not been able to get back to it.
--
Michael

#14Michael Paquier
michael@paquier.xyz
In reply to: Alexander Korotkov (#10)
Re: pgsql: Implement pg_wal_replay_wait() stored procedure

On Fri, Sep 20, 2024 at 03:00:20PM +0300, Alexander Korotkov wrote:

Please, check the attached patchset for implementation of proposed approach.

0001 looks like it requires an indentation in its .h diffs.

+typedef enum
+{
+	WaitLSNResultSuccess, /* Target LSN is reached */
+	WaitLSNResultTimeout, /* Timeout occured */

Perhaps use WAIT_LSN_RESULT_SUCCESS, etc, rather than camel casing.

+ * Results statuses for WaitForLSNReplay().

Too much plural here.

What you are doing with WaitForLSNReplay() sounds kind of right.

rc = WaitLatch(MyLatch, wake_events, delay_ms,
WAIT_EVENT_WAIT_FOR_WAL_REPLAY);

Question about this existing bit in waitlsn.c. Shouldn't this issue a
FATAL if rc reports a WL_POSTMASTER_DEATH? Or am I missing an
intention here. That was already the case before this patch set.

pg_wal_replay_wait() is new to v18, meaning that we still have some
time to agree on its final shape for this release cycle. This
discussion shares similarities with the recent exercise done in
f593c5517d14, and I think that we should be more consistent between
both to not repeat the same mistakes as in the past, even if this case
if more complex because we have more reasons behind why a wait could
not have happened.

I would suggest to keep things simple and have one single function
rather than introduce two more pg_proc entries with slight differences
in their error reporting, making the original function return a text
about the reason of the failure when waiting (be it a timeout,
success, promotion or outside recovery).

FWIW, I am confused regarding the need for WaitLSNResultNotInRecovery
and WaitLSNResultPromotedConcurrently in the error states. On a code
basis, they check the same thing: RecoveryInProgress(). I'd suggest
to cut one of them. This also points to the fact that
WaitForLSNReplay() has some duplication that's not required. We could
have less GetXLogReplayRecPtr() calls and do all the checks in the for
loop. The current structure of the code in waitlsn.c is more complex
than it could be.
--
Michael

#15Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#14)
Re: pgsql: Implement pg_wal_replay_wait() stored procedure

On Fri, Sep 27, 2024 at 01:35:23PM +0900, Michael Paquier wrote:

I would suggest to keep things simple and have one single function
rather than introduce two more pg_proc entries with slight differences
in their error reporting, making the original function return a text
about the reason of the failure when waiting (be it a timeout,
success, promotion or outside recovery).

More to the point here. I am not sure what's the benefit of having a
procedure, while we have been using SQL functions for similar purposes
in xlogfuncs.c for years. And what you have here can also be coupled
with more complex logic in SQL queries.
--
Michael

#16Alexander Korotkov
aekorotkov@gmail.com
In reply to: Michael Paquier (#15)
Re: pgsql: Implement pg_wal_replay_wait() stored procedure

On Fri, Sep 27, 2024 at 7:57 AM Michael Paquier <michael@paquier.xyz> wrote:

On Fri, Sep 27, 2024 at 01:35:23PM +0900, Michael Paquier wrote:

I would suggest to keep things simple and have one single function
rather than introduce two more pg_proc entries with slight differences
in their error reporting, making the original function return a text
about the reason of the failure when waiting (be it a timeout,
success, promotion or outside recovery).

More to the point here. I am not sure what's the benefit of having a
procedure, while we have been using SQL functions for similar purposes
in xlogfuncs.c for years. And what you have here can also be coupled
with more complex logic in SQL queries.

As I multiple times pointed in the thread [1] [2], this couldn't be
done in SQL function. SQL-function should run within snapshot, which
could prevent WAL from being replayed. In turn, depending on timeout
settings that could lead to hidden deadlock (function waits for LSN to
be replayed, replay process wait snapshot to be released) or an error.

In the stored procedure, we're releasing active and catalog snapshots
(similarly to VACUUM statement). Waiting without holding a snapshots
allows us to workaround the problem described above. We can't do this
in the SQL function, because that would leave the rest of SQL query
without a snapshot.

Links.
1. /messages/by-id/CAPpHfduBSN8j5j5Ynn5x=ThD=8ypNd53D608VXGweBsPzxPvqA@mail.gmail.com
2. /messages/by-id/CAPpHfdtiGgn0iS1KbW2HTam-1+oK+vhXZDAcnX9hKaA7Oe=F-A@mail.gmail.com

------
Regards,
Alexander Korotkov
Supabase

#17Alexander Korotkov
aekorotkov@gmail.com
In reply to: Michael Paquier (#14)
3 attachment(s)
Re: pgsql: Implement pg_wal_replay_wait() stored procedure

Hi!

Thank you for your review.

On Fri, Sep 27, 2024 at 7:35 AM Michael Paquier <michael@paquier.xyz> wrote:

On Fri, Sep 20, 2024 at 03:00:20PM +0300, Alexander Korotkov wrote:

Please, check the attached patchset for implementation of proposed approach.

0001 looks like it requires an indentation in its .h diffs.

+typedef enum
+{
+       WaitLSNResultSuccess, /* Target LSN is reached */
+       WaitLSNResultTimeout, /* Timeout occured */

Perhaps use WAIT_LSN_RESULT_SUCCESS, etc, rather than camel casing.

+ * Results statuses for WaitForLSNReplay().

Too much plural here.

What you are doing with WaitForLSNReplay() sounds kind of right.

rc = WaitLatch(MyLatch, wake_events, delay_ms,
WAIT_EVENT_WAIT_FOR_WAL_REPLAY);

Thank you, fixed in the attached patchset.

Question about this existing bit in waitlsn.c. Shouldn't this issue a
FATAL if rc reports a WL_POSTMASTER_DEATH? Or am I missing an
intention here. That was already the case before this patch set.

Fixed in the separate patch.

pg_wal_replay_wait() is new to v18, meaning that we still have some
time to agree on its final shape for this release cycle. This
discussion shares similarities with the recent exercise done in
f593c5517d14, and I think that we should be more consistent between
both to not repeat the same mistakes as in the past, even if this case
if more complex because we have more reasons behind why a wait could
not have happened.

I would suggest to keep things simple and have one single function
rather than introduce two more pg_proc entries with slight differences
in their error reporting, making the original function return a text
about the reason of the failure when waiting (be it a timeout,
success, promotion or outside recovery).

I also like to keep things simple. Keeping this as a single function
is not possible due to the reasons I described in [1]. However, it's
possible to fit into one stored procedure. I made 'no_error' as an
argument for the pg_wal_replay_wait() procedure. Done so in the
attached patchset.

FWIW, I am confused regarding the need for WaitLSNResultNotInRecovery
and WaitLSNResultPromotedConcurrently in the error states. On a code
basis, they check the same thing: RecoveryInProgress(). I'd suggest
to cut one of them.

Agreed. I initially intended to distinguish situations when the user
mistakenly calls pg_wal_replay_wait() on replication leader and when
concurrent promotion happens. However, given that the promotion could
happen after the user issued the query and before waiting starts, it
doesn't make much sense.

This also points to the fact that
WaitForLSNReplay() has some duplication that's not required. We could
have less GetXLogReplayRecPtr() calls and do all the checks in the for
loop. The current structure of the code in waitlsn.c is more complex
than it could be.

Not sure about this. I'd like to keep the fast-path check before we
call addLSNWaiter().

Links.
1. /messages/by-id/CAPpHfdukVbJZntibZZ4HM7p92zN-QmAtD1+sAALRTFCsvpAq7A@mail.gmail.com

------
Regards,
Alexander Korotkov
Supabase

Attachments:

v3-0002-Refactor-WaitForLSNReplay-to-return-the-result-of.patchapplication/octet-stream; name=v3-0002-Refactor-WaitForLSNReplay-to-return-the-result-of.patchDownload
From 54b36ff6e82f1a01b42fdb059b9c6db842534fc3 Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <akorotkov@postgresql.org>
Date: Thu, 19 Sep 2024 14:48:44 +0300
Subject: [PATCH v3 2/3] Refactor WaitForLSNReplay() to return the result of
 waiting

Currently WaitForLSNReplay() immediatly throws an error if waiting for LSN
replay is not successful.  This commit teaches  WaitForLSNReplay() to return
the result of waiting, while making pg_wal_replay_wait() responsible for
throwing an appropriate error.

This is preparation for new procedure pg_wal_replay_wait_status(), which will
expose the waiting result to the user instead of throwing an error.
---
 src/backend/access/transam/xlogfuncs.c | 31 +++++++++++++++++++++++-
 src/backend/commands/waitlsn.c         | 33 +++++++++-----------------
 src/include/commands/waitlsn.h         | 12 +++++++++-
 3 files changed, 52 insertions(+), 24 deletions(-)

diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index 3e3d2bb6189..4c79c264e13 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -759,6 +759,7 @@ pg_wal_replay_wait(PG_FUNCTION_ARGS)
 {
 	XLogRecPtr	target_lsn = PG_GETARG_LSN(0);
 	int64		timeout = PG_GETARG_INT64(1);
+	WaitLSNResult result;
 
 	if (timeout < 0)
 		ereport(ERROR,
@@ -799,7 +800,35 @@ pg_wal_replay_wait(PG_FUNCTION_ARGS)
 	 */
 	Assert(MyProc->xmin == InvalidTransactionId);
 
-	(void) WaitForLSNReplay(target_lsn, timeout);
+	result = WaitForLSNReplay(target_lsn, timeout);
+
+	/*
+	 * Process the result of WaitForLSNReplay().  Throw appropriate error if
+	 * needed.
+	 */
+	switch (result)
+	{
+		case WAIT_LSN_RESULT_SUCCESS:
+			/* Nothing to do on success */
+			break;
+
+		case WAIT_LSN_RESULT_TIMEOUT:
+			ereport(ERROR,
+					(errcode(ERRCODE_QUERY_CANCELED),
+					 errmsg("timed out while waiting for target LSN %X/%X to be replayed; current replay LSN %X/%X",
+							LSN_FORMAT_ARGS(target_lsn),
+							LSN_FORMAT_ARGS(GetXLogReplayRecPtr(NULL)))));
+			break;
+
+		case WAIT_LSN_RESULT_NOT_IN_RECOVERY:
+			ereport(ERROR,
+					(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+					 errmsg("recovery is not in progress"),
+					 errdetail("Recovery ended before replaying target LSN %X/%X; last replay LSN %X/%X.",
+							   LSN_FORMAT_ARGS(target_lsn),
+							   LSN_FORMAT_ARGS(GetXLogReplayRecPtr(NULL)))));
+			break;
+	}
 
 	PG_RETURN_VOID();
 }
diff --git a/src/backend/commands/waitlsn.c b/src/backend/commands/waitlsn.c
index 7f3cd62e2a2..237351316a8 100644
--- a/src/backend/commands/waitlsn.c
+++ b/src/backend/commands/waitlsn.c
@@ -217,7 +217,7 @@ WaitLSNCleanup(void)
  * Wait using MyLatch till the given LSN is replayed, the postmaster dies or
  * timeout happens.
  */
-void
+WaitLSNResult
 WaitForLSNReplay(XLogRecPtr targetLSN, int64 timeout)
 {
 	XLogRecPtr	currentLSN;
@@ -240,17 +240,14 @@ WaitForLSNReplay(XLogRecPtr targetLSN, int64 timeout)
 		 * check the last replay LSN before reporting an error.
 		 */
 		if (targetLSN <= GetXLogReplayRecPtr(NULL))
-			return;
-		ereport(ERROR,
-				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
-				 errmsg("recovery is not in progress"),
-				 errhint("Waiting for LSN can only be executed during recovery.")));
+			return WAIT_LSN_RESULT_SUCCESS;
+		return WAIT_LSN_RESULT_NOT_IN_RECOVERY;
 	}
 	else
 	{
 		/* If target LSN is already replayed, exit immediately */
 		if (targetLSN <= GetXLogReplayRecPtr(NULL))
-			return;
+			return WAIT_LSN_RESULT_SUCCESS;
 	}
 
 	if (timeout > 0)
@@ -276,17 +273,13 @@ WaitForLSNReplay(XLogRecPtr targetLSN, int64 timeout)
 		{
 			/*
 			 * Recovery was ended, but recheck if target LSN was already
-			 * replayed.
+			 * replayed.  See the comment regarding deleteLSNWaiter() below.
 			 */
+			deleteLSNWaiter();
 			currentLSN = GetXLogReplayRecPtr(NULL);
 			if (targetLSN <= currentLSN)
-				return;
-			ereport(ERROR,
-					(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
-					 errmsg("recovery is not in progress"),
-					 errdetail("Recovery ended before replaying target LSN %X/%X; last replay LSN %X/%X.",
-							   LSN_FORMAT_ARGS(targetLSN),
-							   LSN_FORMAT_ARGS(currentLSN))));
+				return WAIT_LSN_RESULT_SUCCESS;
+			return WAIT_LSN_RESULT_NOT_IN_RECOVERY;
 		}
 		else
 		{
@@ -338,11 +331,7 @@ WaitForLSNReplay(XLogRecPtr targetLSN, int64 timeout)
 	 * If we didn't reach the target LSN, we must be exited by timeout.
 	 */
 	if (targetLSN > currentLSN)
-	{
-		ereport(ERROR,
-				(errcode(ERRCODE_QUERY_CANCELED),
-				 errmsg("timed out while waiting for target LSN %X/%X to be replayed; current replay LSN %X/%X",
-						LSN_FORMAT_ARGS(targetLSN),
-						LSN_FORMAT_ARGS(currentLSN))));
-	}
+		return WAIT_LSN_RESULT_TIMEOUT;
+
+	return WAIT_LSN_RESULT_SUCCESS;
 }
diff --git a/src/include/commands/waitlsn.h b/src/include/commands/waitlsn.h
index bb5ac858dcc..edf4a793e63 100644
--- a/src/include/commands/waitlsn.h
+++ b/src/include/commands/waitlsn.h
@@ -70,12 +70,22 @@ typedef struct WaitLSNState
 	WaitLSNProcInfo procInfos[FLEXIBLE_ARRAY_MEMBER];
 } WaitLSNState;
 
+/*
+ * Result statuses for WaitForLSNReplay().
+ */
+typedef enum
+{
+	WAIT_LSN_RESULT_SUCCESS, /* Target LSN is reached */
+	WAIT_LSN_RESULT_TIMEOUT, /* Timeout occured */
+	WAIT_LSN_RESULT_NOT_IN_RECOVERY, /* Recovery ended before or during our wait */
+} WaitLSNResult;
+
 extern PGDLLIMPORT WaitLSNState *waitLSNState;
 
 extern Size WaitLSNShmemSize(void);
 extern void WaitLSNShmemInit(void);
 extern void WaitLSNSetLatches(XLogRecPtr currentLSN);
 extern void WaitLSNCleanup(void);
-extern void WaitForLSNReplay(XLogRecPtr targetLSN, int64 timeout);
+extern WaitLSNResult WaitForLSNReplay(XLogRecPtr targetLSN, int64 timeout);
 
 #endif							/* WAIT_LSN_H */
-- 
2.39.5 (Apple Git-154)

v3-0003-Add-no_error-argument-to-pg_wal_replay_wait.patchapplication/octet-stream; name=v3-0003-Add-no_error-argument-to-pg_wal_replay_wait.patchDownload
From 60f58577ec3631199df42479ba0cf53806f9162b Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <akorotkov@postgresql.org>
Date: Thu, 19 Sep 2024 15:34:18 +0300
Subject: [PATCH v3 3/3] Add 'no_error' argument to pg_wal_replay_wait()

This argument allow to skip throwing an error.  Instead the result status
can be obtained using pg_wal_replay_wait_status() function.
---
 src/backend/access/transam/xlogfuncs.c     | 42 +++++++++++++++++++---
 src/backend/catalog/system_functions.sql   |  4 ++-
 src/include/catalog/pg_proc.dat            |  7 +++-
 src/test/recovery/t/043_wal_replay_wait.pl | 12 +++++++
 4 files changed, 59 insertions(+), 6 deletions(-)

diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index 4c79c264e13..761ec6bb9d4 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -751,15 +751,18 @@ pg_promote(PG_FUNCTION_ARGS)
 	PG_RETURN_BOOL(false);
 }
 
+static WaitLSNResult lastWaitLSNResult = WAIT_LSN_RESULT_SUCCESS;
+
 /*
- * Waits until recovery replays the target LSN with optional timeout.
+ * Waits until recovery replays the target LSN with optional timeout.  Throw
+ * an error on failure.
  */
 Datum
 pg_wal_replay_wait(PG_FUNCTION_ARGS)
 {
 	XLogRecPtr	target_lsn = PG_GETARG_LSN(0);
 	int64		timeout = PG_GETARG_INT64(1);
-	WaitLSNResult result;
+	bool		no_error = PG_GETARG_BOOL(2);
 
 	if (timeout < 0)
 		ereport(ERROR,
@@ -800,13 +803,16 @@ pg_wal_replay_wait(PG_FUNCTION_ARGS)
 	 */
 	Assert(MyProc->xmin == InvalidTransactionId);
 
-	result = WaitForLSNReplay(target_lsn, timeout);
+	lastWaitLSNResult = WaitForLSNReplay(target_lsn, timeout);
+
+	if (no_error)
+		PG_RETURN_VOID();
 
 	/*
 	 * Process the result of WaitForLSNReplay().  Throw appropriate error if
 	 * needed.
 	 */
-	switch (result)
+	switch (lastWaitLSNResult)
 	{
 		case WAIT_LSN_RESULT_SUCCESS:
 			/* Nothing to do on success */
@@ -832,3 +838,31 @@ pg_wal_replay_wait(PG_FUNCTION_ARGS)
 
 	PG_RETURN_VOID();
 }
+
+Datum
+pg_wal_replay_wait_status(PG_FUNCTION_ARGS)
+{
+	const char *result_string = "";
+
+	/* Process the result of WaitForLSNReplay(). */
+	switch (lastWaitLSNResult)
+	{
+		case WAIT_LSN_RESULT_SUCCESS:
+			result_string = "success";
+			break;
+
+		case WAIT_LSN_RESULT_TIMEOUT:
+			result_string = "timeout";
+			break;
+
+		case WAIT_LSN_RESULT_NOT_IN_RECOVERY:
+			result_string = "not in recovery";
+			break;
+
+		case WAIT_LSN_RESULT_PROMOTED_CONCURRENTLY:
+			result_string = "promoted concurrently";
+			break;
+	}
+
+	PG_RETURN_TEXT_P(cstring_to_text(result_string));
+}
diff --git a/src/backend/catalog/system_functions.sql b/src/backend/catalog/system_functions.sql
index b0d0de051e7..26b3f33566e 100644
--- a/src/backend/catalog/system_functions.sql
+++ b/src/backend/catalog/system_functions.sql
@@ -414,7 +414,9 @@ CREATE OR REPLACE FUNCTION
   json_populate_recordset(base anyelement, from_json json, use_json_as_text boolean DEFAULT false)
   RETURNS SETOF anyelement LANGUAGE internal STABLE ROWS 100  AS 'json_populate_recordset' PARALLEL SAFE;
 
-CREATE OR REPLACE PROCEDURE pg_wal_replay_wait(target_lsn pg_lsn, timeout int8 DEFAULT 0)
+CREATE OR REPLACE PROCEDURE pg_wal_replay_wait(target_lsn pg_lsn,
+											   timeout int8 DEFAULT 0,
+											   no_error bool DEFAULT false)
   LANGUAGE internal AS 'pg_wal_replay_wait';
 
 CREATE OR REPLACE FUNCTION pg_logical_slot_get_changes(
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 322114d72a7..ef633c4f227 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -6648,8 +6648,13 @@
 { oid => '8593',
   descr => 'wait for the target LSN to be replayed on standby with an optional timeout',
   proname => 'pg_wal_replay_wait', prokind => 'p', prorettype => 'void',
-  proargtypes => 'pg_lsn int8', proargnames => '{target_lsn,timeout}',
+  proargtypes => 'pg_lsn int8 bool', proargnames => '{target_lsn,timeout,no_error}',
   prosrc => 'pg_wal_replay_wait' },
+{ oid => '8594',
+  descr => 'the last result for pg_wal_replay_wait()',
+  proname => 'pg_wal_replay_wait_status', prorettype => 'text',
+  proargtypes => '',
+  prosrc => 'pg_wal_replay_wait_status' },
 
 { oid => '6224', descr => 'get resource managers loaded in system',
   proname => 'pg_get_wal_resource_managers', prorows => '50', proretset => 't',
diff --git a/src/test/recovery/t/043_wal_replay_wait.pl b/src/test/recovery/t/043_wal_replay_wait.pl
index 024f1fe6488..e5cca59e468 100644
--- a/src/test/recovery/t/043_wal_replay_wait.pl
+++ b/src/test/recovery/t/043_wal_replay_wait.pl
@@ -77,6 +77,18 @@ $node_standby->psql(
 ok( $stderr =~ /timed out while waiting for target LSN/,
 	"get timeout on waiting for unreachable LSN");
 
+$output = $node_standby->safe_psql('postgres', qq[
+	CALL pg_wal_replay_wait('${lsn2}', 10, true);
+	SELECT pg_wal_replay_wait_status();]);
+ok( $output == "success",
+	"pg_wal_replay_wait_status() returns correct status after successful waiting");
+$output = $node_standby->psql(
+	'postgres',	qq[
+	CALL pg_wal_replay_wait('${lsn2}', 10, true);
+	SELECT pg_wal_replay_wait_status();]);
+ok( $output == "timeout",
+	"pg_wal_replay_wait_status() returns correct status after timeout");
+
 # 4. Check that pg_wal_replay_wait() triggers an error if called on primary,
 # within another function, or inside a transaction with an isolation level
 # higher than READ COMMITTED.
-- 
2.39.5 (Apple Git-154)

v3-0001-Make-WaitForLSNReplay-issue-FATAL-on-postmaster-d.patchapplication/octet-stream; name=v3-0001-Make-WaitForLSNReplay-issue-FATAL-on-postmaster-d.patchDownload
From ffc2d4966ad46a17c69e8e61d08d4dbf11600386 Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <akorotkov@postgresql.org>
Date: Sun, 29 Sep 2024 12:29:31 +0300
Subject: [PATCH v3 1/3] Make WaitForLSNReplay() issue FATAL on postmaster
 death

Reported-by: Michael Paquier
Discussion: https://postgr.es/m/ZvY2C8N4ZqgCFaLu%40paquier.xyz
---
 src/backend/commands/waitlsn.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/src/backend/commands/waitlsn.c b/src/backend/commands/waitlsn.c
index 501938f4330..7f3cd62e2a2 100644
--- a/src/backend/commands/waitlsn.c
+++ b/src/backend/commands/waitlsn.c
@@ -222,7 +222,7 @@ WaitForLSNReplay(XLogRecPtr targetLSN, int64 timeout)
 {
 	XLogRecPtr	currentLSN;
 	TimestampTz endtime = 0;
-	int			wake_events = WL_LATCH_SET | WL_EXIT_ON_PM_DEATH;
+	int			wake_events = WL_LATCH_SET | WL_POSTMASTER_DEATH;
 
 	/* Shouldn't be called when shmem isn't initialized */
 	Assert(waitLSNState);
@@ -313,6 +313,16 @@ WaitForLSNReplay(XLogRecPtr targetLSN, int64 timeout)
 		rc = WaitLatch(MyLatch, wake_events, delay_ms,
 					   WAIT_EVENT_WAIT_FOR_WAL_REPLAY);
 
+		/*
+		 * Emergency bailout if postmaster has died.  This is to avoid the
+		 * necessity for manual cleanup of all postmaster children.
+		 */
+		if (rc & WL_POSTMASTER_DEATH)
+			ereport(FATAL,
+					(errcode(ERRCODE_ADMIN_SHUTDOWN),
+					 errmsg("terminating connection due to unexpected postmaster exit"),
+					 errcontext("while waiting for LSN replay")));
+
 		if (rc & WL_LATCH_SET)
 			ResetLatch(MyLatch);
 	}
-- 
2.39.5 (Apple Git-154)

#18Alexander Korotkov
aekorotkov@gmail.com
In reply to: Alexander Korotkov (#17)
3 attachment(s)
Re: pgsql: Implement pg_wal_replay_wait() stored procedure

Hi!

On Sun, Sep 29, 2024 at 1:40 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:

Thank you for your review.

On Fri, Sep 27, 2024 at 7:35 AM Michael Paquier <michael@paquier.xyz> wrote:

On Fri, Sep 20, 2024 at 03:00:20PM +0300, Alexander Korotkov wrote:

Please, check the attached patchset for implementation of proposed approach.

0001 looks like it requires an indentation in its .h diffs.

+typedef enum
+{
+       WaitLSNResultSuccess, /* Target LSN is reached */
+       WaitLSNResultTimeout, /* Timeout occured */

Perhaps use WAIT_LSN_RESULT_SUCCESS, etc, rather than camel casing.

+ * Results statuses for WaitForLSNReplay().

Too much plural here.

What you are doing with WaitForLSNReplay() sounds kind of right.

rc = WaitLatch(MyLatch, wake_events, delay_ms,
WAIT_EVENT_WAIT_FOR_WAL_REPLAY);

Thank you, fixed in the attached patchset.

Question about this existing bit in waitlsn.c. Shouldn't this issue a
FATAL if rc reports a WL_POSTMASTER_DEATH? Or am I missing an
intention here. That was already the case before this patch set.

Fixed in the separate patch.

pg_wal_replay_wait() is new to v18, meaning that we still have some
time to agree on its final shape for this release cycle. This
discussion shares similarities with the recent exercise done in
f593c5517d14, and I think that we should be more consistent between
both to not repeat the same mistakes as in the past, even if this case
if more complex because we have more reasons behind why a wait could
not have happened.

I would suggest to keep things simple and have one single function
rather than introduce two more pg_proc entries with slight differences
in their error reporting, making the original function return a text
about the reason of the failure when waiting (be it a timeout,
success, promotion or outside recovery).

I also like to keep things simple. Keeping this as a single function
is not possible due to the reasons I described in [1]. However, it's
possible to fit into one stored procedure. I made 'no_error' as an
argument for the pg_wal_replay_wait() procedure. Done so in the
attached patchset.

FWIW, I am confused regarding the need for WaitLSNResultNotInRecovery
and WaitLSNResultPromotedConcurrently in the error states. On a code
basis, they check the same thing: RecoveryInProgress(). I'd suggest
to cut one of them.

Agreed. I initially intended to distinguish situations when the user
mistakenly calls pg_wal_replay_wait() on replication leader and when
concurrent promotion happens. However, given that the promotion could
happen after the user issued the query and before waiting starts, it
doesn't make much sense.

This also points to the fact that
WaitForLSNReplay() has some duplication that's not required. We could
have less GetXLogReplayRecPtr() calls and do all the checks in the for
loop. The current structure of the code in waitlsn.c is more complex
than it could be.

Not sure about this. I'd like to keep the fast-path check before we
call addLSNWaiter().

Please, check the revised patchset. It contains more tests for new
function pg_wal_replay_wait_status() and changes for documentation.

------
Regards,
Alexander Korotkov
Supabase

Attachments:

v4-0003-Add-no_error-argument-to-pg_wal_replay_wait.patchapplication/octet-stream; name=v4-0003-Add-no_error-argument-to-pg_wal_replay_wait.patchDownload
From a0995a2652648f4aa701061728ffffddd4e81a37 Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <akorotkov@postgresql.org>
Date: Thu, 19 Sep 2024 15:34:18 +0300
Subject: [PATCH v4 3/3] Add 'no_error' argument to pg_wal_replay_wait()

This argument allow to skip throwing an error.  Instead the result status
can be obtained using pg_wal_replay_wait_status() function.
---
 doc/src/sgml/func.sgml                     | 56 +++++++++++++++++++---
 src/backend/access/transam/xlogfuncs.c     | 38 +++++++++++++--
 src/backend/catalog/system_functions.sql   |  4 +-
 src/include/catalog/pg_proc.dat            |  7 ++-
 src/test/recovery/t/043_wal_replay_wait.pl | 19 ++++++++
 5 files changed, 112 insertions(+), 12 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 9656d1891e8..57052cec55a 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -28989,12 +28989,15 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset
    </para>
 
    <table id="recovery-synchronization-procedure-table">
-    <title>Recovery Synchronization Procedure</title>
+    <title>Recovery Synchronization Procedure and Function</title>
     <tgroup cols="1">
      <thead>
       <row>
        <entry role="func_table_entry"><para role="func_signature">
-        Procedure
+        Procedure or Function
+       </para>
+       <para>
+        Type
        </para>
        <para>
         Description
@@ -29010,8 +29013,11 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset
         </indexterm>
         <function>pg_wal_replay_wait</function> (
           <parameter>target_lsn</parameter> <type>pg_lsn</type>,
-          <parameter>timeout</parameter> <type>bigint</type> <literal>DEFAULT</literal> <literal>0</literal>)
-        <returnvalue>void</returnvalue>
+          <parameter>timeout</parameter> <type>bigint</type> <literal>DEFAULT</literal> <literal>0</literal>,
+          <parameter>no_error</parameter> <type>bool</type> <literal>DEFAULT</literal> <literal>false</literal>)
+       </para>
+       <para>
+        Procedure
        </para>
        <para>
         Waits until recovery replays <literal>target_lsn</literal>.
@@ -29022,7 +29028,30 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset
         procedure waits until <literal>target_lsn</literal> is reached or
         the specified <parameter>timeout</parameter> has elapsed.
         On timeout, or if the server is promoted before
-        <literal>target_lsn</literal> is reached, an error is emitted.
+        <literal>target_lsn</literal> is reached, an error is emitted,
+        as soon as <parameter>no_error</parameter> is false.
+        If <parameter>no_error</parameter> is set to true, then the procedure
+        doesn't throw errors.  The last result status could be read
+        with <function>pg_wal_replay_wait_status</function>.
+       </para></entry>
+      </row>
+
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_wal_replay_wait_status</primary>
+        </indexterm>
+        <function>pg_wal_replay_wait_status</function> ()
+        <returnvalue>text</returnvalue>
+       </para>
+       <para>
+        Function
+       </para>
+       <para>
+        Returns the last result status for
+        <function>pg_wal_replay_wait</function> procedure.  The possible
+        values are <literal>success</literal>, <literal>timeout</literal>,
+        and <literal>not in recovery</literal>.
        </para></entry>
       </row>
      </tbody>
@@ -29044,7 +29073,8 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset
    <para>
     <function>pg_wal_replay_wait</function> should be called on standby.
     If a user calls <function>pg_wal_replay_wait</function> on primary, it
-    will error out.  However, if <function>pg_wal_replay_wait</function> is
+    will error out as soon as <parameter>no_error</parameter> is false.
+    However, if <function>pg_wal_replay_wait</function> is
     called on primary promoted from standby and <literal>target_lsn</literal>
     was already replayed, then <function>pg_wal_replay_wait</function> just
     exits immediately.
@@ -29090,6 +29120,20 @@ postgres=# CALL pg_wal_replay_wait('0/306EE20', 100);
 ERROR:  timed out while waiting for target LSN 0/306EE20 to be replayed; current replay LSN 0/306EA60
     </programlisting>
 
+   The same example uses <function>pg_wal_replay_wait</function> with
+   <parameter>no_error</parameter> set to true.  In this case, the result
+   status must be read with <function>pg_wal_replay_wait_status</function>.
+
+   <programlisting>
+postgres=# CALL pg_wal_replay_wait('0/306EE20', 100, true);
+CALL
+postgres=# SELECT pg_wal_replay_wait_status();
+ pg_wal_replay_wait_status
+---------------------------
+ timeout
+(1 row)
+    </programlisting>
+
    </para>
 
    <para>
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index 4c79c264e13..badf47e967c 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -751,15 +751,18 @@ pg_promote(PG_FUNCTION_ARGS)
 	PG_RETURN_BOOL(false);
 }
 
+static WaitLSNResult lastWaitLSNResult = WAIT_LSN_RESULT_SUCCESS;
+
 /*
- * Waits until recovery replays the target LSN with optional timeout.
+ * Waits until recovery replays the target LSN with optional timeout.  Throw
+ * an error on failure.
  */
 Datum
 pg_wal_replay_wait(PG_FUNCTION_ARGS)
 {
 	XLogRecPtr	target_lsn = PG_GETARG_LSN(0);
 	int64		timeout = PG_GETARG_INT64(1);
-	WaitLSNResult result;
+	bool		no_error = PG_GETARG_BOOL(2);
 
 	if (timeout < 0)
 		ereport(ERROR,
@@ -800,13 +803,16 @@ pg_wal_replay_wait(PG_FUNCTION_ARGS)
 	 */
 	Assert(MyProc->xmin == InvalidTransactionId);
 
-	result = WaitForLSNReplay(target_lsn, timeout);
+	lastWaitLSNResult = WaitForLSNReplay(target_lsn, timeout);
+
+	if (no_error)
+		PG_RETURN_VOID();
 
 	/*
 	 * Process the result of WaitForLSNReplay().  Throw appropriate error if
 	 * needed.
 	 */
-	switch (result)
+	switch (lastWaitLSNResult)
 	{
 		case WAIT_LSN_RESULT_SUCCESS:
 			/* Nothing to do on success */
@@ -832,3 +838,27 @@ pg_wal_replay_wait(PG_FUNCTION_ARGS)
 
 	PG_RETURN_VOID();
 }
+
+Datum
+pg_wal_replay_wait_status(PG_FUNCTION_ARGS)
+{
+	const char *result_string = "";
+
+	/* Process the result of WaitForLSNReplay(). */
+	switch (lastWaitLSNResult)
+	{
+		case WAIT_LSN_RESULT_SUCCESS:
+			result_string = "success";
+			break;
+
+		case WAIT_LSN_RESULT_TIMEOUT:
+			result_string = "timeout";
+			break;
+
+		case WAIT_LSN_RESULT_NOT_IN_RECOVERY:
+			result_string = "not in recovery";
+			break;
+	}
+
+	PG_RETURN_TEXT_P(cstring_to_text(result_string));
+}
diff --git a/src/backend/catalog/system_functions.sql b/src/backend/catalog/system_functions.sql
index fd6b606ae90..fad857fffac 100644
--- a/src/backend/catalog/system_functions.sql
+++ b/src/backend/catalog/system_functions.sql
@@ -414,7 +414,9 @@ CREATE OR REPLACE FUNCTION
   json_populate_recordset(base anyelement, from_json json, use_json_as_text boolean DEFAULT false)
   RETURNS SETOF anyelement LANGUAGE internal STABLE ROWS 100  AS 'json_populate_recordset' PARALLEL SAFE;
 
-CREATE OR REPLACE PROCEDURE pg_wal_replay_wait(target_lsn pg_lsn, timeout int8 DEFAULT 0)
+CREATE OR REPLACE PROCEDURE pg_wal_replay_wait(target_lsn pg_lsn,
+											   timeout int8 DEFAULT 0,
+											   no_error bool DEFAULT false)
   LANGUAGE internal AS 'pg_wal_replay_wait';
 
 CREATE OR REPLACE FUNCTION pg_logical_slot_get_changes(
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index bcfb92528ee..e8c86d36bc2 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -6665,8 +6665,13 @@
 { oid => '8593',
   descr => 'wait for the target LSN to be replayed on standby with an optional timeout',
   proname => 'pg_wal_replay_wait', prokind => 'p', prorettype => 'void',
-  proargtypes => 'pg_lsn int8', proargnames => '{target_lsn,timeout}',
+  proargtypes => 'pg_lsn int8 bool', proargnames => '{target_lsn,timeout,no_error}',
   prosrc => 'pg_wal_replay_wait' },
+{ oid => '8594',
+  descr => 'the last result for pg_wal_replay_wait()',
+  proname => 'pg_wal_replay_wait_status', prorettype => 'text',
+  proargtypes => '',
+  prosrc => 'pg_wal_replay_wait_status' },
 
 { oid => '6224', descr => 'get resource managers loaded in system',
   proname => 'pg_get_wal_resource_managers', prorows => '50', proretset => 't',
diff --git a/src/test/recovery/t/043_wal_replay_wait.pl b/src/test/recovery/t/043_wal_replay_wait.pl
index 024f1fe6488..7766010ab79 100644
--- a/src/test/recovery/t/043_wal_replay_wait.pl
+++ b/src/test/recovery/t/043_wal_replay_wait.pl
@@ -77,6 +77,18 @@ $node_standby->psql(
 ok( $stderr =~ /timed out while waiting for target LSN/,
 	"get timeout on waiting for unreachable LSN");
 
+$output = $node_standby->safe_psql('postgres', qq[
+	CALL pg_wal_replay_wait('${lsn2}', 10, true);
+	SELECT pg_wal_replay_wait_status();]);
+ok( $output == "success",
+	"pg_wal_replay_wait_status() returns correct status after successful waiting");
+$output = $node_standby->psql(
+	'postgres',	qq[
+	CALL pg_wal_replay_wait('${lsn2}', 10, true);
+	SELECT pg_wal_replay_wait_status();]);
+ok( $output == "timeout",
+	"pg_wal_replay_wait_status() returns correct status after timeout");
+
 # 4. Check that pg_wal_replay_wait() triggers an error if called on primary,
 # within another function, or inside a transaction with an isolation level
 # higher than READ COMMITTED.
@@ -193,6 +205,13 @@ $node_standby->safe_psql('postgres', "CALL pg_wal_replay_wait('${lsn5}');");
 
 ok(1, 'wait for already replayed LSN exits immediately even after promotion');
 
+$output = $node_standby->psql(
+	'postgres',	qq[
+	CALL pg_wal_replay_wait('${lsn4}', 10, true);
+	SELECT pg_wal_replay_wait_status();]);
+ok( $output == "not in recovery",
+	"pg_wal_replay_wait_status() returns correct status after standby promotion");
+
 $node_standby->stop;
 $node_primary->stop;
 
-- 
2.39.5 (Apple Git-154)

v4-0002-Refactor-WaitForLSNReplay-to-return-the-result-of.patchapplication/octet-stream; name=v4-0002-Refactor-WaitForLSNReplay-to-return-the-result-of.patchDownload
From ce6af266dd7e403ad685e97282636509b4cf2875 Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <akorotkov@postgresql.org>
Date: Thu, 19 Sep 2024 14:48:44 +0300
Subject: [PATCH v4 2/3] Refactor WaitForLSNReplay() to return the result of
 waiting

Currently WaitForLSNReplay() immediatly throws an error if waiting for LSN
replay is not successful.  This commit teaches  WaitForLSNReplay() to return
the result of waiting, while making pg_wal_replay_wait() responsible for
throwing an appropriate error.

This is preparation for new procedure pg_wal_replay_wait_status(), which will
expose the waiting result to the user instead of throwing an error.
---
 src/backend/access/transam/xlogfuncs.c | 31 +++++++++++++++++++++++-
 src/backend/commands/waitlsn.c         | 33 +++++++++-----------------
 src/include/commands/waitlsn.h         | 12 +++++++++-
 3 files changed, 52 insertions(+), 24 deletions(-)

diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index 3e3d2bb6189..4c79c264e13 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -759,6 +759,7 @@ pg_wal_replay_wait(PG_FUNCTION_ARGS)
 {
 	XLogRecPtr	target_lsn = PG_GETARG_LSN(0);
 	int64		timeout = PG_GETARG_INT64(1);
+	WaitLSNResult result;
 
 	if (timeout < 0)
 		ereport(ERROR,
@@ -799,7 +800,35 @@ pg_wal_replay_wait(PG_FUNCTION_ARGS)
 	 */
 	Assert(MyProc->xmin == InvalidTransactionId);
 
-	(void) WaitForLSNReplay(target_lsn, timeout);
+	result = WaitForLSNReplay(target_lsn, timeout);
+
+	/*
+	 * Process the result of WaitForLSNReplay().  Throw appropriate error if
+	 * needed.
+	 */
+	switch (result)
+	{
+		case WAIT_LSN_RESULT_SUCCESS:
+			/* Nothing to do on success */
+			break;
+
+		case WAIT_LSN_RESULT_TIMEOUT:
+			ereport(ERROR,
+					(errcode(ERRCODE_QUERY_CANCELED),
+					 errmsg("timed out while waiting for target LSN %X/%X to be replayed; current replay LSN %X/%X",
+							LSN_FORMAT_ARGS(target_lsn),
+							LSN_FORMAT_ARGS(GetXLogReplayRecPtr(NULL)))));
+			break;
+
+		case WAIT_LSN_RESULT_NOT_IN_RECOVERY:
+			ereport(ERROR,
+					(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+					 errmsg("recovery is not in progress"),
+					 errdetail("Recovery ended before replaying target LSN %X/%X; last replay LSN %X/%X.",
+							   LSN_FORMAT_ARGS(target_lsn),
+							   LSN_FORMAT_ARGS(GetXLogReplayRecPtr(NULL)))));
+			break;
+	}
 
 	PG_RETURN_VOID();
 }
diff --git a/src/backend/commands/waitlsn.c b/src/backend/commands/waitlsn.c
index 7f3cd62e2a2..237351316a8 100644
--- a/src/backend/commands/waitlsn.c
+++ b/src/backend/commands/waitlsn.c
@@ -217,7 +217,7 @@ WaitLSNCleanup(void)
  * Wait using MyLatch till the given LSN is replayed, the postmaster dies or
  * timeout happens.
  */
-void
+WaitLSNResult
 WaitForLSNReplay(XLogRecPtr targetLSN, int64 timeout)
 {
 	XLogRecPtr	currentLSN;
@@ -240,17 +240,14 @@ WaitForLSNReplay(XLogRecPtr targetLSN, int64 timeout)
 		 * check the last replay LSN before reporting an error.
 		 */
 		if (targetLSN <= GetXLogReplayRecPtr(NULL))
-			return;
-		ereport(ERROR,
-				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
-				 errmsg("recovery is not in progress"),
-				 errhint("Waiting for LSN can only be executed during recovery.")));
+			return WAIT_LSN_RESULT_SUCCESS;
+		return WAIT_LSN_RESULT_NOT_IN_RECOVERY;
 	}
 	else
 	{
 		/* If target LSN is already replayed, exit immediately */
 		if (targetLSN <= GetXLogReplayRecPtr(NULL))
-			return;
+			return WAIT_LSN_RESULT_SUCCESS;
 	}
 
 	if (timeout > 0)
@@ -276,17 +273,13 @@ WaitForLSNReplay(XLogRecPtr targetLSN, int64 timeout)
 		{
 			/*
 			 * Recovery was ended, but recheck if target LSN was already
-			 * replayed.
+			 * replayed.  See the comment regarding deleteLSNWaiter() below.
 			 */
+			deleteLSNWaiter();
 			currentLSN = GetXLogReplayRecPtr(NULL);
 			if (targetLSN <= currentLSN)
-				return;
-			ereport(ERROR,
-					(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
-					 errmsg("recovery is not in progress"),
-					 errdetail("Recovery ended before replaying target LSN %X/%X; last replay LSN %X/%X.",
-							   LSN_FORMAT_ARGS(targetLSN),
-							   LSN_FORMAT_ARGS(currentLSN))));
+				return WAIT_LSN_RESULT_SUCCESS;
+			return WAIT_LSN_RESULT_NOT_IN_RECOVERY;
 		}
 		else
 		{
@@ -338,11 +331,7 @@ WaitForLSNReplay(XLogRecPtr targetLSN, int64 timeout)
 	 * If we didn't reach the target LSN, we must be exited by timeout.
 	 */
 	if (targetLSN > currentLSN)
-	{
-		ereport(ERROR,
-				(errcode(ERRCODE_QUERY_CANCELED),
-				 errmsg("timed out while waiting for target LSN %X/%X to be replayed; current replay LSN %X/%X",
-						LSN_FORMAT_ARGS(targetLSN),
-						LSN_FORMAT_ARGS(currentLSN))));
-	}
+		return WAIT_LSN_RESULT_TIMEOUT;
+
+	return WAIT_LSN_RESULT_SUCCESS;
 }
diff --git a/src/include/commands/waitlsn.h b/src/include/commands/waitlsn.h
index bb5ac858dcc..edf4a793e63 100644
--- a/src/include/commands/waitlsn.h
+++ b/src/include/commands/waitlsn.h
@@ -70,12 +70,22 @@ typedef struct WaitLSNState
 	WaitLSNProcInfo procInfos[FLEXIBLE_ARRAY_MEMBER];
 } WaitLSNState;
 
+/*
+ * Result statuses for WaitForLSNReplay().
+ */
+typedef enum
+{
+	WAIT_LSN_RESULT_SUCCESS, /* Target LSN is reached */
+	WAIT_LSN_RESULT_TIMEOUT, /* Timeout occured */
+	WAIT_LSN_RESULT_NOT_IN_RECOVERY, /* Recovery ended before or during our wait */
+} WaitLSNResult;
+
 extern PGDLLIMPORT WaitLSNState *waitLSNState;
 
 extern Size WaitLSNShmemSize(void);
 extern void WaitLSNShmemInit(void);
 extern void WaitLSNSetLatches(XLogRecPtr currentLSN);
 extern void WaitLSNCleanup(void);
-extern void WaitForLSNReplay(XLogRecPtr targetLSN, int64 timeout);
+extern WaitLSNResult WaitForLSNReplay(XLogRecPtr targetLSN, int64 timeout);
 
 #endif							/* WAIT_LSN_H */
-- 
2.39.5 (Apple Git-154)

v4-0001-Make-WaitForLSNReplay-issue-FATAL-on-postmaster-d.patchapplication/octet-stream; name=v4-0001-Make-WaitForLSNReplay-issue-FATAL-on-postmaster-d.patchDownload
From c7eadbc5871630ddad399f0eb3d2292d4b3f31ef Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <akorotkov@postgresql.org>
Date: Sun, 29 Sep 2024 12:29:31 +0300
Subject: [PATCH v4 1/3] Make WaitForLSNReplay() issue FATAL on postmaster
 death

Reported-by: Michael Paquier
Discussion: https://postgr.es/m/ZvY2C8N4ZqgCFaLu%40paquier.xyz
---
 src/backend/commands/waitlsn.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/src/backend/commands/waitlsn.c b/src/backend/commands/waitlsn.c
index 501938f4330..7f3cd62e2a2 100644
--- a/src/backend/commands/waitlsn.c
+++ b/src/backend/commands/waitlsn.c
@@ -222,7 +222,7 @@ WaitForLSNReplay(XLogRecPtr targetLSN, int64 timeout)
 {
 	XLogRecPtr	currentLSN;
 	TimestampTz endtime = 0;
-	int			wake_events = WL_LATCH_SET | WL_EXIT_ON_PM_DEATH;
+	int			wake_events = WL_LATCH_SET | WL_POSTMASTER_DEATH;
 
 	/* Shouldn't be called when shmem isn't initialized */
 	Assert(waitLSNState);
@@ -313,6 +313,16 @@ WaitForLSNReplay(XLogRecPtr targetLSN, int64 timeout)
 		rc = WaitLatch(MyLatch, wake_events, delay_ms,
 					   WAIT_EVENT_WAIT_FOR_WAL_REPLAY);
 
+		/*
+		 * Emergency bailout if postmaster has died.  This is to avoid the
+		 * necessity for manual cleanup of all postmaster children.
+		 */
+		if (rc & WL_POSTMASTER_DEATH)
+			ereport(FATAL,
+					(errcode(ERRCODE_ADMIN_SHUTDOWN),
+					 errmsg("terminating connection due to unexpected postmaster exit"),
+					 errcontext("while waiting for LSN replay")));
+
 		if (rc & WL_LATCH_SET)
 			ResetLatch(MyLatch);
 	}
-- 
2.39.5 (Apple Git-154)

#19Peter Eisentraut
peter@eisentraut.org
In reply to: Alexander Korotkov (#5)
Re: pgsql: Implement pg_wal_replay_wait() stored procedure

On 02.09.24 01:55, Alexander Korotkov wrote:

On Mon, Sep 2, 2024 at 2:28 AM Michael Paquier <michael@paquier.xyz> wrote:

On Sun, Sep 01, 2024 at 10:35:27PM +0300, Alexander Korotkov wrote:

This path hasn't changes since the patch revision when it was a
utility command. I agree that this doesn't look like proper path for
stored procedure. But I don't think src/backend/utils/adt is
appropriate path either, because it's not really about data type.
pg_wal_replay_wait() looks a good neighbor for
pg_wal_replay_pause()/pg_wal_replay_resume() and other WAL-related
functions. So, what about moving it to src/backend/access/transam?

Moving the new function to xlogfuncs.c while publishing
WaitForLSNReplay() makes sense to me.

Thank you for proposal. I like this.

Could you, please, check the attached patch?

We still have stuff in src/backend/commands/waitlsn.c that is nothing
like a "command". You have moved some stuff elsewhere, but what are you
planning to do with the rest?

#20Alexander Korotkov
aekorotkov@gmail.com
In reply to: Peter Eisentraut (#19)
Re: pgsql: Implement pg_wal_replay_wait() stored procedure

On Wed, Oct 16, 2024 at 10:35 PM Peter Eisentraut <peter@eisentraut.org> wrote:

On 02.09.24 01:55, Alexander Korotkov wrote:

On Mon, Sep 2, 2024 at 2:28 AM Michael Paquier <michael@paquier.xyz> wrote:

On Sun, Sep 01, 2024 at 10:35:27PM +0300, Alexander Korotkov wrote:

This path hasn't changes since the patch revision when it was a
utility command. I agree that this doesn't look like proper path for
stored procedure. But I don't think src/backend/utils/adt is
appropriate path either, because it's not really about data type.
pg_wal_replay_wait() looks a good neighbor for
pg_wal_replay_pause()/pg_wal_replay_resume() and other WAL-related
functions. So, what about moving it to src/backend/access/transam?

Moving the new function to xlogfuncs.c while publishing
WaitForLSNReplay() makes sense to me.

Thank you for proposal. I like this.

Could you, please, check the attached patch?

We still have stuff in src/backend/commands/waitlsn.c that is nothing
like a "command". You have moved some stuff elsewhere, but what are you
planning to do with the rest?

Thank you for spotting this another time. What about moving that
somewhere like src/backend/access/transam/xlogwait.c ?

------
Regards,
Alexander Korotkov
Supabase

#21Alexander Korotkov
aekorotkov@gmail.com
In reply to: Alexander Korotkov (#20)
4 attachment(s)
Re: pgsql: Implement pg_wal_replay_wait() stored procedure

On Wed, Oct 16, 2024 at 11:20 PM Alexander Korotkov
<aekorotkov@gmail.com> wrote:

On Wed, Oct 16, 2024 at 10:35 PM Peter Eisentraut <peter@eisentraut.org> wrote:

On 02.09.24 01:55, Alexander Korotkov wrote:

On Mon, Sep 2, 2024 at 2:28 AM Michael Paquier <michael@paquier.xyz> wrote:

On Sun, Sep 01, 2024 at 10:35:27PM +0300, Alexander Korotkov wrote:

This path hasn't changes since the patch revision when it was a
utility command. I agree that this doesn't look like proper path for
stored procedure. But I don't think src/backend/utils/adt is
appropriate path either, because it's not really about data type.
pg_wal_replay_wait() looks a good neighbor for
pg_wal_replay_pause()/pg_wal_replay_resume() and other WAL-related
functions. So, what about moving it to src/backend/access/transam?

Moving the new function to xlogfuncs.c while publishing
WaitForLSNReplay() makes sense to me.

Thank you for proposal. I like this.

Could you, please, check the attached patch?

We still have stuff in src/backend/commands/waitlsn.c that is nothing
like a "command". You have moved some stuff elsewhere, but what are you
planning to do with the rest?

Thank you for spotting this another time. What about moving that
somewhere like src/backend/access/transam/xlogwait.c ?

Implemented this as a separate patch (0001). Also rebased other
pending patches on that. 0004 now revises header comment of
xlogwait.c with new procedure signature.

------
Regards,
Alexander Korotkov
Supabase

Attachments:

v5-0002-Make-WaitForLSNReplay-issue-FATAL-on-postmaster-d.patchapplication/octet-stream; name=v5-0002-Make-WaitForLSNReplay-issue-FATAL-on-postmaster-d.patchDownload
From 6614c6d5dc8907f5ca5e683edab5f7703de78dfd Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <akorotkov@postgresql.org>
Date: Sun, 29 Sep 2024 12:29:31 +0300
Subject: [PATCH v5 2/4] Make WaitForLSNReplay() issue FATAL on postmaster
 death

Reported-by: Michael Paquier
Discussion: https://postgr.es/m/ZvY2C8N4ZqgCFaLu%40paquier.xyz
---
 src/backend/access/transam/xlogwait.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/src/backend/access/transam/xlogwait.c b/src/backend/access/transam/xlogwait.c
index eef58ce69ce..353b7854dc8 100644
--- a/src/backend/access/transam/xlogwait.c
+++ b/src/backend/access/transam/xlogwait.c
@@ -222,7 +222,7 @@ WaitForLSNReplay(XLogRecPtr targetLSN, int64 timeout)
 {
 	XLogRecPtr	currentLSN;
 	TimestampTz endtime = 0;
-	int			wake_events = WL_LATCH_SET | WL_EXIT_ON_PM_DEATH;
+	int			wake_events = WL_LATCH_SET | WL_POSTMASTER_DEATH;
 
 	/* Shouldn't be called when shmem isn't initialized */
 	Assert(waitLSNState);
@@ -313,6 +313,16 @@ WaitForLSNReplay(XLogRecPtr targetLSN, int64 timeout)
 		rc = WaitLatch(MyLatch, wake_events, delay_ms,
 					   WAIT_EVENT_WAIT_FOR_WAL_REPLAY);
 
+		/*
+		 * Emergency bailout if postmaster has died.  This is to avoid the
+		 * necessity for manual cleanup of all postmaster children.
+		 */
+		if (rc & WL_POSTMASTER_DEATH)
+			ereport(FATAL,
+					(errcode(ERRCODE_ADMIN_SHUTDOWN),
+					 errmsg("terminating connection due to unexpected postmaster exit"),
+					 errcontext("while waiting for LSN replay")));
+
 		if (rc & WL_LATCH_SET)
 			ResetLatch(MyLatch);
 	}
-- 
2.39.5 (Apple Git-154)

v5-0001-Move-LSN-waiting-declarations-and-definitions-to-.patchapplication/octet-stream; name=v5-0001-Move-LSN-waiting-declarations-and-definitions-to-.patchDownload
From 7fe904de12d8f00fa5535c834c63f0285017ab20 Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <akorotkov@postgresql.org>
Date: Sun, 20 Oct 2024 20:05:01 +0300
Subject: [PATCH v5 1/4] Move LSN waiting declarations and definitions to
 better place

3c5db1d6b implemented the pg_wal_replay_wait() stored procedure.  Due to
the patch development history, the implementation resided in
src/backend/commands/waitlsn.c (src/include/commands/waitlsn.h for headers).

014f9f34d moved pg_wal_replay_wait() itself to
src/backend/access/transam/xlogfuncs.c near to the WAL-manipulation functions.
But most of the implementation stayed in place.

The code in src/backend/commands/waitlsn.c has nothing to do with commands,
but is related to WAL.  So, this commit moves this code into
src/backend/access/transam/xlogwait.c (src/include/access/xlogwait.h for
headers).

Reported-by: Peter Eisentraut
Discussion: https://postgr.es/m/18c0fa64-0475-415e-a1bd-665d922c5201%40eisentraut.org
---
 src/backend/access/transam/Makefile                    |  3 ++-
 src/backend/access/transam/meson.build                 |  1 +
 src/backend/access/transam/xact.c                      |  2 +-
 src/backend/access/transam/xlog.c                      |  2 +-
 src/backend/access/transam/xlogfuncs.c                 |  2 +-
 src/backend/access/transam/xlogrecovery.c              |  2 +-
 .../{commands/waitlsn.c => access/transam/xlogwait.c}  |  6 +++---
 src/backend/commands/Makefile                          |  3 +--
 src/backend/commands/meson.build                       |  1 -
 src/backend/storage/ipc/ipci.c                         |  2 +-
 src/backend/storage/lmgr/proc.c                        |  2 +-
 src/include/{commands/waitlsn.h => access/xlogwait.h}  | 10 +++++-----
 12 files changed, 18 insertions(+), 18 deletions(-)
 rename src/backend/{commands/waitlsn.c => access/transam/xlogwait.c} (99%)
 rename src/include/{commands/waitlsn.h => access/xlogwait.h} (94%)

diff --git a/src/backend/access/transam/Makefile b/src/backend/access/transam/Makefile
index 661c55a9db7..a32f473e0a2 100644
--- a/src/backend/access/transam/Makefile
+++ b/src/backend/access/transam/Makefile
@@ -36,7 +36,8 @@ OBJS = \
 	xlogreader.o \
 	xlogrecovery.o \
 	xlogstats.o \
-	xlogutils.o
+	xlogutils.o \
+	xlogwait.o
 
 include $(top_srcdir)/src/backend/common.mk
 
diff --git a/src/backend/access/transam/meson.build b/src/backend/access/transam/meson.build
index 8a3522557cd..91d258f9df1 100644
--- a/src/backend/access/transam/meson.build
+++ b/src/backend/access/transam/meson.build
@@ -24,6 +24,7 @@ backend_sources += files(
   'xlogrecovery.c',
   'xlogstats.c',
   'xlogutils.c',
+  'xlogwait.c',
 )
 
 # used by frontend programs to build a frontend xlogreader
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 87700c7c5c7..cbeced71a02 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -31,6 +31,7 @@
 #include "access/xloginsert.h"
 #include "access/xlogrecovery.h"
 #include "access/xlogutils.h"
+#include "access/xlogwait.h"
 #include "catalog/index.h"
 #include "catalog/namespace.h"
 #include "catalog/pg_enum.h"
@@ -38,7 +39,6 @@
 #include "commands/async.h"
 #include "commands/tablecmds.h"
 #include "commands/trigger.h"
-#include "commands/waitlsn.h"
 #include "common/pg_prng.h"
 #include "executor/spi.h"
 #include "libpq/be-fsstubs.h"
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 9102c8d772e..ad9b0b612f4 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -62,11 +62,11 @@
 #include "access/xlogreader.h"
 #include "access/xlogrecovery.h"
 #include "access/xlogutils.h"
+#include "access/xlogwait.h"
 #include "backup/basebackup.h"
 #include "catalog/catversion.h"
 #include "catalog/pg_control.h"
 #include "catalog/pg_database.h"
-#include "commands/waitlsn.h"
 #include "common/controldata_utils.h"
 #include "common/file_utils.h"
 #include "executor/instrument.h"
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index 3e3d2bb6189..cbf84ef7d8f 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -22,8 +22,8 @@
 #include "access/xlog_internal.h"
 #include "access/xlogbackup.h"
 #include "access/xlogrecovery.h"
+#include "access/xlogwait.h"
 #include "catalog/pg_type.h"
-#include "commands/waitlsn.h"
 #include "funcapi.h"
 #include "miscadmin.h"
 #include "pgstat.h"
diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index 320b14add1a..31caa49d6c3 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -40,10 +40,10 @@
 #include "access/xlogreader.h"
 #include "access/xlogrecovery.h"
 #include "access/xlogutils.h"
+#include "access/xlogwait.h"
 #include "backup/basebackup.h"
 #include "catalog/pg_control.h"
 #include "commands/tablespace.h"
-#include "commands/waitlsn.h"
 #include "common/file_utils.h"
 #include "miscadmin.h"
 #include "pgstat.h"
diff --git a/src/backend/commands/waitlsn.c b/src/backend/access/transam/xlogwait.c
similarity index 99%
rename from src/backend/commands/waitlsn.c
rename to src/backend/access/transam/xlogwait.c
index 501938f4330..eef58ce69ce 100644
--- a/src/backend/commands/waitlsn.c
+++ b/src/backend/access/transam/xlogwait.c
@@ -1,13 +1,13 @@
 /*-------------------------------------------------------------------------
  *
- * waitlsn.c
+ * xlogwait.c
  *	  Implements waiting for the given replay LSN, which is used in
  *	  CALL pg_wal_replay_wait(target_lsn pg_lsn, timeout float8).
  *
  * Copyright (c) 2024, PostgreSQL Global Development Group
  *
  * IDENTIFICATION
- *	  src/backend/commands/waitlsn.c
+ *	  src/backend/access/transam/xlogwait.c
  *
  *-------------------------------------------------------------------------
  */
@@ -20,7 +20,7 @@
 #include "pgstat.h"
 #include "access/xlog.h"
 #include "access/xlogrecovery.h"
-#include "commands/waitlsn.h"
+#include "access/xlogwait.h"
 #include "funcapi.h"
 #include "miscadmin.h"
 #include "storage/latch.h"
diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile
index cede90c3b98..48f7348f91c 100644
--- a/src/backend/commands/Makefile
+++ b/src/backend/commands/Makefile
@@ -61,7 +61,6 @@ OBJS = \
 	vacuum.o \
 	vacuumparallel.o \
 	variable.o \
-	view.o \
-	waitlsn.o
+	view.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/commands/meson.build b/src/backend/commands/meson.build
index 7549be5dc3b..6dd00a4abde 100644
--- a/src/backend/commands/meson.build
+++ b/src/backend/commands/meson.build
@@ -50,5 +50,4 @@ backend_sources += files(
   'vacuumparallel.c',
   'variable.c',
   'view.c',
-  'waitlsn.c',
 )
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 10fc18f2529..9ff687045f4 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -24,8 +24,8 @@
 #include "access/twophase.h"
 #include "access/xlogprefetcher.h"
 #include "access/xlogrecovery.h"
+#include "access/xlogwait.h"
 #include "commands/async.h"
-#include "commands/waitlsn.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "postmaster/autovacuum.h"
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index eaf3916f282..260e7029f50 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -36,7 +36,7 @@
 #include "access/transam.h"
 #include "access/twophase.h"
 #include "access/xlogutils.h"
-#include "commands/waitlsn.h"
+#include "access/xlogwait.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "postmaster/autovacuum.h"
diff --git a/src/include/commands/waitlsn.h b/src/include/access/xlogwait.h
similarity index 94%
rename from src/include/commands/waitlsn.h
rename to src/include/access/xlogwait.h
index bb5ac858dcc..31e208cb7ad 100644
--- a/src/include/commands/waitlsn.h
+++ b/src/include/access/xlogwait.h
@@ -1,16 +1,16 @@
 /*-------------------------------------------------------------------------
  *
- * waitlsn.h
+ * xlogwait.h
  *	  Declarations for LSN replay waiting routines.
  *
  * Copyright (c) 2024, PostgreSQL Global Development Group
  *
- * src/include/commands/waitlsn.h
+ * src/include/access/xlogwait.h
  *
  *-------------------------------------------------------------------------
  */
-#ifndef WAIT_LSN_H
-#define WAIT_LSN_H
+#ifndef XLOG_WAIT_H
+#define XLOG_WAIT_H
 
 #include "lib/pairingheap.h"
 #include "postgres.h"
@@ -78,4 +78,4 @@ extern void WaitLSNSetLatches(XLogRecPtr currentLSN);
 extern void WaitLSNCleanup(void);
 extern void WaitForLSNReplay(XLogRecPtr targetLSN, int64 timeout);
 
-#endif							/* WAIT_LSN_H */
+#endif							/* XLOG_WAIT_H */
-- 
2.39.5 (Apple Git-154)

v5-0003-Refactor-WaitForLSNReplay-to-return-the-result-of.patchapplication/octet-stream; name=v5-0003-Refactor-WaitForLSNReplay-to-return-the-result-of.patchDownload
From 1f9c879800c0e1e591d490b1f9a1282f013d7696 Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <akorotkov@postgresql.org>
Date: Thu, 19 Sep 2024 14:48:44 +0300
Subject: [PATCH v5 3/4] Refactor WaitForLSNReplay() to return the result of
 waiting

Currently WaitForLSNReplay() immediatly throws an error if waiting for LSN
replay is not successful.  This commit teaches  WaitForLSNReplay() to return
the result of waiting, while making pg_wal_replay_wait() responsible for
throwing an appropriate error.

This is preparation for new procedure pg_wal_replay_wait_status(), which will
expose the waiting result to the user instead of throwing an error.
---
 src/backend/access/transam/xlogfuncs.c | 31 +++++++++++++++++++++++-
 src/backend/access/transam/xlogwait.c  | 33 +++++++++-----------------
 src/include/access/xlogwait.h          | 12 +++++++++-
 3 files changed, 52 insertions(+), 24 deletions(-)

diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index cbf84ef7d8f..ddca78d3717 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -759,6 +759,7 @@ pg_wal_replay_wait(PG_FUNCTION_ARGS)
 {
 	XLogRecPtr	target_lsn = PG_GETARG_LSN(0);
 	int64		timeout = PG_GETARG_INT64(1);
+	WaitLSNResult result;
 
 	if (timeout < 0)
 		ereport(ERROR,
@@ -799,7 +800,35 @@ pg_wal_replay_wait(PG_FUNCTION_ARGS)
 	 */
 	Assert(MyProc->xmin == InvalidTransactionId);
 
-	(void) WaitForLSNReplay(target_lsn, timeout);
+	result = WaitForLSNReplay(target_lsn, timeout);
+
+	/*
+	 * Process the result of WaitForLSNReplay().  Throw appropriate error if
+	 * needed.
+	 */
+	switch (result)
+	{
+		case WAIT_LSN_RESULT_SUCCESS:
+			/* Nothing to do on success */
+			break;
+
+		case WAIT_LSN_RESULT_TIMEOUT:
+			ereport(ERROR,
+					(errcode(ERRCODE_QUERY_CANCELED),
+					 errmsg("timed out while waiting for target LSN %X/%X to be replayed; current replay LSN %X/%X",
+							LSN_FORMAT_ARGS(target_lsn),
+							LSN_FORMAT_ARGS(GetXLogReplayRecPtr(NULL)))));
+			break;
+
+		case WAIT_LSN_RESULT_NOT_IN_RECOVERY:
+			ereport(ERROR,
+					(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+					 errmsg("recovery is not in progress"),
+					 errdetail("Recovery ended before replaying target LSN %X/%X; last replay LSN %X/%X.",
+							   LSN_FORMAT_ARGS(target_lsn),
+							   LSN_FORMAT_ARGS(GetXLogReplayRecPtr(NULL)))));
+			break;
+	}
 
 	PG_RETURN_VOID();
 }
diff --git a/src/backend/access/transam/xlogwait.c b/src/backend/access/transam/xlogwait.c
index 353b7854dc8..58fb10aa5a8 100644
--- a/src/backend/access/transam/xlogwait.c
+++ b/src/backend/access/transam/xlogwait.c
@@ -217,7 +217,7 @@ WaitLSNCleanup(void)
  * Wait using MyLatch till the given LSN is replayed, the postmaster dies or
  * timeout happens.
  */
-void
+WaitLSNResult
 WaitForLSNReplay(XLogRecPtr targetLSN, int64 timeout)
 {
 	XLogRecPtr	currentLSN;
@@ -240,17 +240,14 @@ WaitForLSNReplay(XLogRecPtr targetLSN, int64 timeout)
 		 * check the last replay LSN before reporting an error.
 		 */
 		if (targetLSN <= GetXLogReplayRecPtr(NULL))
-			return;
-		ereport(ERROR,
-				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
-				 errmsg("recovery is not in progress"),
-				 errhint("Waiting for LSN can only be executed during recovery.")));
+			return WAIT_LSN_RESULT_SUCCESS;
+		return WAIT_LSN_RESULT_NOT_IN_RECOVERY;
 	}
 	else
 	{
 		/* If target LSN is already replayed, exit immediately */
 		if (targetLSN <= GetXLogReplayRecPtr(NULL))
-			return;
+			return WAIT_LSN_RESULT_SUCCESS;
 	}
 
 	if (timeout > 0)
@@ -276,17 +273,13 @@ WaitForLSNReplay(XLogRecPtr targetLSN, int64 timeout)
 		{
 			/*
 			 * Recovery was ended, but recheck if target LSN was already
-			 * replayed.
+			 * replayed.  See the comment regarding deleteLSNWaiter() below.
 			 */
+			deleteLSNWaiter();
 			currentLSN = GetXLogReplayRecPtr(NULL);
 			if (targetLSN <= currentLSN)
-				return;
-			ereport(ERROR,
-					(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
-					 errmsg("recovery is not in progress"),
-					 errdetail("Recovery ended before replaying target LSN %X/%X; last replay LSN %X/%X.",
-							   LSN_FORMAT_ARGS(targetLSN),
-							   LSN_FORMAT_ARGS(currentLSN))));
+				return WAIT_LSN_RESULT_SUCCESS;
+			return WAIT_LSN_RESULT_NOT_IN_RECOVERY;
 		}
 		else
 		{
@@ -338,11 +331,7 @@ WaitForLSNReplay(XLogRecPtr targetLSN, int64 timeout)
 	 * If we didn't reach the target LSN, we must be exited by timeout.
 	 */
 	if (targetLSN > currentLSN)
-	{
-		ereport(ERROR,
-				(errcode(ERRCODE_QUERY_CANCELED),
-				 errmsg("timed out while waiting for target LSN %X/%X to be replayed; current replay LSN %X/%X",
-						LSN_FORMAT_ARGS(targetLSN),
-						LSN_FORMAT_ARGS(currentLSN))));
-	}
+		return WAIT_LSN_RESULT_TIMEOUT;
+
+	return WAIT_LSN_RESULT_SUCCESS;
 }
diff --git a/src/include/access/xlogwait.h b/src/include/access/xlogwait.h
index 31e208cb7ad..bbc67439855 100644
--- a/src/include/access/xlogwait.h
+++ b/src/include/access/xlogwait.h
@@ -70,12 +70,22 @@ typedef struct WaitLSNState
 	WaitLSNProcInfo procInfos[FLEXIBLE_ARRAY_MEMBER];
 } WaitLSNState;
 
+/*
+ * Result statuses for WaitForLSNReplay().
+ */
+typedef enum
+{
+	WAIT_LSN_RESULT_SUCCESS, /* Target LSN is reached */
+	WAIT_LSN_RESULT_TIMEOUT, /* Timeout occured */
+	WAIT_LSN_RESULT_NOT_IN_RECOVERY, /* Recovery ended before or during our wait */
+} WaitLSNResult;
+
 extern PGDLLIMPORT WaitLSNState *waitLSNState;
 
 extern Size WaitLSNShmemSize(void);
 extern void WaitLSNShmemInit(void);
 extern void WaitLSNSetLatches(XLogRecPtr currentLSN);
 extern void WaitLSNCleanup(void);
-extern void WaitForLSNReplay(XLogRecPtr targetLSN, int64 timeout);
+extern WaitLSNResult WaitForLSNReplay(XLogRecPtr targetLSN, int64 timeout);
 
 #endif							/* XLOG_WAIT_H */
-- 
2.39.5 (Apple Git-154)

v5-0004-Add-no_error-argument-to-pg_wal_replay_wait.patchapplication/octet-stream; name=v5-0004-Add-no_error-argument-to-pg_wal_replay_wait.patchDownload
From 9225fdd5791ac7538a92cbf6e7b4564fec68fddd Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <akorotkov@postgresql.org>
Date: Thu, 19 Sep 2024 15:34:18 +0300
Subject: [PATCH v5 4/4] Add 'no_error' argument to pg_wal_replay_wait()

This argument allow to skip throwing an error.  Instead the result status
can be obtained using pg_wal_replay_wait_status() function.
---
 doc/src/sgml/func.sgml                     | 56 +++++++++++++++++++---
 src/backend/access/transam/xlogfuncs.c     | 38 +++++++++++++--
 src/backend/access/transam/xlogwait.c      |  3 +-
 src/backend/catalog/system_functions.sql   |  4 +-
 src/include/catalog/pg_proc.dat            |  7 ++-
 src/test/recovery/t/043_wal_replay_wait.pl | 19 ++++++++
 6 files changed, 114 insertions(+), 13 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index ad663c94d77..1b68f284e5e 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -28989,12 +28989,15 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset
    </para>
 
    <table id="recovery-synchronization-procedure-table">
-    <title>Recovery Synchronization Procedure</title>
+    <title>Recovery Synchronization Procedure and Function</title>
     <tgroup cols="1">
      <thead>
       <row>
        <entry role="func_table_entry"><para role="func_signature">
-        Procedure
+        Procedure or Function
+       </para>
+       <para>
+        Type
        </para>
        <para>
         Description
@@ -29010,8 +29013,11 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset
         </indexterm>
         <function>pg_wal_replay_wait</function> (
           <parameter>target_lsn</parameter> <type>pg_lsn</type>,
-          <parameter>timeout</parameter> <type>bigint</type> <literal>DEFAULT</literal> <literal>0</literal>)
-        <returnvalue>void</returnvalue>
+          <parameter>timeout</parameter> <type>bigint</type> <literal>DEFAULT</literal> <literal>0</literal>,
+          <parameter>no_error</parameter> <type>bool</type> <literal>DEFAULT</literal> <literal>false</literal>)
+       </para>
+       <para>
+        Procedure
        </para>
        <para>
         Waits until recovery replays <literal>target_lsn</literal>.
@@ -29022,7 +29028,30 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset
         procedure waits until <literal>target_lsn</literal> is reached or
         the specified <parameter>timeout</parameter> has elapsed.
         On timeout, or if the server is promoted before
-        <literal>target_lsn</literal> is reached, an error is emitted.
+        <literal>target_lsn</literal> is reached, an error is emitted,
+        as soon as <parameter>no_error</parameter> is false.
+        If <parameter>no_error</parameter> is set to true, then the procedure
+        doesn't throw errors.  The last result status could be read
+        with <function>pg_wal_replay_wait_status</function>.
+       </para></entry>
+      </row>
+
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_wal_replay_wait_status</primary>
+        </indexterm>
+        <function>pg_wal_replay_wait_status</function> ()
+        <returnvalue>text</returnvalue>
+       </para>
+       <para>
+        Function
+       </para>
+       <para>
+        Returns the last result status for
+        <function>pg_wal_replay_wait</function> procedure.  The possible
+        values are <literal>success</literal>, <literal>timeout</literal>,
+        and <literal>not in recovery</literal>.
        </para></entry>
       </row>
      </tbody>
@@ -29044,7 +29073,8 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset
    <para>
     <function>pg_wal_replay_wait</function> should be called on standby.
     If a user calls <function>pg_wal_replay_wait</function> on primary, it
-    will error out.  However, if <function>pg_wal_replay_wait</function> is
+    will error out as soon as <parameter>no_error</parameter> is false.
+    However, if <function>pg_wal_replay_wait</function> is
     called on primary promoted from standby and <literal>target_lsn</literal>
     was already replayed, then <function>pg_wal_replay_wait</function> just
     exits immediately.
@@ -29090,6 +29120,20 @@ postgres=# CALL pg_wal_replay_wait('0/306EE20', 100);
 ERROR:  timed out while waiting for target LSN 0/306EE20 to be replayed; current replay LSN 0/306EA60
     </programlisting>
 
+   The same example uses <function>pg_wal_replay_wait</function> with
+   <parameter>no_error</parameter> set to true.  In this case, the result
+   status must be read with <function>pg_wal_replay_wait_status</function>.
+
+   <programlisting>
+postgres=# CALL pg_wal_replay_wait('0/306EE20', 100, true);
+CALL
+postgres=# SELECT pg_wal_replay_wait_status();
+ pg_wal_replay_wait_status
+---------------------------
+ timeout
+(1 row)
+    </programlisting>
+
    </para>
 
    <para>
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index ddca78d3717..c46271f0364 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -751,15 +751,18 @@ pg_promote(PG_FUNCTION_ARGS)
 	PG_RETURN_BOOL(false);
 }
 
+static WaitLSNResult lastWaitLSNResult = WAIT_LSN_RESULT_SUCCESS;
+
 /*
- * Waits until recovery replays the target LSN with optional timeout.
+ * Waits until recovery replays the target LSN with optional timeout.  Throw
+ * an error on failure.
  */
 Datum
 pg_wal_replay_wait(PG_FUNCTION_ARGS)
 {
 	XLogRecPtr	target_lsn = PG_GETARG_LSN(0);
 	int64		timeout = PG_GETARG_INT64(1);
-	WaitLSNResult result;
+	bool		no_error = PG_GETARG_BOOL(2);
 
 	if (timeout < 0)
 		ereport(ERROR,
@@ -800,13 +803,16 @@ pg_wal_replay_wait(PG_FUNCTION_ARGS)
 	 */
 	Assert(MyProc->xmin == InvalidTransactionId);
 
-	result = WaitForLSNReplay(target_lsn, timeout);
+	lastWaitLSNResult = WaitForLSNReplay(target_lsn, timeout);
+
+	if (no_error)
+		PG_RETURN_VOID();
 
 	/*
 	 * Process the result of WaitForLSNReplay().  Throw appropriate error if
 	 * needed.
 	 */
-	switch (result)
+	switch (lastWaitLSNResult)
 	{
 		case WAIT_LSN_RESULT_SUCCESS:
 			/* Nothing to do on success */
@@ -832,3 +838,27 @@ pg_wal_replay_wait(PG_FUNCTION_ARGS)
 
 	PG_RETURN_VOID();
 }
+
+Datum
+pg_wal_replay_wait_status(PG_FUNCTION_ARGS)
+{
+	const char *result_string = "";
+
+	/* Process the result of WaitForLSNReplay(). */
+	switch (lastWaitLSNResult)
+	{
+		case WAIT_LSN_RESULT_SUCCESS:
+			result_string = "success";
+			break;
+
+		case WAIT_LSN_RESULT_TIMEOUT:
+			result_string = "timeout";
+			break;
+
+		case WAIT_LSN_RESULT_NOT_IN_RECOVERY:
+			result_string = "not in recovery";
+			break;
+	}
+
+	PG_RETURN_TEXT_P(cstring_to_text(result_string));
+}
diff --git a/src/backend/access/transam/xlogwait.c b/src/backend/access/transam/xlogwait.c
index 58fb10aa5a8..8860a9c73da 100644
--- a/src/backend/access/transam/xlogwait.c
+++ b/src/backend/access/transam/xlogwait.c
@@ -2,7 +2,8 @@
  *
  * xlogwait.c
  *	  Implements waiting for the given replay LSN, which is used in
- *	  CALL pg_wal_replay_wait(target_lsn pg_lsn, timeout float8).
+ *	  CALL pg_wal_replay_wait(target_lsn pg_lsn,
+ *							  timeout float8, no_error bool).
  *
  * Copyright (c) 2024, PostgreSQL Global Development Group
  *
diff --git a/src/backend/catalog/system_functions.sql b/src/backend/catalog/system_functions.sql
index 92b6aefe7aa..3fd1dfa945e 100644
--- a/src/backend/catalog/system_functions.sql
+++ b/src/backend/catalog/system_functions.sql
@@ -414,7 +414,9 @@ CREATE OR REPLACE FUNCTION
   json_populate_recordset(base anyelement, from_json json, use_json_as_text boolean DEFAULT false)
   RETURNS SETOF anyelement LANGUAGE internal STABLE ROWS 100  AS 'json_populate_recordset' PARALLEL SAFE;
 
-CREATE OR REPLACE PROCEDURE pg_wal_replay_wait(target_lsn pg_lsn, timeout int8 DEFAULT 0)
+CREATE OR REPLACE PROCEDURE pg_wal_replay_wait(target_lsn pg_lsn,
+											   timeout int8 DEFAULT 0,
+											   no_error bool DEFAULT false)
   LANGUAGE internal AS 'pg_wal_replay_wait';
 
 CREATE OR REPLACE FUNCTION pg_logical_slot_get_changes(
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 7c0b74fe055..4c8adab567f 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -6665,8 +6665,13 @@
 { oid => '8593',
   descr => 'wait for the target LSN to be replayed on standby with an optional timeout',
   proname => 'pg_wal_replay_wait', prokind => 'p', prorettype => 'void',
-  proargtypes => 'pg_lsn int8', proargnames => '{target_lsn,timeout}',
+  proargtypes => 'pg_lsn int8 bool', proargnames => '{target_lsn,timeout,no_error}',
   prosrc => 'pg_wal_replay_wait' },
+{ oid => '8594',
+  descr => 'the last result for pg_wal_replay_wait()',
+  proname => 'pg_wal_replay_wait_status', prorettype => 'text',
+  proargtypes => '',
+  prosrc => 'pg_wal_replay_wait_status' },
 
 { oid => '6224', descr => 'get resource managers loaded in system',
   proname => 'pg_get_wal_resource_managers', prorows => '50', proretset => 't',
diff --git a/src/test/recovery/t/043_wal_replay_wait.pl b/src/test/recovery/t/043_wal_replay_wait.pl
index 024f1fe6488..7766010ab79 100644
--- a/src/test/recovery/t/043_wal_replay_wait.pl
+++ b/src/test/recovery/t/043_wal_replay_wait.pl
@@ -77,6 +77,18 @@ $node_standby->psql(
 ok( $stderr =~ /timed out while waiting for target LSN/,
 	"get timeout on waiting for unreachable LSN");
 
+$output = $node_standby->safe_psql('postgres', qq[
+	CALL pg_wal_replay_wait('${lsn2}', 10, true);
+	SELECT pg_wal_replay_wait_status();]);
+ok( $output == "success",
+	"pg_wal_replay_wait_status() returns correct status after successful waiting");
+$output = $node_standby->psql(
+	'postgres',	qq[
+	CALL pg_wal_replay_wait('${lsn2}', 10, true);
+	SELECT pg_wal_replay_wait_status();]);
+ok( $output == "timeout",
+	"pg_wal_replay_wait_status() returns correct status after timeout");
+
 # 4. Check that pg_wal_replay_wait() triggers an error if called on primary,
 # within another function, or inside a transaction with an isolation level
 # higher than READ COMMITTED.
@@ -193,6 +205,13 @@ $node_standby->safe_psql('postgres', "CALL pg_wal_replay_wait('${lsn5}');");
 
 ok(1, 'wait for already replayed LSN exits immediately even after promotion');
 
+$output = $node_standby->psql(
+	'postgres',	qq[
+	CALL pg_wal_replay_wait('${lsn4}', 10, true);
+	SELECT pg_wal_replay_wait_status();]);
+ok( $output == "not in recovery",
+	"pg_wal_replay_wait_status() returns correct status after standby promotion");
+
 $node_standby->stop;
 $node_primary->stop;
 
-- 
2.39.5 (Apple Git-154)

#22Pavel Borisov
pashkin.elfe@gmail.com
In reply to: Alexander Korotkov (#21)
Re: pgsql: Implement pg_wal_replay_wait() stored procedure

Hi, Alexander!

On Tue, 22 Oct 2024 at 13:26, Alexander Korotkov <aekorotkov@gmail.com>
wrote:

On Wed, Oct 16, 2024 at 11:20 PM Alexander Korotkov
<aekorotkov@gmail.com> wrote:

On Wed, Oct 16, 2024 at 10:35 PM Peter Eisentraut <peter@eisentraut.org>

wrote:

On 02.09.24 01:55, Alexander Korotkov wrote:

On Mon, Sep 2, 2024 at 2:28 AM Michael Paquier <michael@paquier.xyz>

wrote:

On Sun, Sep 01, 2024 at 10:35:27PM +0300, Alexander Korotkov wrote:

This path hasn't changes since the patch revision when it was a
utility command. I agree that this doesn't look like proper path

for

stored procedure. But I don't think src/backend/utils/adt is
appropriate path either, because it's not really about data type.
pg_wal_replay_wait() looks a good neighbor for
pg_wal_replay_pause()/pg_wal_replay_resume() and other WAL-related
functions. So, what about moving it to src/backend/access/transam?

Moving the new function to xlogfuncs.c while publishing
WaitForLSNReplay() makes sense to me.

Thank you for proposal. I like this.

Could you, please, check the attached patch?

We still have stuff in src/backend/commands/waitlsn.c that is nothing
like a "command". You have moved some stuff elsewhere, but what are

you

planning to do with the rest?

Thank you for spotting this another time. What about moving that
somewhere like src/backend/access/transam/xlogwait.c ?

Implemented this as a separate patch (0001). Also rebased other
pending patches on that. 0004 now revises header comment of
xlogwait.c with new procedure signature.

I've looked at v5 of a patchset.

0001:
Looks completely ok.

0002:

As stated in latch.c

- WL_POSTMASTER_DEATH: Wait for postmaster to die
- WL_EXIT_ON_PM_DEATH: Exit immediately if the postmaster dies

* wakeEvents must include either WL_EXIT_ON_PM_DEATH for automatic exit
* if the postmaster dies or WL_POSTMASTER_DEATH for a flag set in the
* return value if the postmaster dies

It's not completely clear to me if these comments need some clarification
(not related to the patchset), or if we should look for WL_EXIT_ON_PM_DEATH
for immediately FATAL. Or waiting for postmaster to die on
WL_POSTMASTER_DEATH instead of just fatal immediately?

0003:
Besides refactor it looks that deleteLSNWaiter() is added
in WaitForLSNReplay. Maybe it's worth mentioning in the commit message.
Maybe refactoring for loop for assigning result variable and breaking a
loop instead of immediate return would look better and lead to natural call
of after the loop before returning.

0004:

Comment:
+ * Waits until recovery replays the target LSN with optional timeout.
Throw
+ * an error on failure.
may need mentioning "Unless no_error provided throws an error on failure"

Otherwise the patch looks good.

Regards,
Pavel Borisov
Supabase

#23Pavel Borisov
pashkin.elfe@gmail.com
In reply to: Pavel Borisov (#22)
Re: pgsql: Implement pg_wal_replay_wait() stored procedure

Fix a typo of myself:

Maybe refactoring for loop for assigning result variable and breaking a
loop instead of immediate return would look better and lead to natural call
of after the loop before returning.

Maybe refactoring for loop for assigning result variable and breaking a
loop instead of immediate return would look better and lead to natural call
of *deleteLSNWaiter()* after the loop before returning.

Pavel.

#24Alexander Korotkov
aekorotkov@gmail.com
In reply to: Pavel Borisov (#22)
4 attachment(s)
Re: pgsql: Implement pg_wal_replay_wait() stored procedure

Hi, Pavel!

Thank you for your review.

On Tue, Oct 22, 2024 at 4:30 PM Pavel Borisov <pashkin.elfe@gmail.com> wrote:

On Tue, 22 Oct 2024 at 13:26, Alexander Korotkov <aekorotkov@gmail.com> wrote:

On Wed, Oct 16, 2024 at 11:20 PM Alexander Korotkov
<aekorotkov@gmail.com> wrote:

On Wed, Oct 16, 2024 at 10:35 PM Peter Eisentraut <peter@eisentraut.org> wrote:

On 02.09.24 01:55, Alexander Korotkov wrote:

On Mon, Sep 2, 2024 at 2:28 AM Michael Paquier <michael@paquier.xyz> wrote:

On Sun, Sep 01, 2024 at 10:35:27PM +0300, Alexander Korotkov wrote:

This path hasn't changes since the patch revision when it was a
utility command. I agree that this doesn't look like proper path for
stored procedure. But I don't think src/backend/utils/adt is
appropriate path either, because it's not really about data type.
pg_wal_replay_wait() looks a good neighbor for
pg_wal_replay_pause()/pg_wal_replay_resume() and other WAL-related
functions. So, what about moving it to src/backend/access/transam?

Moving the new function to xlogfuncs.c while publishing
WaitForLSNReplay() makes sense to me.

Thank you for proposal. I like this.

Could you, please, check the attached patch?

We still have stuff in src/backend/commands/waitlsn.c that is nothing
like a "command". You have moved some stuff elsewhere, but what are you
planning to do with the rest?

Thank you for spotting this another time. What about moving that
somewhere like src/backend/access/transam/xlogwait.c ?

Implemented this as a separate patch (0001). Also rebased other
pending patches on that. 0004 now revises header comment of
xlogwait.c with new procedure signature.

I've looked at v5 of a patchset.

0001:
Looks completely ok.

0002:

As stated in latch.c

- WL_POSTMASTER_DEATH: Wait for postmaster to die
- WL_EXIT_ON_PM_DEATH: Exit immediately if the postmaster dies

* wakeEvents must include either WL_EXIT_ON_PM_DEATH for automatic exit
* if the postmaster dies or WL_POSTMASTER_DEATH for a flag set in the
* return value if the postmaster dies

It's not completely clear to me if these comments need some clarification (not related to the patchset), or if we should look for WL_EXIT_ON_PM_DEATH for immediately FATAL. Or waiting for postmaster to die on WL_POSTMASTER_DEATH instead of just fatal immediately?

As I get from the code, WL_EXIT_ON_PM_DEATH cause process to just
proc_exit(1) without throwing FATAL. So, in the most of situations we
do throw FATAL after seeing WL_POSTMASTER_DEATH event. So, it's
reasonable to do the same here. But indeed, this is a question (not
related to this patch) whether WL_EXIT_ON_PM_DEATH should cause
process to throw FATAL.

0003:
Besides refactor it looks that deleteLSNWaiter() is added in WaitForLSNReplay. Maybe it's worth mentioning in the commit message.
Maybe refactoring for loop for assigning result variable and breaking a loop instead of immediate return would look better and lead to natural call of after the loop before returning.

I don't think we would get much simplicity/readability by breaking
loop instead of immediate return. However, I reflected the changes in
the commit message. Also I reflected that we don't distinguish any
more seeing !RecoveryInProgress() in different places.

0004:

Comment:
+ * Waits until recovery replays the target LSN with optional timeout.  Throw
+ * an error on failure.
may need mentioning "Unless no_error provided throws an error on failure"

Changed as you proposed.

Otherwise the patch looks good.

Thanks!

------
Regards,
Alexander Korotkov
Supabase

Attachments:

v6-0004-Add-no_error-argument-to-pg_wal_replay_wait.patchapplication/octet-stream; name=v6-0004-Add-no_error-argument-to-pg_wal_replay_wait.patchDownload
From 5962b11fb84358eb5df086a3059e64cda0a1b752 Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <akorotkov@postgresql.org>
Date: Thu, 19 Sep 2024 15:34:18 +0300
Subject: [PATCH v6 4/4] Add 'no_error' argument to pg_wal_replay_wait()

This argument allow skipping throwing an error.  Instead, the result status
can be obtained using pg_wal_replay_wait_status() function.

Catversion is bumped.

Reported-by: Michael Paquier
Discussion: https://postgr.es/m/ZtUF17gF0pNpwZDI%40paquier.xyz
Reviewed-by: Pavel Borisov
---
 doc/src/sgml/func.sgml                     | 56 +++++++++++++++++++---
 src/backend/access/transam/xlogfuncs.c     | 38 +++++++++++++--
 src/backend/access/transam/xlogwait.c      |  3 +-
 src/backend/catalog/system_functions.sql   |  4 +-
 src/include/catalog/pg_proc.dat            |  7 ++-
 src/test/recovery/t/043_wal_replay_wait.pl | 22 +++++++++
 6 files changed, 117 insertions(+), 13 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index ad663c94d77..1b68f284e5e 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -28989,12 +28989,15 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset
    </para>
 
    <table id="recovery-synchronization-procedure-table">
-    <title>Recovery Synchronization Procedure</title>
+    <title>Recovery Synchronization Procedure and Function</title>
     <tgroup cols="1">
      <thead>
       <row>
        <entry role="func_table_entry"><para role="func_signature">
-        Procedure
+        Procedure or Function
+       </para>
+       <para>
+        Type
        </para>
        <para>
         Description
@@ -29010,8 +29013,11 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset
         </indexterm>
         <function>pg_wal_replay_wait</function> (
           <parameter>target_lsn</parameter> <type>pg_lsn</type>,
-          <parameter>timeout</parameter> <type>bigint</type> <literal>DEFAULT</literal> <literal>0</literal>)
-        <returnvalue>void</returnvalue>
+          <parameter>timeout</parameter> <type>bigint</type> <literal>DEFAULT</literal> <literal>0</literal>,
+          <parameter>no_error</parameter> <type>bool</type> <literal>DEFAULT</literal> <literal>false</literal>)
+       </para>
+       <para>
+        Procedure
        </para>
        <para>
         Waits until recovery replays <literal>target_lsn</literal>.
@@ -29022,7 +29028,30 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset
         procedure waits until <literal>target_lsn</literal> is reached or
         the specified <parameter>timeout</parameter> has elapsed.
         On timeout, or if the server is promoted before
-        <literal>target_lsn</literal> is reached, an error is emitted.
+        <literal>target_lsn</literal> is reached, an error is emitted,
+        as soon as <parameter>no_error</parameter> is false.
+        If <parameter>no_error</parameter> is set to true, then the procedure
+        doesn't throw errors.  The last result status could be read
+        with <function>pg_wal_replay_wait_status</function>.
+       </para></entry>
+      </row>
+
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_wal_replay_wait_status</primary>
+        </indexterm>
+        <function>pg_wal_replay_wait_status</function> ()
+        <returnvalue>text</returnvalue>
+       </para>
+       <para>
+        Function
+       </para>
+       <para>
+        Returns the last result status for
+        <function>pg_wal_replay_wait</function> procedure.  The possible
+        values are <literal>success</literal>, <literal>timeout</literal>,
+        and <literal>not in recovery</literal>.
        </para></entry>
       </row>
      </tbody>
@@ -29044,7 +29073,8 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset
    <para>
     <function>pg_wal_replay_wait</function> should be called on standby.
     If a user calls <function>pg_wal_replay_wait</function> on primary, it
-    will error out.  However, if <function>pg_wal_replay_wait</function> is
+    will error out as soon as <parameter>no_error</parameter> is false.
+    However, if <function>pg_wal_replay_wait</function> is
     called on primary promoted from standby and <literal>target_lsn</literal>
     was already replayed, then <function>pg_wal_replay_wait</function> just
     exits immediately.
@@ -29090,6 +29120,20 @@ postgres=# CALL pg_wal_replay_wait('0/306EE20', 100);
 ERROR:  timed out while waiting for target LSN 0/306EE20 to be replayed; current replay LSN 0/306EA60
     </programlisting>
 
+   The same example uses <function>pg_wal_replay_wait</function> with
+   <parameter>no_error</parameter> set to true.  In this case, the result
+   status must be read with <function>pg_wal_replay_wait_status</function>.
+
+   <programlisting>
+postgres=# CALL pg_wal_replay_wait('0/306EE20', 100, true);
+CALL
+postgres=# SELECT pg_wal_replay_wait_status();
+ pg_wal_replay_wait_status
+---------------------------
+ timeout
+(1 row)
+    </programlisting>
+
    </para>
 
    <para>
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index ddca78d3717..bca1d395683 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -751,15 +751,18 @@ pg_promote(PG_FUNCTION_ARGS)
 	PG_RETURN_BOOL(false);
 }
 
+static WaitLSNResult lastWaitLSNResult = WAIT_LSN_RESULT_SUCCESS;
+
 /*
- * Waits until recovery replays the target LSN with optional timeout.
+ * Waits until recovery replays the target LSN with optional timeout.  Unless
+ * 'no_error' provided throws an error on failure
  */
 Datum
 pg_wal_replay_wait(PG_FUNCTION_ARGS)
 {
 	XLogRecPtr	target_lsn = PG_GETARG_LSN(0);
 	int64		timeout = PG_GETARG_INT64(1);
-	WaitLSNResult result;
+	bool		no_error = PG_GETARG_BOOL(2);
 
 	if (timeout < 0)
 		ereport(ERROR,
@@ -800,13 +803,16 @@ pg_wal_replay_wait(PG_FUNCTION_ARGS)
 	 */
 	Assert(MyProc->xmin == InvalidTransactionId);
 
-	result = WaitForLSNReplay(target_lsn, timeout);
+	lastWaitLSNResult = WaitForLSNReplay(target_lsn, timeout);
+
+	if (no_error)
+		PG_RETURN_VOID();
 
 	/*
 	 * Process the result of WaitForLSNReplay().  Throw appropriate error if
 	 * needed.
 	 */
-	switch (result)
+	switch (lastWaitLSNResult)
 	{
 		case WAIT_LSN_RESULT_SUCCESS:
 			/* Nothing to do on success */
@@ -832,3 +838,27 @@ pg_wal_replay_wait(PG_FUNCTION_ARGS)
 
 	PG_RETURN_VOID();
 }
+
+Datum
+pg_wal_replay_wait_status(PG_FUNCTION_ARGS)
+{
+	const char *result_string = "";
+
+	/* Process the result of WaitForLSNReplay(). */
+	switch (lastWaitLSNResult)
+	{
+		case WAIT_LSN_RESULT_SUCCESS:
+			result_string = "success";
+			break;
+
+		case WAIT_LSN_RESULT_TIMEOUT:
+			result_string = "timeout";
+			break;
+
+		case WAIT_LSN_RESULT_NOT_IN_RECOVERY:
+			result_string = "not in recovery";
+			break;
+	}
+
+	PG_RETURN_TEXT_P(cstring_to_text(result_string));
+}
diff --git a/src/backend/access/transam/xlogwait.c b/src/backend/access/transam/xlogwait.c
index 58fb10aa5a8..8860a9c73da 100644
--- a/src/backend/access/transam/xlogwait.c
+++ b/src/backend/access/transam/xlogwait.c
@@ -2,7 +2,8 @@
  *
  * xlogwait.c
  *	  Implements waiting for the given replay LSN, which is used in
- *	  CALL pg_wal_replay_wait(target_lsn pg_lsn, timeout float8).
+ *	  CALL pg_wal_replay_wait(target_lsn pg_lsn,
+ *							  timeout float8, no_error bool).
  *
  * Copyright (c) 2024, PostgreSQL Global Development Group
  *
diff --git a/src/backend/catalog/system_functions.sql b/src/backend/catalog/system_functions.sql
index 92b6aefe7aa..3fd1dfa945e 100644
--- a/src/backend/catalog/system_functions.sql
+++ b/src/backend/catalog/system_functions.sql
@@ -414,7 +414,9 @@ CREATE OR REPLACE FUNCTION
   json_populate_recordset(base anyelement, from_json json, use_json_as_text boolean DEFAULT false)
   RETURNS SETOF anyelement LANGUAGE internal STABLE ROWS 100  AS 'json_populate_recordset' PARALLEL SAFE;
 
-CREATE OR REPLACE PROCEDURE pg_wal_replay_wait(target_lsn pg_lsn, timeout int8 DEFAULT 0)
+CREATE OR REPLACE PROCEDURE pg_wal_replay_wait(target_lsn pg_lsn,
+											   timeout int8 DEFAULT 0,
+											   no_error bool DEFAULT false)
   LANGUAGE internal AS 'pg_wal_replay_wait';
 
 CREATE OR REPLACE FUNCTION pg_logical_slot_get_changes(
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 7c0b74fe055..4c8adab567f 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -6665,8 +6665,13 @@
 { oid => '8593',
   descr => 'wait for the target LSN to be replayed on standby with an optional timeout',
   proname => 'pg_wal_replay_wait', prokind => 'p', prorettype => 'void',
-  proargtypes => 'pg_lsn int8', proargnames => '{target_lsn,timeout}',
+  proargtypes => 'pg_lsn int8 bool', proargnames => '{target_lsn,timeout,no_error}',
   prosrc => 'pg_wal_replay_wait' },
+{ oid => '8594',
+  descr => 'the last result for pg_wal_replay_wait()',
+  proname => 'pg_wal_replay_wait_status', prorettype => 'text',
+  proargtypes => '',
+  prosrc => 'pg_wal_replay_wait_status' },
 
 { oid => '6224', descr => 'get resource managers loaded in system',
   proname => 'pg_get_wal_resource_managers', prorows => '50', proretset => 't',
diff --git a/src/test/recovery/t/043_wal_replay_wait.pl b/src/test/recovery/t/043_wal_replay_wait.pl
index cf77a9eec70..74769acd5e0 100644
--- a/src/test/recovery/t/043_wal_replay_wait.pl
+++ b/src/test/recovery/t/043_wal_replay_wait.pl
@@ -77,6 +77,20 @@ $node_standby->psql(
 ok( $stderr =~ /timed out while waiting for target LSN/,
 	"get timeout on waiting for unreachable LSN");
 
+$output = $node_standby->safe_psql(
+	'postgres', qq[
+	CALL pg_wal_replay_wait('${lsn2}', 10, true);
+	SELECT pg_wal_replay_wait_status();]);
+ok( $output == "success",
+	"pg_wal_replay_wait_status() returns correct status after successful waiting"
+);
+$output = $node_standby->psql(
+	'postgres', qq[
+	CALL pg_wal_replay_wait('${lsn2}', 10, true);
+	SELECT pg_wal_replay_wait_status();]);
+ok($output == "timeout",
+	"pg_wal_replay_wait_status() returns correct status after timeout");
+
 # 4. Check that pg_wal_replay_wait() triggers an error if called on primary,
 # within another function, or inside a transaction with an isolation level
 # higher than READ COMMITTED.
@@ -193,6 +207,14 @@ $node_standby->safe_psql('postgres', "CALL pg_wal_replay_wait('${lsn5}');");
 
 ok(1, 'wait for already replayed LSN exits immediately even after promotion');
 
+$output = $node_standby->psql(
+	'postgres', qq[
+	CALL pg_wal_replay_wait('${lsn4}', 10, true);
+	SELECT pg_wal_replay_wait_status();]);
+ok( $output == "not in recovery",
+	"pg_wal_replay_wait_status() returns correct status after standby promotion"
+);
+
 $node_standby->stop;
 $node_primary->stop;
 
-- 
2.39.5 (Apple Git-154)

v6-0001-Move-LSN-waiting-declarations-and-definitions-to-.patchapplication/octet-stream; name=v6-0001-Move-LSN-waiting-declarations-and-definitions-to-.patchDownload
From 7cd71a5606aae9718724c6420030d944d339380f Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <akorotkov@postgresql.org>
Date: Tue, 22 Oct 2024 22:45:07 +0300
Subject: [PATCH v6 1/4] Move LSN waiting declarations and definitions to
 better place

3c5db1d6b implemented the pg_wal_replay_wait() stored procedure.  Due to
the patch development history, the implementation resided in
src/backend/commands/waitlsn.c (src/include/commands/waitlsn.h for headers).

014f9f34d moved pg_wal_replay_wait() itself to
src/backend/access/transam/xlogfuncs.c near to the WAL-manipulation functions.
But most of the implementation stayed in place.

The code in src/backend/commands/waitlsn.c has nothing to do with commands,
but is related to WAL.  So, this commit moves this code into
src/backend/access/transam/xlogwait.c (src/include/access/xlogwait.h for
headers).

Reported-by: Peter Eisentraut
Discussion: https://postgr.es/m/18c0fa64-0475-415e-a1bd-665d922c5201%40eisentraut.org
Reviewed-by: Pavel Borisov
---
 src/backend/access/transam/Makefile                    |  3 ++-
 src/backend/access/transam/meson.build                 |  1 +
 src/backend/access/transam/xact.c                      |  2 +-
 src/backend/access/transam/xlog.c                      |  2 +-
 src/backend/access/transam/xlogfuncs.c                 |  2 +-
 src/backend/access/transam/xlogrecovery.c              |  2 +-
 .../{commands/waitlsn.c => access/transam/xlogwait.c}  |  6 +++---
 src/backend/commands/Makefile                          |  3 +--
 src/backend/commands/meson.build                       |  1 -
 src/backend/storage/ipc/ipci.c                         |  2 +-
 src/backend/storage/lmgr/proc.c                        |  2 +-
 src/include/{commands/waitlsn.h => access/xlogwait.h}  | 10 +++++-----
 12 files changed, 18 insertions(+), 18 deletions(-)
 rename src/backend/{commands/waitlsn.c => access/transam/xlogwait.c} (99%)
 rename src/include/{commands/waitlsn.h => access/xlogwait.h} (94%)

diff --git a/src/backend/access/transam/Makefile b/src/backend/access/transam/Makefile
index 661c55a9db7..a32f473e0a2 100644
--- a/src/backend/access/transam/Makefile
+++ b/src/backend/access/transam/Makefile
@@ -36,7 +36,8 @@ OBJS = \
 	xlogreader.o \
 	xlogrecovery.o \
 	xlogstats.o \
-	xlogutils.o
+	xlogutils.o \
+	xlogwait.o
 
 include $(top_srcdir)/src/backend/common.mk
 
diff --git a/src/backend/access/transam/meson.build b/src/backend/access/transam/meson.build
index 8a3522557cd..91d258f9df1 100644
--- a/src/backend/access/transam/meson.build
+++ b/src/backend/access/transam/meson.build
@@ -24,6 +24,7 @@ backend_sources += files(
   'xlogrecovery.c',
   'xlogstats.c',
   'xlogutils.c',
+  'xlogwait.c',
 )
 
 # used by frontend programs to build a frontend xlogreader
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 87700c7c5c7..cbeced71a02 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -31,6 +31,7 @@
 #include "access/xloginsert.h"
 #include "access/xlogrecovery.h"
 #include "access/xlogutils.h"
+#include "access/xlogwait.h"
 #include "catalog/index.h"
 #include "catalog/namespace.h"
 #include "catalog/pg_enum.h"
@@ -38,7 +39,6 @@
 #include "commands/async.h"
 #include "commands/tablecmds.h"
 #include "commands/trigger.h"
-#include "commands/waitlsn.h"
 #include "common/pg_prng.h"
 #include "executor/spi.h"
 #include "libpq/be-fsstubs.h"
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 9102c8d772e..ad9b0b612f4 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -62,11 +62,11 @@
 #include "access/xlogreader.h"
 #include "access/xlogrecovery.h"
 #include "access/xlogutils.h"
+#include "access/xlogwait.h"
 #include "backup/basebackup.h"
 #include "catalog/catversion.h"
 #include "catalog/pg_control.h"
 #include "catalog/pg_database.h"
-#include "commands/waitlsn.h"
 #include "common/controldata_utils.h"
 #include "common/file_utils.h"
 #include "executor/instrument.h"
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index 3e3d2bb6189..cbf84ef7d8f 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -22,8 +22,8 @@
 #include "access/xlog_internal.h"
 #include "access/xlogbackup.h"
 #include "access/xlogrecovery.h"
+#include "access/xlogwait.h"
 #include "catalog/pg_type.h"
-#include "commands/waitlsn.h"
 #include "funcapi.h"
 #include "miscadmin.h"
 #include "pgstat.h"
diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index 320b14add1a..31caa49d6c3 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -40,10 +40,10 @@
 #include "access/xlogreader.h"
 #include "access/xlogrecovery.h"
 #include "access/xlogutils.h"
+#include "access/xlogwait.h"
 #include "backup/basebackup.h"
 #include "catalog/pg_control.h"
 #include "commands/tablespace.h"
-#include "commands/waitlsn.h"
 #include "common/file_utils.h"
 #include "miscadmin.h"
 #include "pgstat.h"
diff --git a/src/backend/commands/waitlsn.c b/src/backend/access/transam/xlogwait.c
similarity index 99%
rename from src/backend/commands/waitlsn.c
rename to src/backend/access/transam/xlogwait.c
index 501938f4330..eef58ce69ce 100644
--- a/src/backend/commands/waitlsn.c
+++ b/src/backend/access/transam/xlogwait.c
@@ -1,13 +1,13 @@
 /*-------------------------------------------------------------------------
  *
- * waitlsn.c
+ * xlogwait.c
  *	  Implements waiting for the given replay LSN, which is used in
  *	  CALL pg_wal_replay_wait(target_lsn pg_lsn, timeout float8).
  *
  * Copyright (c) 2024, PostgreSQL Global Development Group
  *
  * IDENTIFICATION
- *	  src/backend/commands/waitlsn.c
+ *	  src/backend/access/transam/xlogwait.c
  *
  *-------------------------------------------------------------------------
  */
@@ -20,7 +20,7 @@
 #include "pgstat.h"
 #include "access/xlog.h"
 #include "access/xlogrecovery.h"
-#include "commands/waitlsn.h"
+#include "access/xlogwait.h"
 #include "funcapi.h"
 #include "miscadmin.h"
 #include "storage/latch.h"
diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile
index cede90c3b98..48f7348f91c 100644
--- a/src/backend/commands/Makefile
+++ b/src/backend/commands/Makefile
@@ -61,7 +61,6 @@ OBJS = \
 	vacuum.o \
 	vacuumparallel.o \
 	variable.o \
-	view.o \
-	waitlsn.o
+	view.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/commands/meson.build b/src/backend/commands/meson.build
index 7549be5dc3b..6dd00a4abde 100644
--- a/src/backend/commands/meson.build
+++ b/src/backend/commands/meson.build
@@ -50,5 +50,4 @@ backend_sources += files(
   'vacuumparallel.c',
   'variable.c',
   'view.c',
-  'waitlsn.c',
 )
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 10fc18f2529..9ff687045f4 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -24,8 +24,8 @@
 #include "access/twophase.h"
 #include "access/xlogprefetcher.h"
 #include "access/xlogrecovery.h"
+#include "access/xlogwait.h"
 #include "commands/async.h"
-#include "commands/waitlsn.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "postmaster/autovacuum.h"
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index eaf3916f282..260e7029f50 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -36,7 +36,7 @@
 #include "access/transam.h"
 #include "access/twophase.h"
 #include "access/xlogutils.h"
-#include "commands/waitlsn.h"
+#include "access/xlogwait.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "postmaster/autovacuum.h"
diff --git a/src/include/commands/waitlsn.h b/src/include/access/xlogwait.h
similarity index 94%
rename from src/include/commands/waitlsn.h
rename to src/include/access/xlogwait.h
index bb5ac858dcc..31e208cb7ad 100644
--- a/src/include/commands/waitlsn.h
+++ b/src/include/access/xlogwait.h
@@ -1,16 +1,16 @@
 /*-------------------------------------------------------------------------
  *
- * waitlsn.h
+ * xlogwait.h
  *	  Declarations for LSN replay waiting routines.
  *
  * Copyright (c) 2024, PostgreSQL Global Development Group
  *
- * src/include/commands/waitlsn.h
+ * src/include/access/xlogwait.h
  *
  *-------------------------------------------------------------------------
  */
-#ifndef WAIT_LSN_H
-#define WAIT_LSN_H
+#ifndef XLOG_WAIT_H
+#define XLOG_WAIT_H
 
 #include "lib/pairingheap.h"
 #include "postgres.h"
@@ -78,4 +78,4 @@ extern void WaitLSNSetLatches(XLogRecPtr currentLSN);
 extern void WaitLSNCleanup(void);
 extern void WaitForLSNReplay(XLogRecPtr targetLSN, int64 timeout);
 
-#endif							/* WAIT_LSN_H */
+#endif							/* XLOG_WAIT_H */
-- 
2.39.5 (Apple Git-154)

v6-0002-Make-WaitForLSNReplay-issue-FATAL-on-postmaster-d.patchapplication/octet-stream; name=v6-0002-Make-WaitForLSNReplay-issue-FATAL-on-postmaster-d.patchDownload
From 0374929e28b8f90f44fef97aa3451aa875bad1ac Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <akorotkov@postgresql.org>
Date: Tue, 22 Oct 2024 22:45:36 +0300
Subject: [PATCH v6 2/4] Make WaitForLSNReplay() issue FATAL on postmaster
 death

Reported-by: Michael Paquier
Discussion: https://postgr.es/m/ZvY2C8N4ZqgCFaLu%40paquier.xyz
Reviewed-by: Pavel Borisov
---
 src/backend/access/transam/xlogwait.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/src/backend/access/transam/xlogwait.c b/src/backend/access/transam/xlogwait.c
index eef58ce69ce..353b7854dc8 100644
--- a/src/backend/access/transam/xlogwait.c
+++ b/src/backend/access/transam/xlogwait.c
@@ -222,7 +222,7 @@ WaitForLSNReplay(XLogRecPtr targetLSN, int64 timeout)
 {
 	XLogRecPtr	currentLSN;
 	TimestampTz endtime = 0;
-	int			wake_events = WL_LATCH_SET | WL_EXIT_ON_PM_DEATH;
+	int			wake_events = WL_LATCH_SET | WL_POSTMASTER_DEATH;
 
 	/* Shouldn't be called when shmem isn't initialized */
 	Assert(waitLSNState);
@@ -313,6 +313,16 @@ WaitForLSNReplay(XLogRecPtr targetLSN, int64 timeout)
 		rc = WaitLatch(MyLatch, wake_events, delay_ms,
 					   WAIT_EVENT_WAIT_FOR_WAL_REPLAY);
 
+		/*
+		 * Emergency bailout if postmaster has died.  This is to avoid the
+		 * necessity for manual cleanup of all postmaster children.
+		 */
+		if (rc & WL_POSTMASTER_DEATH)
+			ereport(FATAL,
+					(errcode(ERRCODE_ADMIN_SHUTDOWN),
+					 errmsg("terminating connection due to unexpected postmaster exit"),
+					 errcontext("while waiting for LSN replay")));
+
 		if (rc & WL_LATCH_SET)
 			ResetLatch(MyLatch);
 	}
-- 
2.39.5 (Apple Git-154)

v6-0003-Refactor-WaitForLSNReplay-to-return-the-result-of.patchapplication/octet-stream; name=v6-0003-Refactor-WaitForLSNReplay-to-return-the-result-of.patchDownload
From 021598e60f3dd751c6a3950f1b7ab36658c84e37 Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <akorotkov@postgresql.org>
Date: Tue, 22 Oct 2024 23:02:16 +0300
Subject: [PATCH v6 3/4] Refactor WaitForLSNReplay() to return the result of
 waiting

Currently, WaitForLSNReplay() immediately throws an error if waiting for LSN
replay is not successful.  This commit teaches  WaitForLSNReplay() to return
the result of waiting, while making pg_wal_replay_wait() responsible for
throwing an appropriate error.

This is preparation to adding 'no_error' argument to pg_wal_replay_wait() and
new function pg_wal_replay_wait_status(), which returns the last wait result
status.

Additionally, we stop distinguishing situations when we find our instance to
be not in a recovery state before entering the waiting loop and inside
the waiting loop.  Standby promotion may happen at any moment, even between
issuing a procedure call statement and pg_wal_replay_wait() doing a first
check of recovery status.  Thus, there is no pointing distinguishing these
situations.

Also, since we may exit the waiting loop and see our instance not in recovery
without throwing an error, we need to deleteLSNWaiter() in that case. We do
this unconditionally for the sake of simplicity, even if standby was already
promoted after reaching the target LSN, the startup process surely already
deleted us.

Reported-by: Michael Paquier
Discussion: https://postgr.es/m/ZtUF17gF0pNpwZDI%40paquier.xyz
Reviewed-by: Michael Paquier, Pavel Borisov
---
 src/backend/access/transam/xlogfuncs.c | 31 +++++++++++++++++++++++-
 src/backend/access/transam/xlogwait.c  | 33 +++++++++-----------------
 src/include/access/xlogwait.h          | 13 +++++++++-
 src/tools/pgindent/typedefs.list       |  1 +
 4 files changed, 54 insertions(+), 24 deletions(-)

diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index cbf84ef7d8f..ddca78d3717 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -759,6 +759,7 @@ pg_wal_replay_wait(PG_FUNCTION_ARGS)
 {
 	XLogRecPtr	target_lsn = PG_GETARG_LSN(0);
 	int64		timeout = PG_GETARG_INT64(1);
+	WaitLSNResult result;
 
 	if (timeout < 0)
 		ereport(ERROR,
@@ -799,7 +800,35 @@ pg_wal_replay_wait(PG_FUNCTION_ARGS)
 	 */
 	Assert(MyProc->xmin == InvalidTransactionId);
 
-	(void) WaitForLSNReplay(target_lsn, timeout);
+	result = WaitForLSNReplay(target_lsn, timeout);
+
+	/*
+	 * Process the result of WaitForLSNReplay().  Throw appropriate error if
+	 * needed.
+	 */
+	switch (result)
+	{
+		case WAIT_LSN_RESULT_SUCCESS:
+			/* Nothing to do on success */
+			break;
+
+		case WAIT_LSN_RESULT_TIMEOUT:
+			ereport(ERROR,
+					(errcode(ERRCODE_QUERY_CANCELED),
+					 errmsg("timed out while waiting for target LSN %X/%X to be replayed; current replay LSN %X/%X",
+							LSN_FORMAT_ARGS(target_lsn),
+							LSN_FORMAT_ARGS(GetXLogReplayRecPtr(NULL)))));
+			break;
+
+		case WAIT_LSN_RESULT_NOT_IN_RECOVERY:
+			ereport(ERROR,
+					(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+					 errmsg("recovery is not in progress"),
+					 errdetail("Recovery ended before replaying target LSN %X/%X; last replay LSN %X/%X.",
+							   LSN_FORMAT_ARGS(target_lsn),
+							   LSN_FORMAT_ARGS(GetXLogReplayRecPtr(NULL)))));
+			break;
+	}
 
 	PG_RETURN_VOID();
 }
diff --git a/src/backend/access/transam/xlogwait.c b/src/backend/access/transam/xlogwait.c
index 353b7854dc8..58fb10aa5a8 100644
--- a/src/backend/access/transam/xlogwait.c
+++ b/src/backend/access/transam/xlogwait.c
@@ -217,7 +217,7 @@ WaitLSNCleanup(void)
  * Wait using MyLatch till the given LSN is replayed, the postmaster dies or
  * timeout happens.
  */
-void
+WaitLSNResult
 WaitForLSNReplay(XLogRecPtr targetLSN, int64 timeout)
 {
 	XLogRecPtr	currentLSN;
@@ -240,17 +240,14 @@ WaitForLSNReplay(XLogRecPtr targetLSN, int64 timeout)
 		 * check the last replay LSN before reporting an error.
 		 */
 		if (targetLSN <= GetXLogReplayRecPtr(NULL))
-			return;
-		ereport(ERROR,
-				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
-				 errmsg("recovery is not in progress"),
-				 errhint("Waiting for LSN can only be executed during recovery.")));
+			return WAIT_LSN_RESULT_SUCCESS;
+		return WAIT_LSN_RESULT_NOT_IN_RECOVERY;
 	}
 	else
 	{
 		/* If target LSN is already replayed, exit immediately */
 		if (targetLSN <= GetXLogReplayRecPtr(NULL))
-			return;
+			return WAIT_LSN_RESULT_SUCCESS;
 	}
 
 	if (timeout > 0)
@@ -276,17 +273,13 @@ WaitForLSNReplay(XLogRecPtr targetLSN, int64 timeout)
 		{
 			/*
 			 * Recovery was ended, but recheck if target LSN was already
-			 * replayed.
+			 * replayed.  See the comment regarding deleteLSNWaiter() below.
 			 */
+			deleteLSNWaiter();
 			currentLSN = GetXLogReplayRecPtr(NULL);
 			if (targetLSN <= currentLSN)
-				return;
-			ereport(ERROR,
-					(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
-					 errmsg("recovery is not in progress"),
-					 errdetail("Recovery ended before replaying target LSN %X/%X; last replay LSN %X/%X.",
-							   LSN_FORMAT_ARGS(targetLSN),
-							   LSN_FORMAT_ARGS(currentLSN))));
+				return WAIT_LSN_RESULT_SUCCESS;
+			return WAIT_LSN_RESULT_NOT_IN_RECOVERY;
 		}
 		else
 		{
@@ -338,11 +331,7 @@ WaitForLSNReplay(XLogRecPtr targetLSN, int64 timeout)
 	 * If we didn't reach the target LSN, we must be exited by timeout.
 	 */
 	if (targetLSN > currentLSN)
-	{
-		ereport(ERROR,
-				(errcode(ERRCODE_QUERY_CANCELED),
-				 errmsg("timed out while waiting for target LSN %X/%X to be replayed; current replay LSN %X/%X",
-						LSN_FORMAT_ARGS(targetLSN),
-						LSN_FORMAT_ARGS(currentLSN))));
-	}
+		return WAIT_LSN_RESULT_TIMEOUT;
+
+	return WAIT_LSN_RESULT_SUCCESS;
 }
diff --git a/src/include/access/xlogwait.h b/src/include/access/xlogwait.h
index 31e208cb7ad..eb2260aa2ec 100644
--- a/src/include/access/xlogwait.h
+++ b/src/include/access/xlogwait.h
@@ -70,12 +70,23 @@ typedef struct WaitLSNState
 	WaitLSNProcInfo procInfos[FLEXIBLE_ARRAY_MEMBER];
 } WaitLSNState;
 
+/*
+ * Result statuses for WaitForLSNReplay().
+ */
+typedef enum
+{
+	WAIT_LSN_RESULT_SUCCESS,	/* Target LSN is reached */
+	WAIT_LSN_RESULT_TIMEOUT,	/* Timeout occurred */
+	WAIT_LSN_RESULT_NOT_IN_RECOVERY,	/* Recovery ended before or during our
+										 * wait */
+} WaitLSNResult;
+
 extern PGDLLIMPORT WaitLSNState *waitLSNState;
 
 extern Size WaitLSNShmemSize(void);
 extern void WaitLSNShmemInit(void);
 extern void WaitLSNSetLatches(XLogRecPtr currentLSN);
 extern void WaitLSNCleanup(void);
-extern void WaitForLSNReplay(XLogRecPtr targetLSN, int64 timeout);
+extern WaitLSNResult WaitForLSNReplay(XLogRecPtr targetLSN, int64 timeout);
 
 #endif							/* XLOG_WAIT_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 57de1acff3a..f1a7bdf550f 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -3125,6 +3125,7 @@ WaitEventIPC
 WaitEventSet
 WaitEventTimeout
 WaitLSNProcInfo
+WaitLSNResult
 WaitLSNState
 WaitPMResult
 WalCloseMethod
-- 
2.39.5 (Apple Git-154)

#25Pavel Borisov
pashkin.elfe@gmail.com
In reply to: Alexander Korotkov (#24)
Re: pgsql: Implement pg_wal_replay_wait() stored procedure

Hi, Alexander!

On Wed, 23 Oct 2024 at 00:12, Alexander Korotkov <aekorotkov@gmail.com>
wrote:

Hi, Pavel!

Thank you for your review.

On Tue, Oct 22, 2024 at 4:30 PM Pavel Borisov <pashkin.elfe@gmail.com>
wrote:

On Tue, 22 Oct 2024 at 13:26, Alexander Korotkov <aekorotkov@gmail.com>

wrote:

On Wed, Oct 16, 2024 at 11:20 PM Alexander Korotkov
<aekorotkov@gmail.com> wrote:

On Wed, Oct 16, 2024 at 10:35 PM Peter Eisentraut <

peter@eisentraut.org> wrote:

On 02.09.24 01:55, Alexander Korotkov wrote:

On Mon, Sep 2, 2024 at 2:28 AM Michael Paquier <

michael@paquier.xyz> wrote:

On Sun, Sep 01, 2024 at 10:35:27PM +0300, Alexander Korotkov

wrote:

This path hasn't changes since the patch revision when it was a
utility command. I agree that this doesn't look like proper

path for

stored procedure. But I don't think src/backend/utils/adt is
appropriate path either, because it's not really about data

type.

pg_wal_replay_wait() looks a good neighbor for
pg_wal_replay_pause()/pg_wal_replay_resume() and other

WAL-related

functions. So, what about moving it to

src/backend/access/transam?

Moving the new function to xlogfuncs.c while publishing
WaitForLSNReplay() makes sense to me.

Thank you for proposal. I like this.

Could you, please, check the attached patch?

We still have stuff in src/backend/commands/waitlsn.c that is

nothing

like a "command". You have moved some stuff elsewhere, but what

are you

planning to do with the rest?

Thank you for spotting this another time. What about moving that
somewhere like src/backend/access/transam/xlogwait.c ?

Implemented this as a separate patch (0001). Also rebased other
pending patches on that. 0004 now revises header comment of
xlogwait.c with new procedure signature.

I've looked at v5 of a patchset.

0002:

As stated in latch.c

- WL_POSTMASTER_DEATH: Wait for postmaster to die
- WL_EXIT_ON_PM_DEATH: Exit immediately if the postmaster dies

* wakeEvents must include either WL_EXIT_ON_PM_DEATH for automatic exit
* if the postmaster dies or WL_POSTMASTER_DEATH for a flag set in the
* return value if the postmaster dies

It's not completely clear to me if these comments need some

clarification (not related to the patchset), or if we should look for
WL_EXIT_ON_PM_DEATH for immediately FATAL. Or waiting for postmaster to die
on WL_POSTMASTER_DEATH instead of just fatal immediately?

As I get from the code, WL_EXIT_ON_PM_DEATH cause process to just
proc_exit(1) without throwing FATAL. So, in the most of situations we
do throw FATAL after seeing WL_POSTMASTER_DEATH event. So, it's
reasonable to do the same here. But indeed, this is a question (not
related to this patch) whether WL_EXIT_ON_PM_DEATH should cause
process to throw FATAL.

Libpq ends up with FATAL on WL_POSTMASTER_DEATH.
In a backend code on WL_POSTMASTER_DEATH: SyncRepWaitForLSN()
sets ProcDiePending but don't FATAL. Walsender exits proc_exit(1).
I suppose WL_POSTMASTER_DEATH expected behavior is "Do whatever you want:
wait for postmaster to die or end up immediately".
I think path 0002 is good.

I looked through patches v6 and I think they're all good now.

Regards,
Pavel Borisov
Supabase.

#26Alexander Korotkov
aekorotkov@gmail.com
In reply to: Pavel Borisov (#25)
Re: pgsql: Implement pg_wal_replay_wait() stored procedure

On Wed, Oct 23, 2024 at 9:02 AM Pavel Borisov <pashkin.elfe@gmail.com> wrote:

On Wed, 23 Oct 2024 at 00:12, Alexander Korotkov <aekorotkov@gmail.com> wrote:

Thank you for your review.

On Tue, Oct 22, 2024 at 4:30 PM Pavel Borisov <pashkin.elfe@gmail.com> wrote:

On Tue, 22 Oct 2024 at 13:26, Alexander Korotkov <aekorotkov@gmail.com> wrote:

On Wed, Oct 16, 2024 at 11:20 PM Alexander Korotkov
<aekorotkov@gmail.com> wrote:

On Wed, Oct 16, 2024 at 10:35 PM Peter Eisentraut <peter@eisentraut.org> wrote:

On 02.09.24 01:55, Alexander Korotkov wrote:

On Mon, Sep 2, 2024 at 2:28 AM Michael Paquier <michael@paquier.xyz> wrote:

On Sun, Sep 01, 2024 at 10:35:27PM +0300, Alexander Korotkov wrote:

This path hasn't changes since the patch revision when it was a
utility command. I agree that this doesn't look like proper path for
stored procedure. But I don't think src/backend/utils/adt is
appropriate path either, because it's not really about data type.
pg_wal_replay_wait() looks a good neighbor for
pg_wal_replay_pause()/pg_wal_replay_resume() and other WAL-related
functions. So, what about moving it to src/backend/access/transam?

Moving the new function to xlogfuncs.c while publishing
WaitForLSNReplay() makes sense to me.

Thank you for proposal. I like this.

Could you, please, check the attached patch?

We still have stuff in src/backend/commands/waitlsn.c that is nothing
like a "command". You have moved some stuff elsewhere, but what are you
planning to do with the rest?

Thank you for spotting this another time. What about moving that
somewhere like src/backend/access/transam/xlogwait.c ?

Implemented this as a separate patch (0001). Also rebased other
pending patches on that. 0004 now revises header comment of
xlogwait.c with new procedure signature.

I've looked at v5 of a patchset.

0002:

As stated in latch.c

- WL_POSTMASTER_DEATH: Wait for postmaster to die
- WL_EXIT_ON_PM_DEATH: Exit immediately if the postmaster dies

* wakeEvents must include either WL_EXIT_ON_PM_DEATH for automatic exit
* if the postmaster dies or WL_POSTMASTER_DEATH for a flag set in the
* return value if the postmaster dies

It's not completely clear to me if these comments need some clarification (not related to the patchset), or if we should look for WL_EXIT_ON_PM_DEATH for immediately FATAL. Or waiting for postmaster to die on WL_POSTMASTER_DEATH instead of just fatal immediately?

As I get from the code, WL_EXIT_ON_PM_DEATH cause process to just
proc_exit(1) without throwing FATAL. So, in the most of situations we
do throw FATAL after seeing WL_POSTMASTER_DEATH event. So, it's
reasonable to do the same here. But indeed, this is a question (not
related to this patch) whether WL_EXIT_ON_PM_DEATH should cause
process to throw FATAL.

Libpq ends up with FATAL on WL_POSTMASTER_DEATH.
In a backend code on WL_POSTMASTER_DEATH: SyncRepWaitForLSN() sets ProcDiePending but don't FATAL. Walsender exits proc_exit(1).
I suppose WL_POSTMASTER_DEATH expected behavior is "Do whatever you want: wait for postmaster to die or end up immediately".
I think path 0002 is good.

I looked through patches v6 and I think they're all good now.

Thank you. I will push this patchset if now objections.

------
Regards,
Alexander Korotkov
Supabase

#27Heikki Linnakangas
hlinnaka@iki.fi
In reply to: Alexander Korotkov (#26)
Re: pgsql: Implement pg_wal_replay_wait() stored procedure

If you call this procedure on a stand-alone server, you get:

postgres=# call pg_wal_replay_wait('1234/0');
ERROR: recovery is not in progress
DETAIL: Recovery ended before replaying target LSN 1234/0; last replay
LSN 0/0.

The DETAIL seems a bit misleading. Recovery never ended, because it
never started in the first place. Last replay LSN is indeed 0/0, but
that's not helpful.

If a standby server has been promoted and you pass an LSN that's earlier
than the last replay LSN, it returns successfully. That makes sense I
guess; if you connect to a standby and wait for it to replay a commit
that you made in the primary, and the standby gets promoted, that seems
correct. But it's a little inconsistent: If the standby crashes
immediately after promotion, and you call pg_wal_replay_wait() after
recovery, it returns success. However, if you shut down the promoted
server and restart it, then last replay LSN is 0/0, and the call will
fail because no recovery happened.

What is the use case for the 'no_error' argument? Why would you ever
want to pass no_error=true ? The separate pg_wal_replay_wait_status()
function feels weird to me. Also it surely shouldn't be marked IMMUTABLE
nor parallel safe.

This would benefit from more documentation, explaining how you would use
this in practice. I believe the use case is that you want "read your
writes" consistency between a primary and a standby. You commit a
transaction in the primary, and you then want to run read-only queries
in a standby, and you want to make sure that you see your own commit,
but you're ok with slightly delayed results otherwise. It would be good
to add a chapter somewhere in the docs to show how to do that in
practice with these functions.

--
Heikki Linnakangas
Neon (https://neon.tech)

#28Alexander Korotkov
aekorotkov@gmail.com
In reply to: Heikki Linnakangas (#27)
Re: pgsql: Implement pg_wal_replay_wait() stored procedure

Hi, Heikki!

On Fri, Oct 25, 2024 at 9:06 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

If you call this procedure on a stand-alone server, you get:

postgres=# call pg_wal_replay_wait('1234/0');
ERROR: recovery is not in progress
DETAIL: Recovery ended before replaying target LSN 1234/0; last replay
LSN 0/0.

The DETAIL seems a bit misleading. Recovery never ended, because it
never started in the first place. Last replay LSN is indeed 0/0, but
that's not helpful.

If a standby server has been promoted and you pass an LSN that's earlier
than the last replay LSN, it returns successfully. That makes sense I
guess; if you connect to a standby and wait for it to replay a commit
that you made in the primary, and the standby gets promoted, that seems
correct. But it's a little inconsistent: If the standby crashes
immediately after promotion, and you call pg_wal_replay_wait() after
recovery, it returns success. However, if you shut down the promoted
server and restart it, then last replay LSN is 0/0, and the call will
fail because no recovery happened.

What is the use case for the 'no_error' argument? Why would you ever
want to pass no_error=true ? The separate pg_wal_replay_wait_status()
function feels weird to me. Also it surely shouldn't be marked IMMUTABLE
nor parallel safe.

This would benefit from more documentation, explaining how you would use
this in practice. I believe the use case is that you want "read your
writes" consistency between a primary and a standby. You commit a
transaction in the primary, and you then want to run read-only queries
in a standby, and you want to make sure that you see your own commit,
but you're ok with slightly delayed results otherwise. It would be good
to add a chapter somewhere in the docs to show how to do that in
practice with these functions.

Thank you for your feedback!

I do agree that error reporting for "not in recovery" case needs to be
improved, as well, as the documentation.

I see that pg_wal_replay_wait_status() might look weird, but it seems
to me like the best of feasible solutions. Given that
pg_wal_replay_wait() procedure can't work concurrently to a query
involving pg_wal_replay_wait_status() function, I think
pg_wal_replay_wait_status() should be stable and parallel safe.

This is the brief answer. I will be able to come back with more
details on Monday.

------
Regards,
Alexander Korotkov
Supabase

#29Heikki Linnakangas
hlinnaka@iki.fi
In reply to: Alexander Korotkov (#28)
Re: pgsql: Implement pg_wal_replay_wait() stored procedure

On 25/10/2024 14:56, Alexander Korotkov wrote:

I see that pg_wal_replay_wait_status() might look weird, but it seems
to me like the best of feasible solutions.

I haven't written many procedures, but our docs say:

Procedures do not return a function value; hence CREATE PROCEDURE

lacks a RETURNS clause. However, procedures can instead return data to
their callers via output parameters.

Did you consider using an output parameter?

Given that
pg_wal_replay_wait() procedure can't work concurrently to a query
involving pg_wal_replay_wait_status() function, I think
pg_wal_replay_wait_status() should be stable and parallel safe.

If you call pg_wal_replay_wait() in the backend process, and
pg_wal_replay_wait_status() in a parallel worker process, it won't
return the result of the wait. Probably not what you'd expect. So I'd
argue that it should be parallel unsafe.

This is the brief answer. I will be able to come back with more
details on Monday.

Thanks. A few more minor issues I spotted while playing with this:

- If you pass a very high value as the timeout, e.g. INT_MAX-1, it wraps
around and doesn't wait at all
- You can pass NULLs as arguments. That should probably not be allowed,
or we need to document what it means.

This is disappointing:

postgres=# set default_transaction_isolation ='repeatable read';
SET
postgres=# call pg_wal_replay_wait('0/55DA24F');
ERROR: pg_wal_replay_wait() must be only called without an active or registered snapshot
DETAIL: Make sure pg_wal_replay_wait() isn't called within a transaction with an isolation level higher than READ COMMITTED, another procedure, or a function.

Is there any way we could make that work? Otherwise, the feature just
basically doesn't work if you use repeatable read.

--
Heikki Linnakangas
Neon (https://neon.tech)

#30Robert Haas
robertmhaas@gmail.com
In reply to: Heikki Linnakangas (#29)
Re: pgsql: Implement pg_wal_replay_wait() stored procedure

Point of order:

Discussions of commits and potential followup commits are best
redirected to -hackers rather than continuing to use -committers.

...Robert

#31Alexander Korotkov
aekorotkov@gmail.com
In reply to: Robert Haas (#30)
Re: pgsql: Implement pg_wal_replay_wait() stored procedure

On Mon, Oct 28, 2024 at 5:47 PM Robert Haas <robertmhaas@gmail.com> wrote:

Point of order:

Discussions of commits and potential followup commits are best
redirected to -hackers rather than continuing to use -committers.

Thank you for pointing this.
My response to Heikki's last message will be redirected to -hackers.

------
Regards,
Alexander Korotkov
Supabase

#32Alexander Korotkov
aekorotkov@gmail.com
In reply to: Heikki Linnakangas (#29)
Re: pgsql: Implement pg_wal_replay_wait() stored procedure

On Mon, Oct 28, 2024 at 11:36 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

On 25/10/2024 14:56, Alexander Korotkov wrote:

I see that pg_wal_replay_wait_status() might look weird, but it seems
to me like the best of feasible solutions.

I haven't written many procedures, but our docs say:

Procedures do not return a function value; hence CREATE PROCEDURE

lacks a RETURNS clause. However, procedures can instead return data to
their callers via output parameters.

Did you consider using an output parameter?

Yes I did consider them and found two issues.
1) You still need to pass something to them. And that couldn't be
default values. That's a bit awkward.
2) Usage of them causes extra snapshot to be held.
I'll recheck if it's possible to workaround any of these two.

Given that
pg_wal_replay_wait() procedure can't work concurrently to a query
involving pg_wal_replay_wait_status() function, I think
pg_wal_replay_wait_status() should be stable and parallel safe.

If you call pg_wal_replay_wait() in the backend process, and
pg_wal_replay_wait_status() in a parallel worker process, it won't
return the result of the wait. Probably not what you'd expect. So I'd
argue that it should be parallel unsafe.

Oh, sorry. You're absolutely correct. That should be parallel unsafe.

This is the brief answer. I will be able to come back with more
details on Monday.

Thanks. A few more minor issues I spotted while playing with this:

- If you pass a very high value as the timeout, e.g. INT_MAX-1, it wraps
around and doesn't wait at all
- You can pass NULLs as arguments. That should probably not be allowed,
or we need to document what it means.

This is disappointing:

postgres=# set default_transaction_isolation ='repeatable read';
SET
postgres=# call pg_wal_replay_wait('0/55DA24F');
ERROR: pg_wal_replay_wait() must be only called without an active or registered snapshot
DETAIL: Make sure pg_wal_replay_wait() isn't called within a transaction with an isolation level higher than READ COMMITTED, another procedure, or a function.

Is there any way we could make that work? Otherwise, the feature just
basically doesn't work if you use repeatable read.

Thank you for catching this. The last one is really disappointing.
I'm exploring on what could be done there.

------
Regards,
Alexander Korotkov
Supabase

#33Alexander Korotkov
aekorotkov@gmail.com
In reply to: Alexander Korotkov (#32)
1 attachment(s)
Re: pgsql: Implement pg_wal_replay_wait() stored procedure

Hi, Heikki!

On Mon, Oct 28, 2024 at 9:42 PM Alexander Korotkov <aekorotkov@gmail.com>
wrote:

On Mon, Oct 28, 2024 at 11:36 AM Heikki Linnakangas <hlinnaka@iki.fi>

wrote:

On 25/10/2024 14:56, Alexander Korotkov wrote:

I see that pg_wal_replay_wait_status() might look weird, but it seems
to me like the best of feasible solutions.

I haven't written many procedures, but our docs say:

Procedures do not return a function value; hence CREATE PROCEDURE

lacks a RETURNS clause. However, procedures can instead return data to
their callers via output parameters.

Did you consider using an output parameter?

Yes I did consider them and found two issues.
1) You still need to pass something to them. And that couldn't be
default values. That's a bit awkward.
2) Usage of them causes extra snapshot to be held.
I'll recheck if it's possible to workaround any of these two.

I've rechecked the output parameters for stored procedures. And I think
the behavior I previously discovered is an anomaly.

CREATE PROCEDURE test_proc(a integer, out b integer)
LANGUAGE plpgsql
AS $$
BEGIN
b := a;
END;
$$;

# call test_proc(1);
ERROR: procedure test_proc(integer) does not exist
LINE 1: call test_proc(1);
^
HINT: No procedure matches the given name and argument types. You might
need to add explicit type casts.

# call test_proc(1,2);
b
---
1
(1 row)

Looks weird that we have to pass in some (ignored?) values for output
parameters. In contrast, functions don't require this.

CREATE FUNCTION test_func(a integer, out b integer)
LANGUAGE plpgsql
AS $$
BEGIN
b := a;
END;
$$;

# select test_func(1);
test_func
-----------
1
(1 row)

This makes me think we have an issue with stored procedures here. I'll try
to investigate it further.

Also when we have output parameters, PortalRun() have portal->strategy ==
PORTAL_UTIL_SELECT (it's PORTAL_MULTI_QUERY without them). This eventually
makes PortalRunUtility() to do RegisterSnapshot(). This is not clear
whether we can release this snapshot without a stored procedure without the
consequences.

Given that
pg_wal_replay_wait() procedure can't work concurrently to a query
involving pg_wal_replay_wait_status() function, I think
pg_wal_replay_wait_status() should be stable and parallel safe.

If you call pg_wal_replay_wait() in the backend process, and
pg_wal_replay_wait_status() in a parallel worker process, it won't
return the result of the wait. Probably not what you'd expect. So I'd
argue that it should be parallel unsafe.

Oh, sorry. You're absolutely correct. That should be parallel unsafe.

This is the brief answer. I will be able to come back with more
details on Monday.

Thanks. A few more minor issues I spotted while playing with this:

- If you pass a very high value as the timeout, e.g. INT_MAX-1, it wraps
around and doesn't wait at all
- You can pass NULLs as arguments. That should probably not be allowed,
or we need to document what it means.

This is disappointing:

postgres=# set default_transaction_isolation ='repeatable read';
SET
postgres=# call pg_wal_replay_wait('0/55DA24F');
ERROR: pg_wal_replay_wait() must be only called without an active or

registered snapshot

DETAIL: Make sure pg_wal_replay_wait() isn't called within a

transaction with an isolation level higher than READ COMMITTED, another
procedure, or a function.

Is there any way we could make that work? Otherwise, the feature just
basically doesn't work if you use repeatable read.

Thank you for catching this. The last one is really disappointing.
I'm exploring on what could be done there.

If we set repeatable read as the default transaction isolation level, then
the catalog snapshot used to find pg_wal_replay_wait() procedure got saved
as the transaction snapshot. As I get the correct way to release that
snapshot is to finish current transaction and start another one.
Implemented in the attached patch.

------
Regards,
Alexander Korotkov
Supabase

Attachments:

v1-0001-Teach-pg_wal_replay_wait-to-handle-REPEATABLE-REA.patchapplication/x-patch; name=v1-0001-Teach-pg_wal_replay_wait-to-handle-REPEATABLE-REA.patchDownload
From 96a5df8ed096a117276c89da1968139e1d20c997 Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <akorotkov@postgresql.org>
Date: Sun, 3 Nov 2024 22:47:05 +0200
Subject: [PATCH v1] Teach pg_wal_replay_wait() to handle REPEATABLE READ
 default level

Reported-by:
Bug:
Discussion:
Author:
Reviewed-by:
Tested-by:
Backpatch-through:
---
 src/backend/access/transam/xlogfuncs.c     | 24 ++++++++++++++++++++--
 src/test/recovery/t/043_wal_replay_wait.pl |  5 ++++-
 2 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index bca1d395683..8e90e6ae444 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -34,6 +34,7 @@
 #include "utils/builtins.h"
 #include "utils/memutils.h"
 #include "utils/pg_lsn.h"
+#include "utils/portal.h"
 #include "utils/snapmgr.h"
 #include "utils/timestamp.h"
 
@@ -763,6 +764,8 @@ pg_wal_replay_wait(PG_FUNCTION_ARGS)
 	XLogRecPtr	target_lsn = PG_GETARG_LSN(0);
 	int64		timeout = PG_GETARG_INT64(1);
 	bool		no_error = PG_GETARG_BOOL(2);
+	CallContext *context = (CallContext *) fcinfo->context;
+	bool		start_tranaction = false;
 
 	if (timeout < 0)
 		ereport(ERROR,
@@ -776,12 +779,26 @@ pg_wal_replay_wait(PG_FUNCTION_ARGS)
 	 * implying a kind of self-deadlock.  This is the reason why
 	 * pg_wal_replay_wait() is a procedure, not a function.
 	 *
-	 * At first, we should check there is no active snapshot.  According to
+	 * If we're in REPEATABLE READ isolation level or higher, the catalog
+	 * snapshot used to find this procedure will be saved as the transaction
+	 * snapshot.  Thus, we need to finish current transaction in order to
+	 * release this snapshot.  We can do this in non-atomic context.
+	 * Additionally check we're in the first command of the transaction.
+	 *
+	 * Also, we should check there is no active snapshot.  According to
 	 * PlannedStmtRequiresSnapshot(), even in an atomic context, CallStmt is
 	 * processed with a snapshot.  Thankfully, we can pop this snapshot,
 	 * because PortalRunUtility() can tolerate this.
 	 */
-	if (ActiveSnapshotSet())
+	if (IsolationUsesXactSnapshot() &&
+		GetCurrentCommandId(false) == 0 &&
+		!context->atomic)
+	{
+		ForgetPortalSnapshots();
+		CommitTransactionCommand();
+		start_tranaction = true;
+	}
+	else if (ActiveSnapshotSet())
 		PopActiveSnapshot();
 
 	/*
@@ -805,6 +822,9 @@ pg_wal_replay_wait(PG_FUNCTION_ARGS)
 
 	lastWaitLSNResult = WaitForLSNReplay(target_lsn, timeout);
 
+	if (start_tranaction)
+		StartTransactionCommand();
+
 	if (no_error)
 		PG_RETURN_VOID();
 
diff --git a/src/test/recovery/t/043_wal_replay_wait.pl b/src/test/recovery/t/043_wal_replay_wait.pl
index 5857b943711..388741ee0aa 100644
--- a/src/test/recovery/t/043_wal_replay_wait.pl
+++ b/src/test/recovery/t/043_wal_replay_wait.pl
@@ -47,13 +47,16 @@ my $output = $node_standby->safe_psql(
 ok($output >= 0,
 	"standby reached the same LSN as primary after pg_wal_replay_wait()");
 
-# 2. Check that new data is visible after calling pg_wal_replay_wait()
+# 2. Check that new data is visible after calling pg_wal_replay_wait().
+# Do this check with REPEATABLE READ as the default transaction isolation
+# mode.
 $node_primary->safe_psql('postgres',
 	"INSERT INTO wait_test VALUES (generate_series(21, 30))");
 my $lsn2 =
   $node_primary->safe_psql('postgres', "SELECT pg_current_wal_insert_lsn()");
 $output = $node_standby->safe_psql(
 	'postgres', qq[
+	SET default_transaction_isolation = 'repeatable read';
 	CALL pg_wal_replay_wait('${lsn2}');
 	SELECT count(*) FROM wait_test;
 ]);
-- 
2.39.5 (Apple Git-154)

#34Alexander Korotkov
aekorotkov@gmail.com
In reply to: Alexander Korotkov (#33)
Re: pgsql: Implement pg_wal_replay_wait() stored procedure

On Sun, Nov 3, 2024 at 10:54 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:

On Mon, Oct 28, 2024 at 9:42 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:

On Mon, Oct 28, 2024 at 11:36 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

On 25/10/2024 14:56, Alexander Korotkov wrote:

I see that pg_wal_replay_wait_status() might look weird, but it seems
to me like the best of feasible solutions.

I haven't written many procedures, but our docs say:

Procedures do not return a function value; hence CREATE PROCEDURE

lacks a RETURNS clause. However, procedures can instead return data to
their callers via output parameters.

Did you consider using an output parameter?

Yes I did consider them and found two issues.
1) You still need to pass something to them. And that couldn't be
default values. That's a bit awkward.
2) Usage of them causes extra snapshot to be held.
I'll recheck if it's possible to workaround any of these two.

I've rechecked the output parameters for stored procedures. And I think the behavior I previously discovered is an anomaly.

CREATE PROCEDURE test_proc(a integer, out b integer)
LANGUAGE plpgsql
AS $$
BEGIN
b := a;
END;
$$;

# call test_proc(1);
ERROR: procedure test_proc(integer) does not exist
LINE 1: call test_proc(1);
^
HINT: No procedure matches the given name and argument types. You might need to add explicit type casts.

# call test_proc(1,2);
b
---
1
(1 row)

Looks weird that we have to pass in some (ignored?) values for output parameters. In contrast, functions don't require this.

CREATE FUNCTION test_func(a integer, out b integer)
LANGUAGE plpgsql
AS $$
BEGIN
b := a;
END;
$$;

# select test_func(1);
test_func
-----------
1
(1 row)

This makes me think we have an issue with stored procedures here. I'll try to investigate it further.

Oh, this seems to be intentional [1] and seems to be part of standard [2].

Links
1. https://www.postgresql.org/docs/devel/xfunc-sql.html#XFUNC-OUTPUT-PARAMETERS-PROC
2. /messages/by-id/2b8490fe-51af-e671-c504-47359dc453c5@2ndquadrant.com

------
Regards,
Alexander Korotkov
Supabase

#35Alexander Korotkov
aekorotkov@gmail.com
In reply to: Alexander Korotkov (#34)
2 attachment(s)
Re: pgsql: Implement pg_wal_replay_wait() stored procedure

On Sun, Nov 3, 2024 at 11:03 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:

On Sun, Nov 3, 2024 at 10:54 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:

On Mon, Oct 28, 2024 at 9:42 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:

On Mon, Oct 28, 2024 at 11:36 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

On 25/10/2024 14:56, Alexander Korotkov wrote:

I see that pg_wal_replay_wait_status() might look weird, but it seems
to me like the best of feasible solutions.

I haven't written many procedures, but our docs say:

Procedures do not return a function value; hence CREATE PROCEDURE

lacks a RETURNS clause. However, procedures can instead return data to
their callers via output parameters.

Did you consider using an output parameter?

Yes I did consider them and found two issues.
1) You still need to pass something to them. And that couldn't be
default values. That's a bit awkward.
2) Usage of them causes extra snapshot to be held.
I'll recheck if it's possible to workaround any of these two.

I've rechecked the output parameters for stored procedures. And I think the behavior I previously discovered is an anomaly.

CREATE PROCEDURE test_proc(a integer, out b integer)
LANGUAGE plpgsql
AS $$
BEGIN
b := a;
END;
$$;

# call test_proc(1);
ERROR: procedure test_proc(integer) does not exist
LINE 1: call test_proc(1);
^
HINT: No procedure matches the given name and argument types. You might need to add explicit type casts.

# call test_proc(1,2);
b
---
1
(1 row)

Looks weird that we have to pass in some (ignored?) values for output parameters. In contrast, functions don't require this.

CREATE FUNCTION test_func(a integer, out b integer)
LANGUAGE plpgsql
AS $$
BEGIN
b := a;
END;
$$;

# select test_func(1);
test_func
-----------
1
(1 row)

This makes me think we have an issue with stored procedures here. I'll try to investigate it further.

Oh, this seems to be intentional [1] and seems to be part of standard [2].

Links
1. https://www.postgresql.org/docs/devel/xfunc-sql.html#XFUNC-OUTPUT-PARAMETERS-PROC
2. /messages/by-id/2b8490fe-51af-e671-c504-47359dc453c5@2ndquadrant.com

The attached patchset contains patch 0001, which improves handling of
not in recovery state by usage of PromoteIsTriggered(). When
(PromoteIsTriggered() == false), last replay LSN is not accepted and
not reported in errdetail().

0002 contains patch finishing implicit transaction in default
isolation level REPEATABLE READ or higher with revised commit message.

------
Regards,
Alexander Korotkov
Supabase

Attachments:

v2-0002-Teach-pg_wal_replay_wait-to-handle-REPEATABLE-REA.patchapplication/octet-stream; name=v2-0002-Teach-pg_wal_replay_wait-to-handle-REPEATABLE-REA.patchDownload
From 34080641a001f2b86fdd56d99dcbdbecf44630b1 Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <akorotkov@postgresql.org>
Date: Sun, 3 Nov 2024 22:47:05 +0200
Subject: [PATCH v2 2/2] Teach pg_wal_replay_wait() to handle REPEATABLE READ
 default level

When pg_wal_replay_wait() is run in the implicit transaction of
REPEATABLE READ isolation level or higher,  the catalog snapshot used to
find this procedure will be saved as the transaction snapshot.
Thus, we need to finish current transaction in order to release this
snapshot.  We can do this in non-atomic context.  Additionally check we're
in the first command of the transaction.

Reported-by: Heikki Linnakangas
Discussion: https://postgr.es/m/14de8671-e328-4c3e-b136-664f6f13a39f%40iki.fi
---
 src/backend/access/transam/xlogfuncs.c     | 24 ++++++++++++++++++++--
 src/test/recovery/t/043_wal_replay_wait.pl |  5 ++++-
 2 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index 8abea2dbb57..b442799299d 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -34,6 +34,7 @@
 #include "utils/builtins.h"
 #include "utils/memutils.h"
 #include "utils/pg_lsn.h"
+#include "utils/portal.h"
 #include "utils/snapmgr.h"
 #include "utils/timestamp.h"
 
@@ -763,6 +764,8 @@ pg_wal_replay_wait(PG_FUNCTION_ARGS)
 	XLogRecPtr	target_lsn = PG_GETARG_LSN(0);
 	int64		timeout = PG_GETARG_INT64(1);
 	bool		no_error = PG_GETARG_BOOL(2);
+	CallContext *context = (CallContext *) fcinfo->context;
+	bool		start_tranaction = false;
 
 	if (timeout < 0)
 		ereport(ERROR,
@@ -776,12 +779,26 @@ pg_wal_replay_wait(PG_FUNCTION_ARGS)
 	 * implying a kind of self-deadlock.  This is the reason why
 	 * pg_wal_replay_wait() is a procedure, not a function.
 	 *
-	 * At first, we should check there is no active snapshot.  According to
+	 * If we're in REPEATABLE READ isolation level or higher, the catalog
+	 * snapshot used to find this procedure will be saved as the transaction
+	 * snapshot.  Thus, we need to finish current transaction in order to
+	 * release this snapshot.  We can do this in non-atomic context.
+	 * Additionally check we're in the first command of the transaction.
+	 *
+	 * Also, we should check there is no active snapshot.  According to
 	 * PlannedStmtRequiresSnapshot(), even in an atomic context, CallStmt is
 	 * processed with a snapshot.  Thankfully, we can pop this snapshot,
 	 * because PortalRunUtility() can tolerate this.
 	 */
-	if (ActiveSnapshotSet())
+	if (IsolationUsesXactSnapshot() &&
+		GetCurrentCommandId(false) == 0 &&
+		!context->atomic)
+	{
+		ForgetPortalSnapshots();
+		CommitTransactionCommand();
+		start_tranaction = true;
+	}
+	else if (ActiveSnapshotSet())
 		PopActiveSnapshot();
 
 	/*
@@ -805,6 +822,9 @@ pg_wal_replay_wait(PG_FUNCTION_ARGS)
 
 	lastWaitLSNResult = WaitForLSNReplay(target_lsn, timeout);
 
+	if (start_tranaction)
+		StartTransactionCommand();
+
 	if (no_error)
 		PG_RETURN_VOID();
 
diff --git a/src/test/recovery/t/043_wal_replay_wait.pl b/src/test/recovery/t/043_wal_replay_wait.pl
index 5857b943711..388741ee0aa 100644
--- a/src/test/recovery/t/043_wal_replay_wait.pl
+++ b/src/test/recovery/t/043_wal_replay_wait.pl
@@ -47,13 +47,16 @@ my $output = $node_standby->safe_psql(
 ok($output >= 0,
 	"standby reached the same LSN as primary after pg_wal_replay_wait()");
 
-# 2. Check that new data is visible after calling pg_wal_replay_wait()
+# 2. Check that new data is visible after calling pg_wal_replay_wait().
+# Do this check with REPEATABLE READ as the default transaction isolation
+# mode.
 $node_primary->safe_psql('postgres',
 	"INSERT INTO wait_test VALUES (generate_series(21, 30))");
 my $lsn2 =
   $node_primary->safe_psql('postgres', "SELECT pg_current_wal_insert_lsn()");
 $output = $node_standby->safe_psql(
 	'postgres', qq[
+	SET default_transaction_isolation = 'repeatable read';
 	CALL pg_wal_replay_wait('${lsn2}');
 	SELECT count(*) FROM wait_test;
 ]);
-- 
2.39.5 (Apple Git-154)

v2-0001-pg_wal_replay_wait-Improve-handling-of-standby-pr.patchapplication/octet-stream; name=v2-0001-pg_wal_replay_wait-Improve-handling-of-standby-pr.patchDownload
From 4c8ef6795234c1314a33c03eba4cba790e787cca Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <akorotkov@postgresql.org>
Date: Mon, 4 Nov 2024 06:18:08 +0200
Subject: [PATCH v2 1/2] pg_wal_replay_wait(): Improve handling of standby
 promotion

Currently, when pg_wal_replay_wait() finds the server to be not in recovery,
it assumes concurrent promotion happened.  That leads to certain problems
including:

  1. No distinction between local recovery as streaming WAL replay,
  2. Weird error reporting when there was no recovery.

This commit uses PromoteIsTriggered() checks to improve the situation in
the following ways:

 1. Accept last replay LSN not in recovery mode only when standby was
    promoted,
 2. Show last replay LSN in errdetail() only if standby was promoted.

Reported-by: Heikki Linnakangas
Discussion: https://postgr.es/m/ab0eddce-06d4-4db2-87ce-46fa2427806c%40iki.fi
---
 src/backend/access/transam/xlogfuncs.c | 18 ++++++++++++------
 src/backend/access/transam/xlogwait.c  |  6 ++++--
 2 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index bca1d395683..8abea2dbb57 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -827,12 +827,18 @@ pg_wal_replay_wait(PG_FUNCTION_ARGS)
 			break;
 
 		case WAIT_LSN_RESULT_NOT_IN_RECOVERY:
-			ereport(ERROR,
-					(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
-					 errmsg("recovery is not in progress"),
-					 errdetail("Recovery ended before replaying target LSN %X/%X; last replay LSN %X/%X.",
-							   LSN_FORMAT_ARGS(target_lsn),
-							   LSN_FORMAT_ARGS(GetXLogReplayRecPtr(NULL)))));
+			if (PromoteIsTriggered())
+				ereport(ERROR,
+						(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+						 errmsg("recovery is not in progress"),
+						 errdetail("Recovery ended before replaying target LSN %X/%X; last replay LSN %X/%X.",
+								   LSN_FORMAT_ARGS(target_lsn),
+								   LSN_FORMAT_ARGS(GetXLogReplayRecPtr(NULL)))));
+			else
+				ereport(ERROR,
+						(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+						 errmsg("recovery is not in progress"),
+						 errhint("Waiting for LSN can only be executed during recovery.")));
 			break;
 	}
 
diff --git a/src/backend/access/transam/xlogwait.c b/src/backend/access/transam/xlogwait.c
index 0ec0898cfbf..fa40762c66a 100644
--- a/src/backend/access/transam/xlogwait.c
+++ b/src/backend/access/transam/xlogwait.c
@@ -238,7 +238,8 @@ WaitForLSNReplay(XLogRecPtr targetLSN, int64 timeout)
 		 * the procedure call, while target LSN is replayed.  So, we still
 		 * check the last replay LSN before reporting an error.
 		 */
-		if (targetLSN <= GetXLogReplayRecPtr(NULL))
+		if (PromoteIsTriggered() &&
+			targetLSN <= GetXLogReplayRecPtr(NULL))
 			return WAIT_LSN_RESULT_SUCCESS;
 		return WAIT_LSN_RESULT_NOT_IN_RECOVERY;
 	}
@@ -276,7 +277,8 @@ WaitForLSNReplay(XLogRecPtr targetLSN, int64 timeout)
 			 */
 			deleteLSNWaiter();
 			currentLSN = GetXLogReplayRecPtr(NULL);
-			if (targetLSN <= currentLSN)
+			if (PromoteIsTriggered() &&
+				targetLSN <= currentLSN)
 				return WAIT_LSN_RESULT_SUCCESS;
 			return WAIT_LSN_RESULT_NOT_IN_RECOVERY;
 		}
-- 
2.39.5 (Apple Git-154)

#36Michael Paquier
michael@paquier.xyz
In reply to: Alexander Korotkov (#35)
Re: pgsql: Implement pg_wal_replay_wait() stored procedure

On Mon, Nov 04, 2024 at 06:29:42AM +0200, Alexander Korotkov wrote:

The attached patchset contains patch 0001, which improves handling of
not in recovery state by usage of PromoteIsTriggered(). When
(PromoteIsTriggered() == false), last replay LSN is not accepted and
not reported in errdetail().

0002 contains patch finishing implicit transaction in default
isolation level REPEATABLE READ or higher with revised commit message.

I was just catching up with this thread, and I'm still confused about
the state of things. There are two things that are out of order for
me, at least, after skimming through the code on HEAD (I suspect there
is more):
- WaitForLSNReplay() uses an initial set of checks that are duplicated
in the main loop. This is still overcomplicated, no? Wouldn't it be
simpler to eliminate the first of checks or just have a goto block
with addLSNWaiter() called after the first round of checks?
- pg_wal_replay_wait_status() returns a status based on a static
variable that can only be accessed with the same backend as the one
that has called the wait function. That's.. Err.. Unfriendly. Why
being sticky with one backend for the job?

Using output parameters in a procedure is something I did not recall.
Based on your point about not using a function due your argument based
on the snapshots, let's just use that and forget about the status
function entirely (please?).

Based on the latest set of issues reported, it does not feel like this
is really baked and ready for prime day, either. Perhaps it would be
less confusing to just revert the whole and repost a more complete and
structured implementation with an extra round of reviews? There's
still time in this release cycle.
--
Michael

#37Alexander Korotkov
aekorotkov@gmail.com
In reply to: Michael Paquier (#36)
Re: pgsql: Implement pg_wal_replay_wait() stored procedure

On Mon, Nov 4, 2024 at 8:04 AM Michael Paquier <michael@paquier.xyz> wrote:

On Mon, Nov 04, 2024 at 06:29:42AM +0200, Alexander Korotkov wrote:

The attached patchset contains patch 0001, which improves handling of
not in recovery state by usage of PromoteIsTriggered(). When
(PromoteIsTriggered() == false), last replay LSN is not accepted and
not reported in errdetail().

0002 contains patch finishing implicit transaction in default
isolation level REPEATABLE READ or higher with revised commit message.

I was just catching up with this thread, and I'm still confused about
the state of things. There are two things that are out of order for
me, at least, after skimming through the code on HEAD (I suspect there
is more):
- WaitForLSNReplay() uses an initial set of checks that are duplicated
in the main loop. This is still overcomplicated, no? Wouldn't it be
simpler to eliminate the first of checks or just have a goto block
with addLSNWaiter() called after the first round of checks?
- pg_wal_replay_wait_status() returns a status based on a static
variable that can only be accessed with the same backend as the one
that has called the wait function. That's.. Err.. Unfriendly. Why
being sticky with one backend for the job?

That's a bit awkward, but I think generally valid. Some backend can
wait for LSN, then the result could be read from the same backend.
Difference backends could wait for different LSNs with different
results, I definitely don't feel like something should be shared
across backends.

Using output parameters in a procedure is something I did not recall.
Based on your point about not using a function due your argument based
on the snapshots, let's just use that and forget about the status
function entirely (please?).

Please, check [1]. Usage of output parameters is a bit awkward too,
because you need to pass some value in there. And more importantly,
usage of output parameters also causes snapshot problem, as it causes
another snapshot to be held.

I'm tending to think we don't necessarily need to have ability to get
the waiting status in the initial version of this feature. The work
on this feature started in 2016. It's already very long for a simple
feature of just waiting for LSN (simple in general design, while our
snapshot reality makes one full of trouble). Having in-core
implementation for this seems like breakthrough already, even if it
doesn't have all the abilities it could potentially have.

Based on the latest set of issues reported, it does not feel like this
is really baked and ready for prime day, either. Perhaps it would be
less confusing to just revert the whole and repost a more complete and
structured implementation with an extra round of reviews? There's
still time in this release cycle.

Yes, this makes sense. Will revert.

Links.
1. /messages/by-id/CAPpHfdvRmTzGJw5rQdSMkTxUPZkjwtbQ=LJE2u9Jqh9gFXHpmg@mail.gmail.com

------
Regards,
Alexander Korotkov
Supabase

#38Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Alexander Korotkov (#37)
Re: pgsql: Implement pg_wal_replay_wait() stored procedure

On 2024-Nov-04, Alexander Korotkov wrote:

On Mon, Nov 4, 2024 at 8:04 AM Michael Paquier <michael@paquier.xyz> wrote:

Using output parameters in a procedure is something I did not recall.
Based on your point about not using a function due your argument based
on the snapshots, let's just use that and forget about the status
function entirely (please?).

Please, check [1]. Usage of output parameters is a bit awkward too,
because you need to pass some value in there. And more importantly,
usage of output parameters also causes snapshot problem, as it causes
another snapshot to be held.

I wonder if it would be better to go back to the original idea of using
special DDL syntax rather than a procedure. It seems we've been piling
up hacks to get around the behavior of procedures, and we seem to have
grown one more to handle repeatable read transactions.

It's looking to me like there's just too much cruft in the quest to
avoid having to reach consensus on new syntax. This might be a mistake.
Is it possible to backtrack on that decision?

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"Linux transformó mi computadora, de una `máquina para hacer cosas',
en un aparato realmente entretenido, sobre el cual cada día aprendo
algo nuevo" (Jaime Salinas)

#39Robert Haas
robertmhaas@gmail.com
In reply to: Alvaro Herrera (#38)
Re: pgsql: Implement pg_wal_replay_wait() stored procedure

On Mon, Nov 4, 2024 at 9:19 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

It's looking to me like there's just too much cruft in the quest to
avoid having to reach consensus on new syntax. This might be a mistake.
Is it possible to backtrack on that decision?

There's also the patch that Heikki posted to wait using a
protocol-level facility. Maybe that's just a better fit and we don't
need either a procedure or new syntax.

--
Robert Haas
EDB: http://www.enterprisedb.com

#40Heikki Linnakangas
hlinnaka@iki.fi
In reply to: Robert Haas (#39)
Re: pgsql: Implement pg_wal_replay_wait() stored procedure

On 04/11/2024 17:53, Robert Haas wrote:

On Mon, Nov 4, 2024 at 9:19 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

It's looking to me like there's just too much cruft in the quest to
avoid having to reach consensus on new syntax. This might be a mistake.
Is it possible to backtrack on that decision?

There's also the patch that Heikki posted to wait using a
protocol-level facility.

It was Peter E

Maybe that's just a better fit and we don't need either a procedure
or new syntax.

I think it would still be good to expose the feature at SQL level too.
Makes it easier to test and makes it usable without client library
changes, for example.

--
Heikki Linnakangas
Neon (https://neon.tech)

#41Robert Haas
robertmhaas@gmail.com
In reply to: Heikki Linnakangas (#40)
Re: pgsql: Implement pg_wal_replay_wait() stored procedure

On Mon, Nov 4, 2024 at 10:57 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

There's also the patch that Heikki posted to wait using a
protocol-level facility.

It was Peter E

Oops.

Maybe that's just a better fit and we don't need either a procedure
or new syntax.

I think it would still be good to expose the feature at SQL level too.
Makes it easier to test and makes it usable without client library
changes, for example.

OK. Well then I agree with Álvaro. If using a procedure isn't working
out nicely, it's better to consider alternatives than to force a round
peg into a square hole (with due regard for the fact that you can fit
anything through the square hole if you make the square hole big
enough...).

--
Robert Haas
EDB: http://www.enterprisedb.com

#42Heikki Linnakangas
hlinnaka@iki.fi
In reply to: Robert Haas (#41)
Re: pgsql: Implement pg_wal_replay_wait() stored procedure

On 04/11/2024 18:53, Robert Haas wrote:

On Mon, Nov 4, 2024 at 10:57 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

Maybe that's just a better fit and we don't need either a procedure
or new syntax.

I think it would still be good to expose the feature at SQL level too.
Makes it easier to test and makes it usable without client library
changes, for example.

OK. Well then I agree with Álvaro. If using a procedure isn't working
out nicely, it's better to consider alternatives than to force a round
peg into a square hole (with due regard for the fact that you can fit
anything through the square hole if you make the square hole big
enough...).

Agreed. Off the top of my, summary of issues I'm aware of:

- Missing documentation. The function itself is documented, but it's
clearly intended for implementing read-after-write consistency in a
cluster setup, so we should document how to do that.
- Not clear to me what the use case for the 'timeout' and 'noerror'
arguments is. Why do we need it at all? (See first point on missing docs).
- Separate pg_wal_replay_wait_status() function to get result status is
awkward
- Inconsistencies in behavior if server is not in recovery (Alexander's
latest patch addresses this, but I didn't review it yet)
- pg_wal_replay_wait_status() is incorrectly marked as STABLE and
PARALLEL SAFE
- Broken with default_transaction_isolation ='repeatable read'

Let's revert commit e546989a26 and rethink. I think the basic
pg_wal_replay_wait() interface before e546989a26 was fine. Some of the
above are issues above were present before e546989a26 already and need
to be addressed in any case, but I believe they can be fixed without
changing the interface.

--
Heikki Linnakangas
Neon (https://neon.tech)

#43Alexander Korotkov
aekorotkov@gmail.com
In reply to: Alvaro Herrera (#38)
Re: pgsql: Implement pg_wal_replay_wait() stored procedure

On Mon, Nov 4, 2024 at 4:19 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2024-Nov-04, Alexander Korotkov wrote:

On Mon, Nov 4, 2024 at 8:04 AM Michael Paquier <michael@paquier.xyz> wrote:

Using output parameters in a procedure is something I did not recall.
Based on your point about not using a function due your argument based
on the snapshots, let's just use that and forget about the status
function entirely (please?).

Please, check [1]. Usage of output parameters is a bit awkward too,
because you need to pass some value in there. And more importantly,
usage of output parameters also causes snapshot problem, as it causes
another snapshot to be held.

I wonder if it would be better to go back to the original idea of using
special DDL syntax rather than a procedure. It seems we've been piling
up hacks to get around the behavior of procedures, and we seem to have
grown one more to handle repeatable read transactions.

It's looking to me like there's just too much cruft in the quest to
avoid having to reach consensus on new syntax. This might be a mistake.
Is it possible to backtrack on that decision?

Before committing pg_wal_replay_wait() I was under impression that
stored procedure would require the same amount of efforts than utility
statement to make backend "snapshot-less". However, it appears that
if we have implicit REPEATABLE READ transaction, stored procedure
needs more efforts. That makes the whole decision, at least,
questionable.

------
Regards,
Alexander Korotkov
Supabase

#44Alexander Korotkov
aekorotkov@gmail.com
In reply to: Michael Paquier (#36)
Re: pgsql: Implement pg_wal_replay_wait() stored procedure

On Mon, Nov 4, 2024 at 8:04 AM Michael Paquier <michael@paquier.xyz> wrote:

On Mon, Nov 04, 2024 at 06:29:42AM +0200, Alexander Korotkov wrote:

The attached patchset contains patch 0001, which improves handling of
not in recovery state by usage of PromoteIsTriggered(). When
(PromoteIsTriggered() == false), last replay LSN is not accepted and
not reported in errdetail().

0002 contains patch finishing implicit transaction in default
isolation level REPEATABLE READ or higher with revised commit message.

I was just catching up with this thread, and I'm still confused about
the state of things. There are two things that are out of order for
me, at least, after skimming through the code on HEAD (I suspect there
is more):
- WaitForLSNReplay() uses an initial set of checks that are duplicated
in the main loop. This is still overcomplicated, no? Wouldn't it be
simpler to eliminate the first of checks or just have a goto block
with addLSNWaiter() called after the first round of checks?
- pg_wal_replay_wait_status() returns a status based on a static
variable that can only be accessed with the same backend as the one
that has called the wait function. That's.. Err.. Unfriendly. Why
being sticky with one backend for the job?

Using output parameters in a procedure is something I did not recall.
Based on your point about not using a function due your argument based
on the snapshots, let's just use that and forget about the status
function entirely (please?).

On second thought, status function is probably not so silly idea.
Unlike result of utility statement/stored procedure, it could be used
inside another function, that is one may implement custom result
handling logic on postgres server side. On contrast, if you get
something out of utility statement/stored procedure, it takes an extra
roundtrip to push it back to postgres backend.

Based on the latest set of issues reported, it does not feel like this
is really baked and ready for prime day, either. Perhaps it would be
less confusing to just revert the whole and repost a more complete and
structured implementation with an extra round of reviews? There's
still time in this release cycle.

Reverted.

------
Regards,
Alexander Korotkov
Supabase

#45Alexander Korotkov
aekorotkov@gmail.com
In reply to: Heikki Linnakangas (#40)
Re: pgsql: Implement pg_wal_replay_wait() stored procedure

On Mon, Nov 4, 2024 at 5:57 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

On 04/11/2024 17:53, Robert Haas wrote:

On Mon, Nov 4, 2024 at 9:19 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

It's looking to me like there's just too much cruft in the quest to
avoid having to reach consensus on new syntax. This might be a mistake.
Is it possible to backtrack on that decision?

There's also the patch that Heikki posted to wait using a
protocol-level facility.

It was Peter E

Maybe that's just a better fit and we don't need either a procedure
or new syntax.

I think it would still be good to expose the feature at SQL level too.
Makes it easier to test and makes it usable without client library
changes, for example.

+1,
Also, it could potentially has wider use cases. For example, waiting
for replay of not latest changes, but some intermediate changes. Or
issuing pg_wal_replay_wait() from another process than one which made
the changes.

------
Regards,
Alexander Korotkov
Supabase