pg_receivewal starting position

Started by Ronan Dunklauover 4 years ago67 messages
#1Ronan Dunklau
ronan.dunklau@aiven.io

Hello,

I've notived that pg_receivewal logic for deciding which LSN to start
streaming at consists of:
- looking up the latest WAL file in our destination folder, and resume from
here
- if there isn't, use the current flush location instead.

This behaviour surprised me when using it with a replication slot: I was
expecting it to start streaming at the last flushed location from the
replication slot instead. If you consider a backup tool which will take
pg_receivewal's output and transfer it somewhere else, using the replication
slot position would be the easiest way to ensure we don't miss WAL files.

Does that make sense ?

I don't know if it should be the default, toggled by a command line flag, or if
we even should let the user provide a LSN.

I'd be happy to implement any of that if we agree.

--
Ronan Dunklau

#2Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Ronan Dunklau (#1)
Re: pg_receivewal starting position

At Tue, 27 Jul 2021 07:50:39 +0200, Ronan Dunklau <ronan.dunklau@aiven.io> wrote in

Hello,

I've notived that pg_receivewal logic for deciding which LSN to start
streaming at consists of:
- looking up the latest WAL file in our destination folder, and resume from
here
- if there isn't, use the current flush location instead.

This behaviour surprised me when using it with a replication slot: I was
expecting it to start streaming at the last flushed location from the
replication slot instead. If you consider a backup tool which will take
pg_receivewal's output and transfer it somewhere else, using the replication
slot position would be the easiest way to ensure we don't miss WAL files.

Does that make sense ?

I don't know if it should be the default, toggled by a command line flag, or if
we even should let the user provide a LSN.

*I* think it is completely reasonable (or at least convenient or less
astonishing) that pg_receivewal starts from the restart_lsn of the
replication slot to use. The tool already decides the clean-start LSN
a bit unusual way. And it seems to me that proposed behavior can be
the default when -S is specified.

I'd be happy to implement any of that if we agree.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#3Ronan Dunklau
ronan.dunklau@aiven.io
In reply to: Kyotaro Horiguchi (#2)
Re: pg_receivewal starting position

Le mercredi 28 juillet 2021, 08:22:30 CEST Kyotaro Horiguchi a écrit :

At Tue, 27 Jul 2021 07:50:39 +0200, Ronan Dunklau <ronan.dunklau@aiven.io>
wrote in

Hello,

I've notived that pg_receivewal logic for deciding which LSN to start

streaming at consists of:
- looking up the latest WAL file in our destination folder, and resume
from

here

- if there isn't, use the current flush location instead.

This behaviour surprised me when using it with a replication slot: I was
expecting it to start streaming at the last flushed location from the
replication slot instead. If you consider a backup tool which will take
pg_receivewal's output and transfer it somewhere else, using the
replication slot position would be the easiest way to ensure we don't
miss WAL files.

Does that make sense ?

I don't know if it should be the default, toggled by a command line flag,
or if we even should let the user provide a LSN.

*I* think it is completely reasonable (or at least convenient or less
astonishing) that pg_receivewal starts from the restart_lsn of the
replication slot to use. The tool already decides the clean-start LSN
a bit unusual way. And it seems to me that proposed behavior can be
the default when -S is specified.

As of now we can't get the replication_slot restart_lsn with a replication
connection AFAIK.

This implies that the patch could require the user to specify a maintenance-db
parameter, and we would use that if provided to fetch the replication slot
info, or fallback to the previous behaviour. I don't really like this approach
as the behaviour changing wether we supply a maintenance-db parameter or not
is error-prone for the user.

Another option would be to add a new replication command (for example
ACQUIRE_REPLICATION_SLOT <slot_name>) to set the replication slot as the
current one, and return some info about it (restart_lsn at least for a
physical slot).

I don't see any reason not to make it work for logical replication connections
/ slots, but it wouldn't be that useful since we can query the database in
that case.

Acquiring the replication slot instead of just reading it would make sure that
no other process could start the replication between the time we read the
restart_lsn and when we issue START_REPLICATION. START_REPLICATION could then
check if we already have a replication slot, and ensure it is the same one as
the one we're trying to use.

From pg_receivewal point of view, this would amount to:

- check if we currently have wal in the target directory.
- if we do, proceed as currently done, by computing the start lsn and
timeline from the last archived wal
- if we don't, and we have a slot, run ACQUIRE_REPLICATION_SLOT. Use the
restart_lsn as the start lsn if there is one, and don't provide a timeline
- if we still don't have a start_lsn, fallback to using the current server
wal position as is done.

What do you think ? Which information should we provide about the slot ?

--
Ronan Dunklau

#4Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Ronan Dunklau (#3)
Re: pg_receivewal starting position

At Wed, 28 Jul 2021 12:57:39 +0200, Ronan Dunklau <ronan.dunklau@aiven.io> wrote in

Le mercredi 28 juillet 2021, 08:22:30 CEST Kyotaro Horiguchi a écrit :

At Tue, 27 Jul 2021 07:50:39 +0200, Ronan Dunklau <ronan.dunklau@aiven.io>
wrote in

I don't know if it should be the default, toggled by a command line flag,
or if we even should let the user provide a LSN.

*I* think it is completely reasonable (or at least convenient or less
astonishing) that pg_receivewal starts from the restart_lsn of the
replication slot to use. The tool already decides the clean-start LSN
a bit unusual way. And it seems to me that proposed behavior can be
the default when -S is specified.

As of now we can't get the replication_slot restart_lsn with a replication
connection AFAIK.

This implies that the patch could require the user to specify a maintenance-db
parameter, and we would use that if provided to fetch the replication slot
info, or fallback to the previous behaviour. I don't really like this approach
as the behaviour changing wether we supply a maintenance-db parameter or not
is error-prone for the user.

Another option would be to add a new replication command (for example
ACQUIRE_REPLICATION_SLOT <slot_name>) to set the replication slot as the
current one, and return some info about it (restart_lsn at least for a
physical slot).

I didn't thought in details. But I forgot that ordinary SQL commands
have been prohibited in physical replication connection. So we need a
new replication command but it's not that a big deal.

I don't see any reason not to make it work for logical replication connections
/ slots, but it wouldn't be that useful since we can query the database in
that case.

Ordinary SQL queries are usable on a logical replication slot so
I'm not sure how logical replication connection uses the command.
However, like you, I wouldn't bother restricting the command to
physical replication, but perhaps the new command should return the
slot type.

Acquiring the replication slot instead of just reading it would make sure that
no other process could start the replication between the time we read the
restart_lsn and when we issue START_REPLICATION. START_REPLICATION could then
check if we already have a replication slot, and ensure it is the same one as
the one we're trying to use.

I'm not sure it's worth adding complexity for such strictness.
START_REPLICATION safely fails if someone steals the slot meanwhile.
In the first place there's no means to protect a slot from others
while idle. One possible problem is the case where START_REPLICATION
successfully acquire the slot after the new command failed. But that
case doesn't seem worse than the case someone advances the slot while
absence. So I think READ_REPLICATION_SLOT is sufficient.

From pg_receivewal point of view, this would amount to:

- check if we currently have wal in the target directory.
- if we do, proceed as currently done, by computing the start lsn and
timeline from the last archived wal
- if we don't, and we have a slot, run ACQUIRE_REPLICATION_SLOT. Use the
restart_lsn as the start lsn if there is one, and don't provide a timeline
- if we still don't have a start_lsn, fallback to using the current server
wal position as is done.

That's pretty much it.

What do you think ? Which information should we provide about the slot ?

We need the timeline id to start with when using restart_lsn. The
current timeline can be used in most cases but there's a case where
the LSN is historical.

pg_receivewal doesn't send a replication status report when a segment
is finished. So after pg_receivewal stops just after a segment is
finished, the slot stays at the beginning of the last segment. Thus
next time it will start from there, creating a duplicate segment.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#5Ronan Dunklau
ronan.dunklau@aiven.io
In reply to: Kyotaro Horiguchi (#4)
2 attachment(s)
Re: pg_receivewal starting position

Le jeudi 29 juillet 2021, 07:32:37 CEST Kyotaro Horiguchi a écrit :

I didn't thought in details. But I forgot that ordinary SQL commands
have been prohibited in physical replication connection. So we need a
new replication command but it's not that a big deal.

Thank you for your feedback !

I don't see any reason not to make it work for logical replication
connections / slots, but it wouldn't be that useful since we can query
the database in that case.

Ordinary SQL queries are usable on a logical replication slot so
I'm not sure how logical replication connection uses the command.
However, like you, I wouldn't bother restricting the command to
physical replication, but perhaps the new command should return the
slot type.

Ok done in the attached patch.

I'm not sure it's worth adding complexity for such strictness.
START_REPLICATION safely fails if someone steals the slot meanwhile.
In the first place there's no means to protect a slot from others
while idle. One possible problem is the case where START_REPLICATION
successfully acquire the slot after the new command failed. But that
case doesn't seem worse than the case someone advances the slot while
absence. So I think READ_REPLICATION_SLOT is sufficient.

Ok, I implemented it like this. I tried to follow the pg_get_replication_slots
approach with regards to how to prevent concurrent modification while reading
the slot.

From pg_receivewal point of view, this would amount to:
- check if we currently have wal in the target directory.

- if we do, proceed as currently done, by computing the start lsn and

timeline from the last archived wal

- if we don't, and we have a slot, run ACQUIRE_REPLICATION_SLOT. Use the

restart_lsn as the start lsn if there is one, and don't provide a timeline

- if we still don't have a start_lsn, fallback to using the current
server

wal position as is done.

That's pretty much it.

Great.

What do you think ? Which information should we provide about the slot ?

We need the timeline id to start with when using restart_lsn. The
current timeline can be used in most cases but there's a case where
the LSN is historical.

Ok, see below.

pg_receivewal doesn't send a replication status report when a segment
is finished. So after pg_receivewal stops just after a segment is
finished, the slot stays at the beginning of the last segment. Thus
next time it will start from there, creating a duplicate segment.

I'm not sure I see where the problem is here. If we don't keep the segments in
pg_walreceiver target directory, then it would be the responsibility of
whoever moved them to make sure we don't have duplicates, or to handle them
gracefully.

Even if we were forcing a feedback after a segment is finished, there could
still be a problem if the feedback never made it to the server but the segment
was here. It might be interesting to send a feedback anyway.

Please find attached two patches implementing what we've been discussing.

Patch 0001 adds the new READ_REPLICATION_SLOT command.
It returns for a given slot the type, restart_lsn, flush_lsn,
restart_lsn_timeline and flush_lsn_timeline.
The timelines are determined by reading the current timeline history, and
finding the timeline where we may find the record. I didn't find explicit test
for eg IDENTIFY_SYSTEM so didn't write one either for this new command, but it
is tested indirectly in patch 0002.

Patch 0002 makes pg_receivewal use that command if we use a replication slot
and the command is available, and use the restart_lsn and restart_lsn_timeline
as a starting point. It also adds a small test to check that we start back
from the previous restart_lsn instead of the current flush position when our
destination directory does not contain any WAL file.

I also noticed we don't test following a timeline switch. It would probably be
good to add that, both for the case where we determine the previous timeline
from the archived segments and when it comes from the new command. What do you
think ?

Regards,

--
Ronan Dunklau

Attachments:

v1-0001-Add-READ_REPLICATION_SLOT-command.patchtext/x-patch; charset=UTF-8; name=v1-0001-Add-READ_REPLICATION_SLOT-command.patchDownload
From 37d9545c05b9e36aafac751f9dc549e75798413c Mon Sep 17 00:00:00 2001
From: Ronan Dunklau <ronan.dunklau@aiven.io>
Date: Wed, 28 Jul 2021 16:34:54 +0200
Subject: [PATCH v1 1/2] Add READ_REPLICATION_SLOT command.

This commit introduces a new READ_REPLICATION_SLOT <slot_name> command.
This command is used to read information about a replication slot when
using a physical replication connection.

In this first version it returns the slot type, restart_lsn, flush_lsn and
the timeline of the restart_lsn and flush_lsn, which are obtained by following the
current timeline history.
---
 doc/src/sgml/protocol.sgml             |  62 +++++++++++++++
 src/backend/replication/repl_gram.y    |  18 ++++-
 src/backend/replication/repl_scanner.l |   1 +
 src/backend/replication/walsender.c    | 106 +++++++++++++++++++++++++
 src/include/nodes/nodes.h              |   1 +
 src/include/nodes/replnodes.h          |  10 +++
 6 files changed, 197 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index e8cb78ff1f..ea15e888d2 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2052,6 +2052,68 @@ The commands accepted in replication mode are:
     </listitem>
   </varlistentry>
 
+  <varlistentry>
+    <term><literal>READ_REPLICATION_SLOT</literal> <replaceable class="parameter">slot_name</replaceable>
+      <indexterm><primary>READ_REPLICATION_SLOT</primary></indexterm>
+    </term>
+    <listitem>
+      <para>
+      Read information about the named replication slot.  This is useful to determine which WAL location we should be asking the server to start streaming at.
+      </para>
+      <para>
+      In response to this command, the server will return a one-row result set, containing the following fields:
+        <variablelist>
+          <varlistentry>
+            <term><literal>type</literal> (<type>text</type>)</term>
+            <listitem>
+              <para>
+               The replication slot's type, either <literal>physical</literal> or <literal>logical</literal>
+              </para>
+            </listitem>
+          </varlistentry>
+
+          <varlistentry>
+            <term><literal>restart_lsn</literal> (<type>text</type>)</term>
+            <listitem>
+              <para>
+               The replication slot's restart_lsn.
+              </para>
+            </listitem>
+          </varlistentry>
+
+          <varlistentry>
+            <term><literal>confirmed_flush_lsn</literal> (<type>text</type>)</term>
+            <listitem>
+              <para>
+               The replication slot's confirmed_flush LSN.
+              </para>
+            </listitem>
+          </varlistentry>
+
+          <varlistentry>
+            <term><literal>restart_lsn_timeline</literal> (<type>int4</type>)</term>
+            <listitem>
+              <para>
+               The timeline ID for the restart_lsn position, when following the current timeline
+               history
+              </para>
+            </listitem>
+          </varlistentry>
+
+          <varlistentry>
+            <term><literal>confirmed_flush_lsn_timeline</literal> (<type>int4</type>)</term>
+            <listitem>
+              <para>
+               The timeline ID for the confirmed_flush_lsn position, when following the current timeline
+               history
+              </para>
+            </listitem>
+          </varlistentry>
+        </variablelist>
+      </para>
+    </listitem>
+  </varlistentry>
+
   <varlistentry>
     <term><literal>START_REPLICATION</literal> [ <literal>SLOT</literal> <replaceable class="parameter">slot_name</replaceable> ] [ <literal>PHYSICAL</literal> ] <replaceable class="parameter">XXX/XXX</replaceable> [ <literal>TIMELINE</literal> <replaceable class="parameter">tli</replaceable> ]
      <indexterm><primary>START_REPLICATION</primary></indexterm>
diff --git a/src/backend/replication/repl_gram.y b/src/backend/replication/repl_gram.y
index e1e8ec29cc..7298f44008 100644
--- a/src/backend/replication/repl_gram.y
+++ b/src/backend/replication/repl_gram.y
@@ -64,6 +64,7 @@ static SQLCmd *make_sqlcmd(void);
 /* Keyword tokens. */
 %token K_BASE_BACKUP
 %token K_IDENTIFY_SYSTEM
+%token K_READ_REPLICATION_SLOT
 %token K_SHOW
 %token K_START_REPLICATION
 %token K_CREATE_REPLICATION_SLOT
@@ -94,7 +95,7 @@ static SQLCmd *make_sqlcmd(void);
 %type <node>	command
 %type <node>	base_backup start_replication start_logical_replication
 				create_replication_slot drop_replication_slot identify_system
-				timeline_history show sql_cmd
+				identify_replication_slot timeline_history show sql_cmd
 %type <list>	base_backup_opt_list
 %type <defelt>	base_backup_opt
 %type <uintval>	opt_timeline
@@ -120,6 +121,7 @@ opt_semicolon:	';'
 
 command:
 			identify_system
+			| identify_replication_slot
 			| base_backup
 			| start_replication
 			| start_logical_replication
@@ -140,6 +142,18 @@ identify_system:
 				}
 			;
 
+/*
+ * READ_REPLICATION_SLOT %s
+ */
+identify_replication_slot:
+			K_READ_REPLICATION_SLOT var_name
+				{
+					ReadReplicationSlotCmd *n = makeNode(ReadReplicationSlotCmd);
+					n->slotname = $2;
+					$$ = (Node *) n;
+				}
+			;
+
 /*
  * SHOW setting
  */
@@ -361,6 +375,8 @@ timeline_history:
 				}
 			;
 
+
+
 opt_physical:
 			K_PHYSICAL
 			| /* EMPTY */
diff --git a/src/backend/replication/repl_scanner.l b/src/backend/replication/repl_scanner.l
index c038a636c3..1b599c255e 100644
--- a/src/backend/replication/repl_scanner.l
+++ b/src/backend/replication/repl_scanner.l
@@ -85,6 +85,7 @@ identifier		{ident_start}{ident_cont}*
 BASE_BACKUP			{ return K_BASE_BACKUP; }
 FAST			{ return K_FAST; }
 IDENTIFY_SYSTEM		{ return K_IDENTIFY_SYSTEM; }
+READ_REPLICATION_SLOT	{ return K_READ_REPLICATION_SLOT; }
 SHOW		{ return K_SHOW; }
 LABEL			{ return K_LABEL; }
 NOWAIT			{ return K_NOWAIT; }
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 3ca2a11389..9a13d1c186 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -232,6 +232,7 @@ static void XLogSendLogical(void);
 static void WalSndDone(WalSndSendDataCallback send_data);
 static XLogRecPtr GetStandbyFlushRecPtr(void);
 static void IdentifySystem(void);
+static void ReadReplicationSlot(ReadReplicationSlotCmd *cmd);
 static void CreateReplicationSlot(CreateReplicationSlotCmd *cmd);
 static void DropReplicationSlot(DropReplicationSlotCmd *cmd);
 static void StartReplication(StartReplicationCmd *cmd);
@@ -457,6 +458,104 @@ IdentifySystem(void)
 	end_tup_output(tstate);
 }
 
+/* Handle READ_REPLICATION_SLOT command */
+static void
+ReadReplicationSlot(ReadReplicationSlotCmd *cmd)
+{
+	ReplicationSlot *slot;
+	ReplicationSlot slot_contents;
+	DestReceiver *dest;
+	TupOutputState *tstate;
+	TupleDesc tupdesc;
+	Datum values[5];
+	bool  nulls[5];
+	char  xloc[MAXFNAMELEN];
+	int i = 0;
+	List *timeline_history = NIL;
+	TimeLineID slots_position_timeline;
+
+	LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
+	slot = SearchNamedReplicationSlot(cmd->slotname, false);
+	if (slot == NULL || !slot->in_use)
+	{
+		LWLockRelease(ReplicationSlotControlLock);
+
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("replication slot \"%s\" does not exist",
+						cmd->slotname)));
+	}
+	/* Copy slot contents while holding spinlock */
+	SpinLockAcquire(&slot->mutex);
+	slot_contents = *slot;
+	SpinLockRelease(&slot->mutex);
+	LWLockRelease(ReplicationSlotControlLock);
+
+	tupdesc = CreateTemplateTupleDesc(5);
+	TupleDescInitBuiltinEntry(tupdesc, (AttrNumber) 1, "type",
+							  TEXTOID, -1, 0);
+	TupleDescInitBuiltinEntry(tupdesc, (AttrNumber) 2, "restart_lsn",
+							  TEXTOID, -1, 0);
+	TupleDescInitBuiltinEntry(tupdesc, (AttrNumber) 3, "confirmed_flush_lsn",
+							  TEXTOID, -1, 0);
+	TupleDescInitBuiltinEntry(tupdesc, (AttrNumber) 4, "restart_lsn_timeline",
+							  INT4OID, -1, 0);
+	TupleDescInitBuiltinEntry(tupdesc, (AttrNumber) 5, "confirmed_flush_lsn_timeline",
+							  INT4OID, -1, 0);
+
+	if (slot_contents.data.database == InvalidOid)
+		values[i] = CStringGetTextDatum("physical");
+	else
+		values[i] = CStringGetTextDatum("logical");
+	nulls[i] = false;
+	i++;
+
+	snprintf(xloc, sizeof(xloc), "%X/%X", LSN_FORMAT_ARGS(slot_contents.data.restart_lsn));
+	values[i] = CStringGetTextDatum(xloc);
+	nulls[i] = false;
+	i++;
+
+	snprintf(xloc, sizeof(xloc), "%X/%X", LSN_FORMAT_ARGS(slot_contents.data.confirmed_flush));
+	values[i] = CStringGetTextDatum(xloc);
+	nulls[i] = false;
+	i++;
+
+	/* Now get the timeline this wal was produced on, to get to the current
+	 * timeline
+	 * XXX: should we allow the caller to specify which target timeline it wants
+	 * ?
+	 */
+	if (slot_contents.data.restart_lsn != InvalidXLogRecPtr)
+	{
+		timeline_history = readTimeLineHistory(ThisTimeLineID);
+		slots_position_timeline = tliOfPointInHistory(slot_contents.data.restart_lsn, timeline_history);
+		values[i] = Int32GetDatum(slots_position_timeline);
+		nulls[i] = false;
+	} else {
+		values[i] = 0;
+		nulls[i] = true;
+	}
+	i++;
+
+	if (slot_contents.data.confirmed_flush != InvalidXLogRecPtr)
+	{
+		if (!timeline_history)
+			timeline_history = readTimeLineHistory(ThisTimeLineID);
+		slots_position_timeline = tliOfPointInHistory(slot_contents.data.confirmed_flush, timeline_history);
+		values[i] = Int32GetDatum(slots_position_timeline);
+		nulls[i] = false;
+	} else {
+		values[i] = 0;
+		nulls[i] = true;
+	}
+	i++;
+
+	dest = CreateDestReceiver(DestRemoteSimple);
+	tstate = begin_tup_output_tupdesc(dest, tupdesc, &TTSOpsVirtual);
+	do_tup_output(tstate, values, nulls);
+	end_tup_output(tstate);
+}
+
 
 /*
  * Handle TIMELINE_HISTORY command.
@@ -1618,6 +1717,13 @@ exec_replication_command(const char *cmd_string)
 			EndReplicationCommand(cmdtag);
 			break;
 
+		case T_ReadReplicationSlotCmd:
+			cmdtag = "READ_REPLICATION_SLOT";
+			set_ps_display(cmdtag);
+			ReadReplicationSlot((ReadReplicationSlotCmd *) cmd_node);
+			EndReplicationCommand(cmdtag);
+			break;
+
 		case T_BaseBackupCmd:
 			cmdtag = "BASE_BACKUP";
 			set_ps_display(cmdtag);
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index f7b009ec43..ade121ad71 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -496,6 +496,7 @@ typedef enum NodeTag
 	 * TAGS FOR REPLICATION GRAMMAR PARSE NODES (replnodes.h)
 	 */
 	T_IdentifySystemCmd,
+	T_ReadReplicationSlotCmd,
 	T_BaseBackupCmd,
 	T_CreateReplicationSlotCmd,
 	T_DropReplicationSlotCmd,
diff --git a/src/include/nodes/replnodes.h b/src/include/nodes/replnodes.h
index faa3a251f2..ec85b7d993 100644
--- a/src/include/nodes/replnodes.h
+++ b/src/include/nodes/replnodes.h
@@ -33,6 +33,16 @@ typedef struct IdentifySystemCmd
 	NodeTag		type;
 } IdentifySystemCmd;
 
+/* ----------------------
+ *		READ_REPLICATION_SLOT command
+ * ----------------------
+ */
+typedef struct ReadReplicationSlotCmd
+{
+	NodeTag		type;
+	char		*slotname;
+} ReadReplicationSlotCmd;
+
 
 /* ----------------------
  *		BASE_BACKUP command
-- 
2.32.0

v1-0002-Use-READ_REPLICATION_SLOT-command-in-pg_receivewa.patchtext/x-patch; charset=UTF-8; name=v1-0002-Use-READ_REPLICATION_SLOT-command-in-pg_receivewa.patchDownload
From 0258bb4e628c5dc5fb35b8b289d8599740e42b17 Mon Sep 17 00:00:00 2001
From: Ronan Dunklau <ronan.dunklau@aiven.io>
Date: Wed, 28 Jul 2021 16:35:39 +0200
Subject: [PATCH v1 2/2] Use READ_REPLICATION_SLOT command in pg_receivewal.

Prior to this patch, when running pg_receivewal, the start LSN is determined by looking at
the WAL files currently stored on disk, then using the current flush lsn
from the server.

If for some reason the WAL files from pg_receivewal were moved, we want
to restart where we left at, which is the replication slot's restart_lsn
instead of skipping right to the current flush location.

To keep compatibility with prior server versions, we only attempt it if
the version is < 15.
---
 src/bin/pg_basebackup/pg_receivewal.c        | 61 +++++++++++++++++++-
 src/bin/pg_basebackup/t/020_pg_receivewal.pl | 46 ++++++++++++++-
 2 files changed, 103 insertions(+), 4 deletions(-)

diff --git a/src/bin/pg_basebackup/pg_receivewal.c b/src/bin/pg_basebackup/pg_receivewal.c
index 4474273daf..02c4d4f1a5 100644
--- a/src/bin/pg_basebackup/pg_receivewal.c
+++ b/src/bin/pg_basebackup/pg_receivewal.c
@@ -26,6 +26,7 @@
 #include "fe_utils/option_utils.h"
 #include "getopt_long.h"
 #include "libpq-fe.h"
+#include "pqexpbuffer.h"
 #include "receivelog.h"
 #include "streamutil.h"
 
@@ -190,6 +191,48 @@ close_destination_dir(DIR *dest_dir, char *dest_folder)
 	}
 }
 
+static XLogRecPtr
+GetSlotRestartLSN(const char *slot_name, uint32 *timeline)
+{
+	PGresult *res;
+	PQExpBuffer query;
+	uint32 hi,
+		   lo;
+	XLogRecPtr startpos;
+	if (slot_name == NULL)
+		return InvalidXLogRecPtr;
+
+	query = createPQExpBuffer();
+	appendPQExpBuffer(query, "READ_REPLICATION_SLOT %s",
+					  slot_name);
+	res = PQexec(conn, query->data);
+	destroyPQExpBuffer(query);
+	if (PQresultStatus(res) != PGRES_TUPLES_OK)
+	{
+		pg_log_error("could not acquire replication slot \"%s\": %s",
+					 slot_name, PQerrorMessage(conn));
+		PQclear(res);
+		return InvalidXLogRecPtr;
+	}
+	if (PQntuples(res) != 1 || PQnfields(res) < 4)
+	{
+		pg_log_error("could not fetch replication slot LSN: got %d rows and %d fields, expected %d rows and %d or more fields",
+					 PQntuples(res), PQnfields(res), 1, 2);
+		PQclear(res);
+		return InvalidXLogRecPtr;
+	}
+	if (sscanf(PQgetvalue(res, 0, 1), "%X/%X", &hi, &lo) != 2)
+	{
+		pg_log_error("could not parse slot's restart_lsn \"%s\"",
+					 PQgetvalue(res, 0, 0));
+		PQclear(res);
+		return InvalidXLogRecPtr;
+	}
+	startpos = ((uint64) hi) << 32 | lo;
+	*timeline = atoi(PQgetvalue(res, 0, 3));
+	PQclear(res);
+	return startpos;
+}
 
 /*
  * Determine starting location for streaming, based on any existing xlog
@@ -408,8 +451,22 @@ StreamLog(void)
 	stream.startpos = FindStreamingStart(&stream.timeline);
 	if (stream.startpos == InvalidXLogRecPtr)
 	{
-		stream.startpos = serverpos;
-		stream.timeline = servertli;
+		/* Try to get it from the slot if any, and the server supports it */
+		if (replication_slot)
+		{
+			if (PQserverVersion(conn) >= 150000)
+				stream.startpos = GetSlotRestartLSN(replication_slot, &stream.timeline);
+			else
+				pg_log_warning("Server does not suport fetching the slot's position, "
+							   "resuming from the current server position instead");
+		}
+		/* If it is still unknown, use the current flush value from the server
+		 */
+		if (stream.startpos == InvalidXLogRecPtr)
+		{
+			stream.startpos = serverpos;
+			stream.timeline = servertli;
+		}
 	}
 
 	/*
diff --git a/src/bin/pg_basebackup/t/020_pg_receivewal.pl b/src/bin/pg_basebackup/t/020_pg_receivewal.pl
index 950083d21c..537a0d9602 100644
--- a/src/bin/pg_basebackup/t/020_pg_receivewal.pl
+++ b/src/bin/pg_basebackup/t/020_pg_receivewal.pl
@@ -5,7 +5,7 @@ use strict;
 use warnings;
 use TestLib;
 use PostgresNode;
-use Test::More tests => 27;
+use Test::More tests => 30;
 
 program_help_ok('pg_receivewal');
 program_version_ok('pg_receivewal');
@@ -49,7 +49,7 @@ is($primary->slot($slot_name)->{'slot_type'},
 
 # Generate some WAL.  Use --synchronous at the same time to add more
 # code coverage.  Switch to the next segment first so that subsequent
-# restarts of pg_receivewal will see this segment as full..
+# restarts of pg_receivewal will see this segment as full../
 $primary->psql('postgres', 'CREATE TABLE test_table(x integer);');
 $primary->psql('postgres', 'SELECT pg_switch_wal();');
 my $nextlsn =
@@ -144,6 +144,48 @@ $primary->command_ok(
 $partial_wals[0] =~ s/(\.gz)?.partial//;
 ok(-e $partial_wals[0], "check that previously partial WAL is now complete");
 
+# Verify that if we use a replication slot, we resume where we left even in the
+# absence of WALs
+
+# Setup the slot, and connect to it a first time
+$primary->run_log(
+	[ 'pg_receivewal', '--slot', $slot_name, '--create-slot' ],
+	'creating a replication slot');
+$primary->psql('postgres',
+	'INSERT INTO test_table VALUES (generate_series(1,100));');
+$primary->psql('postgres', 'SELECT pg_switch_wal();');
+$nextlsn =
+  $primary->safe_psql('postgres', 'SELECT pg_current_wal_insert_lsn();');
+chomp($nextlsn);
+$primary->run_log(
+	[ 'pg_receivewal', '-D', $stream_dir, '--slot', $slot_name, '--verbose', '--endpos', $nextlsn ],
+	"streaming some WAL");
+
+# Get the slot restart_lsn and make sure we retrieve the associated WAL file
+# even after deletion of the previous stored WAL files.
+$slot = $primary->slot($slot_name);
+my $restart_lsn = $slot->{'restart_lsn'};
+# Add one so that the restart_lsn doesn't correspond to the previous file.
+$restart_lsn =~ s/.$/1/;
+my $walfile_to_be_archived = $primary->safe_psql('postgres',
+	"SELECT pg_walfile_name('$restart_lsn');");
+isnt($restart_lsn, '', 'restart LSN of new slot is not null');
+
+unlink glob "'${stream_dir}/*'";
+
+$primary->psql('postgres',
+	'INSERT INTO test_table VALUES (generate_series(1,100));');
+$primary->psql('postgres', 'SELECT pg_switch_wal();');
+$nextlsn =
+  $primary->safe_psql('postgres', 'SELECT pg_current_wal_insert_lsn();');
+chomp($nextlsn);
+$primary->command_ok(
+	[ 'pg_receivewal', '-D', $stream_dir, '--slot', $slot_name, '--verbose', '--endpos', $nextlsn ],
+	"Stream some wal starting from the slot's restart_lsn");
+$slot = $primary->slot($slot_name);
+my @walfiles = glob "${stream_dir}/*";
+ok(-e "$stream_dir/$walfile_to_be_archived", "WAL from the slot's restart_lsn has been archived");
+
 # Permissions on WAL files should be default
 SKIP:
 {
-- 
2.32.0

#6Ronan Dunklau
ronan.dunklau@aiven.io
In reply to: Ronan Dunklau (#5)
3 attachment(s)
Re: pg_receivewal starting position

Le jeudi 29 juillet 2021, 11:09:40 CEST Ronan Dunklau a écrit :

Patch 0001 adds the new READ_REPLICATION_SLOT command.
It returns for a given slot the type, restart_lsn, flush_lsn,
restart_lsn_timeline and flush_lsn_timeline.
The timelines are determined by reading the current timeline history, and
finding the timeline where we may find the record. I didn't find explicit
test for eg IDENTIFY_SYSTEM so didn't write one either for this new
command, but it is tested indirectly in patch 0002.

Patch 0002 makes pg_receivewal use that command if we use a replication slot
and the command is available, and use the restart_lsn and
restart_lsn_timeline as a starting point. It also adds a small test to
check that we start back from the previous restart_lsn instead of the
current flush position when our destination directory does not contain any
WAL file.

I also noticed we don't test following a timeline switch. It would probably
be good to add that, both for the case where we determine the previous
timeline from the archived segments and when it comes from the new command.
What do you think ?

Following the discussion at [1]/messages/by-id/flat/ CAD21AoDYmv0yJMQnWtCx_kZGwVZnkQSTQ1re2JNSgM0k37afYQ%40mail.gmail.com, I refactored the implementation into
streamutil and added a third patch making use of it in pg_basebackup itself in
order to fail early if the replication slot doesn't exist, so please find
attached v2 for that.

Best regards,

[1]: /messages/by-id/flat/ CAD21AoDYmv0yJMQnWtCx_kZGwVZnkQSTQ1re2JNSgM0k37afYQ%40mail.gmail.com
CAD21AoDYmv0yJMQnWtCx_kZGwVZnkQSTQ1re2JNSgM0k37afYQ%40mail.gmail.com

--
Ronan Dunklau

Attachments:

v2-0001-Add-READ_REPLICATION_SLOT-command.patchtext/x-patch; charset=UTF-8; name=v2-0001-Add-READ_REPLICATION_SLOT-command.patchDownload
From fff8786049326864d3ef8fe4539e1829f933f32f Mon Sep 17 00:00:00 2001
From: Ronan Dunklau <ronan.dunklau@aiven.io>
Date: Wed, 28 Jul 2021 16:34:54 +0200
Subject: [PATCH v2 1/3] Add READ_REPLICATION_SLOT command.

This commit introduces a new READ_REPLICATION_SLOT <slot_name> command.
This command is used to read information about a replication slot when
using a physical replication connection.

In this first version it returns the slot type, restart_lsn, flush_lsn and
the timeline of the restart_lsn and flush_lsn, which are obtained by following the
current timeline history.
---
 doc/src/sgml/protocol.sgml             |  62 +++++++++++++++
 src/backend/replication/repl_gram.y    |  18 ++++-
 src/backend/replication/repl_scanner.l |   1 +
 src/backend/replication/walsender.c    | 106 +++++++++++++++++++++++++
 src/include/nodes/nodes.h              |   1 +
 src/include/nodes/replnodes.h          |  10 +++
 6 files changed, 197 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index a232546b1d..6207171426 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2052,6 +2052,68 @@ The commands accepted in replication mode are:
     </listitem>
   </varlistentry>
 
+  <varlistentry>
+    <term><literal>READ_REPLICATION_SLOT</literal> <replaceable class="parameter">slot_name</replaceable>
+      <indexterm><primary>READ_REPLICATION_SLOT</primary></indexterm>
+    </term>
+    <listitem>
+      <para>
+      Read information about the named replication slot.  This is useful to determine which WAL location we should be asking the server to start streaming at.
+      </para>
+      <para>
+      In response to this command, the server will return a one-row result set, containing the following fields:
+        <variablelist>
+          <varlistentry>
+            <term><literal>type</literal> (<type>text</type>)</term>
+            <listitem>
+              <para>
+               The replication slot's type, either <literal>physical</literal> or <literal>logical</literal>
+              </para>
+            </listitem>
+          </varlistentry>
+
+          <varlistentry>
+            <term><literal>restart_lsn</literal> (<type>text</type>)</term>
+            <listitem>
+              <para>
+               The replication slot's restart_lsn.
+              </para>
+            </listitem>
+          </varlistentry>
+
+          <varlistentry>
+            <term><literal>confirmed_flush_lsn</literal> (<type>text</type>)</term>
+            <listitem>
+              <para>
+               The replication slot's confirmed_flush LSN.
+              </para>
+            </listitem>
+          </varlistentry>
+
+          <varlistentry>
+            <term><literal>restart_lsn_timeline</literal> (<type>int4</type>)</term>
+            <listitem>
+              <para>
+               The timeline ID for the restart_lsn position, when following the current timeline
+               history
+              </para>
+            </listitem>
+          </varlistentry>
+
+          <varlistentry>
+            <term><literal>confirmed_flush_lsn_timeline</literal> (<type>int4</type>)</term>
+            <listitem>
+              <para>
+               The timeline ID for the confirmed_flush_lsn position, when following the current timeline
+               history
+              </para>
+            </listitem>
+          </varlistentry>
+        </variablelist>
+      </para>
+    </listitem>
+  </varlistentry>
+
   <varlistentry>
     <term><literal>START_REPLICATION</literal> [ <literal>SLOT</literal> <replaceable class="parameter">slot_name</replaceable> ] [ <literal>PHYSICAL</literal> ] <replaceable class="parameter">XXX/XXX</replaceable> [ <literal>TIMELINE</literal> <replaceable class="parameter">tli</replaceable> ]
      <indexterm><primary>START_REPLICATION</primary></indexterm>
diff --git a/src/backend/replication/repl_gram.y b/src/backend/replication/repl_gram.y
index e1e8ec29cc..7298f44008 100644
--- a/src/backend/replication/repl_gram.y
+++ b/src/backend/replication/repl_gram.y
@@ -64,6 +64,7 @@ static SQLCmd *make_sqlcmd(void);
 /* Keyword tokens. */
 %token K_BASE_BACKUP
 %token K_IDENTIFY_SYSTEM
+%token K_READ_REPLICATION_SLOT
 %token K_SHOW
 %token K_START_REPLICATION
 %token K_CREATE_REPLICATION_SLOT
@@ -94,7 +95,7 @@ static SQLCmd *make_sqlcmd(void);
 %type <node>	command
 %type <node>	base_backup start_replication start_logical_replication
 				create_replication_slot drop_replication_slot identify_system
-				timeline_history show sql_cmd
+				identify_replication_slot timeline_history show sql_cmd
 %type <list>	base_backup_opt_list
 %type <defelt>	base_backup_opt
 %type <uintval>	opt_timeline
@@ -120,6 +121,7 @@ opt_semicolon:	';'
 
 command:
 			identify_system
+			| identify_replication_slot
 			| base_backup
 			| start_replication
 			| start_logical_replication
@@ -140,6 +142,18 @@ identify_system:
 				}
 			;
 
+/*
+ * READ_REPLICATION_SLOT %s
+ */
+identify_replication_slot:
+			K_READ_REPLICATION_SLOT var_name
+				{
+					ReadReplicationSlotCmd *n = makeNode(ReadReplicationSlotCmd);
+					n->slotname = $2;
+					$$ = (Node *) n;
+				}
+			;
+
 /*
  * SHOW setting
  */
@@ -361,6 +375,8 @@ timeline_history:
 				}
 			;
 
+
+
 opt_physical:
 			K_PHYSICAL
 			| /* EMPTY */
diff --git a/src/backend/replication/repl_scanner.l b/src/backend/replication/repl_scanner.l
index c038a636c3..1b599c255e 100644
--- a/src/backend/replication/repl_scanner.l
+++ b/src/backend/replication/repl_scanner.l
@@ -85,6 +85,7 @@ identifier		{ident_start}{ident_cont}*
 BASE_BACKUP			{ return K_BASE_BACKUP; }
 FAST			{ return K_FAST; }
 IDENTIFY_SYSTEM		{ return K_IDENTIFY_SYSTEM; }
+READ_REPLICATION_SLOT	{ return K_READ_REPLICATION_SLOT; }
 SHOW		{ return K_SHOW; }
 LABEL			{ return K_LABEL; }
 NOWAIT			{ return K_NOWAIT; }
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 3ca2a11389..9a13d1c186 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -232,6 +232,7 @@ static void XLogSendLogical(void);
 static void WalSndDone(WalSndSendDataCallback send_data);
 static XLogRecPtr GetStandbyFlushRecPtr(void);
 static void IdentifySystem(void);
+static void ReadReplicationSlot(ReadReplicationSlotCmd *cmd);
 static void CreateReplicationSlot(CreateReplicationSlotCmd *cmd);
 static void DropReplicationSlot(DropReplicationSlotCmd *cmd);
 static void StartReplication(StartReplicationCmd *cmd);
@@ -457,6 +458,104 @@ IdentifySystem(void)
 	end_tup_output(tstate);
 }
 
+/* Handle READ_REPLICATION_SLOT command */
+static void
+ReadReplicationSlot(ReadReplicationSlotCmd *cmd)
+{
+	ReplicationSlot *slot;
+	ReplicationSlot slot_contents;
+	DestReceiver *dest;
+	TupOutputState *tstate;
+	TupleDesc tupdesc;
+	Datum values[5];
+	bool  nulls[5];
+	char  xloc[MAXFNAMELEN];
+	int i = 0;
+	List *timeline_history = NIL;
+	TimeLineID slots_position_timeline;
+
+	LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
+	slot = SearchNamedReplicationSlot(cmd->slotname, false);
+	if (slot == NULL || !slot->in_use)
+	{
+		LWLockRelease(ReplicationSlotControlLock);
+
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("replication slot \"%s\" does not exist",
+						cmd->slotname)));
+	}
+	/* Copy slot contents while holding spinlock */
+	SpinLockAcquire(&slot->mutex);
+	slot_contents = *slot;
+	SpinLockRelease(&slot->mutex);
+	LWLockRelease(ReplicationSlotControlLock);
+
+	tupdesc = CreateTemplateTupleDesc(5);
+	TupleDescInitBuiltinEntry(tupdesc, (AttrNumber) 1, "type",
+							  TEXTOID, -1, 0);
+	TupleDescInitBuiltinEntry(tupdesc, (AttrNumber) 2, "restart_lsn",
+							  TEXTOID, -1, 0);
+	TupleDescInitBuiltinEntry(tupdesc, (AttrNumber) 3, "confirmed_flush_lsn",
+							  TEXTOID, -1, 0);
+	TupleDescInitBuiltinEntry(tupdesc, (AttrNumber) 4, "restart_lsn_timeline",
+							  INT4OID, -1, 0);
+	TupleDescInitBuiltinEntry(tupdesc, (AttrNumber) 5, "confirmed_flush_lsn_timeline",
+							  INT4OID, -1, 0);
+
+	if (slot_contents.data.database == InvalidOid)
+		values[i] = CStringGetTextDatum("physical");
+	else
+		values[i] = CStringGetTextDatum("logical");
+	nulls[i] = false;
+	i++;
+
+	snprintf(xloc, sizeof(xloc), "%X/%X", LSN_FORMAT_ARGS(slot_contents.data.restart_lsn));
+	values[i] = CStringGetTextDatum(xloc);
+	nulls[i] = false;
+	i++;
+
+	snprintf(xloc, sizeof(xloc), "%X/%X", LSN_FORMAT_ARGS(slot_contents.data.confirmed_flush));
+	values[i] = CStringGetTextDatum(xloc);
+	nulls[i] = false;
+	i++;
+
+	/* Now get the timeline this wal was produced on, to get to the current
+	 * timeline
+	 * XXX: should we allow the caller to specify which target timeline it wants
+	 * ?
+	 */
+	if (slot_contents.data.restart_lsn != InvalidXLogRecPtr)
+	{
+		timeline_history = readTimeLineHistory(ThisTimeLineID);
+		slots_position_timeline = tliOfPointInHistory(slot_contents.data.restart_lsn, timeline_history);
+		values[i] = Int32GetDatum(slots_position_timeline);
+		nulls[i] = false;
+	} else {
+		values[i] = 0;
+		nulls[i] = true;
+	}
+	i++;
+
+	if (slot_contents.data.confirmed_flush != InvalidXLogRecPtr)
+	{
+		if (!timeline_history)
+			timeline_history = readTimeLineHistory(ThisTimeLineID);
+		slots_position_timeline = tliOfPointInHistory(slot_contents.data.confirmed_flush, timeline_history);
+		values[i] = Int32GetDatum(slots_position_timeline);
+		nulls[i] = false;
+	} else {
+		values[i] = 0;
+		nulls[i] = true;
+	}
+	i++;
+
+	dest = CreateDestReceiver(DestRemoteSimple);
+	tstate = begin_tup_output_tupdesc(dest, tupdesc, &TTSOpsVirtual);
+	do_tup_output(tstate, values, nulls);
+	end_tup_output(tstate);
+}
+
 
 /*
  * Handle TIMELINE_HISTORY command.
@@ -1618,6 +1717,13 @@ exec_replication_command(const char *cmd_string)
 			EndReplicationCommand(cmdtag);
 			break;
 
+		case T_ReadReplicationSlotCmd:
+			cmdtag = "READ_REPLICATION_SLOT";
+			set_ps_display(cmdtag);
+			ReadReplicationSlot((ReadReplicationSlotCmd *) cmd_node);
+			EndReplicationCommand(cmdtag);
+			break;
+
 		case T_BaseBackupCmd:
 			cmdtag = "BASE_BACKUP";
 			set_ps_display(cmdtag);
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 6a4d82f0a8..5f78bdd573 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -495,6 +495,7 @@ typedef enum NodeTag
 	 * TAGS FOR REPLICATION GRAMMAR PARSE NODES (replnodes.h)
 	 */
 	T_IdentifySystemCmd,
+	T_ReadReplicationSlotCmd,
 	T_BaseBackupCmd,
 	T_CreateReplicationSlotCmd,
 	T_DropReplicationSlotCmd,
diff --git a/src/include/nodes/replnodes.h b/src/include/nodes/replnodes.h
index faa3a251f2..ec85b7d993 100644
--- a/src/include/nodes/replnodes.h
+++ b/src/include/nodes/replnodes.h
@@ -33,6 +33,16 @@ typedef struct IdentifySystemCmd
 	NodeTag		type;
 } IdentifySystemCmd;
 
+/* ----------------------
+ *		READ_REPLICATION_SLOT command
+ * ----------------------
+ */
+typedef struct ReadReplicationSlotCmd
+{
+	NodeTag		type;
+	char		*slotname;
+} ReadReplicationSlotCmd;
+
 
 /* ----------------------
  *		BASE_BACKUP command
-- 
2.32.0

v2-0002-Use-READ_REPLICATION_SLOT-command-in-pg_receivewa.patchtext/x-patch; charset=UTF-8; name=v2-0002-Use-READ_REPLICATION_SLOT-command-in-pg_receivewa.patchDownload
From 935cdeef089c985867c9d18df96caca1c751f02d Mon Sep 17 00:00:00 2001
From: Ronan Dunklau <ronan.dunklau@aiven.io>
Date: Wed, 28 Jul 2021 16:35:39 +0200
Subject: [PATCH v2 2/3] Use READ_REPLICATION_SLOT command in pg_receivewal.

Prior to this patch, when running pg_receivewal, the start LSN is determined by looking at
the WAL files currently stored on disk, then using the current flush lsn
from the server.

If for some reason the WAL files from pg_receivewal were moved, we want
to restart where we left at, which is the replication slot's restart_lsn
instead of skipping right to the current flush location.

To keep compatibility with prior server versions, we only attempt it if
the version is < 15.
---
 src/bin/pg_basebackup/pg_receivewal.c        | 26 ++++++++-
 src/bin/pg_basebackup/streamutil.c           | 61 ++++++++++++++++++++
 src/bin/pg_basebackup/streamutil.h           |  2 +
 src/bin/pg_basebackup/t/020_pg_receivewal.pl | 46 ++++++++++++++-
 4 files changed, 131 insertions(+), 4 deletions(-)

diff --git a/src/bin/pg_basebackup/pg_receivewal.c b/src/bin/pg_basebackup/pg_receivewal.c
index 4474273daf..453b3644ca 100644
--- a/src/bin/pg_basebackup/pg_receivewal.c
+++ b/src/bin/pg_basebackup/pg_receivewal.c
@@ -26,6 +26,7 @@
 #include "fe_utils/option_utils.h"
 #include "getopt_long.h"
 #include "libpq-fe.h"
+#include "pqexpbuffer.h"
 #include "receivelog.h"
 #include "streamutil.h"
 
@@ -191,6 +192,7 @@ close_destination_dir(DIR *dest_dir, char *dest_folder)
 }
 
 
+
 /*
  * Determine starting location for streaming, based on any existing xlog
  * segments in the directory. We start at the end of the last one that is
@@ -408,8 +410,28 @@ StreamLog(void)
 	stream.startpos = FindStreamingStart(&stream.timeline);
 	if (stream.startpos == InvalidXLogRecPtr)
 	{
-		stream.startpos = serverpos;
-		stream.timeline = servertli;
+		/* Try to get it from the slot if any, and the server supports it */
+		if (replication_slot)
+		{
+			if (PQserverVersion(conn) >= 150000)
+			{
+				if (!GetSlotInformation(conn, replication_slot, &stream.startpos, &stream.timeline))
+					pg_log_warning("Could not fetch the replication_slot \"%s\" information "
+								   "resuming from the current server position instead", replication_slot);
+			}
+			else
+				pg_log_warning("Server does not suport fetching the slot's position, "
+							   "resuming from the current server position instead");
+		}
+
+		/*
+		 * If it is still unknown, use the current flush value from the server
+		 */
+		if (stream.startpos == InvalidXLogRecPtr)
+		{
+			stream.startpos = serverpos;
+			stream.timeline = servertli;
+		}
 	}
 
 	/*
diff --git a/src/bin/pg_basebackup/streamutil.c b/src/bin/pg_basebackup/streamutil.c
index f5b3b476e5..c902824617 100644
--- a/src/bin/pg_basebackup/streamutil.c
+++ b/src/bin/pg_basebackup/streamutil.c
@@ -397,6 +397,8 @@ RetrieveDataDirCreatePerm(PGconn *conn)
 	return true;
 }
 
+
+
 /*
  * Run IDENTIFY_SYSTEM through a given connection and give back to caller
  * some result information if requested:
@@ -479,6 +481,65 @@ RunIdentifySystem(PGconn *conn, char **sysid, TimeLineID *starttli,
 	return true;
 }
 
+
+/*
+ * Check a replication slot exists through a given connection, and give back to
+ * caller some result information if requested:
+ * 	- restart_lsn
+ * 	- timeline
+ */
+bool
+GetSlotInformation(PGconn *conn, const char *slot_name, XLogRecPtr *restart_lsn, uint32 *restart_lsn_timeline)
+{
+	PGresult   *res;
+	PQExpBuffer query;
+	uint32		hi,
+				lo;
+
+	if (slot_name == NULL)
+		return InvalidXLogRecPtr;
+
+	query = createPQExpBuffer();
+	appendPQExpBuffer(query, "READ_REPLICATION_SLOT %s",
+					  slot_name);
+	res = PQexec(conn, query->data);
+	destroyPQExpBuffer(query);
+	if (PQresultStatus(res) != PGRES_TUPLES_OK)
+	{
+		pg_log_error("could not read replication slot : %s",
+					 PQerrorMessage(conn));
+		PQclear(res);
+		return InvalidXLogRecPtr;
+	}
+	if (PQntuples(res) == 0)
+	{
+		pg_log_error("replication slot %s does not exist", slot_name);
+		PQclear(0);
+		return false;
+	}
+
+	if (PQntuples(res) != 1 || PQnfields(res) < 4)
+	{
+		pg_log_error("could not fetch replication slot LSN: got %d rows and %d fields, expected %d rows and %d or more fields",
+					 PQntuples(res), PQnfields(res), 1, 2);
+		PQclear(res);
+		return false;
+	}
+	if (sscanf(PQgetvalue(res, 0, 1), "%X/%X", &hi, &lo) != 2)
+	{
+		pg_log_error("could not parse slot's restart_lsn \"%s\"",
+					 PQgetvalue(res, 0, 0));
+		PQclear(res);
+		return false;
+	}
+	if (restart_lsn)
+		*restart_lsn = ((uint64) hi) << 32 | lo;
+	if (restart_lsn_timeline)
+		*restart_lsn_timeline = atoi(PQgetvalue(res, 0, 3));
+	PQclear(res);
+	return true;
+}
+
 /*
  * Create a replication slot for the given connection. This function
  * returns true in case of success.
diff --git a/src/bin/pg_basebackup/streamutil.h b/src/bin/pg_basebackup/streamutil.h
index 504803b976..e34f986136 100644
--- a/src/bin/pg_basebackup/streamutil.h
+++ b/src/bin/pg_basebackup/streamutil.h
@@ -40,6 +40,8 @@ extern bool RunIdentifySystem(PGconn *conn, char **sysid,
 							  TimeLineID *starttli,
 							  XLogRecPtr *startpos,
 							  char **db_name);
+extern bool GetSlotInformation(PGconn *conn, const char *slot_name, XLogRecPtr *restart_lsn, uint32 *restart_lsn_timeline);
+
 extern bool RetrieveWalSegSize(PGconn *conn);
 extern TimestampTz feGetCurrentTimestamp(void);
 extern void feTimestampDifference(TimestampTz start_time, TimestampTz stop_time,
diff --git a/src/bin/pg_basebackup/t/020_pg_receivewal.pl b/src/bin/pg_basebackup/t/020_pg_receivewal.pl
index 0b33d73900..4f0f6956b5 100644
--- a/src/bin/pg_basebackup/t/020_pg_receivewal.pl
+++ b/src/bin/pg_basebackup/t/020_pg_receivewal.pl
@@ -5,7 +5,7 @@ use strict;
 use warnings;
 use TestLib;
 use PostgresNode;
-use Test::More tests => 27;
+use Test::More tests => 30;
 
 program_help_ok('pg_receivewal');
 program_version_ok('pg_receivewal');
@@ -49,7 +49,7 @@ is($primary->slot($slot_name)->{'slot_type'},
 
 # Generate some WAL.  Use --synchronous at the same time to add more
 # code coverage.  Switch to the next segment first so that subsequent
-# restarts of pg_receivewal will see this segment as full..
+# restarts of pg_receivewal will see this segment as full../
 $primary->psql('postgres', 'CREATE TABLE test_table(x integer);');
 $primary->psql('postgres', 'SELECT pg_switch_wal();');
 my $nextlsn =
@@ -146,6 +146,48 @@ $primary->command_ok(
 $partial_wals[0] =~ s/(\.gz)?.partial//;
 ok(-e $partial_wals[0], "check that previously partial WAL is now complete");
 
+# Verify that if we use a replication slot, we resume where we left even in the
+# absence of WALs
+
+# Setup the slot, and connect to it a first time
+$primary->run_log(
+	[ 'pg_receivewal', '--slot', $slot_name, '--create-slot' ],
+	'creating a replication slot');
+$primary->psql('postgres',
+	'INSERT INTO test_table VALUES (generate_series(1,100));');
+$primary->psql('postgres', 'SELECT pg_switch_wal();');
+$nextlsn =
+  $primary->safe_psql('postgres', 'SELECT pg_current_wal_insert_lsn();');
+chomp($nextlsn);
+$primary->run_log(
+	[ 'pg_receivewal', '-D', $stream_dir, '--slot', $slot_name, '--verbose', '--endpos', $nextlsn ],
+	"streaming some WAL");
+
+# Get the slot restart_lsn and make sure we retrieve the associated WAL file
+# even after deletion of the previous stored WAL files.
+$slot = $primary->slot($slot_name);
+my $restart_lsn = $slot->{'restart_lsn'};
+# Add one so that the restart_lsn doesn't correspond to the previous file.
+$restart_lsn =~ s/.$/1/;
+my $walfile_to_be_archived = $primary->safe_psql('postgres',
+	"SELECT pg_walfile_name('$restart_lsn');");
+isnt($restart_lsn, '', 'restart LSN of new slot is not null');
+
+unlink glob "'${stream_dir}/*'";
+
+$primary->psql('postgres',
+	'INSERT INTO test_table VALUES (generate_series(1,100));');
+$primary->psql('postgres', 'SELECT pg_switch_wal();');
+$nextlsn =
+  $primary->safe_psql('postgres', 'SELECT pg_current_wal_insert_lsn();');
+chomp($nextlsn);
+$primary->command_ok(
+	[ 'pg_receivewal', '-D', $stream_dir, '--slot', $slot_name, '--verbose', '--endpos', $nextlsn ],
+	"Stream some wal starting from the slot's restart_lsn");
+$slot = $primary->slot($slot_name);
+my @walfiles = glob "${stream_dir}/*";
+ok(-e "$stream_dir/$walfile_to_be_archived", "WAL from the slot's restart_lsn has been archived");
+
 # Permissions on WAL files should be default
 SKIP:
 {
-- 
2.32.0

v2-0003-Check-slot-existence-in-pg_basebackup.patchtext/x-patch; charset=UTF-8; name=v2-0003-Check-slot-existence-in-pg_basebackup.patchDownload
From 5eec8896036da42175f1f58893a87faa8f92d560 Mon Sep 17 00:00:00 2001
From: Ronan Dunklau <ronan.dunklau@aiven.io>
Date: Thu, 26 Aug 2021 14:05:26 +0200
Subject: [PATCH v2 3/3] Check slot existence in pg_basebackup.

Use the newly introduced READ_REPLICATION_SLOT command to check for a
slot existence in pg_basebackup. That way, we can fail early.
---
 src/bin/pg_basebackup/pg_basebackup.c        |  9 +++++++++
 src/bin/pg_basebackup/t/010_pg_basebackup.pl | 10 ++++++----
 2 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 7296eb97d0..6794994c69 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -1867,6 +1867,15 @@ BaseBackup(void)
 	if (!RunIdentifySystem(conn, &sysidentifier, &latesttli, NULL, NULL))
 		exit(1);
 
+	/*
+	 * Check the replication slot exists if applicable
+	 */
+	if (replication_slot && !(temp_replication_slot || create_slot) && PQserverVersion(conn) >= 15000)
+	{
+		if (!GetSlotInformation(conn, replication_slot, NULL, NULL))
+			exit(1);
+	}
+
 	/*
 	 * Start the actual backup
 	 */
diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
index a2cb2a7679..787edd2f4f 100644
--- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl
+++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
@@ -10,7 +10,7 @@ use File::Path qw(rmtree);
 use Fcntl qw(:seek);
 use PostgresNode;
 use TestLib;
-use Test::More tests => 110;
+use Test::More tests => 111;
 
 program_help_ok('pg_basebackup');
 program_version_ok('pg_basebackup');
@@ -465,14 +465,16 @@ $node->command_ok(
 	'pg_basebackup -X stream runs with --no-slot');
 rmtree("$tempdir/backupnoslot");
 
-$node->command_fails(
+command_fails_like(
 	[
 		'pg_basebackup',             '-D',
 		"$tempdir/backupxs_sl_fail", '-X',
 		'stream',                    '-S',
-		'slot0'
+		'slot0',                     '-p',
+		"$port"
 	],
-	'pg_basebackup fails with nonexistent replication slot');
+	qr/pg_basebackup: error: could not read replication slot/,
+	'pg_basebackup fails early with nonexistent replication slot');
 
 $node->command_fails(
 	[ 'pg_basebackup', '-D', "$tempdir/backupxs_slot", '-C' ],
-- 
2.32.0

#7Michael Paquier
michael@paquier.xyz
In reply to: Ronan Dunklau (#6)
Re: pg_receivewal starting position

On Thu, Aug 26, 2021 at 02:14:27PM +0200, Ronan Dunklau wrote:

Following the discussion at [1], I refactored the implementation into
streamutil and added a third patch making use of it in pg_basebackup itself in
order to fail early if the replication slot doesn't exist, so please find
attached v2 for that.

Thanks for the split. That helps a lot.

+
+
 /*
  * Run IDENTIFY_SYSTEM through a given connection and give back to caller

The patch series has some noise diffs here and there, you may want to
clean up that for clarity.

+   if (slot == NULL || !slot->in_use)
+   {
+       LWLockRelease(ReplicationSlotControlLock);
+
+       ereport(ERROR,
+               (errcode(ERRCODE_UNDEFINED_OBJECT),
LWLocks are released on ERROR, so no need for LWLockRelease() here.
+    <listitem>
+      <para>
+      Read information about the named replication slot.  This is
useful to determine which WAL location we should be asking the server
to start streaming at.

A nit. You may want to be more careful with the indentation of the
documentation. Things are usually limited in width for readability.
More <literal> markups would be nice for the field names used in the
descriptions.

+   if (slot == NULL || !slot->in_use)                                                                                                                                                                                       [...]
+       ereport(ERROR,
+               (errcode(ERRCODE_UNDEFINED_OBJECT),
+                errmsg("replication slot \"%s\" does not exist",
+                       cmd->slotname)));
[...]
+       if (PQntuples(res) == 0)
+       {
+               pg_log_error("replication slot %s does not exist", slot_name);
+               PQclear(0);
+               return false;
So, the backend and ReadReplicationSlot() report an ERROR if a slot
does not exist but pg_basebackup's GetSlotInformation() does the same
if there are no tuples returned.  That's inconsistent.  Wouldn't it be
more instinctive to return a NULL tuple instead if the slot does not
exist to be able to check after real ERRORs in frontends using this
interface?  A slot in use exists, so the error is a bit confusing here
anyway, no?
+    * XXX: should we allow the caller to specify which target timeline it wants
+    * ?
+    */
What are you thinking about here?
-# restarts of pg_receivewal will see this segment as full..
+# restarts of pg_receivewal will see this segment as full../
Typo.
+   TupleDescInitBuiltinEntry(tupdesc, (AttrNumber) 4, "restart_lsn_timeline",
+                             INT4OID, -1, 0);
+   TupleDescInitBuiltinEntry(tupdesc, (AttrNumber) 5, "confirmed_flush_lsn_timeline",
+                             INT4OID, -1, 0);
I would call these restart_tli and confirmed_flush_tli., without the
"lsn" part.

The patch for READ_REPLICATION_SLOT could have some tests using a
connection that has replication=1 in some TAP tests. We do that in
001_stream_rep.pl with SHOW, as one example.

-               'slot0'
+               'slot0',                     '-p',
+               "$port"
Something we are missing here?
--
Michael
#8Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Ronan Dunklau (#6)
Re: pg_receivewal starting position

On Thu, Aug 26, 2021 at 5:45 PM Ronan Dunklau <ronan.dunklau@aiven.io> wrote:

order to fail early if the replication slot doesn't exist, so please find
attached v2 for that.

Thanks for the patches. Here are some comments:

1) While the intent of these patches looks good, I have following
concern with new replication command READ_REPLICATION_SLOT: what if
the pg_receivewal exits (because user issued a SIGINT or for some
reason) after flushing the received WAL to disk, before it sends
sendFeedback to postgres server's walsender so that it doesn't get a
chance to update the restart_lsn in the replication slot via
PhysicalConfirmReceivedLocation. If the pg_receivewal is started
again, isn't it going to get the previous restart_lsn and receive the
last chunk of flushed WAL again?

2) What is the significance of READ_REPLICATION_SLOT for logical
replication slots? I read above that somebody suggested to restrict
the walsender to handle READ_REPLICATION_SLOT for physical replication
slots so that the callers will see a command failure. But I tend to
think that it is clean to have this command common for both physical
and logical replication slots and the callers can have an Assert(type
== 'physical').

3) Isn't it useful to send active, active_pid info of the replication
slot via READ_REPLICATION_SLOT? pg_receivewal can use Assert(active ==
true && active_pid == getpid()) as an assertion to ensure that it is
the sole owner of the replication slot? Also, is it good send
wal_status info

4) I think below messages should start with lower case letter and also
there are some typos:
+ pg_log_warning("Could not fetch the replication_slot \"%s\" information "
+ pg_log_warning("Server does not suport fetching the slot's position, "
something like:
+ pg_log_warning("could not fetch replication slot \"%s\" information, "
+    "resuming from current server position instead", replication_slot);
+ pg_log_warning("server does not support fetching replication slot
information, "
+    "resuming from current server position instead");

5) How about emitting the above messages in case of "verbose"?

6) How about an assertion like below?
+ if (stream.startpos == InvalidXLogRecPtr)
+ {
+ stream.startpos = serverpos;
+ stream.timeline = servertli;
+ }
+
+Assert(stream.startpos != InvalidXLogRecPtr)>>

7) How about we let pg_receivewal use READ_REPLICATION_SLOT as an option?

8) Just an idea, how about we store pg_receivewal's lastFlushPosition
in a file before pg_receivewal exits and compare it with the
restart_lsn that it received from the replication slot, if
lastFlushPosition == received_restart_lsn well and good, if not, then
something would have happened and we always start at the
lastFlushPosition ?

Regards,
Bharath Rupireddy.

#9Ronan Dunklau
ronan.dunklau@aiven.io
In reply to: Michael Paquier (#7)
3 attachment(s)
Re: pg_receivewal starting position

Le vendredi 27 août 2021, 05:44:32 CEST Michael Paquier a écrit :

On Thu, Aug 26, 2021 at 02:14:27PM +0200, Ronan Dunklau wrote:

Following the discussion at [1], I refactored the implementation into
streamutil and added a third patch making use of it in pg_basebackup
itself in order to fail early if the replication slot doesn't exist, so
please find attached v2 for that.

Thanks for the split. That helps a lot.

Thank you very much for the review, please find attached an updated patchset.
I've also taken into account some remarks made by Bharath Rupireddy.

+
+
/*
* Run IDENTIFY_SYSTEM through a given connection and give back to caller

The patch series has some noise diffs here and there, you may want to
clean up that for clarity.

Ok, sorry about that.

+   if (slot == NULL || !slot->in_use)
+   {
+       LWLockRelease(ReplicationSlotControlLock);
+
+       ereport(ERROR,
+               (errcode(ERRCODE_UNDEFINED_OBJECT),
LWLocks are released on ERROR, so no need for LWLockRelease() here.

Following your suggestion of not erroring out on an unexisting slot this point
is no longer be relevant, but thanks for pointing this out anyway.

+    <listitem>
+      <para>
+      Read information about the named replication slot.  This is
useful to determine which WAL location we should be asking the server
to start streaming at.

A nit. You may want to be more careful with the indentation of the
documentation. Things are usually limited in width for readability.
More <literal> markups would be nice for the field names used in the
descriptions.

Ok.

+ if (slot == NULL || !slot->in_use)

[...] +
ereport(ERROR,
+               (errcode(ERRCODE_UNDEFINED_OBJECT),
+                errmsg("replication slot \"%s\" does not exist",
+                       cmd->slotname)));
[...]
+       if (PQntuples(res) == 0)
+       {
+               pg_log_error("replication slot %s does not exist",
slot_name); +               PQclear(0);
+               return false;
So, the backend and ReadReplicationSlot() report an ERROR if a slot
does not exist but pg_basebackup's GetSlotInformation() does the same
if there are no tuples returned.  That's inconsistent.  Wouldn't it be
more instinctive to return a NULL tuple instead if the slot does not
exist to be able to check after real ERRORs in frontends using this
interface?  

The attached patch returns no tuple at all when the replication slot doesn't
exist. I'm not sure if that's what you meant by returning a NULL tuple ?

A slot in use exists, so the error is a bit confusing here
anyway, no?

From my understanding, a slot *not* in use doesn't exist anymore, as such I
don't really understand this point. Could you clarify ?

+    * XXX: should we allow the caller to specify which target timeline it
wants +    * ?
+    */
What are you thinking about here?

I was thinking that maybe instead of walking back the timeline history from
where we currently are on the server, we could allow an additional argument
for the client to specify which timeline it wants. But I guess a replication
slot can not be present for a past, divergent timeline ? I have removed that
suggestion.

-# restarts of pg_receivewal will see this segment as full..
+# restarts of pg_receivewal will see this segment as full../
Typo.

Ok.

+   TupleDescInitBuiltinEntry(tupdesc, (AttrNumber) 4,
"restart_lsn_timeline", +                             INT4OID, -1, 0);
+   TupleDescInitBuiltinEntry(tupdesc, (AttrNumber) 5,
"confirmed_flush_lsn_timeline", +                             INT4OID, -1,
0);
I would call these restart_tli and confirmed_flush_tli., without the
"lsn" part.

Ok.

The patch for READ_REPLICATION_SLOT could have some tests using a
connection that has replication=1 in some TAP tests. We do that in
001_stream_rep.pl with SHOW, as one example.

Ok. I added the physical part to 001_stream_rep.pl, using the protocol
interface directly for creating / dropping the slot, and some tests for
logical replication slots to 006_logical_decoding.pl.

-               'slot0'
+               'slot0',                     '-p',
+               "$port"
Something we are missing here?

The thing we're missing here is a wrapper for command_fails_like. I've added
this to PostgresNode.pm.

Best regards,

--
Ronan Dunklau

Attachments:

v3-0001-Add-READ_REPLICATION_SLOT-command.patchtext/x-patch; charset=UTF-8; name=v3-0001-Add-READ_REPLICATION_SLOT-command.patchDownload
From 9fa01789f663975b963c26875b70857055cadb9b Mon Sep 17 00:00:00 2001
From: Ronan Dunklau <ronan.dunklau@aiven.io>
Date: Wed, 28 Jul 2021 16:34:54 +0200
Subject: [PATCH v3 1/3] Add READ_REPLICATION_SLOT command.

This commit introduces a new READ_REPLICATION_SLOT <slot_name> command.
This command is used to read information about a replication slot when
using a physical replication connection.

In this first version it returns the slot type, restart_lsn, flush_lsn and
the timeline of the restart_lsn and flush_lsn, which are obtained by following the
current timeline history.
---
 doc/src/sgml/protocol.sgml                  |  66 ++++++++++
 src/backend/replication/repl_gram.y         |  16 ++-
 src/backend/replication/repl_scanner.l      |   1 +
 src/backend/replication/walsender.c         | 129 ++++++++++++++++++++
 src/include/nodes/nodes.h                   |   1 +
 src/include/nodes/replnodes.h               |  10 ++
 src/test/recovery/t/001_stream_rep.pl       |  46 ++++++-
 src/test/recovery/t/006_logical_decoding.pl |  13 +-
 8 files changed, 279 insertions(+), 3 deletions(-)

diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index a232546b1d..0e6fb01054 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2052,6 +2052,72 @@ The commands accepted in replication mode are:
     </listitem>
   </varlistentry>
 
+  <varlistentry>
+    <term><literal>READ_REPLICATION_SLOT</literal> <replaceable class="parameter">slot_name</replaceable>
+      <indexterm><primary>READ_REPLICATION_SLOT</primary></indexterm>
+    </term>
+    <listitem>
+      <para>
+      Read information about the named replication slot.
+      This is useful to determine which WAL location we should be asking the
+      server to start streaming at.
+      </para>
+      <para>
+      In response to this command, the server will return a one-row result set,
+      containing the following fields:
+        <variablelist>
+          <varlistentry>
+            <term><literal>type</literal> (<type>text</type>)</term>
+            <listitem>
+              <para>
+               The replication slot's type, either <literal>physical</literal> or
+               <literal>logical</literal>
+              </para>
+            </listitem>
+          </varlistentry>
+
+          <varlistentry>
+            <term><literal>restart_lsn</literal> (<type>text</type>)</term>
+            <listitem>
+              <para>
+               The replication slot's <literal>restart_lsn</literal>.
+              </para>
+            </listitem>
+          </varlistentry>
+
+          <varlistentry>
+            <term><literal>confirmed_flush_lsn</literal> (<type>text</type>)</term>
+            <listitem>
+              <para>
+               The replication slot's <literal>confirmed_flush_lsn</literal>.
+              </para>
+            </listitem>
+          </varlistentry>
+
+          <varlistentry>
+            <term><literal>restart_tli</literal> (<type>int4</type>)</term>
+            <listitem>
+              <para>
+               The timeline ID for the <literal>restart_lsn</literal> position,
+               when following the current timeline history
+              </para>
+            </listitem>
+          </varlistentry>
+
+          <varlistentry>
+            <term><literal>confirmed_flush_tli</literal> (<type>int4</type>)</term>
+            <listitem>
+              <para>
+               The timeline ID for the <literal>confirmed_flush_lsn</literal>
+               position, when following the current timeline history
+              </para>
+            </listitem>
+          </varlistentry>
+        </variablelist>
+      </para>
+    </listitem>
+  </varlistentry>
+
   <varlistentry>
     <term><literal>START_REPLICATION</literal> [ <literal>SLOT</literal> <replaceable class="parameter">slot_name</replaceable> ] [ <literal>PHYSICAL</literal> ] <replaceable class="parameter">XXX/XXX</replaceable> [ <literal>TIMELINE</literal> <replaceable class="parameter">tli</replaceable> ]
      <indexterm><primary>START_REPLICATION</primary></indexterm>
diff --git a/src/backend/replication/repl_gram.y b/src/backend/replication/repl_gram.y
index e1e8ec29cc..5de003b7dc 100644
--- a/src/backend/replication/repl_gram.y
+++ b/src/backend/replication/repl_gram.y
@@ -64,6 +64,7 @@ static SQLCmd *make_sqlcmd(void);
 /* Keyword tokens. */
 %token K_BASE_BACKUP
 %token K_IDENTIFY_SYSTEM
+%token K_READ_REPLICATION_SLOT
 %token K_SHOW
 %token K_START_REPLICATION
 %token K_CREATE_REPLICATION_SLOT
@@ -94,7 +95,7 @@ static SQLCmd *make_sqlcmd(void);
 %type <node>	command
 %type <node>	base_backup start_replication start_logical_replication
 				create_replication_slot drop_replication_slot identify_system
-				timeline_history show sql_cmd
+				identify_replication_slot timeline_history show sql_cmd
 %type <list>	base_backup_opt_list
 %type <defelt>	base_backup_opt
 %type <uintval>	opt_timeline
@@ -120,6 +121,7 @@ opt_semicolon:	';'
 
 command:
 			identify_system
+			| identify_replication_slot
 			| base_backup
 			| start_replication
 			| start_logical_replication
@@ -140,6 +142,18 @@ identify_system:
 				}
 			;
 
+/*
+ * READ_REPLICATION_SLOT %s
+ */
+identify_replication_slot:
+			K_READ_REPLICATION_SLOT var_name
+				{
+					ReadReplicationSlotCmd *n = makeNode(ReadReplicationSlotCmd);
+					n->slotname = $2;
+					$$ = (Node *) n;
+				}
+			;
+
 /*
  * SHOW setting
  */
diff --git a/src/backend/replication/repl_scanner.l b/src/backend/replication/repl_scanner.l
index c038a636c3..1b599c255e 100644
--- a/src/backend/replication/repl_scanner.l
+++ b/src/backend/replication/repl_scanner.l
@@ -85,6 +85,7 @@ identifier		{ident_start}{ident_cont}*
 BASE_BACKUP			{ return K_BASE_BACKUP; }
 FAST			{ return K_FAST; }
 IDENTIFY_SYSTEM		{ return K_IDENTIFY_SYSTEM; }
+READ_REPLICATION_SLOT	{ return K_READ_REPLICATION_SLOT; }
 SHOW		{ return K_SHOW; }
 LABEL			{ return K_LABEL; }
 NOWAIT			{ return K_NOWAIT; }
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 3ca2a11389..de50f06e90 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -232,6 +232,7 @@ static void XLogSendLogical(void);
 static void WalSndDone(WalSndSendDataCallback send_data);
 static XLogRecPtr GetStandbyFlushRecPtr(void);
 static void IdentifySystem(void);
+static void ReadReplicationSlot(ReadReplicationSlotCmd *cmd);
 static void CreateReplicationSlot(CreateReplicationSlotCmd *cmd);
 static void DropReplicationSlot(DropReplicationSlotCmd *cmd);
 static void StartReplication(StartReplicationCmd *cmd);
@@ -457,6 +458,127 @@ IdentifySystem(void)
 	end_tup_output(tstate);
 }
 
+/* Handle READ_REPLICATION_SLOT command */
+static void
+ReadReplicationSlot(ReadReplicationSlotCmd * cmd)
+{
+	ReplicationSlot *slot;
+	ReplicationSlot slot_contents;
+	DestReceiver *dest;
+	TupOutputState *tstate;
+	TupleDesc	tupdesc;
+	Datum		values[5];
+	bool		nulls[5];
+	char		xloc[MAXFNAMELEN];
+	int			i = 0;
+	List	   *timeline_history = NIL;
+	TimeLineID	slots_position_timeline;
+	bool		has_value;
+
+	tupdesc = CreateTemplateTupleDesc(5);
+	TupleDescInitBuiltinEntry(tupdesc, (AttrNumber) 1, "type",
+							  TEXTOID, -1, 0);
+	TupleDescInitBuiltinEntry(tupdesc, (AttrNumber) 2, "restart_lsn",
+							  TEXTOID, -1, 0);
+	TupleDescInitBuiltinEntry(tupdesc, (AttrNumber) 3, "confirmed_flush_lsn",
+							  TEXTOID, -1, 0);
+	TupleDescInitBuiltinEntry(tupdesc, (AttrNumber) 4, "restart_tli",
+							  INT4OID, -1, 0);
+	TupleDescInitBuiltinEntry(tupdesc, (AttrNumber) 5, "confirmed_flush_tli",
+							  INT4OID, -1, 0);
+
+	LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
+	slot = SearchNamedReplicationSlot(cmd->slotname, false);
+	if (slot == NULL || !slot->in_use)
+	{
+		LWLockRelease(ReplicationSlotControlLock);
+		has_value = false;
+	}
+	else
+	{
+		/* Copy slot contents while holding spinlock */
+		SpinLockAcquire(&slot->mutex);
+		slot_contents = *slot;
+		SpinLockRelease(&slot->mutex);
+		LWLockRelease(ReplicationSlotControlLock);
+
+
+		if (slot_contents.data.database == InvalidOid)
+			values[i] = CStringGetTextDatum("physical");
+		else
+			values[i] = CStringGetTextDatum("logical");
+		nulls[i] = false;
+		i++;
+
+		if (slot_contents.data.restart_lsn != InvalidXLogRecPtr)
+		{
+			snprintf(xloc, sizeof(xloc), "%X/%X", LSN_FORMAT_ARGS(slot_contents.data.restart_lsn));
+			values[i] = CStringGetTextDatum(xloc);
+			nulls[i] = false;
+		}
+		else
+		{
+			values[i] = 0;
+			nulls[i] = true;
+
+		}
+		i++;
+
+		if (slot_contents.data.confirmed_flush != InvalidXLogRecPtr)
+		{
+			snprintf(xloc, sizeof(xloc), "%X/%X", LSN_FORMAT_ARGS(slot_contents.data.confirmed_flush));
+			values[i] = CStringGetTextDatum(xloc);
+			nulls[i] = false;
+		}
+		else
+		{
+			values[i] = 0;
+			nulls[i] = true;
+		}
+		i++;
+
+		/*
+		 * Now get the timeline this wal was produced on, to get to the
+		 * current timeline
+		 */
+		if (slot_contents.data.restart_lsn != InvalidXLogRecPtr)
+		{
+			timeline_history = readTimeLineHistory(ThisTimeLineID);
+			slots_position_timeline = tliOfPointInHistory(slot_contents.data.restart_lsn, timeline_history);
+			values[i] = Int32GetDatum(slots_position_timeline);
+			nulls[i] = false;
+		}
+		else
+		{
+			values[i] = 0;
+			nulls[i] = true;
+		}
+		i++;
+
+		if (slot_contents.data.confirmed_flush != InvalidXLogRecPtr)
+		{
+			if (!timeline_history)
+				timeline_history = readTimeLineHistory(ThisTimeLineID);
+			slots_position_timeline = tliOfPointInHistory(slot_contents.data.confirmed_flush, timeline_history);
+			values[i] = Int32GetDatum(slots_position_timeline);
+			nulls[i] = false;
+		}
+		else
+		{
+			values[i] = 0;
+			nulls[i] = true;
+		}
+		i++;
+		has_value = true;
+	}
+
+	dest = CreateDestReceiver(DestRemoteSimple);
+	tstate = begin_tup_output_tupdesc(dest, tupdesc, &TTSOpsVirtual);
+	if (has_value)
+		do_tup_output(tstate, values, nulls);
+	end_tup_output(tstate);
+}
+
 
 /*
  * Handle TIMELINE_HISTORY command.
@@ -1618,6 +1740,13 @@ exec_replication_command(const char *cmd_string)
 			EndReplicationCommand(cmdtag);
 			break;
 
+		case T_ReadReplicationSlotCmd:
+			cmdtag = "READ_REPLICATION_SLOT";
+			set_ps_display(cmdtag);
+			ReadReplicationSlot((ReadReplicationSlotCmd *) cmd_node);
+			EndReplicationCommand(cmdtag);
+			break;
+
 		case T_BaseBackupCmd:
 			cmdtag = "BASE_BACKUP";
 			set_ps_display(cmdtag);
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 6a4d82f0a8..5f78bdd573 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -495,6 +495,7 @@ typedef enum NodeTag
 	 * TAGS FOR REPLICATION GRAMMAR PARSE NODES (replnodes.h)
 	 */
 	T_IdentifySystemCmd,
+	T_ReadReplicationSlotCmd,
 	T_BaseBackupCmd,
 	T_CreateReplicationSlotCmd,
 	T_DropReplicationSlotCmd,
diff --git a/src/include/nodes/replnodes.h b/src/include/nodes/replnodes.h
index faa3a251f2..ec85b7d993 100644
--- a/src/include/nodes/replnodes.h
+++ b/src/include/nodes/replnodes.h
@@ -33,6 +33,16 @@ typedef struct IdentifySystemCmd
 	NodeTag		type;
 } IdentifySystemCmd;
 
+/* ----------------------
+ *		READ_REPLICATION_SLOT command
+ * ----------------------
+ */
+typedef struct ReadReplicationSlotCmd
+{
+	NodeTag		type;
+	char		*slotname;
+} ReadReplicationSlotCmd;
+
 
 /* ----------------------
  *		BASE_BACKUP command
diff --git a/src/test/recovery/t/001_stream_rep.pl b/src/test/recovery/t/001_stream_rep.pl
index ac581c1c07..75a8c6e45e 100644
--- a/src/test/recovery/t/001_stream_rep.pl
+++ b/src/test/recovery/t/001_stream_rep.pl
@@ -6,7 +6,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 49;
+use Test::More tests => 55;
 
 # Initialize primary node
 my $node_primary = PostgresNode->new('primary');
@@ -252,6 +252,50 @@ ok( $ret == 0,
 	"SHOW with superuser-settable parameter, replication role and logical replication"
 );
 
+note "testing READ_REPLICATION_SLOT command";
+
+my $slotname = 'test_read_replication_slot_physical';
+
+($ret, $stdout, $stderr) = $node_primary->psql(
+  'postgres', 'READ_REPLICATION_SLOT non_existent_slot;',
+  extra_params => [ '-d', $connstr_rep ]);
+ok( $ret == 0,
+  "READ_REPLICATION_SLOT does not produce an error with non existent slot");
+ok( $stdout eq '',
+    "READ_REPLICATION_SLOT returns no tuple if a slot is non existent");
+
+($ret, $stdout, $stderr) = $node_primary->psql(
+  'postgres',
+  "CREATE_REPLICATION_SLOT $slotname PHYSICAL RESERVE_WAL;",
+  extra_params => [ '-d', $connstr_rep ],
+  0,
+  'physical slot created on primary');
+
+($ret, $stdout, $stderr) = $node_primary->psql(
+  'postgres',
+  "READ_REPLICATION_SLOT $slotname;",
+  extra_params => [ '-d', $connstr_rep ]);
+ok( $ret == 0,
+  "READ_REPLICATION_SLOT does not produce an error with existing slot");
+ok( $stdout =~ 'physical\|[^|]*\|\|1\|',
+    "READ_REPLICATION_SLOT returns tuple corresponding to the slot");
+
+$node_primary->psql(
+  'postgres',
+  "DROP_REPLICATION_SLOT $slotname;",
+  extra_params => [ '-d', $connstr_rep ],
+  0,
+  'physical slot dropped on primary');
+
+($ret, $stdout, $stderr) = $node_primary->psql(
+  'postgres',
+  "READ_REPLICATION_SLOT $slotname;",
+  extra_params => [ '-d', $connstr_rep ]);
+ok( $ret == 0,
+  "READ_REPLICATION_SLOT does not produce an error with dropped slot");
+ok( $stdout eq '',
+    "READ_REPLICATION_SLOT returns no tuple if a slot has been dropped");
+
 note "switching to physical replication slot";
 
 # Switch to using a physical replication slot. We can do this without a new
diff --git a/src/test/recovery/t/006_logical_decoding.pl b/src/test/recovery/t/006_logical_decoding.pl
index cc116062c2..07396f40e8 100644
--- a/src/test/recovery/t/006_logical_decoding.pl
+++ b/src/test/recovery/t/006_logical_decoding.pl
@@ -10,7 +10,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 14;
+use Test::More tests => 16;
 use Config;
 
 # Initialize primary node
@@ -39,6 +39,17 @@ ok( $stderr =~
 	  m/replication slot "test_slot" was not created in this database/,
 	"Logical decoding correctly fails to start");
 
+($result, $stdout, $stderr) = $node_primary->psql('template1',
+  qq[READ_REPLICATION_SLOT test_slot;],
+  replication => 'database');
+ok ($stdout =~ 'logical\|[^|]*\|[^|]*\|1\|1',
+  'Logical replication slot can be read on any logical connection');
+($result, $stdout, $stderr) = $node_primary->psql('postgres',
+  qq[READ_REPLICATION_SLOT test_slot;],
+  replication => '1');
+ok ($stdout =~ 'logical\|[^|]*\|[^|]*\|1\|1',
+  'Logical replication slot can be read on a physical connection');
+
 # Check case of walsender not using a database connection.  Logical
 # decoding should not be allowed.
 ($result, $stdout, $stderr) = $node_primary->psql(
-- 
2.32.0

v3-0002-Use-READ_REPLICATION_SLOT-command-in-pg_receivewa.patchtext/x-patch; charset=UTF-8; name=v3-0002-Use-READ_REPLICATION_SLOT-command-in-pg_receivewa.patchDownload
From 546af79c4d0c2da5fa5bbf00d9280f6fee961e26 Mon Sep 17 00:00:00 2001
From: Ronan Dunklau <ronan.dunklau@aiven.io>
Date: Wed, 28 Jul 2021 16:35:39 +0200
Subject: [PATCH v3 2/3] Use READ_REPLICATION_SLOT command in pg_receivewal.

Prior to this patch, when running pg_receivewal, the start LSN is determined by looking at
the WAL files currently stored on disk, then using the current flush lsn
from the server.

If for some reason the WAL files from pg_receivewal were moved, we want
to restart where we left at, which is the replication slot's restart_lsn
instead of skipping right to the current flush location.

To keep compatibility with prior server versions, we only attempt it if
the version is < 15.
---
 src/bin/pg_basebackup/pg_receivewal.c        | 35 ++++++++-
 src/bin/pg_basebackup/streamutil.c           | 74 ++++++++++++++++++++
 src/bin/pg_basebackup/streamutil.h           |  3 +
 src/bin/pg_basebackup/t/020_pg_receivewal.pl | 46 +++++++++++-
 4 files changed, 154 insertions(+), 4 deletions(-)

diff --git a/src/bin/pg_basebackup/pg_receivewal.c b/src/bin/pg_basebackup/pg_receivewal.c
index 4474273daf..1f769f6179 100644
--- a/src/bin/pg_basebackup/pg_receivewal.c
+++ b/src/bin/pg_basebackup/pg_receivewal.c
@@ -26,6 +26,7 @@
 #include "fe_utils/option_utils.h"
 #include "getopt_long.h"
 #include "libpq-fe.h"
+#include "pqexpbuffer.h"
 #include "receivelog.h"
 #include "streamutil.h"
 
@@ -408,10 +409,40 @@ StreamLog(void)
 	stream.startpos = FindStreamingStart(&stream.timeline);
 	if (stream.startpos == InvalidXLogRecPtr)
 	{
-		stream.startpos = serverpos;
-		stream.timeline = servertli;
+		/* Try to get it from the slot if any, and the server supports it */
+		if (replication_slot)
+		{
+			if (PQserverVersion(conn) >= 150000)
+			{
+				char	   *slot_type = NULL;
+
+				if (!GetSlotInformation(conn, replication_slot, &stream.startpos, &stream.timeline, &slot_type))
+					pg_log_warning("could not fetch the replication_slot \"%s\" information "
+								   "resuming from the current server position instead", replication_slot);
+				if (strcmp(slot_type, "physical") != 0)
+				{
+					pg_log_error("slot \"%s\" is not a physical replication slot",
+								 replication_slot);
+					exit(1);
+				}
+			}
+			else
+				pg_log_warning("server does not suport fetching the slot's position, "
+							   "resuming from the current server position instead");
+		}
+
+		/*
+		 * If it is still unknown, use the current flush value from the server
+		 */
+		if (stream.startpos == InvalidXLogRecPtr)
+		{
+			stream.startpos = serverpos;
+			stream.timeline = servertli;
+		}
 	}
 
+	Assert(stream.startpos != InvalidXLogRecPtr);
+
 	/*
 	 * Always start streaming at the beginning of a segment
 	 */
diff --git a/src/bin/pg_basebackup/streamutil.c b/src/bin/pg_basebackup/streamutil.c
index f5b3b476e5..b100a073d9 100644
--- a/src/bin/pg_basebackup/streamutil.c
+++ b/src/bin/pg_basebackup/streamutil.c
@@ -479,6 +479,80 @@ RunIdentifySystem(PGconn *conn, char **sysid, TimeLineID *starttli,
 	return true;
 }
 
+
+/*
+ * Check a replication slot exists through a given connection, and give back to
+ * caller some result information if requested:
+ * 	- restart_lsn
+ * 	- timeline
+ */
+bool
+GetSlotInformation(PGconn *conn, const char *slot_name, XLogRecPtr *restart_lsn,
+				   uint32 *restart_lsn_timeline, char **slot_type)
+{
+	PGresult   *res;
+	PQExpBuffer query;
+	uint32		hi,
+				lo;
+
+	if (slot_name == NULL)
+		return InvalidXLogRecPtr;
+
+	query = createPQExpBuffer();
+	appendPQExpBuffer(query, "READ_REPLICATION_SLOT %s",
+					  slot_name);
+	res = PQexec(conn, query->data);
+	destroyPQExpBuffer(query);
+	if (PQresultStatus(res) != PGRES_TUPLES_OK)
+	{
+		pg_log_error("could not read replication slot : %s",
+					 PQerrorMessage(conn));
+		PQclear(res);
+		return InvalidXLogRecPtr;
+	}
+	if (PQntuples(res) == 0)
+	{
+		pg_log_error("replication slot \"%s\" does not exist", slot_name);
+		PQclear(0);
+		return false;
+	}
+	if (slot_type != NULL)
+	{
+		*slot_type = NULL;
+		if (!PQgetisnull(res, 0, 0))
+			*slot_type = pg_strdup(PQgetvalue(res, 0, 0));
+	}
+
+	if (PQntuples(res) != 1 || PQnfields(res) < 4)
+	{
+		pg_log_error("could not fetch replication slot LSN: got %d rows and %d fields, expected %d rows and %d or more fields",
+					 PQntuples(res), PQnfields(res), 1, 2);
+		PQclear(res);
+		return false;
+	}
+	if (PQgetisnull(res, 0, 1))
+	{
+		hi = 0;
+		lo = 0;
+	}
+	else
+	{
+		if (sscanf(PQgetvalue(res, 0, 1), "%X/%X", &hi, &lo) != 2)
+		{
+			pg_log_error("could not parse slot's restart_lsn \"%s\"",
+						 PQgetvalue(res, 0, 1));
+			PQclear(res);
+			return false;
+		}
+	}
+	if (restart_lsn)
+		*restart_lsn = ((uint64) hi) << 32 | lo;
+	if (restart_lsn_timeline && !PQgetisnull(res, 0, 3))
+		*restart_lsn_timeline = atoi(PQgetvalue(res, 0, 3));
+	PQclear(res);
+	return true;
+}
+
 /*
  * Create a replication slot for the given connection. This function
  * returns true in case of success.
diff --git a/src/bin/pg_basebackup/streamutil.h b/src/bin/pg_basebackup/streamutil.h
index 504803b976..81115029d2 100644
--- a/src/bin/pg_basebackup/streamutil.h
+++ b/src/bin/pg_basebackup/streamutil.h
@@ -40,6 +40,9 @@ extern bool RunIdentifySystem(PGconn *conn, char **sysid,
 							  TimeLineID *starttli,
 							  XLogRecPtr *startpos,
 							  char **db_name);
+extern bool GetSlotInformation(PGconn *conn, const char *slot_name,
+							   XLogRecPtr *restart_lsn, uint32 *restart_lsn_timeline,
+							   char **slot_type);
 extern bool RetrieveWalSegSize(PGconn *conn);
 extern TimestampTz feGetCurrentTimestamp(void);
 extern void feTimestampDifference(TimestampTz start_time, TimestampTz stop_time,
diff --git a/src/bin/pg_basebackup/t/020_pg_receivewal.pl b/src/bin/pg_basebackup/t/020_pg_receivewal.pl
index 0b33d73900..6b40778429 100644
--- a/src/bin/pg_basebackup/t/020_pg_receivewal.pl
+++ b/src/bin/pg_basebackup/t/020_pg_receivewal.pl
@@ -5,7 +5,7 @@ use strict;
 use warnings;
 use TestLib;
 use PostgresNode;
-use Test::More tests => 27;
+use Test::More tests => 30;
 
 program_help_ok('pg_receivewal');
 program_version_ok('pg_receivewal');
@@ -49,7 +49,7 @@ is($primary->slot($slot_name)->{'slot_type'},
 
 # Generate some WAL.  Use --synchronous at the same time to add more
 # code coverage.  Switch to the next segment first so that subsequent
-# restarts of pg_receivewal will see this segment as full..
+# restarts of pg_receivewal will see this segment as full
 $primary->psql('postgres', 'CREATE TABLE test_table(x integer);');
 $primary->psql('postgres', 'SELECT pg_switch_wal();');
 my $nextlsn =
@@ -146,6 +146,48 @@ $primary->command_ok(
 $partial_wals[0] =~ s/(\.gz)?.partial//;
 ok(-e $partial_wals[0], "check that previously partial WAL is now complete");
 
+# Verify that if we use a replication slot, we resume where we left even in the
+# absence of WALs
+
+# Setup the slot, and connect to it a first time
+$primary->run_log(
+	[ 'pg_receivewal', '--slot', $slot_name, '--create-slot' ],
+	'creating a replication slot');
+$primary->psql('postgres',
+	'INSERT INTO test_table VALUES (generate_series(1,100));');
+$primary->psql('postgres', 'SELECT pg_switch_wal();');
+$nextlsn =
+  $primary->safe_psql('postgres', 'SELECT pg_current_wal_insert_lsn();');
+chomp($nextlsn);
+$primary->run_log(
+	[ 'pg_receivewal', '-D', $stream_dir, '--slot', $slot_name, '--verbose', '--endpos', $nextlsn ],
+	"streaming some WAL");
+
+# Get the slot restart_lsn and make sure we retrieve the associated WAL file
+# even after deletion of the previous stored WAL files.
+$slot = $primary->slot($slot_name);
+my $restart_lsn = $slot->{'restart_lsn'};
+# Add one so that the restart_lsn doesn't correspond to the previous file.
+$restart_lsn =~ s/.$/1/;
+my $walfile_to_be_archived = $primary->safe_psql('postgres',
+	"SELECT pg_walfile_name('$restart_lsn');");
+isnt($restart_lsn, '', 'restart LSN of new slot is not null');
+
+unlink glob "'${stream_dir}/*'";
+
+$primary->psql('postgres',
+	'INSERT INTO test_table VALUES (generate_series(1,100));');
+$primary->psql('postgres', 'SELECT pg_switch_wal();');
+$nextlsn =
+  $primary->safe_psql('postgres', 'SELECT pg_current_wal_insert_lsn();');
+chomp($nextlsn);
+$primary->command_ok(
+	[ 'pg_receivewal', '-D', $stream_dir, '--slot', $slot_name, '--verbose', '--endpos', $nextlsn ],
+	"Stream some wal starting from the slot's restart_lsn");
+$slot = $primary->slot($slot_name);
+my @walfiles = glob "${stream_dir}/*";
+ok(-e "$stream_dir/$walfile_to_be_archived", "WAL from the slot's restart_lsn has been archived");
+
 # Permissions on WAL files should be default
 SKIP:
 {
-- 
2.32.0

v3-0003-Check-slot-existence-in-pg_basebackup.patchtext/x-patch; charset=UTF-8; name=v3-0003-Check-slot-existence-in-pg_basebackup.patchDownload
From e6a867c8a9724d37ce04b56bd132f8bfc571d659 Mon Sep 17 00:00:00 2001
From: Ronan Dunklau <ronan.dunklau@aiven.io>
Date: Thu, 26 Aug 2021 14:05:26 +0200
Subject: [PATCH v3 3/3] Check slot existence in pg_basebackup.

Use the newly introduced READ_REPLICATION_SLOT command to check for a
slot existence in pg_basebackup. That way, we can fail early.
---
 src/bin/pg_basebackup/pg_basebackup.c        | 17 +++++++++++++++++
 src/bin/pg_basebackup/t/010_pg_basebackup.pl |  7 ++++---
 src/test/perl/PostgresNode.pm                | 20 ++++++++++++++++++++
 3 files changed, 41 insertions(+), 3 deletions(-)

diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 7296eb97d0..b93db32e23 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -1867,6 +1867,23 @@ BaseBackup(void)
 	if (!RunIdentifySystem(conn, &sysidentifier, &latesttli, NULL, NULL))
 		exit(1);
 
+	/*
+	 * Check the replication slot exists if applicable
+	 */
+	if (replication_slot && !(temp_replication_slot || create_slot) && PQserverVersion(conn) >= 15000)
+	{
+		char	   *slot_type = NULL;
+
+		if (!GetSlotInformation(conn, replication_slot, NULL, NULL, &slot_type))
+			exit(1);
+		if (strcmp(slot_type, "physical") != 0)
+		{
+			pg_log_error("Slot \"%s\" is not a physical replication slot",
+						 replication_slot);
+			exit(1);
+		}
+	}
+
 	/*
 	 * Start the actual backup
 	 */
diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
index a2cb2a7679..69fd262a97 100644
--- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl
+++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
@@ -10,7 +10,7 @@ use File::Path qw(rmtree);
 use Fcntl qw(:seek);
 use PostgresNode;
 use TestLib;
-use Test::More tests => 110;
+use Test::More tests => 111;
 
 program_help_ok('pg_basebackup');
 program_version_ok('pg_basebackup');
@@ -465,14 +465,15 @@ $node->command_ok(
 	'pg_basebackup -X stream runs with --no-slot');
 rmtree("$tempdir/backupnoslot");
 
-$node->command_fails(
+$node->command_fails_like(
 	[
 		'pg_basebackup',             '-D',
 		"$tempdir/backupxs_sl_fail", '-X',
 		'stream',                    '-S',
 		'slot0'
 	],
-	'pg_basebackup fails with nonexistent replication slot');
+	qr/pg_basebackup: error: replication slot "slot0" does not exist/,
+	'pg_basebackup fails early with nonexistent replication slot');
 
 $node->command_fails(
 	[ 'pg_basebackup', '-D', "$tempdir/backupxs_slot", '-C' ],
diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm
index 8158ea5b2f..c59da758c7 100644
--- a/src/test/perl/PostgresNode.pm
+++ b/src/test/perl/PostgresNode.pm
@@ -2267,6 +2267,26 @@ sub command_like
 
 =pod
 
+=item $node->command_fails_like(...)
+
+TestLib::command_fails_like with our connection parameters. See command_ok(...)
+
+=cut
+
+sub command_fails_like
+{
+	local $Test::Builder::Level = $Test::Builder::Level + 1;
+
+	my $self = shift;
+
+	local %ENV = $self->_get_env();
+
+	TestLib::command_fails_like(@_);
+	return;
+}
+
+=pod
+
 =item $node->command_checks_all(...)
 
 TestLib::command_checks_all with our connection parameters. See
-- 
2.32.0

#10Ronan Dunklau
ronan.dunklau@aiven.io
In reply to: Bharath Rupireddy (#8)
Re: pg_receivewal starting position

Le samedi 28 août 2021, 14:10:25 CEST Bharath Rupireddy a écrit :

On Thu, Aug 26, 2021 at 5:45 PM Ronan Dunklau <ronan.dunklau@aiven.io>

wrote:

order to fail early if the replication slot doesn't exist, so please find
attached v2 for that.

Thanks for the patches. Here are some comments:

Thank you for this review ! Please see the other side of the thread where I
answered Michael Paquier with a new patchset, which includes some of your
proposed modifications.

1) While the intent of these patches looks good, I have following
concern with new replication command READ_REPLICATION_SLOT: what if
the pg_receivewal exits (because user issued a SIGINT or for some
reason) after flushing the received WAL to disk, before it sends
sendFeedback to postgres server's walsender so that it doesn't get a
chance to update the restart_lsn in the replication slot via
PhysicalConfirmReceivedLocation. If the pg_receivewal is started
again, isn't it going to get the previous restart_lsn and receive the
last chunk of flushed WAL again?

I've kept the existing directory as the source of truth if we have any WAL
there already. If we don't, we fallback to the restart_lsn on the replication
slot.
So in the event that we start it again from somewhere else where we don't have
access to those WALs anymore, we could be receiving it again, which in my
opinion is better than losing everything in between in that case.

2) What is the significance of READ_REPLICATION_SLOT for logical
replication slots? I read above that somebody suggested to restrict
the walsender to handle READ_REPLICATION_SLOT for physical replication
slots so that the callers will see a command failure. But I tend to
think that it is clean to have this command common for both physical
and logical replication slots and the callers can have an Assert(type
== 'physical').

I've updated the patch to make it easy for the caller to check the slot's type
and added a verification for those cases.

In general, I tried to implement the meaning of the different fields exactly as
it's done in the pg_replication_slots view.

3) Isn't it useful to send active, active_pid info of the replication
slot via READ_REPLICATION_SLOT? pg_receivewal can use Assert(active ==
true && active_pid == getpid()) as an assertion to ensure that it is
the sole owner of the replication slot? Also, is it good send
wal_status info

Typically we read the slot before attaching to it, since what we want to do is
check if it exists. It may be worthwile to check that it's not already used by
another backend though.

4) I think below messages should start with lower case letter and also
there are some typos:
+ pg_log_warning("Could not fetch the replication_slot \"%s\" information "
+ pg_log_warning("Server does not suport fetching the slot's position, "
something like:
+ pg_log_warning("could not fetch replication slot \"%s\" information, "
+    "resuming from current server position instead", replication_slot);
+ pg_log_warning("server does not support fetching replication slot
information, "
+    "resuming from current server position instead");

I've rephrased it a bit in v3, let me know if that's what you had in mind.

5) How about emitting the above messages in case of "verbose"?

I think it is useful to warn the user even if not in the verbose case, but if
the consensus is to move it to verbose only output I can change it.

6) How about an assertion like below?
+ if (stream.startpos == InvalidXLogRecPtr)
+ {
+ stream.startpos = serverpos;
+ stream.timeline = servertli;
+ }
+
+Assert(stream.startpos != InvalidXLogRecPtr)>>

Good idea.

7) How about we let pg_receivewal use READ_REPLICATION_SLOT as an option?

From my point of view, I already expected it to use something like that when
using a replication slot. Maybe an option to turn it off could be useful ?

8) Just an idea, how about we store pg_receivewal's lastFlushPosition
in a file before pg_receivewal exits and compare it with the
restart_lsn that it received from the replication slot, if
lastFlushPosition == received_restart_lsn well and good, if not, then
something would have happened and we always start at the
lastFlushPosition ?

The patch idea originally came from the fact that some utility use
pg_receivewal to fetch WALs, and move them elsewhere. In that sense I don't
really see what value this brings compared to the existing (and unmodified) way
of computing the restart position from the already stored files ?

Best regards,

--
Ronan Dunklau

#11Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Ronan Dunklau (#10)
Re: pg_receivewal starting position

On Mon, Aug 30, 2021 at 3:26 PM Ronan Dunklau <ronan.dunklau@aiven.io> wrote:

Thank you for this review ! Please see the other side of the thread where I
answered Michael Paquier with a new patchset, which includes some of your
proposed modifications.

Thanks for the updated patches. Here are some comments on v3-0001
patch. I will continue to review 0002 and 0003.

1) Missing full stops "." at the end.
+               <literal>logical</literal>
+               when following the current timeline history
+               position, when following the current timeline history

2) Can we have the "type" column as "slot_type" just to keep in sync
with the pg_replication_slots view?

3) Can we mention the link to pg_replication_slots view in the columns
- "type", "restart_lsn", "confirmed_flush_lsn"?
Something like: the "slot_type"/"restart_lsn"/"confirmed_flush_lsn" is
same as <link linkend="view-pg-replication-slots"><structname>pg_replication_slots</structname></link>
view.

4) Can we use "read_replication_slot" instead of
"identify_replication_slot", just to be in sync with the actual
command?

5) Are you actually copying the slot contents into the slot_contents
variable here? Isn't just taking the pointer to the shared memory?
+ /* Copy slot contents while holding spinlock */
+ SpinLockAcquire(&slot->mutex);
+ slot_contents = *slot;
+ SpinLockRelease(&slot->mutex);
+ LWLockRelease(ReplicationSlotControlLock);
You could just do:
+ Oid dbid;
+ XLogRecPtr restart_lsn;
+ XLogRecPtr confirmed_flush;
+ /* Copy the required slot contents */
+ SpinLockAcquire(&slot->mutex);
+ dbid = slot.data.database;
+ restart_lsn = slot.data.restart_lsn;
+ confirmed_flush = slot.data.confirmed_flush;
+ SpinLockRelease(&slot->mutex);
+ LWLockRelease(ReplicationSlotControlLock);

6) It looks like you are not sending anything as a response to the
READ_REPLICATION_SLOT command, if the slot specified doesn't exist.
You are just calling end_tup_output which just calls rShutdown (points
to donothingCleanup of printsimpleDR)
if (has_value)
do_tup_output(tstate, values, nulls);
end_tup_output(tstate);

Can you test the use case where the pg_receivewal asks
READ_REPLICATION_SLOT with a non-existent replication slot and see
with your v3 patch how it behaves?

Why don't you remove has_value flag and do this in ReadReplicationSlot:
Datum values[5];
bool nulls[5];
MemSet(values, 0, sizeof(values));
MemSet(nulls, 0, sizeof(nulls));

+ dest = CreateDestReceiver(DestRemoteSimple);
+ tstate = begin_tup_output_tupdesc(dest, tupdesc, &TTSOpsVirtual);
+ do_tup_output(tstate, values, nulls);
+ end_tup_output(tstate);

7) Why don't we use 2 separate variables "restart_tli",
"confirmed_flush_tli" instead of "slots_position_timeline", just to be
more informative?

8) I still have the question like, how can a client (pg_receivewal for
instance) know that it is the current owner/user of the slot it is
requesting the info? As I said upthread, why don't we send "active"
and "active_pid" fields of the pg_replication_slots view?
Also, it would be good to send the "wal_status" field so that the
client can check if the "wal_status" is not "lost"?

9) There are 2 new lines at the end of ReadReplicationSlot. We give
only one new line after each function definition.
end_tup_output(tstate);
}
<<1stnewline>>
<<2ndnewline>>
/*
* Handle TIMELINE_HISTORY command.
*/

10) Why do we need to have two test cases for "non-existent" slots?
Isn't the test case after "DROP REPLICATION" enough?
+($ret, $stdout, $stderr) = $node_primary->psql(
+  'postgres', 'READ_REPLICATION_SLOT non_existent_slot;',
+  extra_params => [ '-d', $connstr_rep ]);
+ok( $ret == 0,
+  "READ_REPLICATION_SLOT does not produce an error with non existent slot");
+ok( $stdout eq '',
+    "READ_REPLICATION_SLOT returns no tuple if a slot is non existent");
You can just rename the test case name from:
+    "READ_REPLICATION_SLOT returns no tuple if a slot has been dropped");
to
+    "READ_REPLICATION_SLOT returns no tuple if a slot is non existent");

Regards,
Bharath Rupireddy.

#12Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Bharath Rupireddy (#11)
Re: pg_receivewal starting position

On Tue, Aug 31, 2021 at 4:47 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

On Mon, Aug 30, 2021 at 3:26 PM Ronan Dunklau <ronan.dunklau@aiven.io> wrote:

Thank you for this review ! Please see the other side of the thread where I
answered Michael Paquier with a new patchset, which includes some of your
proposed modifications.

Thanks for the updated patches. Here are some comments on v3-0001
patch. I will continue to review 0002 and 0003.

Continuing my review on the v3 patch set:

0002:
1) I think the following message
"could not fetch replication slot LSN: got %d rows and %d fields,
expected %d rows and %d or more fields"
should be:
"could not read replication slot: got %d rows and %d fields, expected
%d rows and %d or more fields"

2) Why GetSlotInformation is returning InvalidXLogRecPtr? Change it to
return false instead.

3) Alignment issue:
Change below 2 lines:
+ appendPQExpBuffer(query, "READ_REPLICATION_SLOT %s",
+   slot_name);
To 1 line, as it will be under 80 char limit:
+ appendPQExpBuffer(query, "READ_REPLICATION_SLOT %s", slot_name);

4) Shouldn't your GetSlotInformation return what ever the
READ_REPLICATION_SLOT gets as output columns to be more generic?
Callers would pass non-null/null pointers to the inputs required/not
required for them. Please refer to RunIdentifySystem how it does that.
GetSlotInformation can just read the tuples that are interested for the callers.
bool
GetSlotInformation(PGconn *conn, const char *slot_name, char **slot_type,
XLogRecPtr *restart_lsn, uint32 *restart_lsn_tli,
XLogRecPtr *confirmed_lsn, XLogRecPtr *confirmed_lsn_tli)
{
if (slot_type)
{
/* get the slot_type value from the received tuple */
}
if (restart_lsn)
{
/* get the restart_lsn value from the received tuple */
}
if (restart_lsn_tli)
{
/* get the restart_lsn_tli value from the received tuple */
}

if (confirmed_lsn)
{
/* get the confirmed_lsn value from the received tuple */
}

if (confirmed_lsn_tli)
{
/* get the confirmed_lsn_tli value from the received tuple */
}
}

5) How about below as GetSlotInformation function comment?
/*
* Run READ_REPLICATION_SLOT through a given connection and give back to caller
* following information of the slot if requested:
* - type
* - restart lsn
* - restart lsn timeline
* - confirmed lsn
* - confirmed lsn timeline
*/

6) Do you need +#include "pqexpbuffer.h" in pg_receivewal.c?

7) Missing "," after information and it is not required to use "the"
in the messages.
Change below
+ pg_log_warning("could not fetch the replication_slot \"%s\" information "
+    "resuming from the current server position instead", replication_slot);
to:
+ pg_log_warning("could not fetch replication_slot \"%s\" information, "
+    "resuming from current server position instead", replication_slot);

8) A typo "suport". Ignore this comment, if you incorporate review comment #10.
Change below
pg_log_warning("server does not suport fetching the slot's position, "
"resuming from the current server position instead");
to:
pg_log_warning("server does not support getting start LSN from
replication slot, "
"resuming from current server position instead");

9) I think you should do free the memory allocated to slot_type by
GetSlotInformation:
+ if (strcmp(slot_type, "physical") != 0)
+ {
+ pg_log_error("slot \"%s\" is not a physical replication slot",
+ replication_slot);
+ exit(1);
+ }
+
+ pg_free(slot_type);

10) Isn't it PQclear(res);?
+ PQclear(0);

11) I don't think you need to check for the null value of
replication_slot. In the StreamLog it can't be null, so you can safely
remove the below if condition.
+ if (replication_slot)
+ {

12) How about
/* Try to get start position from server's replication slot information */
insted of
+ /* Try to get it from the slot if any, and the server supports it */

13) When you say that the server supports the new
READ_REPLICATION_SLOT command only if version >= 15000, then isn't it
the function GetSlotInformation doing the following:
bool
GetSlotInformation(PGconn *conn,....,bool *is_supported)
{

if (PQserverVersion(conn) < 15000)
{
*is_supported = false;
return false;
}
*is_supported = true;
}
So, the callers will just do:
/* Try to get start position from server's replication slot information */
char *slot_type = NULL;
bool is_supported;

if (!GetSlotInformation(conn, replication_slot, &stream.startpos,
&stream.timeline, &slot_type, &is_supported))
{
if (!is_supported)
pg_log_warning("server does not support getting start LSN from
replication slot, "
"resuming from current server position instead");
else
pg_log_warning("could not fetch replication_slot \"%s\" information, "
"resuming from current server position instead",
replication_slot);
}

if (slot_type && strcmp(slot_type, "physical") != 0)
{
pg_log_error("slot \"%s\" is not a physical replication slot",
replication_slot);
exit(1);
}

pg_free(slot_type);
}

14) Instead of just
+ if (strcmp(slot_type, "physical") != 0)
do
+ if (slot_type && strcmp(slot_type, "physical") != 0)

0003:
1) The message should start with lower case: "slot \"%s\" is not a
physical replication slot".
+ pg_log_error("Slot \"%s\" is not a physical replication slot",

2)
+ if (strcmp(slot_type, "physical") != 0)
do
+ if (slot_type && strcmp(slot_type, "physical") != 0)

Regards,
Bharath Rupireddy.

#13Ronan Dunklau
ronan.dunklau@aiven.io
In reply to: Bharath Rupireddy (#11)
Re: pg_receivewal starting position

Le mardi 31 août 2021, 13:17:22 CEST Bharath Rupireddy a écrit :

On Mon, Aug 30, 2021 at 3:26 PM Ronan Dunklau <ronan.dunklau@aiven.io>

wrote:

Thank you for this review ! Please see the other side of the thread where
I
answered Michael Paquier with a new patchset, which includes some of your
proposed modifications.

Thanks for the updated patches. Here are some comments on v3-0001
patch. I will continue to review 0002 and 0003.

Thank you ! I will send a new version shortly, once I address your remarks
concerning patch 0002 (and hopefully 0003 :-) )

1) Missing full stops "." at the end.
+               <literal>logical</literal>
+               when following the current timeline history
+               position, when following the current timeline history

Good catch, I will take care of it for the next version.

2) Can we have the "type" column as "slot_type" just to keep in sync
with the pg_replication_slots view?

You're right, it makes more sense like this.

3) Can we mention the link to pg_replication_slots view in the columns
- "type", "restart_lsn", "confirmed_flush_lsn"?
Something like: the "slot_type"/"restart_lsn"/"confirmed_flush_lsn" is
same as <link
linkend="view-pg-replication-slots"><structname>pg_replication_slots</struc
tname></link> view.

Same as above, thanks.

4) Can we use "read_replication_slot" instead of
"identify_replication_slot", just to be in sync with the actual
command?

That must have been a leftover from an earlier version of the patch, I will fix
it also.

5) Are you actually copying the slot contents into the slot_contents
variable here? Isn't just taking the pointer to the shared memory?
+ /* Copy slot contents while holding spinlock */
+ SpinLockAcquire(&slot->mutex);
+ slot_contents = *slot;
+ SpinLockRelease(&slot->mutex);
+ LWLockRelease(ReplicationSlotControlLock);
You could just do:
+ Oid dbid;
+ XLogRecPtr restart_lsn;
+ XLogRecPtr confirmed_flush;
+ /* Copy the required slot contents */
+ SpinLockAcquire(&slot->mutex);
+ dbid = slot.data.database;
+ restart_lsn = slot.data.restart_lsn;
+ confirmed_flush = slot.data.confirmed_flush;
+ SpinLockRelease(&slot->mutex);
+ LWLockRelease(ReplicationSlotControlLock);

It's probably simpler that way.

6) It looks like you are not sending anything as a response to the
READ_REPLICATION_SLOT command, if the slot specified doesn't exist.
You are just calling end_tup_output which just calls rShutdown (points
to donothingCleanup of printsimpleDR)
if (has_value)
do_tup_output(tstate, values, nulls);
end_tup_output(tstate);

Can you test the use case where the pg_receivewal asks
READ_REPLICATION_SLOT with a non-existent replication slot and see
with your v3 patch how it behaves?

Why don't you remove has_value flag and do this in ReadReplicationSlot:
Datum values[5];
bool nulls[5];
MemSet(values, 0, sizeof(values));
MemSet(nulls, 0, sizeof(nulls));

+ dest = CreateDestReceiver(DestRemoteSimple);
+ tstate = begin_tup_output_tupdesc(dest, tupdesc, &TTSOpsVirtual);
+ do_tup_output(tstate, values, nulls);
+ end_tup_output(tstate);

As for the API, I think it's cleaner to just send an empty result set instead
of a null tuple in that case but I won't fight over it if there is consensus on
having an all-nulls tuple value instead.

There is indeed a bug, but not here, in the second patch: I still test the
slot type even when we didn't fetch anything. So I will add a test for that
too.

7) Why don't we use 2 separate variables "restart_tli",
"confirmed_flush_tli" instead of "slots_position_timeline", just to be
more informative?

You're right.

8) I still have the question like, how can a client (pg_receivewal for
instance) know that it is the current owner/user of the slot it is
requesting the info? As I said upthread, why don't we send "active"
and "active_pid" fields of the pg_replication_slots view?
Also, it would be good to send the "wal_status" field so that the
client can check if the "wal_status" is not "lost"?

As for pg_receivewal, it can only check that it's not active at that time,
since we only aquire the replication slot once we know the start_lsn. For the
basebackup case it's the same thing as we only want to check if it exists.
As such, I didn't add them as I didn't see the need, but if it can be useful
why not ? I will do that in the next version.

9) There are 2 new lines at the end of ReadReplicationSlot. We give
only one new line after each function definition.
end_tup_output(tstate);
}
<<1stnewline>>
<<2ndnewline>>
/*
* Handle TIMELINE_HISTORY command.
*/

Ok !

10) Why do we need to have two test cases for "non-existent" slots?
Isn't the test case after "DROP REPLICATION" enough?
+($ret, $stdout, $stderr) = $node_primary->psql(
+  'postgres', 'READ_REPLICATION_SLOT non_existent_slot;',
+  extra_params => [ '-d', $connstr_rep ]);
+ok( $ret == 0,
+  "READ_REPLICATION_SLOT does not produce an error with non existent
slot"); +ok( $stdout eq '',
+    "READ_REPLICATION_SLOT returns no tuple if a slot is non existent");
You can just rename the test case name from:
+    "READ_REPLICATION_SLOT returns no tuple if a slot has been dropped");
to
+    "READ_REPLICATION_SLOT returns no tuple if a slot is non existent");

I wanted to test both the case where no slot by this name exists, and the case
where it has been dropped hence still referenced but marked as not "in_use".
Maybe it's not worth it and we can remove the first case as you suggest.

Best regards,

--
Ronan Dunklau

#14Michael Paquier
michael@paquier.xyz
In reply to: Ronan Dunklau (#9)
Re: pg_receivewal starting position

On Mon, Aug 30, 2021 at 11:55:42AM +0200, Ronan Dunklau wrote:

Le vendredi 27 août 2021, 05:44:32 CEST Michael Paquier a écrit :

+ if (slot == NULL || !slot->in_use)

[...] +
ereport(ERROR,
+               (errcode(ERRCODE_UNDEFINED_OBJECT),
+                errmsg("replication slot \"%s\" does not exist",
+                       cmd->slotname)));
[...]
+       if (PQntuples(res) == 0)
+       {
+               pg_log_error("replication slot %s does not exist",
slot_name); +               PQclear(0);
+               return false;
So, the backend and ReadReplicationSlot() report an ERROR if a slot
does not exist but pg_basebackup's GetSlotInformation() does the same
if there are no tuples returned.  That's inconsistent.  Wouldn't it be
more instinctive to return a NULL tuple instead if the slot does not
exist to be able to check after real ERRORs in frontends using this
interface?  

The attached patch returns no tuple at all when the replication slot doesn't
exist. I'm not sure if that's what you meant by returning a NULL tuple ?

Just return a tuple filled only with NULL values. I would tend to
code things so as we set all the flags of nulls[] to true by default,
remove has_value and define the number of columns in a #define, as of:
#define READ_REPLICATION_SLOT_COLS 5
[...]
Datum values[READ_REPLICATION_SLOT_COLS];
bool nulls[READ_REPLICATION_SLOT_COLS];
[...]
MemSet(nulls, true, READ_REPLICATION_SLOT_COLS * sizeof(bool));
Assert(i == READ_REPLICATION_SLOT_COLS); // when filling values.

This would make ReadReplicationSlot() cleaner by removing all the
else{} blocks coded now to handle the NULL values, and that would be
more in-line with the documentation where we state that one tuple is
returned. Note that this is the same kind of behavior for similar
in-core functions where objects are queried if they don't exist.

I would also suggest a reword of some of the docs, say:
+    <listitem>
+     <para>
+      Read the information of a replication slot. Returns a tuple with
+      <literal>NULL</literal> values if the replication slot does not
+      exist.
+     </para>

A slot in use exists, so the error is a bit confusing here
anyway, no?

From my understanding, a slot *not* in use doesn't exist anymore, as such I
don't really understand this point. Could you clarify ?

Yeah, sorry about that. I did not recall the exact meaning of
in_use. Considering the slot as undefined if the flag is false is the
right thing to do.

I was thinking that maybe instead of walking back the timeline history from
where we currently are on the server, we could allow an additional argument
for the client to specify which timeline it wants. But I guess a replication
slot can not be present for a past, divergent timeline ? I have removed that
suggestion.

The parent TLI history is linear, so I'd find that a bit strange in
concept, FWIW.

-               'slot0'
+               'slot0',                     '-p',
+               "$port"
Something we are missing here?

The thing we're missing here is a wrapper for command_fails_like. I've added
this to PostgresNode.pm.

It may be better to apply this bit separately, then.
--
Michael

#15Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Ronan Dunklau (#10)
Re: pg_receivewal starting position

On Mon, Aug 30, 2021 at 3:26 PM Ronan Dunklau <ronan.dunklau@aiven.io> wrote:

7) How about we let pg_receivewal use READ_REPLICATION_SLOT as an option?

From my point of view, I already expected it to use something like that when
using a replication slot. Maybe an option to turn it off could be useful ?

IMO, pg_receivewal should have a way to turn off/on using
READ_REPLICATION_SLOT. Imagine if the postgres server doesn't support
READ_REPLICATION_SLOT (a lower version) but for some reasons the
pg_receivewal(running separately) is upgraded to the version that uses
READ_REPLICATION_SLOT, knowing that the server doesn't support
READ_REPLICATION_SLOT why should user let pg_receivewal run an
unnecessary code?

Regards,
Bharath Rupireddy.

#16Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Bharath Rupireddy (#15)
Re: pg_receivewal starting position

At Wed, 1 Sep 2021 10:30:05 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in

On Mon, Aug 30, 2021 at 3:26 PM Ronan Dunklau <ronan.dunklau@aiven.io> wrote:

7) How about we let pg_receivewal use READ_REPLICATION_SLOT as an option?

From my point of view, I already expected it to use something like that when
using a replication slot. Maybe an option to turn it off could be useful ?

IMO, pg_receivewal should have a way to turn off/on using
READ_REPLICATION_SLOT. Imagine if the postgres server doesn't support
READ_REPLICATION_SLOT (a lower version) but for some reasons the
pg_receivewal(running separately) is upgraded to the version that uses
READ_REPLICATION_SLOT, knowing that the server doesn't support
READ_REPLICATION_SLOT why should user let pg_receivewal run an
unnecessary code?

If I read the patch correctly the situation above is warned by the
following message then continue to the next step giving up to identify
start position from slot data.

"server does not suport fetching the slot's position, resuming from the current server position instead"

(The message looks a bit too long, though..)

However, if the operator doesn't know the server is old, pg_receivewal
starts streaming from unexpected position, which is a kind of
disaster. So I'm inclined to agree to Bharath, but rather I imagine of
an option to explicitly specify how to determine the start position.

--start-source=[server,wal,slot] specify starting-LSN source, default is
trying all of them in the order of wal, slot, server.

I don't think the option doesn't need to accept multiple values at once.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#17Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Kyotaro Horiguchi (#16)
Re: pg_receivewal starting position

On Thu, Sep 2, 2021 at 11:15 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:

At Wed, 1 Sep 2021 10:30:05 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in

On Mon, Aug 30, 2021 at 3:26 PM Ronan Dunklau <ronan.dunklau@aiven.io> wrote:

7) How about we let pg_receivewal use READ_REPLICATION_SLOT as an option?

From my point of view, I already expected it to use something like that when
using a replication slot. Maybe an option to turn it off could be useful ?

IMO, pg_receivewal should have a way to turn off/on using
READ_REPLICATION_SLOT. Imagine if the postgres server doesn't support
READ_REPLICATION_SLOT (a lower version) but for some reasons the
pg_receivewal(running separately) is upgraded to the version that uses
READ_REPLICATION_SLOT, knowing that the server doesn't support
READ_REPLICATION_SLOT why should user let pg_receivewal run an
unnecessary code?

If I read the patch correctly the situation above is warned by the
following message then continue to the next step giving up to identify
start position from slot data.

"server does not suport fetching the slot's position, resuming from the current server position instead"

(The message looks a bit too long, though..)

However, if the operator doesn't know the server is old, pg_receivewal
starts streaming from unexpected position, which is a kind of
disaster. So I'm inclined to agree to Bharath, but rather I imagine of
an option to explicitly specify how to determine the start position.

--start-source=[server,wal,slot] specify starting-LSN source, default is
trying all of them in the order of wal, slot, server.

I don't think the option doesn't need to accept multiple values at once.

If --start-source = 'wal' fails, then pg_receivewal should show up an
error saying "cannot find start position from <<user-specified-wal>>
directory, try with "server" or "slot" for --start-source". We might
end having similar errors for other options as well. Isn't this going
to create unnecessary complexity?

The existing way the pg_receivewal fetches start pos i.e. first from
wal directory and then from server start position, isn't known to the
user at all, no verbose message or anything specified in the docs. Why
do we need to expose this with the --start-source option? IMO, we can
keep it that way and we can just have a way to turn off the new
behaviour that we are proposing here, i.e.fetching the start position
from the slot's restart_lsn.

Regards,
Bharath Rupireddy.

#18Ronan Dunklau
ronan.dunklau@aiven.io
In reply to: Bharath Rupireddy (#17)
Re: pg_receivewal starting position

Le jeudi 2 septembre 2021, 08:42:22 CEST Bharath Rupireddy a écrit :

On Thu, Sep 2, 2021 at 11:15 AM Kyotaro Horiguchi

<horikyota.ntt@gmail.com> wrote:

At Wed, 1 Sep 2021 10:30:05 +0530, Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote in>

On Mon, Aug 30, 2021 at 3:26 PM Ronan Dunklau <ronan.dunklau@aiven.io>

wrote:

7) How about we let pg_receivewal use READ_REPLICATION_SLOT as an
option?

From my point of view, I already expected it to use something like
that when using a replication slot. Maybe an option to turn it off
could be useful ?> >

IMO, pg_receivewal should have a way to turn off/on using
READ_REPLICATION_SLOT. Imagine if the postgres server doesn't support
READ_REPLICATION_SLOT (a lower version) but for some reasons the
pg_receivewal(running separately) is upgraded to the version that uses
READ_REPLICATION_SLOT, knowing that the server doesn't support
READ_REPLICATION_SLOT why should user let pg_receivewal run an
unnecessary code?

If I read the patch correctly the situation above is warned by the
following message then continue to the next step giving up to identify
start position from slot data.

"server does not suport fetching the slot's position, resuming from the
current server position instead">

(The message looks a bit too long, though..)

However, if the operator doesn't know the server is old, pg_receivewal
starts streaming from unexpected position, which is a kind of
disaster. So I'm inclined to agree to Bharath, but rather I imagine of
an option to explicitly specify how to determine the start position.

--start-source=[server,wal,slot] specify starting-LSN source, default is

trying all of them in the order of wal, slot, server.

I don't think the option doesn't need to accept multiple values at once.

If --start-source = 'wal' fails, then pg_receivewal should show up an
error saying "cannot find start position from <<user-specified-wal>>
directory, try with "server" or "slot" for --start-source". We might
end having similar errors for other options as well. Isn't this going
to create unnecessary complexity?

The existing way the pg_receivewal fetches start pos i.e. first from
wal directory and then from server start position, isn't known to the
user at all, no verbose message or anything specified in the docs. Why
do we need to expose this with the --start-source option? IMO, we can
keep it that way and we can just have a way to turn off the new
behaviour that we are proposing here, i.e.fetching the start position
from the slot's restart_lsn.

Then it should probably be documented. We write in the docs that it is
strongly recommended to use a replication slot, but do not mention how we
resume from have been already processed.

If someone really cares about having control over how the start position is
defined instead of relying on the auto detection, it would be wiser to add a --
startpos parameter similar to the endpos one, which would override everything
else, instead of different flags for different behaviours.

Regards,

--
Ronan Dunklau

#19Michael Paquier
michael@paquier.xyz
In reply to: Kyotaro Horiguchi (#16)
1 attachment(s)
Re: pg_receivewal starting position

On Thu, Sep 02, 2021 at 02:45:54PM +0900, Kyotaro Horiguchi wrote:

At Wed, 1 Sep 2021 10:30:05 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in
If I read the patch correctly the situation above is warned by the
following message then continue to the next step giving up to identify
start position from slot data.

Better to fallback to the past behavior if attempting to use a
pg_receivewal >= 15 with a PG instance older than 14.

"server does not suport fetching the slot's position, resuming from the current server position instead"

(The message looks a bit too long, though..)

Agreed. Falling back to a warning is not the best answer we can have
here, as there could be various failure types, and for some of them a
hard failure is adapted;
- Failure in the backend while running READ_REPLICATION_SLOT. This
should imply a hard failure, no?
- Slot that does not exist. In this case, we could fall back to the
current write position of the server.

by default if the slot information cannot be retrieved.
Something that's disturbing me in patch 0002 is that we would ignore
the results of GetSlotInformation() if any error happens, even if
there is a problem in the backend, like an OOM. We should be careful
about the semantics here.

However, if the operator doesn't know the server is old, pg_receivewal
starts streaming from unexpected position, which is a kind of
disaster. So I'm inclined to agree to Bharath, but rather I imagine of
an option to explicitly specify how to determine the start position.

--start-source=[server,wal,slot] specify starting-LSN source, default is
trying all of them in the order of wal, slot, server.

I don't think the option doesn't need to accept multiple values at once.

What is the difference between "wal" and "server"? "wal" stands for
the start position of the set of files stored in the location
directory, and "server" is the location that we'd receive from the
server? I don't think that we need that because, when using a slot,
we know that we can rely on the LSN that the slot retains for
pg_receivewal as that should be the same point as what has been
streamed last. Could there be an argument instead for changing the
default and rely on the slot information rather than scanning the
local WAL archive path for the start point when using --slot? When
using pg_receivewal as a service, relying on a scan of the WAL archive
directory if there is no slot and fallback to an invalid LSN if there
is nothing is fine by me, but I think that just relying on the slot
information is saner as the backend makes sure that nothing is
missing. That's also more useful when streaming changes from a single
slot from multiple locations (stream to location 1 with a slot, stop
pg_receivewal, stream to location 2 that completes 1 with the same
slot).

+        pg_log_error("Slot \"%s\" is not a physical replication slot",
+                     replication_slot);
In 0003, the format of this error is not really project-like.
Something like that perhaps would be more adapted:
"cannot use the slot provided, physical slot expected."

I am not really convinced about the need of getting the active state
and the PID used in the backend when fetcing the slot data,
particularly if that's just for some frontend-side checks. The
backend has safeguards already for all that.

While looking at that, I have applied de1d4fe to add
PostgresNode::command_fails_like(), coming from 0003, and put my hands
on 0001 as per the attached, as the starting point. That basically
comes down to all the points raised upthread, plus some tweaks for
things I bumped into to get the semantics of the command to what looks
like the right shape.
--
Michael

Attachments:

v4-0001-Add-READ_REPLICATION_SLOT-command.patchtext/plain; charset=us-asciiDownload
From adbd72f70ee3592965f2a52500820d1387dcbf85 Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Thu, 2 Sep 2021 16:25:25 +0900
Subject: [PATCH v4] Add READ_REPLICATION_SLOT command

---
 src/include/nodes/nodes.h                   |   1 +
 src/include/nodes/replnodes.h               |  10 ++
 src/backend/replication/repl_gram.y         |  16 ++-
 src/backend/replication/repl_scanner.l      |   1 +
 src/backend/replication/walsender.c         | 112 ++++++++++++++++++++
 src/test/recovery/t/001_stream_rep.pl       |  47 +++++++-
 src/test/recovery/t/006_logical_decoding.pl |  15 ++-
 doc/src/sgml/protocol.sgml                  |  66 ++++++++++++
 src/tools/pgindent/typedefs.list            |   1 +
 9 files changed, 266 insertions(+), 3 deletions(-)

diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 6a4d82f0a8..5f78bdd573 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -495,6 +495,7 @@ typedef enum NodeTag
 	 * TAGS FOR REPLICATION GRAMMAR PARSE NODES (replnodes.h)
 	 */
 	T_IdentifySystemCmd,
+	T_ReadReplicationSlotCmd,
 	T_BaseBackupCmd,
 	T_CreateReplicationSlotCmd,
 	T_DropReplicationSlotCmd,
diff --git a/src/include/nodes/replnodes.h b/src/include/nodes/replnodes.h
index faa3a251f2..46384ea074 100644
--- a/src/include/nodes/replnodes.h
+++ b/src/include/nodes/replnodes.h
@@ -33,6 +33,16 @@ typedef struct IdentifySystemCmd
 	NodeTag		type;
 } IdentifySystemCmd;
 
+/* ----------------------
+ *		READ_REPLICATION_SLOT command
+ * ----------------------
+ */
+typedef struct ReadReplicationSlotCmd
+{
+	NodeTag		type;
+	char	   *slotname;
+} ReadReplicationSlotCmd;
+
 
 /* ----------------------
  *		BASE_BACKUP command
diff --git a/src/backend/replication/repl_gram.y b/src/backend/replication/repl_gram.y
index e1e8ec29cc..d34d617045 100644
--- a/src/backend/replication/repl_gram.y
+++ b/src/backend/replication/repl_gram.y
@@ -64,6 +64,7 @@ static SQLCmd *make_sqlcmd(void);
 /* Keyword tokens. */
 %token K_BASE_BACKUP
 %token K_IDENTIFY_SYSTEM
+%token K_READ_REPLICATION_SLOT
 %token K_SHOW
 %token K_START_REPLICATION
 %token K_CREATE_REPLICATION_SLOT
@@ -94,7 +95,7 @@ static SQLCmd *make_sqlcmd(void);
 %type <node>	command
 %type <node>	base_backup start_replication start_logical_replication
 				create_replication_slot drop_replication_slot identify_system
-				timeline_history show sql_cmd
+				read_replication_slot timeline_history show sql_cmd
 %type <list>	base_backup_opt_list
 %type <defelt>	base_backup_opt
 %type <uintval>	opt_timeline
@@ -120,6 +121,7 @@ opt_semicolon:	';'
 
 command:
 			identify_system
+			| read_replication_slot
 			| base_backup
 			| start_replication
 			| start_logical_replication
@@ -140,6 +142,18 @@ identify_system:
 				}
 			;
 
+/*
+ * READ_REPLICATION_SLOT %s
+ */
+read_replication_slot:
+			K_READ_REPLICATION_SLOT var_name
+				{
+					ReadReplicationSlotCmd *n = makeNode(ReadReplicationSlotCmd);
+					n->slotname = $2;
+					$$ = (Node *) n;
+				}
+			;
+
 /*
  * SHOW setting
  */
diff --git a/src/backend/replication/repl_scanner.l b/src/backend/replication/repl_scanner.l
index c038a636c3..1b599c255e 100644
--- a/src/backend/replication/repl_scanner.l
+++ b/src/backend/replication/repl_scanner.l
@@ -85,6 +85,7 @@ identifier		{ident_start}{ident_cont}*
 BASE_BACKUP			{ return K_BASE_BACKUP; }
 FAST			{ return K_FAST; }
 IDENTIFY_SYSTEM		{ return K_IDENTIFY_SYSTEM; }
+READ_REPLICATION_SLOT	{ return K_READ_REPLICATION_SLOT; }
 SHOW		{ return K_SHOW; }
 LABEL			{ return K_LABEL; }
 NOWAIT			{ return K_NOWAIT; }
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 3ca2a11389..afcc5f6612 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -232,6 +232,7 @@ static void XLogSendLogical(void);
 static void WalSndDone(WalSndSendDataCallback send_data);
 static XLogRecPtr GetStandbyFlushRecPtr(void);
 static void IdentifySystem(void);
+static void ReadReplicationSlot(ReadReplicationSlotCmd *cmd);
 static void CreateReplicationSlot(CreateReplicationSlotCmd *cmd);
 static void DropReplicationSlot(DropReplicationSlotCmd *cmd);
 static void StartReplication(StartReplicationCmd *cmd);
@@ -457,6 +458,110 @@ IdentifySystem(void)
 	end_tup_output(tstate);
 }
 
+/* Handle READ_REPLICATION_SLOT command */
+static void
+ReadReplicationSlot(ReadReplicationSlotCmd *cmd)
+{
+#define READ_REPLICATION_SLOT_COLS 5
+	ReplicationSlot *slot;
+	DestReceiver *dest;
+	TupOutputState *tstate;
+	TupleDesc	tupdesc;
+	Datum		values[READ_REPLICATION_SLOT_COLS];
+	bool		nulls[READ_REPLICATION_SLOT_COLS];
+
+	tupdesc = CreateTemplateTupleDesc(READ_REPLICATION_SLOT_COLS);
+	TupleDescInitBuiltinEntry(tupdesc, (AttrNumber) 1, "slot_type",
+							  TEXTOID, -1, 0);
+	TupleDescInitBuiltinEntry(tupdesc, (AttrNumber) 2, "restart_lsn",
+							  TEXTOID, -1, 0);
+	TupleDescInitBuiltinEntry(tupdesc, (AttrNumber) 3, "confirmed_flush_lsn",
+							  TEXTOID, -1, 0);
+	TupleDescInitBuiltinEntry(tupdesc, (AttrNumber) 4, "restart_tli",
+							  INT4OID, -1, 0);
+	TupleDescInitBuiltinEntry(tupdesc, (AttrNumber) 5, "confirmed_flush_tli",
+							  INT4OID, -1, 0);
+
+	MemSet(nulls, true, READ_REPLICATION_SLOT_COLS * sizeof(bool));
+
+	LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
+	slot = SearchNamedReplicationSlot(cmd->slotname, false);
+	if (slot == NULL || !slot->in_use)
+	{
+		LWLockRelease(ReplicationSlotControlLock);
+	}
+	else
+	{
+		List	   *timeline_history = NIL;
+		ReplicationSlot slot_contents;
+		int			i = 0;
+		char		xloc[MAXFNAMELEN];
+		TimeLineID	slots_position_timeline;
+
+		/* Copy slot contents while holding spinlock */
+		SpinLockAcquire(&slot->mutex);
+		slot_contents = *slot;
+		SpinLockRelease(&slot->mutex);
+		LWLockRelease(ReplicationSlotControlLock);
+
+		if (OidIsValid(slot_contents.data.database))
+			values[i] = CStringGetTextDatum("logical");
+		else
+			values[i] = CStringGetTextDatum("physical");
+		nulls[i] = false;
+		i++;
+
+		if (!XLogRecPtrIsInvalid(slot_contents.data.restart_lsn))
+		{
+			snprintf(xloc, sizeof(xloc), "%X/%X",
+					 LSN_FORMAT_ARGS(slot_contents.data.restart_lsn));
+			values[i] = CStringGetTextDatum(xloc);
+			nulls[i] = false;
+		}
+		i++;
+
+		if (!XLogRecPtrIsInvalid(slot_contents.data.confirmed_flush))
+		{
+			snprintf(xloc, sizeof(xloc), "%X/%X",
+					 LSN_FORMAT_ARGS(slot_contents.data.confirmed_flush));
+			values[i] = CStringGetTextDatum(xloc);
+			nulls[i] = false;
+		}
+		i++;
+
+		/*
+		 * Now get the timeline this wal was produced on, to get to the
+		 * current timeline
+		 */
+		if (!XLogRecPtrIsInvalid(slot_contents.data.restart_lsn))
+		{
+			timeline_history = readTimeLineHistory(ThisTimeLineID);
+			slots_position_timeline = tliOfPointInHistory(slot_contents.data.restart_lsn,
+														  timeline_history);
+			values[i] = Int32GetDatum(slots_position_timeline);
+			nulls[i] = false;
+		}
+		i++;
+
+		if (!XLogRecPtrIsInvalid(slot_contents.data.confirmed_flush))
+		{
+			if (!timeline_history)
+				timeline_history = readTimeLineHistory(ThisTimeLineID);
+			slots_position_timeline = tliOfPointInHistory(slot_contents.data.confirmed_flush,
+														  timeline_history);
+			values[i] = Int32GetDatum(slots_position_timeline);
+			nulls[i] = false;
+		}
+		i++;
+		Assert(i == READ_REPLICATION_SLOT_COLS);
+	}
+
+	dest = CreateDestReceiver(DestRemoteSimple);
+	tstate = begin_tup_output_tupdesc(dest, tupdesc, &TTSOpsVirtual);
+	do_tup_output(tstate, values, nulls);
+	end_tup_output(tstate);
+}
+
 
 /*
  * Handle TIMELINE_HISTORY command.
@@ -1618,6 +1723,13 @@ exec_replication_command(const char *cmd_string)
 			EndReplicationCommand(cmdtag);
 			break;
 
+		case T_ReadReplicationSlotCmd:
+			cmdtag = "READ_REPLICATION_SLOT";
+			set_ps_display(cmdtag);
+			ReadReplicationSlot((ReadReplicationSlotCmd *) cmd_node);
+			EndReplicationCommand(cmdtag);
+			break;
+
 		case T_BaseBackupCmd:
 			cmdtag = "BASE_BACKUP";
 			set_ps_display(cmdtag);
diff --git a/src/test/recovery/t/001_stream_rep.pl b/src/test/recovery/t/001_stream_rep.pl
index ac581c1c07..cbd817cda3 100644
--- a/src/test/recovery/t/001_stream_rep.pl
+++ b/src/test/recovery/t/001_stream_rep.pl
@@ -6,7 +6,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 49;
+use Test::More tests => 55;
 
 # Initialize primary node
 my $node_primary = PostgresNode->new('primary');
@@ -252,6 +252,51 @@ ok( $ret == 0,
 	"SHOW with superuser-settable parameter, replication role and logical replication"
 );
 
+note "testing READ_REPLICATION_SLOT command";
+
+my $slotname = 'test_read_replication_slot_physical';
+
+($ret, $stdout, $stderr) = $node_primary->psql(
+	'postgres',
+	'READ_REPLICATION_SLOT non_existent_slot;',
+	extra_params => [ '-d', $connstr_rep ]);
+ok( $ret == 0,
+	"READ_REPLICATION_SLOT does not produce an error with non existent slot");
+ok($stdout eq '||||',
+	"READ_REPLICATION_SLOT returns NULL values if slot does not exist");
+
+($ret, $stdout, $stderr) = $node_primary->psql(
+	'postgres',
+	"CREATE_REPLICATION_SLOT $slotname PHYSICAL RESERVE_WAL;",
+	extra_params => [ '-d', $connstr_rep ],
+	0,
+	'physical slot created on primary');
+
+($ret, $stdout, $stderr) = $node_primary->psql(
+	'postgres',
+	"READ_REPLICATION_SLOT $slotname;",
+	extra_params => [ '-d', $connstr_rep ]);
+ok($ret == 0,
+	"READ_REPLICATION_SLOT does not produce an error with existing slot");
+ok($stdout =~ 'physical\|[^|]*\|\|1\|',
+	"READ_REPLICATION_SLOT returns tuple corresponding to the slot");
+
+$node_primary->psql(
+	'postgres',
+	"DROP_REPLICATION_SLOT $slotname;",
+	extra_params => [ '-d', $connstr_rep ],
+	0,
+	'physical slot dropped on primary');
+
+($ret, $stdout, $stderr) = $node_primary->psql(
+	'postgres',
+	"READ_REPLICATION_SLOT $slotname;",
+	extra_params => [ '-d', $connstr_rep ]);
+ok($ret == 0,
+	"READ_REPLICATION_SLOT does not produce an error with dropped slot");
+ok($stdout eq '||||',
+	"READ_REPLICATION_SLOT returns NULL values if slot has been dropped");
+
 note "switching to physical replication slot";
 
 # Switch to using a physical replication slot. We can do this without a new
diff --git a/src/test/recovery/t/006_logical_decoding.pl b/src/test/recovery/t/006_logical_decoding.pl
index cc116062c2..50e4fa1042 100644
--- a/src/test/recovery/t/006_logical_decoding.pl
+++ b/src/test/recovery/t/006_logical_decoding.pl
@@ -10,7 +10,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 14;
+use Test::More tests => 16;
 use Config;
 
 # Initialize primary node
@@ -39,6 +39,19 @@ ok( $stderr =~
 	  m/replication slot "test_slot" was not created in this database/,
 	"Logical decoding correctly fails to start");
 
+($result, $stdout, $stderr) = $node_primary->psql(
+	'template1',
+	qq[READ_REPLICATION_SLOT test_slot;],
+	replication => 'database');
+ok($stdout =~ 'logical\|[^|]*\|[^|]*\|1\|1',
+	'Logical replication slot can be read on any logical connection');
+($result, $stdout, $stderr) = $node_primary->psql(
+	'postgres',
+	qq[READ_REPLICATION_SLOT test_slot;],
+	replication => '1');
+ok($stdout =~ 'logical\|[^|]*\|[^|]*\|1\|1',
+	'Logical replication slot can be read on a physical connection');
+
 # Check case of walsender not using a database connection.  Logical
 # decoding should not be allowed.
 ($result, $stdout, $stderr) = $node_primary->psql(
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index a232546b1d..8191f17137 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2052,6 +2052,72 @@ The commands accepted in replication mode are:
     </listitem>
   </varlistentry>
 
+  <varlistentry>
+    <term><literal>READ_REPLICATION_SLOT</literal> <replaceable class="parameter">slot_name</replaceable>
+      <indexterm><primary>READ_REPLICATION_SLOT</primary></indexterm>
+    </term>
+    <listitem>
+     <para>
+      Read the information of a replication slot. Returns a tuple with
+      <literal>NULL</literal> values if the replication slot does not
+      exist.
+     </para>
+     <para>
+      In response to this command, the server will return a one-row result set,
+      containing the following fields:
+      <variablelist>
+       <varlistentry>
+        <term><literal>slot_type</literal> (<type>text</type>)</term>
+        <listitem>
+         <para>
+          The replication slot's type, either <literal>physical</literal> or
+          <literal>logical</literal>.
+         </para>
+        </listitem>
+       </varlistentry>
+
+       <varlistentry>
+        <term><literal>restart_lsn</literal> (<type>text</type>)</term>
+        <listitem>
+         <para>
+          The replication slot's <literal>restart_lsn</literal>.
+         </para>
+        </listitem>
+       </varlistentry>
+
+       <varlistentry>
+        <term><literal>confirmed_flush_lsn</literal> (<type>text</type>)</term>
+        <listitem>
+         <para>
+          The replication slot's <literal>confirmed_flush_lsn</literal>.
+         </para>
+        </listitem>
+       </varlistentry>
+
+       <varlistentry>
+        <term><literal>restart_tli</literal> (<type>int4</type>)</term>
+        <listitem>
+         <para>
+          The timeline ID for the <literal>restart_lsn</literal> position,
+          when following the current timeline history.
+         </para>
+        </listitem>
+       </varlistentry>
+
+       <varlistentry>
+        <term><literal>confirmed_flush_tli</literal> (<type>int4</type>)</term>
+        <listitem>
+         <para>
+          The timeline ID for the <literal>confirmed_flush_lsn</literal>
+          position, when following the current timeline history.
+         </para>
+        </listitem>
+       </varlistentry>
+      </variablelist>
+     </para>
+    </listitem>
+  </varlistentry>
+
   <varlistentry>
     <term><literal>START_REPLICATION</literal> [ <literal>SLOT</literal> <replaceable class="parameter">slot_name</replaceable> ] [ <literal>PHYSICAL</literal> ] <replaceable class="parameter">XXX/XXX</replaceable> [ <literal>TIMELINE</literal> <replaceable class="parameter">tli</replaceable> ]
      <indexterm><primary>START_REPLICATION</primary></indexterm>
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index f31a1e4e1e..b256ee3be6 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2126,6 +2126,7 @@ ReadBufferMode
 ReadBytePtrType
 ReadExtraTocPtrType
 ReadFunc
+ReadReplicationSlotCmd
 ReassignOwnedStmt
 RecheckForeignScan_function
 RecordCacheEntry
-- 
2.33.0

#20Ronan Dunklau
ronan.dunklau@aiven.io
In reply to: Michael Paquier (#19)
Re: pg_receivewal starting position

Le jeudi 2 septembre 2021, 09:28:29 CEST Michael Paquier a écrit :

On Thu, Sep 02, 2021 at 02:45:54PM +0900, Kyotaro Horiguchi wrote:

At Wed, 1 Sep 2021 10:30:05 +0530, Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote in If I read the patch
correctly the situation above is warned by the following message then
continue to the next step giving up to identify start position from slot
data.

Better to fallback to the past behavior if attempting to use a
pg_receivewal >= 15 with a PG instance older than 14.

"server does not suport fetching the slot's position, resuming from the
current server position instead">

(The message looks a bit too long, though..)

Agreed. Falling back to a warning is not the best answer we can have
here, as there could be various failure types, and for some of them a
hard failure is adapted;
- Failure in the backend while running READ_REPLICATION_SLOT. This
should imply a hard failure, no?
- Slot that does not exist. In this case, we could fall back to the
current write position of the server.

by default if the slot information cannot be retrieved.
Something that's disturbing me in patch 0002 is that we would ignore
the results of GetSlotInformation() if any error happens, even if
there is a problem in the backend, like an OOM. We should be careful
about the semantics here.

Ok !

However, if the operator doesn't know the server is old, pg_receivewal
starts streaming from unexpected position, which is a kind of
disaster. So I'm inclined to agree to Bharath, but rather I imagine of
an option to explicitly specify how to determine the start position.

--start-source=[server,wal,slot] specify starting-LSN source, default is

trying all of them in the order of wal, slot, server.

I don't think the option doesn't need to accept multiple values at once.

What is the difference between "wal" and "server"? "wal" stands for
the start position of the set of files stored in the location
directory, and "server" is the location that we'd receive from the
server? I don't think that we need that because, when using a slot,
we know that we can rely on the LSN that the slot retains for
pg_receivewal as that should be the same point as what has been
streamed last. Could there be an argument instead for changing the
default and rely on the slot information rather than scanning the
local WAL archive path for the start point when using --slot? When
using pg_receivewal as a service, relying on a scan of the WAL archive
directory if there is no slot and fallback to an invalid LSN if there
is nothing is fine by me, but I think that just relying on the slot
information is saner as the backend makes sure that nothing is
missing. That's also more useful when streaming changes from a single
slot from multiple locations (stream to location 1 with a slot, stop
pg_receivewal, stream to location 2 that completes 1 with the same
slot).

One benefit I see from first trying to get it from the local WAL stream is that
we may end up in a state where it has been flushed to disk but we couldn't
advance the replication slot. In that case it is better to resume from the
point on disk. Maybe taking the max(slot_lsn, local_file_lsn) would work best
for the use case you're describing.

+        pg_log_error("Slot \"%s\" is not a physical replication slot",
+                     replication_slot);
In 0003, the format of this error is not really project-like.
Something like that perhaps would be more adapted:
"cannot use the slot provided, physical slot expected."

I am not really convinced about the need of getting the active state
and the PID used in the backend when fetcing the slot data,
particularly if that's just for some frontend-side checks. The
backend has safeguards already for all that.

I could see the use for sending active_pid for use within pg_basebackup: at
least we could fail early if the slot is already in use. But at the same time,
maybe it won't be in use anymore once we need it.

While looking at that, I have applied de1d4fe to add
PostgresNode::command_fails_like(), coming from 0003, and put my hands
on 0001 as per the attached, as the starting point. That basically
comes down to all the points raised upthread, plus some tweaks for
things I bumped into to get the semantics of the command to what looks
like the right shape.

Thanks, I was about to send a new patchset with basically the same thing. It
would be nice to know we work on the same thing concurrently in the future to
avoid duplicate efforts. I'll rebase and send the updated version for patches
0002 and 0003 of my original proposal once we reach an agreement over the
behaviour / options of pg_receivewal.

Also considering the number of different fields to be filled by the
GetSlotInformation function, my local branch groups them into a dedicated
struct which is more convenient than having X possibly null arguments.

--
Ronan Dunklau

#21Michael Paquier
michael@paquier.xyz
In reply to: Ronan Dunklau (#20)
Re: pg_receivewal starting position

On Thu, Sep 02, 2021 at 10:08:26AM +0200, Ronan Dunklau wrote:

I could see the use for sending active_pid for use within pg_basebackup: at
least we could fail early if the slot is already in use. But at the same time,
maybe it won't be in use anymore once we need it.

There is no real concurrent protection with this design. You could
read that the slot is not active during READ_REPLICATION_SLOT just to
find out after in the process of pg_basebackup streaming WAL that it
became in use in-between. And the backend-side protections would kick
in at this stage.

Hmm. The logic doing the decision-making with pg_receivewal may
become more tricky when it comes to pg_replication_slots.wal_status,
max_slot_wal_keep_size and pg_replication_slots.safe_wal_size. The
number of cases we'd like to consider impacts directly the amount of
data send through READ_REPLICATION_SLOT. That's not really different
than deciding of a failure, a success or a retry with active_pid at an
earlier or a later stage of a base backup. pg_receivewal, on the
contrary, can just rely on what the backend tells when it begins
streaming. So I'd prefer keeping things simple and limit the number
of fields a minimum for this command.
--
Michael

#22Ronan Dunklau
ronan.dunklau@aiven.io
In reply to: Michael Paquier (#21)
3 attachment(s)
Re: pg_receivewal starting position

Le jeudi 2 septembre 2021, 10:37:20 CEST Michael Paquier a écrit :

On Thu, Sep 02, 2021 at 10:08:26AM +0200, Ronan Dunklau wrote:

I could see the use for sending active_pid for use within pg_basebackup:
at
least we could fail early if the slot is already in use. But at the same
time, maybe it won't be in use anymore once we need it.

There is no real concurrent protection with this design. You could
read that the slot is not active during READ_REPLICATION_SLOT just to
find out after in the process of pg_basebackup streaming WAL that it
became in use in-between. And the backend-side protections would kick
in at this stage.

Hmm. The logic doing the decision-making with pg_receivewal may
become more tricky when it comes to pg_replication_slots.wal_status,
max_slot_wal_keep_size and pg_replication_slots.safe_wal_size. The
number of cases we'd like to consider impacts directly the amount of
data send through READ_REPLICATION_SLOT. That's not really different
than deciding of a failure, a success or a retry with active_pid at an
earlier or a later stage of a base backup. pg_receivewal, on the
contrary, can just rely on what the backend tells when it begins
streaming. So I'd prefer keeping things simple and limit the number
of fields a minimum for this command.

Ok. Please find attached new patches rebased on master.*

0001 is yours without any modification.

0002 for pg_receivewal tried to simplify the logic of information to return,
by using a dedicated struct for this. This accounts for Bahrath's demands to
return every possible field.
In particular, an is_logical field simplifies the detection of the type of slot.
In my opinion it makes sense to simplify it like this on the client side while
being more open-minded on the server side if we ever need to provide a new
type of slot. Also, GetSlotInformation now returns an enum to be able to
handle the different modes of failures, which differ between pg_receivewal and
pg_basebackup.

0003 is the pg_basebackup one, not much changed except for the concerns you
had about the log message and handling of different failure modes.

There is still the concern raised by Bharath about being able to select the
way to fetch the replication slot information for the user, and what should or
should not be included in the documentation. I am in favor of documenting the
process of selecting the wal start, and maybe include a --start-lsn option for
the user to override it, but that's maybe for another patch.

--
Ronan Dunklau

Attachments:

v5-0001-Add-READ_REPLICATION_SLOT-command.patchtext/x-patch; charset=UTF-8; name=v5-0001-Add-READ_REPLICATION_SLOT-command.patchDownload
From 60b8cedb196f5acdd75b489c1d2155c2698084a4 Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Thu, 2 Sep 2021 16:25:25 +0900
Subject: [PATCH v5 1/3] Add READ_REPLICATION_SLOT command

---
 doc/src/sgml/protocol.sgml                  |  66 ++++++++++++
 src/backend/replication/repl_gram.y         |  16 ++-
 src/backend/replication/repl_scanner.l      |   1 +
 src/backend/replication/walsender.c         | 112 ++++++++++++++++++++
 src/include/nodes/nodes.h                   |   1 +
 src/include/nodes/replnodes.h               |  10 ++
 src/test/recovery/t/001_stream_rep.pl       |  47 +++++++-
 src/test/recovery/t/006_logical_decoding.pl |  15 ++-
 src/tools/pgindent/typedefs.list            |   1 +
 9 files changed, 266 insertions(+), 3 deletions(-)

diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index a232546b1d..8191f17137 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2052,6 +2052,72 @@ The commands accepted in replication mode are:
     </listitem>
   </varlistentry>
 
+  <varlistentry>
+    <term><literal>READ_REPLICATION_SLOT</literal> <replaceable class="parameter">slot_name</replaceable>
+      <indexterm><primary>READ_REPLICATION_SLOT</primary></indexterm>
+    </term>
+    <listitem>
+     <para>
+      Read the information of a replication slot. Returns a tuple with
+      <literal>NULL</literal> values if the replication slot does not
+      exist.
+     </para>
+     <para>
+      In response to this command, the server will return a one-row result set,
+      containing the following fields:
+      <variablelist>
+       <varlistentry>
+        <term><literal>slot_type</literal> (<type>text</type>)</term>
+        <listitem>
+         <para>
+          The replication slot's type, either <literal>physical</literal> or
+          <literal>logical</literal>.
+         </para>
+        </listitem>
+       </varlistentry>
+
+       <varlistentry>
+        <term><literal>restart_lsn</literal> (<type>text</type>)</term>
+        <listitem>
+         <para>
+          The replication slot's <literal>restart_lsn</literal>.
+         </para>
+        </listitem>
+       </varlistentry>
+
+       <varlistentry>
+        <term><literal>confirmed_flush_lsn</literal> (<type>text</type>)</term>
+        <listitem>
+         <para>
+          The replication slot's <literal>confirmed_flush_lsn</literal>.
+         </para>
+        </listitem>
+       </varlistentry>
+
+       <varlistentry>
+        <term><literal>restart_tli</literal> (<type>int4</type>)</term>
+        <listitem>
+         <para>
+          The timeline ID for the <literal>restart_lsn</literal> position,
+          when following the current timeline history.
+         </para>
+        </listitem>
+       </varlistentry>
+
+       <varlistentry>
+        <term><literal>confirmed_flush_tli</literal> (<type>int4</type>)</term>
+        <listitem>
+         <para>
+          The timeline ID for the <literal>confirmed_flush_lsn</literal>
+          position, when following the current timeline history.
+         </para>
+        </listitem>
+       </varlistentry>
+      </variablelist>
+     </para>
+    </listitem>
+  </varlistentry>
+
   <varlistentry>
     <term><literal>START_REPLICATION</literal> [ <literal>SLOT</literal> <replaceable class="parameter">slot_name</replaceable> ] [ <literal>PHYSICAL</literal> ] <replaceable class="parameter">XXX/XXX</replaceable> [ <literal>TIMELINE</literal> <replaceable class="parameter">tli</replaceable> ]
      <indexterm><primary>START_REPLICATION</primary></indexterm>
diff --git a/src/backend/replication/repl_gram.y b/src/backend/replication/repl_gram.y
index e1e8ec29cc..d34d617045 100644
--- a/src/backend/replication/repl_gram.y
+++ b/src/backend/replication/repl_gram.y
@@ -64,6 +64,7 @@ static SQLCmd *make_sqlcmd(void);
 /* Keyword tokens. */
 %token K_BASE_BACKUP
 %token K_IDENTIFY_SYSTEM
+%token K_READ_REPLICATION_SLOT
 %token K_SHOW
 %token K_START_REPLICATION
 %token K_CREATE_REPLICATION_SLOT
@@ -94,7 +95,7 @@ static SQLCmd *make_sqlcmd(void);
 %type <node>	command
 %type <node>	base_backup start_replication start_logical_replication
 				create_replication_slot drop_replication_slot identify_system
-				timeline_history show sql_cmd
+				read_replication_slot timeline_history show sql_cmd
 %type <list>	base_backup_opt_list
 %type <defelt>	base_backup_opt
 %type <uintval>	opt_timeline
@@ -120,6 +121,7 @@ opt_semicolon:	';'
 
 command:
 			identify_system
+			| read_replication_slot
 			| base_backup
 			| start_replication
 			| start_logical_replication
@@ -140,6 +142,18 @@ identify_system:
 				}
 			;
 
+/*
+ * READ_REPLICATION_SLOT %s
+ */
+read_replication_slot:
+			K_READ_REPLICATION_SLOT var_name
+				{
+					ReadReplicationSlotCmd *n = makeNode(ReadReplicationSlotCmd);
+					n->slotname = $2;
+					$$ = (Node *) n;
+				}
+			;
+
 /*
  * SHOW setting
  */
diff --git a/src/backend/replication/repl_scanner.l b/src/backend/replication/repl_scanner.l
index c038a636c3..1b599c255e 100644
--- a/src/backend/replication/repl_scanner.l
+++ b/src/backend/replication/repl_scanner.l
@@ -85,6 +85,7 @@ identifier		{ident_start}{ident_cont}*
 BASE_BACKUP			{ return K_BASE_BACKUP; }
 FAST			{ return K_FAST; }
 IDENTIFY_SYSTEM		{ return K_IDENTIFY_SYSTEM; }
+READ_REPLICATION_SLOT	{ return K_READ_REPLICATION_SLOT; }
 SHOW		{ return K_SHOW; }
 LABEL			{ return K_LABEL; }
 NOWAIT			{ return K_NOWAIT; }
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 3ca2a11389..afcc5f6612 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -232,6 +232,7 @@ static void XLogSendLogical(void);
 static void WalSndDone(WalSndSendDataCallback send_data);
 static XLogRecPtr GetStandbyFlushRecPtr(void);
 static void IdentifySystem(void);
+static void ReadReplicationSlot(ReadReplicationSlotCmd *cmd);
 static void CreateReplicationSlot(CreateReplicationSlotCmd *cmd);
 static void DropReplicationSlot(DropReplicationSlotCmd *cmd);
 static void StartReplication(StartReplicationCmd *cmd);
@@ -457,6 +458,110 @@ IdentifySystem(void)
 	end_tup_output(tstate);
 }
 
+/* Handle READ_REPLICATION_SLOT command */
+static void
+ReadReplicationSlot(ReadReplicationSlotCmd *cmd)
+{
+#define READ_REPLICATION_SLOT_COLS 5
+	ReplicationSlot *slot;
+	DestReceiver *dest;
+	TupOutputState *tstate;
+	TupleDesc	tupdesc;
+	Datum		values[READ_REPLICATION_SLOT_COLS];
+	bool		nulls[READ_REPLICATION_SLOT_COLS];
+
+	tupdesc = CreateTemplateTupleDesc(READ_REPLICATION_SLOT_COLS);
+	TupleDescInitBuiltinEntry(tupdesc, (AttrNumber) 1, "slot_type",
+							  TEXTOID, -1, 0);
+	TupleDescInitBuiltinEntry(tupdesc, (AttrNumber) 2, "restart_lsn",
+							  TEXTOID, -1, 0);
+	TupleDescInitBuiltinEntry(tupdesc, (AttrNumber) 3, "confirmed_flush_lsn",
+							  TEXTOID, -1, 0);
+	TupleDescInitBuiltinEntry(tupdesc, (AttrNumber) 4, "restart_tli",
+							  INT4OID, -1, 0);
+	TupleDescInitBuiltinEntry(tupdesc, (AttrNumber) 5, "confirmed_flush_tli",
+							  INT4OID, -1, 0);
+
+	MemSet(nulls, true, READ_REPLICATION_SLOT_COLS * sizeof(bool));
+
+	LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
+	slot = SearchNamedReplicationSlot(cmd->slotname, false);
+	if (slot == NULL || !slot->in_use)
+	{
+		LWLockRelease(ReplicationSlotControlLock);
+	}
+	else
+	{
+		List	   *timeline_history = NIL;
+		ReplicationSlot slot_contents;
+		int			i = 0;
+		char		xloc[MAXFNAMELEN];
+		TimeLineID	slots_position_timeline;
+
+		/* Copy slot contents while holding spinlock */
+		SpinLockAcquire(&slot->mutex);
+		slot_contents = *slot;
+		SpinLockRelease(&slot->mutex);
+		LWLockRelease(ReplicationSlotControlLock);
+
+		if (OidIsValid(slot_contents.data.database))
+			values[i] = CStringGetTextDatum("logical");
+		else
+			values[i] = CStringGetTextDatum("physical");
+		nulls[i] = false;
+		i++;
+
+		if (!XLogRecPtrIsInvalid(slot_contents.data.restart_lsn))
+		{
+			snprintf(xloc, sizeof(xloc), "%X/%X",
+					 LSN_FORMAT_ARGS(slot_contents.data.restart_lsn));
+			values[i] = CStringGetTextDatum(xloc);
+			nulls[i] = false;
+		}
+		i++;
+
+		if (!XLogRecPtrIsInvalid(slot_contents.data.confirmed_flush))
+		{
+			snprintf(xloc, sizeof(xloc), "%X/%X",
+					 LSN_FORMAT_ARGS(slot_contents.data.confirmed_flush));
+			values[i] = CStringGetTextDatum(xloc);
+			nulls[i] = false;
+		}
+		i++;
+
+		/*
+		 * Now get the timeline this wal was produced on, to get to the
+		 * current timeline
+		 */
+		if (!XLogRecPtrIsInvalid(slot_contents.data.restart_lsn))
+		{
+			timeline_history = readTimeLineHistory(ThisTimeLineID);
+			slots_position_timeline = tliOfPointInHistory(slot_contents.data.restart_lsn,
+														  timeline_history);
+			values[i] = Int32GetDatum(slots_position_timeline);
+			nulls[i] = false;
+		}
+		i++;
+
+		if (!XLogRecPtrIsInvalid(slot_contents.data.confirmed_flush))
+		{
+			if (!timeline_history)
+				timeline_history = readTimeLineHistory(ThisTimeLineID);
+			slots_position_timeline = tliOfPointInHistory(slot_contents.data.confirmed_flush,
+														  timeline_history);
+			values[i] = Int32GetDatum(slots_position_timeline);
+			nulls[i] = false;
+		}
+		i++;
+		Assert(i == READ_REPLICATION_SLOT_COLS);
+	}
+
+	dest = CreateDestReceiver(DestRemoteSimple);
+	tstate = begin_tup_output_tupdesc(dest, tupdesc, &TTSOpsVirtual);
+	do_tup_output(tstate, values, nulls);
+	end_tup_output(tstate);
+}
+
 
 /*
  * Handle TIMELINE_HISTORY command.
@@ -1618,6 +1723,13 @@ exec_replication_command(const char *cmd_string)
 			EndReplicationCommand(cmdtag);
 			break;
 
+		case T_ReadReplicationSlotCmd:
+			cmdtag = "READ_REPLICATION_SLOT";
+			set_ps_display(cmdtag);
+			ReadReplicationSlot((ReadReplicationSlotCmd *) cmd_node);
+			EndReplicationCommand(cmdtag);
+			break;
+
 		case T_BaseBackupCmd:
 			cmdtag = "BASE_BACKUP";
 			set_ps_display(cmdtag);
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 6a4d82f0a8..5f78bdd573 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -495,6 +495,7 @@ typedef enum NodeTag
 	 * TAGS FOR REPLICATION GRAMMAR PARSE NODES (replnodes.h)
 	 */
 	T_IdentifySystemCmd,
+	T_ReadReplicationSlotCmd,
 	T_BaseBackupCmd,
 	T_CreateReplicationSlotCmd,
 	T_DropReplicationSlotCmd,
diff --git a/src/include/nodes/replnodes.h b/src/include/nodes/replnodes.h
index faa3a251f2..46384ea074 100644
--- a/src/include/nodes/replnodes.h
+++ b/src/include/nodes/replnodes.h
@@ -33,6 +33,16 @@ typedef struct IdentifySystemCmd
 	NodeTag		type;
 } IdentifySystemCmd;
 
+/* ----------------------
+ *		READ_REPLICATION_SLOT command
+ * ----------------------
+ */
+typedef struct ReadReplicationSlotCmd
+{
+	NodeTag		type;
+	char	   *slotname;
+} ReadReplicationSlotCmd;
+
 
 /* ----------------------
  *		BASE_BACKUP command
diff --git a/src/test/recovery/t/001_stream_rep.pl b/src/test/recovery/t/001_stream_rep.pl
index ac581c1c07..cbd817cda3 100644
--- a/src/test/recovery/t/001_stream_rep.pl
+++ b/src/test/recovery/t/001_stream_rep.pl
@@ -6,7 +6,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 49;
+use Test::More tests => 55;
 
 # Initialize primary node
 my $node_primary = PostgresNode->new('primary');
@@ -252,6 +252,51 @@ ok( $ret == 0,
 	"SHOW with superuser-settable parameter, replication role and logical replication"
 );
 
+note "testing READ_REPLICATION_SLOT command";
+
+my $slotname = 'test_read_replication_slot_physical';
+
+($ret, $stdout, $stderr) = $node_primary->psql(
+	'postgres',
+	'READ_REPLICATION_SLOT non_existent_slot;',
+	extra_params => [ '-d', $connstr_rep ]);
+ok( $ret == 0,
+	"READ_REPLICATION_SLOT does not produce an error with non existent slot");
+ok($stdout eq '||||',
+	"READ_REPLICATION_SLOT returns NULL values if slot does not exist");
+
+($ret, $stdout, $stderr) = $node_primary->psql(
+	'postgres',
+	"CREATE_REPLICATION_SLOT $slotname PHYSICAL RESERVE_WAL;",
+	extra_params => [ '-d', $connstr_rep ],
+	0,
+	'physical slot created on primary');
+
+($ret, $stdout, $stderr) = $node_primary->psql(
+	'postgres',
+	"READ_REPLICATION_SLOT $slotname;",
+	extra_params => [ '-d', $connstr_rep ]);
+ok($ret == 0,
+	"READ_REPLICATION_SLOT does not produce an error with existing slot");
+ok($stdout =~ 'physical\|[^|]*\|\|1\|',
+	"READ_REPLICATION_SLOT returns tuple corresponding to the slot");
+
+$node_primary->psql(
+	'postgres',
+	"DROP_REPLICATION_SLOT $slotname;",
+	extra_params => [ '-d', $connstr_rep ],
+	0,
+	'physical slot dropped on primary');
+
+($ret, $stdout, $stderr) = $node_primary->psql(
+	'postgres',
+	"READ_REPLICATION_SLOT $slotname;",
+	extra_params => [ '-d', $connstr_rep ]);
+ok($ret == 0,
+	"READ_REPLICATION_SLOT does not produce an error with dropped slot");
+ok($stdout eq '||||',
+	"READ_REPLICATION_SLOT returns NULL values if slot has been dropped");
+
 note "switching to physical replication slot";
 
 # Switch to using a physical replication slot. We can do this without a new
diff --git a/src/test/recovery/t/006_logical_decoding.pl b/src/test/recovery/t/006_logical_decoding.pl
index cc116062c2..50e4fa1042 100644
--- a/src/test/recovery/t/006_logical_decoding.pl
+++ b/src/test/recovery/t/006_logical_decoding.pl
@@ -10,7 +10,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 14;
+use Test::More tests => 16;
 use Config;
 
 # Initialize primary node
@@ -39,6 +39,19 @@ ok( $stderr =~
 	  m/replication slot "test_slot" was not created in this database/,
 	"Logical decoding correctly fails to start");
 
+($result, $stdout, $stderr) = $node_primary->psql(
+	'template1',
+	qq[READ_REPLICATION_SLOT test_slot;],
+	replication => 'database');
+ok($stdout =~ 'logical\|[^|]*\|[^|]*\|1\|1',
+	'Logical replication slot can be read on any logical connection');
+($result, $stdout, $stderr) = $node_primary->psql(
+	'postgres',
+	qq[READ_REPLICATION_SLOT test_slot;],
+	replication => '1');
+ok($stdout =~ 'logical\|[^|]*\|[^|]*\|1\|1',
+	'Logical replication slot can be read on a physical connection');
+
 # Check case of walsender not using a database connection.  Logical
 # decoding should not be allowed.
 ($result, $stdout, $stderr) = $node_primary->psql(
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index f31a1e4e1e..b256ee3be6 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2126,6 +2126,7 @@ ReadBufferMode
 ReadBytePtrType
 ReadExtraTocPtrType
 ReadFunc
+ReadReplicationSlotCmd
 ReassignOwnedStmt
 RecheckForeignScan_function
 RecordCacheEntry
-- 
2.33.0

v5-0002-Use-READ_REPLICATION_SLOT-command-in-pg_receivewa.patchtext/x-patch; charset=UTF-8; name=v5-0002-Use-READ_REPLICATION_SLOT-command-in-pg_receivewa.patchDownload
From a1ca1f98096b05344e8b2c7025d90885ad6f8b8e Mon Sep 17 00:00:00 2001
From: Ronan Dunklau <ronan.dunklau@aiven.io>
Date: Wed, 1 Sep 2021 15:52:32 +0200
Subject: [PATCH v5 2/3] Use READ_REPLICATION_SLOT command in pg_receivewal.

Prior to this patch, when running pg_receivewal, the start LSN is determined by looking at
the WAL files currently stored on disk, then using the current flush lsn
from the server.

If for some reason the WAL files from pg_receivewal were moved, we want
to restart where we left at, which is the replication slot's restart_lsn
instead of skipping right to the current flush location.

To keep compatibility with prior server versions, we only attempt it if
the version is < 15.
---
 src/bin/pg_basebackup/pg_receivewal.c        |  53 +++++++++-
 src/bin/pg_basebackup/streamutil.c           | 100 +++++++++++++++++++
 src/bin/pg_basebackup/streamutil.h           |  16 +++
 src/bin/pg_basebackup/t/020_pg_receivewal.pl |  51 +++++++++-
 4 files changed, 216 insertions(+), 4 deletions(-)

diff --git a/src/bin/pg_basebackup/pg_receivewal.c b/src/bin/pg_basebackup/pg_receivewal.c
index 9d1843728d..9f63df120b 100644
--- a/src/bin/pg_basebackup/pg_receivewal.c
+++ b/src/bin/pg_basebackup/pg_receivewal.c
@@ -408,10 +408,59 @@ StreamLog(void)
 	stream.startpos = FindStreamingStart(&stream.timeline);
 	if (stream.startpos == InvalidXLogRecPtr)
 	{
-		stream.startpos = serverpos;
-		stream.timeline = servertli;
+		/* Try to get it from the slot if any, and the server supports it */
+		if (replication_slot)
+		{
+			SlotInformation slot_info;
+
+			switch (GetSlotInformation(conn, replication_slot, &slot_info))
+			{
+				case READ_REPLICATION_SLOT_ERROR:
+
+					/*
+					 * Error has been logged by GetSlotInformation, return and
+					 * maybe retry
+					 */
+					return;
+				case READ_REPLICATION_SLOT_NONEXISTENT:
+					pg_log_error("replication slot \"%s\" does not exist", replication_slot);
+					return;
+				case READ_REPLICATION_SLOT_UNSUPPORTED:
+
+					/*
+					 * Server doesn't support the command, so fallback to the
+					 * default case
+					 */
+					pg_log_warning("server does not support fetching the slot's position, "
+								   "resuming from current server position instead");
+					break;
+				default:
+					if (slot_info.is_logical)
+					{
+						/*
+						 * If the slot is not physical we can't expect to
+						 * recover from that
+						 */
+						pg_log_error("cannot use the slot provided, physical slot expected.");
+						exit(1);
+					}
+					stream.startpos = slot_info.restart_lsn;
+					stream.timeline = slot_info.restart_tli;
+			}
+		}
+
+		/*
+		 * If it is still unknown, use the current flush value from the server
+		 */
+		if (stream.startpos == InvalidXLogRecPtr)
+		{
+			stream.startpos = serverpos;
+			stream.timeline = servertli;
+		}
 	}
 
+	Assert(stream.startpos != InvalidXLogRecPtr);
+
 	/*
 	 * Always start streaming at the beginning of a segment
 	 */
diff --git a/src/bin/pg_basebackup/streamutil.c b/src/bin/pg_basebackup/streamutil.c
index f5b3b476e5..a4d25552b3 100644
--- a/src/bin/pg_basebackup/streamutil.c
+++ b/src/bin/pg_basebackup/streamutil.c
@@ -479,6 +479,106 @@ RunIdentifySystem(PGconn *conn, char **sysid, TimeLineID *starttli,
 	return true;
 }
 
+
+/*
+ * Returns wether a replication slot exists through a given connection,
+ * and fills in the slot_info with the results if passed by the caller.
+ */
+ReadReplicationSlotStatus
+GetSlotInformation(PGconn *conn, const char *slot_name, SlotInformation * slot_info)
+{
+	PGresult   *res;
+	PQExpBuffer query;
+
+	Assert(slot_name != NULL);
+
+	if (PQserverVersion(conn) < 150000)
+		return READ_REPLICATION_SLOT_UNSUPPORTED;
+
+
+	query = createPQExpBuffer();
+	appendPQExpBuffer(query, "READ_REPLICATION_SLOT %s", slot_name);
+	res = PQexec(conn, query->data);
+	destroyPQExpBuffer(query);
+	/* The commpand should always return precisely one tuple */
+	if (PQresultStatus(res) != PGRES_TUPLES_OK || PQntuples(res) != 1)
+	{
+		pg_log_error("could not read replication slot : %s",
+					 PQerrorMessage(conn));
+		PQclear(res);
+		return READ_REPLICATION_SLOT_ERROR;
+	}
+
+	/*
+	 * When the slot doesn't exist, the command returns an all-null tuple. The
+	 * first column (slot_type) will only be null if the slot doesn't exists.
+	 */
+	if (PQgetisnull(res, 0, 0))
+	{
+		PQclear(res);
+		return READ_REPLICATION_SLOT_NONEXISTENT;
+	}
+	if (PQntuples(res) != 1 || PQnfields(res) < 5)
+	{
+		pg_log_error("could not fetch replication slot: got %d rows and %d fields, expected %d rows and %d or more fields",
+					 PQntuples(res), PQnfields(res), 1, 5);
+		PQclear(res);
+		return READ_REPLICATION_SLOT_ERROR;
+	}
+	/* If no slotinformation has been passed, we can return immediately */
+	if (slot_info == NULL)
+	{
+		PQclear(res);
+		return READ_REPLICATION_SLOT_OK;
+	}
+
+	slot_info->is_logical = strcmp(PQgetvalue(res, 0, 0), "logical") == 0;
+	/* Restart LSN */
+	if (PQgetisnull(res, 0, 1))
+		slot_info->restart_lsn = InvalidXLogRecPtr;
+	else
+	{
+		uint32		hi,
+					lo;
+
+		if (sscanf(PQgetvalue(res, 0, 1), "%X/%X", &hi, &lo) != 2)
+		{
+			pg_log_error("could not parse slot's restart_lsn \"%s\"",
+						 PQgetvalue(res, 0, 1));
+			PQclear(res);
+			return READ_REPLICATION_SLOT_ERROR;
+		}
+		slot_info->restart_lsn = ((uint64) hi) << 32 | lo;
+	}
+	if (PQgetisnull(res, 0, 2))
+		slot_info->confirmed_flush_lsn = InvalidXLogRecPtr;
+	else
+	{
+		uint32		hi,
+					lo;
+
+		if (sscanf(PQgetvalue(res, 0, 2), "%X/%X", &hi, &lo) != 2)
+		{
+			pg_log_error("could not parse slot's confirmed_flush_lsn \"%s\"",
+						 PQgetvalue(res, 0, 2));
+			PQclear(res);
+			return READ_REPLICATION_SLOT_ERROR;
+		}
+		slot_info->confirmed_flush_lsn = ((uint64) hi) << 32 | lo;
+	}
+
+	if (PQgetisnull(res, 0, 3))
+		slot_info->restart_tli = 0;
+	else
+		slot_info->restart_tli = atoi(PQgetvalue(res, 0, 3));
+	if (PQgetisnull(res, 0, 4))
+		slot_info->confirmed_flush_tli = 0;
+	else
+		slot_info->confirmed_flush_tli = atoi(PQgetvalue(res, 0, 4));
+	PQclear(res);
+	return READ_REPLICATION_SLOT_OK;
+}
+
 /*
  * Create a replication slot for the given connection. This function
  * returns true in case of success.
diff --git a/src/bin/pg_basebackup/streamutil.h b/src/bin/pg_basebackup/streamutil.h
index 504803b976..cd3bc3229a 100644
--- a/src/bin/pg_basebackup/streamutil.h
+++ b/src/bin/pg_basebackup/streamutil.h
@@ -28,6 +28,14 @@ extern uint32 WalSegSz;
 /* Connection kept global so we can disconnect easily */
 extern PGconn *conn;
 
+typedef struct SlotInformation {
+	bool	   is_logical;
+	XLogRecPtr restart_lsn;
+	XLogRecPtr confirmed_flush_lsn;
+	TimeLineID restart_tli;
+	TimeLineID confirmed_flush_tli;
+} SlotInformation;
+
 extern PGconn *GetConnection(void);
 
 /* Replication commands */
@@ -40,6 +48,14 @@ extern bool RunIdentifySystem(PGconn *conn, char **sysid,
 							  TimeLineID *starttli,
 							  XLogRecPtr *startpos,
 							  char **db_name);
+typedef enum {
+	READ_REPLICATION_SLOT_OK,
+	READ_REPLICATION_SLOT_UNSUPPORTED,
+	READ_REPLICATION_SLOT_ERROR,
+	READ_REPLICATION_SLOT_NONEXISTENT
+} ReadReplicationSlotStatus;
+
+extern ReadReplicationSlotStatus GetSlotInformation(PGconn *conn, const char *slot_name, SlotInformation *slot_info);
 extern bool RetrieveWalSegSize(PGconn *conn);
 extern TimestampTz feGetCurrentTimestamp(void);
 extern void feTimestampDifference(TimestampTz start_time, TimestampTz stop_time,
diff --git a/src/bin/pg_basebackup/t/020_pg_receivewal.pl b/src/bin/pg_basebackup/t/020_pg_receivewal.pl
index 0b33d73900..e79fd9d7f0 100644
--- a/src/bin/pg_basebackup/t/020_pg_receivewal.pl
+++ b/src/bin/pg_basebackup/t/020_pg_receivewal.pl
@@ -5,7 +5,7 @@ use strict;
 use warnings;
 use TestLib;
 use PostgresNode;
-use Test::More tests => 27;
+use Test::More tests => 32;
 
 program_help_ok('pg_receivewal');
 program_version_ok('pg_receivewal');
@@ -49,7 +49,7 @@ is($primary->slot($slot_name)->{'slot_type'},
 
 # Generate some WAL.  Use --synchronous at the same time to add more
 # code coverage.  Switch to the next segment first so that subsequent
-# restarts of pg_receivewal will see this segment as full..
+# restarts of pg_receivewal will see this segment as full
 $primary->psql('postgres', 'CREATE TABLE test_table(x integer);');
 $primary->psql('postgres', 'SELECT pg_switch_wal();');
 my $nextlsn =
@@ -146,6 +146,53 @@ $primary->command_ok(
 $partial_wals[0] =~ s/(\.gz)?.partial//;
 ok(-e $partial_wals[0], "check that previously partial WAL is now complete");
 
+# Verify that if we use a replication slot, we resume where we left even in the
+# absence of WALs
+
+# Setup the slot, and connect to it a first time
+$primary->run_log(
+	[ 'pg_receivewal', '--slot', $slot_name, '--create-slot' ],
+	'creating a replication slot');
+$primary->psql('postgres',
+	'INSERT INTO test_table VALUES (generate_series(1,100));');
+$primary->psql('postgres', 'SELECT pg_switch_wal();');
+$nextlsn =
+  $primary->safe_psql('postgres', 'SELECT pg_current_wal_insert_lsn();');
+chomp($nextlsn);
+$primary->run_log(
+	[ 'pg_receivewal', '-D', $stream_dir, '--slot', $slot_name, '--verbose', '--endpos', $nextlsn ],
+	"streaming some WAL");
+
+# Get the slot restart_lsn and make sure we retrieve the associated WAL file
+# even after deletion of the previous stored WAL files.
+$slot = $primary->slot($slot_name);
+my $restart_lsn = $slot->{'restart_lsn'};
+# Add one so that the restart_lsn doesn't correspond to the previous file.
+$restart_lsn =~ s/.$/1/;
+my $walfile_to_be_archived = $primary->safe_psql('postgres',
+	"SELECT pg_walfile_name('$restart_lsn');");
+isnt($restart_lsn, '', 'restart LSN of new slot is not null');
+
+unlink glob "'${stream_dir}/*'";
+
+$primary->psql('postgres',
+	'INSERT INTO test_table VALUES (generate_series(1,100));');
+$primary->psql('postgres', 'SELECT pg_switch_wal();');
+$nextlsn =
+  $primary->safe_psql('postgres', 'SELECT pg_current_wal_insert_lsn();');
+chomp($nextlsn);
+# First verify what happens if we try with a non-existing slot.
+$primary->command_fails_like(
+  [ 'pg_receivewal', '-D', $stream_dir, '--slot', 'nonexistentslot', '-n', '--verbose', '--endpos', $nextlsn ],
+  qr/pg_receivewal: error: replication slot "nonexistentslot" does not exist/,
+  'pg_receivewal fails with a non-existing slot');
+$primary->command_ok(
+	[ 'pg_receivewal', '-D', $stream_dir, '--slot', $slot_name, '--verbose', '--endpos', $nextlsn ],
+	"Stream some wal starting from the slot's restart_lsn");
+$slot = $primary->slot($slot_name);
+my @walfiles = glob "${stream_dir}/*";
+ok(-e "$stream_dir/$walfile_to_be_archived", "WAL from the slot's restart_lsn has been archived");
+
 # Permissions on WAL files should be default
 SKIP:
 {
-- 
2.33.0

v5-0003-Check-slot-existence-in-pg_basebackup.patchtext/x-patch; charset=UTF-8; name=v5-0003-Check-slot-existence-in-pg_basebackup.patchDownload
From bbd70460e8de09da8bcacb0df3aa92e043fbff14 Mon Sep 17 00:00:00 2001
From: Ronan Dunklau <ronan.dunklau@aiven.io>
Date: Wed, 1 Sep 2021 15:52:53 +0200
Subject: [PATCH v5 3/3] Check slot existence in pg_basebackup.

Use the newly introduced READ_REPLICATION_SLOT command to check for a
slot existence in pg_basebackup. That way, we can fail early.
---
 src/bin/pg_basebackup/pg_basebackup.c        | 31 ++++++++++++++++++++
 src/bin/pg_basebackup/t/010_pg_basebackup.pl |  7 +++--
 2 files changed, 35 insertions(+), 3 deletions(-)

diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 7296eb97d0..a807d5a699 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -1867,6 +1867,37 @@ BaseBackup(void)
 	if (!RunIdentifySystem(conn, &sysidentifier, &latesttli, NULL, NULL))
 		exit(1);
 
+	/*
+	 * Check the replication slot exists if applicable
+	 */
+	if (replication_slot && !(temp_replication_slot || create_slot))
+	{
+		SlotInformation slot_info;
+
+		switch (GetSlotInformation(conn, replication_slot, &slot_info))
+		{
+			case READ_REPLICATION_SLOT_ERROR:
+
+				/*
+				 * Error has been logged by GetSlotInformation
+				 */
+				exit(1);
+			case READ_REPLICATION_SLOT_UNSUPPORTED:
+				/* We don't care about the result here */
+				break;
+			case READ_REPLICATION_SLOT_NONEXISTENT:
+				pg_log_error("replication slot \"%s\" does not exist", replication_slot);
+				exit(1);
+			case READ_REPLICATION_SLOT_OK:
+				if (slot_info.is_logical)
+				{
+					pg_log_error("cannot use the slot provided, physical slot expected.");
+					exit(1);
+				}
+				break;
+		}
+	}
+
 	/*
 	 * Start the actual backup
 	 */
diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
index a2cb2a7679..69fd262a97 100644
--- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl
+++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
@@ -10,7 +10,7 @@ use File::Path qw(rmtree);
 use Fcntl qw(:seek);
 use PostgresNode;
 use TestLib;
-use Test::More tests => 110;
+use Test::More tests => 111;
 
 program_help_ok('pg_basebackup');
 program_version_ok('pg_basebackup');
@@ -465,14 +465,15 @@ $node->command_ok(
 	'pg_basebackup -X stream runs with --no-slot');
 rmtree("$tempdir/backupnoslot");
 
-$node->command_fails(
+$node->command_fails_like(
 	[
 		'pg_basebackup',             '-D',
 		"$tempdir/backupxs_sl_fail", '-X',
 		'stream',                    '-S',
 		'slot0'
 	],
-	'pg_basebackup fails with nonexistent replication slot');
+	qr/pg_basebackup: error: replication slot "slot0" does not exist/,
+	'pg_basebackup fails early with nonexistent replication slot');
 
 $node->command_fails(
 	[ 'pg_basebackup', '-D', "$tempdir/backupxs_slot", '-C' ],
-- 
2.33.0

#23Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Ronan Dunklau (#22)
Re: pg_receivewal starting position

On Fri, Sep 3, 2021 at 3:28 PM Ronan Dunklau <ronan.dunklau@aiven.io> wrote:

There is still the concern raised by Bharath about being able to select the
way to fetch the replication slot information for the user, and what should or
should not be included in the documentation. I am in favor of documenting the
process of selecting the wal start, and maybe include a --start-lsn option for
the user to override it, but that's maybe for another patch.

Let's hear from others.

Thanks for the patches. I have some quick comments on the v5 patch-set:

0001:
1) Do you also want to MemSet values too in ReadReplicationSlot?

2) When if clause has single statement we don't generally use "{" and "}"
+ if (slot == NULL || !slot->in_use)
+ {
+ LWLockRelease(ReplicationSlotControlLock);
+ }
you can just have:
+ if (slot == NULL || !slot->in_use)
+ LWLockRelease(ReplicationSlotControlLock);
3) This is still not copying the slot contents, as I said earlier you
can just take the required info into some local variables instead of
taking the slot pointer to slot_contents pointer.
+ /* Copy slot contents while holding spinlock */
+ SpinLockAcquire(&slot->mutex);
+ slot_contents = *slot;
+ SpinLockRelease(&slot->mutex);
+ LWLockRelease(ReplicationSlotControlLock);
4) As I said earlier, why can't we have separate tli variables for
more readability instead of just one slots_position_timeline and
timeline_history variable? And you are not even resetting those 2
variables after if
(!XLogRecPtrIsInvalid(slot_contents.data.restart_lsn)), you might end
up sending the restart_lsn timelineid for confirmed_flush. So, better
use two separate variables. In fact you can use block local variables:
+ if (!XLogRecPtrIsInvalid(slot_contents.data.restart_lsn))
+ {
+ List    *tl_history= NIL;
+ TimeLineID tli;
+ tl_history= readTimeLineHistory(ThisTimeLineID);
+ tli = tliOfPointInHistory(slot_contents.data.restart_lsn,
+   tl_history);
+ values[i] = Int32GetDatum(tli);
+ nulls[i] = false;
+ }
similarly for confirmed_flush.
5) I still don't see the need for below test case:
+ "READ_REPLICATION_SLOT does not produce an error with non existent slot");
when we have
+ "READ_REPLICATION_SLOT does not produce an error with dropped slot");
Because for a user, dropped or non existent slot is same, it's just
that for dropped slot we internally don't delete its entry from the
shared memory.
0002:
1) Imagine GetSlotInformation always returns
READ_REPLICATION_SLOT_ERROR, don't you think StreamLog enters an
infinite loop there? Instead, why don't you just exit(1); instead of
return; and retry? Similarly for READ_REPLICATION_SLOT_NONEXISTENT? I
think, you can just do exit(1), no need to retry.
+ case READ_REPLICATION_SLOT_ERROR:
+
+ /*
+ * Error has been logged by GetSlotInformation, return and
+ * maybe retry
+ */
+ return;
2) Why is it returning READ_REPLICATION_SLOT_OK when slot_info isn't
passed? And why are you having this check after you connect to the
server and fetch the data?
+ /* If no slotinformation has been passed, we can return immediately */
+ if (slot_info == NULL)
+ {
+ PQclear(res);
+ return READ_REPLICATION_SLOT_OK;
+ }
Instead you can just have a single assert:

+ Assert(slot_name && slot_info );

3) How about
pg_log_error("could not read replication slot:
instead of
pg_log_error("could not fetch replication slot:

4) Why are you having the READ_REPLICATION_SLOT_OK case in default?
+ default:
+ if (slot_info.is_logical)
+ {
+ /*
+ * If the slot is not physical we can't expect to
+ * recover from that
+ */
+ pg_log_error("cannot use the slot provided, physical slot expected.");
+ exit(1);
+ }
+ stream.startpos = slot_info.restart_lsn;
+ stream.timeline = slot_info.restart_tli;
+ }
You can just have another case statement for READ_REPLICATION_SLOT_OK
and in the default you can throw an error "unknown read replication
slot status" or some other better working and exit(1);
5) Do you want initialize slot_info to 0?
+ if (replication_slot)
+ {
+ SlotInformation slot_info;
+ MemSet(slot_info, 0, sizeof(SlotInformation));

6) This isn't readable:
+ slot_info->is_logical = strcmp(PQgetvalue(res, 0, 0), "logical") == 0;
How about:
if (strcmp(PQgetvalue(res, 0, 0), "logical") == 0)
slot_info->is_logical = true;
You don't need to set it false, because you would have
MemSet(slot_info) in the caller.

7) How about
uint32 hi;
uint32 lo;
instead of
+ uint32 hi,
+ lo;

8) Move SlotInformation * slot_info) to the next line as it crosses
the 80 char limit.
+GetSlotInformation(PGconn *conn, const char *slot_name,
SlotInformation * slot_info)

9) Instead of a boolean is_logical, I would rather suggest to use an
enum or #define macros the slot types correctly, because someday we
might introduce new type slots and somebody wants is_physical kind of
variable name?
+typedef struct SlotInformation {
+ bool is_logical;

Regards,
Bharath Rupireddy.

#24Michael Paquier
michael@paquier.xyz
In reply to: Ronan Dunklau (#22)
Re: pg_receivewal starting position

On Fri, Sep 03, 2021 at 11:58:27AM +0200, Ronan Dunklau wrote:

0002 for pg_receivewal tried to simplify the logic of information to return,
by using a dedicated struct for this. This accounts for Bahrath's demands to
return every possible field.
In particular, an is_logical field simplifies the detection of the type of slot.
In my opinion it makes sense to simplify it like this on the client side while
being more open-minded on the server side if we ever need to provide a new
type of slot. Also, GetSlotInformation now returns an enum to be able to
handle the different modes of failures, which differ between pg_receivewal and
pg_basebackup.

+   if (PQserverVersion(conn) < 150000)
+       return READ_REPLICATION_SLOT_UNSUPPORTED;
[...]
+typedef enum {
+   READ_REPLICATION_SLOT_OK,
+   READ_REPLICATION_SLOT_UNSUPPORTED,
+   READ_REPLICATION_SLOT_ERROR,
+   READ_REPLICATION_SLOT_NONEXISTENT
+} ReadReplicationSlotStatus;

Do you really need this much complication? We could treat the
unsupported case and the non-existing case the same way: we don't know
so just assign InvalidXlogRecPtr or NULL to the fields of the
structure, and make GetSlotInformation() return false just on error,
with some pg_log_error() where adapted in its internals.

There is still the concern raised by Bharath about being able to select the
way to fetch the replication slot information for the user, and what should or
should not be included in the documentation. I am in favor of documenting the
process of selecting the wal start, and maybe include a --start-lsn option for
the user to override it, but that's maybe for another patch.

The behavior of pg_receivewal that you are implementing should be
documented. We don't say either how the start location is selected
when working on an existing directory, so I would recommend to add a
paragraph in the description section to detail all that, as of:
- First a scan of the existing archives is done.
- If nothing is found, and if there is a slot, request the slot
information.
- If still nothing (aka slot undefined, or command not supported), use
the last flush location.

As a whole, I am not really convinced that we need a new option for
that as long as we rely on a slot with pg_receivewal as these are used
to make sure that we avoid holes in the WAL archives.

Regarding pg_basebackup, Daniel has proposed a couple of days ago a
different solution to trap errors earlier, which would cover the case
dealt with here:
/messages/by-id/0F69E282-97F9-4DB7-8D6D-F927AA6340C8@yesql.se
We should not mimic in the frontend errors that are safely trapped in
the backend with the proper locks, in any case.

While on it, READ_REPLICATION_SLOT returns a confirmed LSN when
grabbing the data of a logical slot. We are not going to use that
with pg_recvlogical as by default START_STREAMING 0/0 would just use
the confirmed LSN. Do we have use cases where this information would
help? There is the argument of consistency with physical slots and
that this can be helpful to do sanity checks for clients, but that's
rather thin.
--
Michael

#25Ronan Dunklau
ronan.dunklau@aiven.io
In reply to: Michael Paquier (#24)
Re: pg_receivewal starting position

Le lundi 6 septembre 2021, 06:22:30 CEST Michael Paquier a écrit :

On Fri, Sep 03, 2021 at 11:58:27AM +0200, Ronan Dunklau wrote:

0002 for pg_receivewal tried to simplify the logic of information to
return, by using a dedicated struct for this. This accounts for Bahrath's
demands to return every possible field.
In particular, an is_logical field simplifies the detection of the type of
slot. In my opinion it makes sense to simplify it like this on the client
side while being more open-minded on the server side if we ever need to
provide a new type of slot. Also, GetSlotInformation now returns an enum
to be able to handle the different modes of failures, which differ
between pg_receivewal and pg_basebackup.

+   if (PQserverVersion(conn) < 150000)
+       return READ_REPLICATION_SLOT_UNSUPPORTED;
[...]
+typedef enum {
+   READ_REPLICATION_SLOT_OK,
+   READ_REPLICATION_SLOT_UNSUPPORTED,
+   READ_REPLICATION_SLOT_ERROR,
+   READ_REPLICATION_SLOT_NONEXISTENT
+} ReadReplicationSlotStatus;

Do you really need this much complication? We could treat the
unsupported case and the non-existing case the same way: we don't know
so just assign InvalidXlogRecPtr or NULL to the fields of the
structure, and make GetSlotInformation() return false just on error,
with some pg_log_error() where adapted in its internals.

I actually started with the implementation you propose, but changed my mind
while writing it because I realised it's easier to reason about like this,
instead of failing silently during READ_REPLICATION_SLOT to fail a bit later
when actually trying to start the replication slot because it doesn't exists.
Either the user expects the replication slot to exists, and in this case we
should retry the whole loop in the hope of getting an interesting LSN, or the
user doesn't and shouldn't even pass a replication_slot name to begin with.

There is still the concern raised by Bharath about being able to select
the
way to fetch the replication slot information for the user, and what
should or should not be included in the documentation. I am in favor of
documenting the process of selecting the wal start, and maybe include a
--start-lsn option for the user to override it, but that's maybe for
another patch.

The behavior of pg_receivewal that you are implementing should be
documented. We don't say either how the start location is selected
when working on an existing directory, so I would recommend to add a
paragraph in the description section to detail all that, as of:
- First a scan of the existing archives is done.
- If nothing is found, and if there is a slot, request the slot
information.
- If still nothing (aka slot undefined, or command not supported), use
the last flush location.

Sounds good, I will add another patch for the documentation of this.

As a whole, I am not really convinced that we need a new option for
that as long as we rely on a slot with pg_receivewal as these are used
to make sure that we avoid holes in the WAL archives.

Regarding pg_basebackup, Daniel has proposed a couple of days ago a
different solution to trap errors earlier, which would cover the case
dealt with here:
/messages/by-id/0F69E282-97F9-4DB7-8D6D-F927AA6340C8@y
esql.se

I will take a look.

We should not mimic in the frontend errors that are safely trapped
in the backend with the proper locks, in any case.

I don't understand what you mean by this ? My original proposal was for the
command to actually attach to the replication slot while reading it, thus
keeping a lock on it to prevent dropping or concurrent access on the server.

While on it, READ_REPLICATION_SLOT returns a confirmed LSN when
grabbing the data of a logical slot. We are not going to use that
with pg_recvlogical as by default START_STREAMING 0/0 would just use
the confirmed LSN. Do we have use cases where this information would
help? There is the argument of consistency with physical slots and
that this can be helpful to do sanity checks for clients, but that's
rather thin.

If we don't, we should rename the command to READ_PHYSICAL_REPLICATION_SLOT
and error out on the server if the slot is not actually a physical one to
spare the client from performing those checks. I still think it's better to
support both cases as opposed to having two completely different APIs
(READ_(PHYSICAL)_REPLICATION_SLOT for physical ones on a replication
connection, pg_replication_slots view for logical ones) as it would enable
more for third-party clients at a relatively low maintenance cost for us.

--
Ronan Dunklau

#26Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Ronan Dunklau (#25)
Re: pg_receivewal starting position

On Mon, Sep 6, 2021 at 12:20 PM Ronan Dunklau <ronan.dunklau@aiven.io> wrote:

Le lundi 6 septembre 2021, 06:22:30 CEST Michael Paquier a écrit :

On Fri, Sep 03, 2021 at 11:58:27AM +0200, Ronan Dunklau wrote:

0002 for pg_receivewal tried to simplify the logic of information to
return, by using a dedicated struct for this. This accounts for Bahrath's
demands to return every possible field.
In particular, an is_logical field simplifies the detection of the type of
slot. In my opinion it makes sense to simplify it like this on the client
side while being more open-minded on the server side if we ever need to
provide a new type of slot. Also, GetSlotInformation now returns an enum
to be able to handle the different modes of failures, which differ
between pg_receivewal and pg_basebackup.

+   if (PQserverVersion(conn) < 150000)
+       return READ_REPLICATION_SLOT_UNSUPPORTED;
[...]
+typedef enum {
+   READ_REPLICATION_SLOT_OK,
+   READ_REPLICATION_SLOT_UNSUPPORTED,
+   READ_REPLICATION_SLOT_ERROR,
+   READ_REPLICATION_SLOT_NONEXISTENT
+} ReadReplicationSlotStatus;

Do you really need this much complication? We could treat the
unsupported case and the non-existing case the same way: we don't know
so just assign InvalidXlogRecPtr or NULL to the fields of the
structure, and make GetSlotInformation() return false just on error,
with some pg_log_error() where adapted in its internals.

I actually started with the implementation you propose, but changed my mind
while writing it because I realised it's easier to reason about like this,
instead of failing silently during READ_REPLICATION_SLOT to fail a bit later
when actually trying to start the replication slot because it doesn't exists.
Either the user expects the replication slot to exists, and in this case we
should retry the whole loop in the hope of getting an interesting LSN, or the
user doesn't and shouldn't even pass a replication_slot name to begin with.

I don't think so we need to retry the whole loop if the
READ_REPLICATION_SLOT command fails as pg_receivewal might enter an
infinite loop there. IMO, we should just exit(1) if
READ_REPLICATION_SLOT fails.

There is still the concern raised by Bharath about being able to select
the
way to fetch the replication slot information for the user, and what
should or should not be included in the documentation. I am in favor of
documenting the process of selecting the wal start, and maybe include a
--start-lsn option for the user to override it, but that's maybe for
another patch.

The behavior of pg_receivewal that you are implementing should be
documented. We don't say either how the start location is selected
when working on an existing directory, so I would recommend to add a
paragraph in the description section to detail all that, as of:
- First a scan of the existing archives is done.
- If nothing is found, and if there is a slot, request the slot
information.
- If still nothing (aka slot undefined, or command not supported), use
the last flush location.

Sounds good, I will add another patch for the documentation of this.

+1.

While on it, READ_REPLICATION_SLOT returns a confirmed LSN when
grabbing the data of a logical slot. We are not going to use that
with pg_recvlogical as by default START_STREAMING 0/0 would just use
the confirmed LSN. Do we have use cases where this information would
help? There is the argument of consistency with physical slots and
that this can be helpful to do sanity checks for clients, but that's
rather thin.

If we don't, we should rename the command to READ_PHYSICAL_REPLICATION_SLOT
and error out on the server if the slot is not actually a physical one to
spare the client from performing those checks. I still think it's better to
support both cases as opposed to having two completely different APIs
(READ_(PHYSICAL)_REPLICATION_SLOT for physical ones on a replication
connection, pg_replication_slots view for logical ones) as it would enable
more for third-party clients at a relatively low maintenance cost for us.

-1 for READ_PHYSICAL_REPLICATION_SLOT or failing on the server. What
happens if we have another slot type "PHYSIOLOGICAL" or "FOO" or "BAR"
some other? IMO, READ_REPLICATION_SLOT should just return info of all
slots. The clients know better how to deal with the slot type.
Although, we don't have a use case for logical slots with the
READ_REPLICATION_SLOT command, let's not change it.

If others are still concerned about the unnecessary slot being
returned, you might consider passing in a parameter to
READ_REPLICATION_SLOT command, something like below. But this too
looks complex to me. I would vote for what the existing patch does
with READ_REPLICATION_SLOT.
READ_REPLICATION_SLOT 'slot_name' 'physical'; only returns physical
slot info with the given name.
READ_REPLICATION_SLOT 'slot_name' 'logical'; only returns logical slot
with the given name.

Regards,
Bharath Rupireddy.

#27Ronan Dunklau
ronan.dunklau@aiven.io
In reply to: Bharath Rupireddy (#8)
Re: pg_receivewal starting position

Le vendredi 3 septembre 2021 17:49:34 CEST, vous avez écrit :

On Fri, Sep 3, 2021 at 3:28 PM Ronan Dunklau <ronan.dunklau@aiven.io> wrote:

There is still the concern raised by Bharath about being able to select
the
way to fetch the replication slot information for the user, and what
should or should not be included in the documentation. I am in favor of
documenting the process of selecting the wal start, and maybe include a
--start-lsn option for the user to override it, but that's maybe for
another patch.

Let's hear from others.

Thanks for the patches. I have some quick comments on the v5 patch-set:

0001:
1) Do you also want to MemSet values too in ReadReplicationSlot?

2) When if clause has single statement we don't generally use "{" and "}"
+ if (slot == NULL || !slot->in_use)
+ {
+ LWLockRelease(ReplicationSlotControlLock);
+ }
you can just have:
+ if (slot == NULL || !slot->in_use)
+ LWLockRelease(ReplicationSlotControlLock);
3) This is still not copying the slot contents, as I said earlier you
can just take the required info into some local variables instead of
taking the slot pointer to slot_contents pointer.
+ /* Copy slot contents while holding spinlock */
+ SpinLockAcquire(&slot->mutex);
+ slot_contents = *slot;
+ SpinLockRelease(&slot->mutex);
+ LWLockRelease(ReplicationSlotControlLock);
4) As I said earlier, why can't we have separate tli variables for
more readability instead of just one slots_position_timeline and
timeline_history variable? And you are not even resetting those 2
variables after if
(!XLogRecPtrIsInvalid(slot_contents.data.restart_lsn)), you might end
up sending the restart_lsn timelineid for confirmed_flush. So, better
use two separate variables. In fact you can use block local variables:
+ if (!XLogRecPtrIsInvalid(slot_contents.data.restart_lsn))
+ {
+ List    *tl_history= NIL;
+ TimeLineID tli;
+ tl_history= readTimeLineHistory(ThisTimeLineID);
+ tli = tliOfPointInHistory(slot_contents.data.restart_lsn,
+   tl_history);
+ values[i] = Int32GetDatum(tli);
+ nulls[i] = false;
+ }
similarly for confirmed_flush.
5) I still don't see the need for below test case:
+ "READ_REPLICATION_SLOT does not produce an error with non existent slot");
when we have
+ "READ_REPLICATION_SLOT does not produce an error with dropped slot");
Because for a user, dropped or non existent slot is same, it's just
that for dropped slot we internally don't delete its entry from the
shared memory.

Thank you for reiterating those concerns. As I said, I haven't touched
Michael's version of the patch. I was incorporating those changes in my branch
before he sent this, so I'll probably merge both before sending an updated
patch.

0002:
1) Imagine GetSlotInformation always returns
READ_REPLICATION_SLOT_ERROR, don't you think StreamLog enters an
infinite loop there? Instead, why don't you just exit(1); instead of
return; and retry? Similarly for READ_REPLICATION_SLOT_NONEXISTENT? I
think, you can just do exit(1), no need to retry.
+ case READ_REPLICATION_SLOT_ERROR:
+
+ /*
+ * Error has been logged by GetSlotInformation, return and
+ * maybe retry
+ */
+ return;

This is the same behaviour we had before: if there is an error with
pg_receivewal we retry the command. There was no special case for the
replication slot not existing before, I don't see why we should change it now
?

Eg:

2021-09-06 09:11:07.774 CEST [95853] ERROR: replication slot "nonexistent"
does not exist
2021-09-06 09:11:07.774 CEST [95853] STATEMENT: START_REPLICATION SLOT
"nonexistent" 0/1000000 TIMELINE 1
pg_receivewal: error: could not send replication command "START_REPLICATION":
ERROR: replication slot "nonexistent" does not exist
pg_receivewal: disconnected; waiting 5 seconds to try again

Users may rely on it to keep retrying in the background until the slot has
been created for example.

2) Why is it returning READ_REPLICATION_SLOT_OK when slot_info isn't
passed? And why are you having this check after you connect to the
server and fetch the data?
+ /* If no slotinformation has been passed, we can return immediately */
+ if (slot_info == NULL)
+ {
+ PQclear(res);
+ return READ_REPLICATION_SLOT_OK;
+ }
Instead you can just have a single assert:

+ Assert(slot_name && slot_info );

At first it was so that we didn't have to fill in all required information if we
don't need too, but it turns out pg_basebackup also has to check for the
slot's type. I agree we should not support the null slot_info case anymore.

3) How about
pg_log_error("could not read replication slot:
instead of
pg_log_error("could not fetch replication slot:

Ok.

4) Why are you having the READ_REPLICATION_SLOT_OK case in default?
+ default:
+ if (slot_info.is_logical)
+ {
+ /*
+ * If the slot is not physical we can't expect to
+ * recover from that
+ */
+ pg_log_error("cannot use the slot provided, physical slot expected.");
+ exit(1);
+ }
+ stream.startpos = slot_info.restart_lsn;
+ stream.timeline = slot_info.restart_tli;
+ }
You can just have another case statement for READ_REPLICATION_SLOT_OK
and in the default you can throw an error "unknown read replication
slot status" or some other better working and exit(1);

Ok.

5) Do you want initialize slot_info to 0?
+ if (replication_slot)
+ {
+ SlotInformation slot_info;
+ MemSet(slot_info, 0, sizeof(SlotInformation));

6) This isn't readable:
+ slot_info->is_logical = strcmp(PQgetvalue(res, 0, 0), "logical") == 0;
How about:
if (strcmp(PQgetvalue(res, 0, 0), "logical") == 0)
slot_info->is_logical = true;
You don't need to set it false, because you would have
MemSet(slot_info) in the caller.

Isn't it preferable to fill in the whole struct without any regards to it's
prior state ? I will modify the is_logical assignment to make it more readable
though.

7) How about
uint32 hi;
uint32 lo;
instead of
+ uint32 hi,
+ lo;

Ok.

8) Move SlotInformation * slot_info) to the next line as it crosses
the 80 char limit.
+GetSlotInformation(PGconn *conn, const char *slot_name,
SlotInformation * slot_info)

Ok.

9) Instead of a boolean is_logical, I would rather suggest to use an
enum or #define macros the slot types correctly, because someday we
might introduce new type slots and somebody wants is_physical kind of
variable name?
+typedef struct SlotInformation {
+ bool is_logical;

That's the reason why the READ_REPLICATION_SLOT command returns a slot type as
a string, but the intermediate representation on the client doesn't really
need to be concerned about this: it's the client after all and it will be very
easy to change it locally if we need to. Thinking about it, for our use case
we should really use is_physical instead of is_logical. But I guess all of
this is moot if we end up deciding not to support the logical case anymore ?

Best regards,

--
Ronan Dunklau

#28Michael Paquier
michael@paquier.xyz
In reply to: Bharath Rupireddy (#26)
Re: pg_receivewal starting position

On Mon, Sep 06, 2021 at 12:37:29PM +0530, Bharath Rupireddy wrote:

-1 for READ_PHYSICAL_REPLICATION_SLOT or failing on the server. What
happens if we have another slot type "PHYSIOLOGICAL" or "FOO" or "BAR"
some other? IMO, READ_REPLICATION_SLOT should just return info of all
slots. The clients know better how to deal with the slot type.
Although, we don't have a use case for logical slots with the
READ_REPLICATION_SLOT command, let's not change it.

Using READ_REPLICATION_SLOT as the command name is fine, and it could
be extended with more fields if necessary, implemented now with only
what we think is useful. Returning errors on cases that are still not
supported yet is fine, say for logical slots if we decide to reject
the case for now, and testrictions can always be lifted in the
future.
--
Michael

#29Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#28)
Re: pg_receivewal starting position

On Mon, Sep 06, 2021 at 04:17:28PM +0900, Michael Paquier wrote:

Using READ_REPLICATION_SLOT as the command name is fine, and it could
be extended with more fields if necessary, implemented now with only
what we think is useful. Returning errors on cases that are still not
supported yet is fine, say for logical slots if we decide to reject
the case for now, and testrictions can always be lifted in the
future.

And marked as RwF as this was three weeks ago. Please feel free to
register a new entry if this is being worked on.
--
Michael

#30Ronan Dunklau
ronan.dunklau@aiven.io
In reply to: Michael Paquier (#29)
3 attachment(s)
Re: pg_receivewal starting position

Hello,

Following recommendations, I stripped most of the features from the patch. For
now we support only physical replication slots, and only provide the two fields
of interest (restart_lsn, restart_tli) in addition to the slot type (fixed at
physical) to not paint ourselves in a corner.

I also removed the part about pg_basebackup since other fixes have been
proposed for that case.

Le vendredi 1 octobre 2021, 09:05:18 CEST Michael Paquier a écrit :

On Mon, Sep 06, 2021 at 04:17:28PM +0900, Michael Paquier wrote:

Using READ_REPLICATION_SLOT as the command name is fine, and it could
be extended with more fields if necessary, implemented now with only
what we think is useful. Returning errors on cases that are still not
supported yet is fine, say for logical slots if we decide to reject
the case for now, and testrictions can always be lifted in the
future.

And marked as RwF as this was three weeks ago. Please feel free to
register a new entry if this is being worked on.
--
Michael

--
Ronan Dunklau

Attachments:

v6-0001-Add-READ_REPLICATION_SLOT-command.patchtext/x-patch; charset=UTF-8; name=v6-0001-Add-READ_REPLICATION_SLOT-command.patchDownload
From 4147665664164eb597fdcc86637ec9c497c36660 Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Thu, 2 Sep 2021 16:25:25 +0900
Subject: [PATCH v6 1/3] Add READ_REPLICATION_SLOT command

---
 doc/src/sgml/protocol.sgml                  | 47 +++++++++++
 src/backend/replication/repl_gram.y         | 16 +++-
 src/backend/replication/repl_scanner.l      |  1 +
 src/backend/replication/walsender.c         | 87 +++++++++++++++++++++
 src/include/nodes/nodes.h                   |  1 +
 src/include/nodes/replnodes.h               | 10 +++
 src/test/recovery/t/001_stream_rep.pl       | 47 ++++++++++-
 src/test/recovery/t/006_logical_decoding.pl |  9 ++-
 src/tools/pgindent/typedefs.list            |  1 +
 9 files changed, 216 insertions(+), 3 deletions(-)

diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index b95cc88599..51a15cc3da 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2067,6 +2067,53 @@ The commands accepted in replication mode are:
     </listitem>
   </varlistentry>
 
+  <varlistentry>
+    <term><literal>READ_REPLICATION_SLOT</literal> <replaceable class="parameter">slot_name</replaceable>
+      <indexterm><primary>READ_REPLICATION_SLOT</primary></indexterm>
+    </term>
+    <listitem>
+     <para>
+      Read the information of a replication slot. Returns a tuple with
+      <literal>NULL</literal> values if the replication slot does not
+      exist. This command is currently only supported for <literal>physical</literal> slots.
+     </para>
+     <para>
+      In response to this command, the server will return a one-row result set,
+      containing the following fields:
+      <variablelist>
+       <varlistentry>
+        <term><literal>slot_type</literal> (<type>text</type>)</term>
+        <listitem>
+         <para>
+          The replication slot's type, either <literal>physical</literal> or
+          <literal>NULL</literal>.
+         </para>
+        </listitem>
+       </varlistentry>
+
+       <varlistentry>
+        <term><literal>restart_lsn</literal> (<type>text</type>)</term>
+        <listitem>
+         <para>
+          The replication slot's <literal>restart_lsn</literal>.
+         </para>
+        </listitem>
+       </varlistentry>
+
+       <varlistentry>
+        <term><literal>restart_tli</literal> (<type>int4</type>)</term>
+        <listitem>
+         <para>
+          The timeline ID for the <literal>restart_lsn</literal> position,
+          when following the current timeline history.
+         </para>
+        </listitem>
+       </varlistentry>
+      </variablelist>
+     </para>
+    </listitem>
+  </varlistentry>
+
   <varlistentry>
     <term><literal>START_REPLICATION</literal> [ <literal>SLOT</literal> <replaceable class="parameter">slot_name</replaceable> ] [ <literal>PHYSICAL</literal> ] <replaceable class="parameter">XXX/XXX</replaceable> [ <literal>TIMELINE</literal> <replaceable class="parameter">tli</replaceable> ]
      <indexterm><primary>START_REPLICATION</primary></indexterm>
diff --git a/src/backend/replication/repl_gram.y b/src/backend/replication/repl_gram.y
index 126380e2df..913a99da5a 100644
--- a/src/backend/replication/repl_gram.y
+++ b/src/backend/replication/repl_gram.y
@@ -64,6 +64,7 @@ static SQLCmd *make_sqlcmd(void);
 /* Keyword tokens. */
 %token K_BASE_BACKUP
 %token K_IDENTIFY_SYSTEM
+%token K_READ_REPLICATION_SLOT
 %token K_SHOW
 %token K_START_REPLICATION
 %token K_CREATE_REPLICATION_SLOT
@@ -94,7 +95,7 @@ static SQLCmd *make_sqlcmd(void);
 %type <node>	command
 %type <node>	base_backup start_replication start_logical_replication
 				create_replication_slot drop_replication_slot identify_system
-				timeline_history show sql_cmd
+				read_replication_slot timeline_history show sql_cmd
 %type <list>	base_backup_legacy_opt_list generic_option_list
 %type <defelt>	base_backup_legacy_opt generic_option
 %type <uintval>	opt_timeline
@@ -120,6 +121,7 @@ opt_semicolon:	';'
 
 command:
 			identify_system
+			| read_replication_slot
 			| base_backup
 			| start_replication
 			| start_logical_replication
@@ -140,6 +142,18 @@ identify_system:
 				}
 			;
 
+/*
+ * READ_REPLICATION_SLOT %s
+ */
+read_replication_slot:
+			K_READ_REPLICATION_SLOT var_name
+				{
+					ReadReplicationSlotCmd *n = makeNode(ReadReplicationSlotCmd);
+					n->slotname = $2;
+					$$ = (Node *) n;
+				}
+			;
+
 /*
  * SHOW setting
  */
diff --git a/src/backend/replication/repl_scanner.l b/src/backend/replication/repl_scanner.l
index c038a636c3..1b599c255e 100644
--- a/src/backend/replication/repl_scanner.l
+++ b/src/backend/replication/repl_scanner.l
@@ -85,6 +85,7 @@ identifier		{ident_start}{ident_cont}*
 BASE_BACKUP			{ return K_BASE_BACKUP; }
 FAST			{ return K_FAST; }
 IDENTIFY_SYSTEM		{ return K_IDENTIFY_SYSTEM; }
+READ_REPLICATION_SLOT	{ return K_READ_REPLICATION_SLOT; }
 SHOW		{ return K_SHOW; }
 LABEL			{ return K_LABEL; }
 NOWAIT			{ return K_NOWAIT; }
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index b811a5c0ef..329ff97ffc 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -232,6 +232,7 @@ static void XLogSendLogical(void);
 static void WalSndDone(WalSndSendDataCallback send_data);
 static XLogRecPtr GetStandbyFlushRecPtr(void);
 static void IdentifySystem(void);
+static void ReadReplicationSlot(ReadReplicationSlotCmd *cmd);
 static void CreateReplicationSlot(CreateReplicationSlotCmd *cmd);
 static void DropReplicationSlot(DropReplicationSlotCmd *cmd);
 static void StartReplication(StartReplicationCmd *cmd);
@@ -457,6 +458,85 @@ IdentifySystem(void)
 	end_tup_output(tstate);
 }
 
+/* Handle READ_REPLICATION_SLOT command */
+static void
+ReadReplicationSlot(ReadReplicationSlotCmd *cmd)
+{
+#define READ_REPLICATION_SLOT_COLS 3
+	ReplicationSlot *slot;
+	DestReceiver *dest;
+	TupOutputState *tstate;
+	TupleDesc	tupdesc;
+	Datum		values[READ_REPLICATION_SLOT_COLS];
+	bool		nulls[READ_REPLICATION_SLOT_COLS];
+
+	tupdesc = CreateTemplateTupleDesc(READ_REPLICATION_SLOT_COLS);
+	TupleDescInitBuiltinEntry(tupdesc, (AttrNumber) 1, "slot_type",
+							  TEXTOID, -1, 0);
+	TupleDescInitBuiltinEntry(tupdesc, (AttrNumber) 2, "restart_lsn",
+							  TEXTOID, -1, 0);
+	TupleDescInitBuiltinEntry(tupdesc, (AttrNumber) 3, "restart_tli",
+							  INT4OID, -1, 0);
+
+	MemSet(nulls, true, READ_REPLICATION_SLOT_COLS * sizeof(bool));
+
+	LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
+	slot = SearchNamedReplicationSlot(cmd->slotname, false);
+	if (slot == NULL || !slot->in_use)
+	{
+		LWLockRelease(ReplicationSlotControlLock);
+	}
+	else
+	{
+		List	   *timeline_history = NIL;
+		ReplicationSlot slot_contents;
+		int			i = 0;
+		char		xloc[MAXFNAMELEN];
+		TimeLineID	slots_position_timeline;
+
+		/* Copy slot contents while holding spinlock */
+		SpinLockAcquire(&slot->mutex);
+		slot_contents = *slot;
+		SpinLockRelease(&slot->mutex);
+		LWLockRelease(ReplicationSlotControlLock);
+
+		if (OidIsValid(slot_contents.data.database))
+			elog(ERROR, "READ_REPLICATION_SLOT is only supported for physical slots");
+		values[i] = CStringGetTextDatum("physical");
+		nulls[i] = false;
+		i++;
+		if (!XLogRecPtrIsInvalid(slot_contents.data.restart_lsn))
+		{
+			snprintf(xloc, sizeof(xloc), "%X/%X",
+					 LSN_FORMAT_ARGS(slot_contents.data.restart_lsn));
+			values[i] = CStringGetTextDatum(xloc);
+			nulls[i] = false;
+		}
+		i++;
+
+		/*
+		 * Now get the timeline this wal was produced on, to get to the
+		 * current timeline
+		 */
+		if (!XLogRecPtrIsInvalid(slot_contents.data.restart_lsn))
+		{
+			timeline_history = readTimeLineHistory(ThisTimeLineID);
+			slots_position_timeline = tliOfPointInHistory(slot_contents.data.restart_lsn,
+														  timeline_history);
+			values[i] = Int32GetDatum(slots_position_timeline);
+			nulls[i] = false;
+		}
+		i++;
+
+		Assert(i == READ_REPLICATION_SLOT_COLS);
+	}
+
+	dest = CreateDestReceiver(DestRemoteSimple);
+	tstate = begin_tup_output_tupdesc(dest, tupdesc, &TTSOpsVirtual);
+	do_tup_output(tstate, values, nulls);
+	end_tup_output(tstate);
+}
+
 
 /*
  * Handle TIMELINE_HISTORY command.
@@ -1622,6 +1702,13 @@ exec_replication_command(const char *cmd_string)
 			EndReplicationCommand(cmdtag);
 			break;
 
+		case T_ReadReplicationSlotCmd:
+			cmdtag = "READ_REPLICATION_SLOT";
+			set_ps_display(cmdtag);
+			ReadReplicationSlot((ReadReplicationSlotCmd *) cmd_node);
+			EndReplicationCommand(cmdtag);
+			break;
+
 		case T_BaseBackupCmd:
 			cmdtag = "BASE_BACKUP";
 			set_ps_display(cmdtag);
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index e0057daa06..6201940637 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -493,6 +493,7 @@ typedef enum NodeTag
 	 * TAGS FOR REPLICATION GRAMMAR PARSE NODES (replnodes.h)
 	 */
 	T_IdentifySystemCmd,
+	T_ReadReplicationSlotCmd,
 	T_BaseBackupCmd,
 	T_CreateReplicationSlotCmd,
 	T_DropReplicationSlotCmd,
diff --git a/src/include/nodes/replnodes.h b/src/include/nodes/replnodes.h
index faa3a251f2..46384ea074 100644
--- a/src/include/nodes/replnodes.h
+++ b/src/include/nodes/replnodes.h
@@ -33,6 +33,16 @@ typedef struct IdentifySystemCmd
 	NodeTag		type;
 } IdentifySystemCmd;
 
+/* ----------------------
+ *		READ_REPLICATION_SLOT command
+ * ----------------------
+ */
+typedef struct ReadReplicationSlotCmd
+{
+	NodeTag		type;
+	char	   *slotname;
+} ReadReplicationSlotCmd;
+
 
 /* ----------------------
  *		BASE_BACKUP command
diff --git a/src/test/recovery/t/001_stream_rep.pl b/src/test/recovery/t/001_stream_rep.pl
index 9916a36012..7d4f01d909 100644
--- a/src/test/recovery/t/001_stream_rep.pl
+++ b/src/test/recovery/t/001_stream_rep.pl
@@ -6,7 +6,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 49;
+use Test::More tests => 55;
 
 # Initialize primary node
 my $node_primary = PostgresNode->new('primary');
@@ -254,6 +254,51 @@ ok( $ret == 0,
 	"SHOW with superuser-settable parameter, replication role and logical replication"
 );
 
+note "testing READ_REPLICATION_SLOT command";
+
+my $slotname = 'test_read_replication_slot_physical';
+
+($ret, $stdout, $stderr) = $node_primary->psql(
+	'postgres',
+	'READ_REPLICATION_SLOT non_existent_slot;',
+	extra_params => [ '-d', $connstr_rep ]);
+ok( $ret == 0,
+	"READ_REPLICATION_SLOT does not produce an error with non existent slot");
+ok($stdout eq '||',
+	"READ_REPLICATION_SLOT returns NULL values if slot does not exist");
+
+($ret, $stdout, $stderr) = $node_primary->psql(
+	'postgres',
+	"CREATE_REPLICATION_SLOT $slotname PHYSICAL RESERVE_WAL;",
+	extra_params => [ '-d', $connstr_rep ],
+	0,
+	'physical slot created on primary');
+
+($ret, $stdout, $stderr) = $node_primary->psql(
+	'postgres',
+	"READ_REPLICATION_SLOT $slotname;",
+	extra_params => [ '-d', $connstr_rep ]);
+ok($ret == 0,
+	"READ_REPLICATION_SLOT does not produce an error with existing slot");
+ok($stdout =~ 'physical\|[^|]*\|1',
+	"READ_REPLICATION_SLOT returns tuple corresponding to the slot");
+
+$node_primary->psql(
+	'postgres',
+	"DROP_REPLICATION_SLOT $slotname;",
+	extra_params => [ '-d', $connstr_rep ],
+	0,
+	'physical slot dropped on primary');
+
+($ret, $stdout, $stderr) = $node_primary->psql(
+	'postgres',
+	"READ_REPLICATION_SLOT $slotname;",
+	extra_params => [ '-d', $connstr_rep ]);
+ok($ret == 0,
+	"READ_REPLICATION_SLOT does not produce an error with dropped slot");
+ok($stdout eq '||',
+	"READ_REPLICATION_SLOT returns NULL values if slot has been dropped");
+
 note "switching to physical replication slot";
 
 # Switch to using a physical replication slot. We can do this without a new
diff --git a/src/test/recovery/t/006_logical_decoding.pl b/src/test/recovery/t/006_logical_decoding.pl
index cc116062c2..802375519b 100644
--- a/src/test/recovery/t/006_logical_decoding.pl
+++ b/src/test/recovery/t/006_logical_decoding.pl
@@ -10,7 +10,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 14;
+use Test::More tests => 15;
 use Config;
 
 # Initialize primary node
@@ -39,6 +39,13 @@ ok( $stderr =~
 	  m/replication slot "test_slot" was not created in this database/,
 	"Logical decoding correctly fails to start");
 
+($result, $stdout, $stderr) = $node_primary->psql(
+	'template1',
+	qq[READ_REPLICATION_SLOT test_slot;],
+	replication => 'database');
+ok($stderr=~ 'READ_REPLICATION_SLOT is only supported for physical slots',
+	'Logical replication slot is not supported');
+
 # Check case of walsender not using a database connection.  Logical
 # decoding should not be allowed.
 ($result, $stdout, $stderr) = $node_primary->psql(
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index cb5b5ec74c..a8e4a2afd6 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2128,6 +2128,7 @@ ReadBufferMode
 ReadBytePtrType
 ReadExtraTocPtrType
 ReadFunc
+ReadReplicationSlotCmd
 ReassignOwnedStmt
 RecheckForeignScan_function
 RecordCacheEntry
-- 
2.33.0

v6-0003-Add-documentation-for-pg_receivewal.patchtext/x-patch; charset=UTF-8; name=v6-0003-Add-documentation-for-pg_receivewal.patchDownload
From 68ed56a86b0da17cadfddd939c8e2430b59c4f53 Mon Sep 17 00:00:00 2001
From: Ronan Dunklau <ronan.dunklau@aiven.io>
Date: Mon, 6 Sep 2021 11:08:54 +0200
Subject: [PATCH v6 3/3] Add documentation for pg_receivewal

---
 doc/src/sgml/ref/pg_receivewal.sgml | 33 +++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)

diff --git a/doc/src/sgml/ref/pg_receivewal.sgml b/doc/src/sgml/ref/pg_receivewal.sgml
index 45b544cf49..46e141bc21 100644
--- a/doc/src/sgml/ref/pg_receivewal.sgml
+++ b/doc/src/sgml/ref/pg_receivewal.sgml
@@ -37,6 +37,34 @@ PostgreSQL documentation
    <xref linkend="continuous-archiving"/>).
   </para>
 
+  <para>
+    When <application>pg_receivewal</application> is launched, it tries to determine the
+    starting LSN from the following sources:
+    <orderedlist>
+      <listitem>
+      <para>
+      The target directory is scanned to determine what is the next file we
+      should archive.
+      </para>
+      </listitem>
+
+      <listitem>
+      <para>
+      If nothing is found and a slot is specified, use the
+      <command>READ_REPLICATION_SLOT</command>
+      command.
+      </para>
+      </listitem>
+
+      <listitem>
+      <para>
+      If we still don't have a valid start position, use the last flush
+      location as reported by the server.
+      </para>
+      </listitem>
+    </orderedlist>
+  </para>
+
   <para>
    <application>pg_receivewal</application> streams the write-ahead
    log in real time as it's being generated on the server, and does not wait
@@ -198,6 +226,11 @@ PostgreSQL documentation
          satisfactorily.  The option <literal>--synchronous</literal> (see
          below) must be specified in addition to make this work correctly.
         </para>
+
+        <para>
+        The slot's <literal>restart_lsn</literal> can also be used as a starting
+        point if the target directory is empty.
+        </para>
       </listitem>
      </varlistentry>
 
-- 
2.33.0

v6-0002-Use-READ_REPLICATION_SLOT-command-in-pg_receivewa.patchtext/x-patch; charset=UTF-8; name=v6-0002-Use-READ_REPLICATION_SLOT-command-in-pg_receivewa.patchDownload
From b4e3b1f4341321c3c5d013e3ff15ff4f7dab58d7 Mon Sep 17 00:00:00 2001
From: Ronan Dunklau <ronan.dunklau@aiven.io>
Date: Wed, 1 Sep 2021 15:52:32 +0200
Subject: [PATCH v6 2/3] Use READ_REPLICATION_SLOT command in pg_receivewal.

Prior to this patch, when running pg_receivewal, the start LSN is determined by looking at
the WAL files currently stored on disk, then using the current flush lsn
from the server.

If for some reason the WAL files from pg_receivewal were moved, we want
to restart where we left at, which is the replication slot's restart_lsn
instead of skipping right to the current flush location.

To keep compatibility with prior server versions, we only attempt it if
the version is < 15.
---
 src/bin/pg_basebackup/pg_receivewal.c        | 20 +++++-
 src/bin/pg_basebackup/streamutil.c           | 73 ++++++++++++++++++++
 src/bin/pg_basebackup/streamutil.h           |  1 +
 src/bin/pg_basebackup/t/020_pg_receivewal.pl | 51 +++++++++++++-
 4 files changed, 141 insertions(+), 4 deletions(-)

diff --git a/src/bin/pg_basebackup/pg_receivewal.c b/src/bin/pg_basebackup/pg_receivewal.c
index d5140a79fe..1128389839 100644
--- a/src/bin/pg_basebackup/pg_receivewal.c
+++ b/src/bin/pg_basebackup/pg_receivewal.c
@@ -409,10 +409,26 @@ StreamLog(void)
 	stream.startpos = FindStreamingStart(&stream.timeline);
 	if (stream.startpos == InvalidXLogRecPtr)
 	{
-		stream.startpos = serverpos;
-		stream.timeline = servertli;
+		/* Try to get it from the slot if any, and the server supports it */
+		if (replication_slot != NULL)
+		{
+			/* Error is logged by GetSlotInformation, so just return. */
+			if (!GetSlotInformation(conn, replication_slot, &stream.startpos, &stream.timeline))
+				return;
+		}
+
+		/*
+		 * If it is still unknown, use the current flush value from the server
+		 */
+		if (stream.startpos == InvalidXLogRecPtr)
+		{
+			stream.startpos = serverpos;
+			stream.timeline = servertli;
+		}
 	}
 
+	Assert(stream.startpos != InvalidXLogRecPtr);
+
 	/*
 	 * Always start streaming at the beginning of a segment
 	 */
diff --git a/src/bin/pg_basebackup/streamutil.c b/src/bin/pg_basebackup/streamutil.c
index a9bc1ce214..807ec5d022 100644
--- a/src/bin/pg_basebackup/streamutil.c
+++ b/src/bin/pg_basebackup/streamutil.c
@@ -479,6 +479,79 @@ RunIdentifySystem(PGconn *conn, char **sysid, TimeLineID *starttli,
 	return true;
 }
 
+
+/*
+ * Returns wether a replication slot exists through a given connection,
+ * and fills in the slot_info with the results if passed by the caller.
+ */
+bool
+GetSlotInformation(PGconn *conn, const char *slot_name, XLogRecPtr *restart_lsn, TimeLineID* restart_tli)
+{
+	PGresult   *res;
+	PQExpBuffer query;
+
+	Assert(slot_name != NULL);
+
+	if (PQserverVersion(conn) < 150000)
+		return false;
+
+
+	query = createPQExpBuffer();
+	appendPQExpBuffer(query, "READ_REPLICATION_SLOT %s", slot_name);
+	res = PQexec(conn, query->data);
+	destroyPQExpBuffer(query);
+	/* The commpand should always return precisely one tuple */
+	if (PQresultStatus(res) != PGRES_TUPLES_OK || PQntuples(res) != 1)
+	{
+		pg_log_error("could not read replication slot : %s",
+					 PQerrorMessage(conn));
+		PQclear(res);
+		return false;
+	}
+
+	/*
+	 * When the slot doesn't exist, the command returns an all-null tuple. The
+	 * first column (slot_type) will only be null if the slot doesn't exists.
+	 */
+	if (PQgetisnull(res, 0, 0))
+	{
+		PQclear(res);
+		pg_log_error("replication slot \"%s\" does not exist", slot_name);
+		return false;
+	}
+	if (PQntuples(res) != 1 || PQnfields(res) < 3)
+	{
+		pg_log_error("could not fetch replication slot: got %d rows and %d fields, expected %d rows and %d or more fields",
+					 PQntuples(res), PQnfields(res), 1, 3);
+		PQclear(res);
+		return false;
+	}
+
+	/* Restart LSN */
+	if (PQgetisnull(res, 0, 1))
+		*restart_lsn = InvalidXLogRecPtr;
+	else
+	{
+		uint32		hi,
+					lo;
+
+		if (sscanf(PQgetvalue(res, 0, 1), "%X/%X", &hi, &lo) != 2)
+		{
+			pg_log_error("could not parse slot's restart_lsn \"%s\"",
+						 PQgetvalue(res, 0, 1));
+			PQclear(res);
+			return false;
+		}
+		*restart_lsn = ((uint64) hi) << 32 | lo;
+	}
+	if (PQgetisnull(res, 0, 2))
+		*restart_tli = 0;
+	else
+		*restart_tli = atoi(PQgetvalue(res, 0, 2));
+	PQclear(res);
+	return true;
+}
+
 /*
  * Create a replication slot for the given connection. This function
  * returns true in case of success.
diff --git a/src/bin/pg_basebackup/streamutil.h b/src/bin/pg_basebackup/streamutil.h
index 65135c79e0..907457b8ae 100644
--- a/src/bin/pg_basebackup/streamutil.h
+++ b/src/bin/pg_basebackup/streamutil.h
@@ -52,6 +52,7 @@ extern void AppendIntegerCommandOption(PQExpBuffer buf,
 									   bool use_new_option_syntax,
 									   char *option_name, int32 option_value);
 
+extern bool GetSlotInformation(PGconn *conn, const char *slot_name, XLogRecPtr *restart_lsn, TimeLineID *restart_tli);
 extern bool RetrieveWalSegSize(PGconn *conn);
 extern TimestampTz feGetCurrentTimestamp(void);
 extern void feTimestampDifference(TimestampTz start_time, TimestampTz stop_time,
diff --git a/src/bin/pg_basebackup/t/020_pg_receivewal.pl b/src/bin/pg_basebackup/t/020_pg_receivewal.pl
index 0b33d73900..e79fd9d7f0 100644
--- a/src/bin/pg_basebackup/t/020_pg_receivewal.pl
+++ b/src/bin/pg_basebackup/t/020_pg_receivewal.pl
@@ -5,7 +5,7 @@ use strict;
 use warnings;
 use TestLib;
 use PostgresNode;
-use Test::More tests => 27;
+use Test::More tests => 32;
 
 program_help_ok('pg_receivewal');
 program_version_ok('pg_receivewal');
@@ -49,7 +49,7 @@ is($primary->slot($slot_name)->{'slot_type'},
 
 # Generate some WAL.  Use --synchronous at the same time to add more
 # code coverage.  Switch to the next segment first so that subsequent
-# restarts of pg_receivewal will see this segment as full..
+# restarts of pg_receivewal will see this segment as full
 $primary->psql('postgres', 'CREATE TABLE test_table(x integer);');
 $primary->psql('postgres', 'SELECT pg_switch_wal();');
 my $nextlsn =
@@ -146,6 +146,53 @@ $primary->command_ok(
 $partial_wals[0] =~ s/(\.gz)?.partial//;
 ok(-e $partial_wals[0], "check that previously partial WAL is now complete");
 
+# Verify that if we use a replication slot, we resume where we left even in the
+# absence of WALs
+
+# Setup the slot, and connect to it a first time
+$primary->run_log(
+	[ 'pg_receivewal', '--slot', $slot_name, '--create-slot' ],
+	'creating a replication slot');
+$primary->psql('postgres',
+	'INSERT INTO test_table VALUES (generate_series(1,100));');
+$primary->psql('postgres', 'SELECT pg_switch_wal();');
+$nextlsn =
+  $primary->safe_psql('postgres', 'SELECT pg_current_wal_insert_lsn();');
+chomp($nextlsn);
+$primary->run_log(
+	[ 'pg_receivewal', '-D', $stream_dir, '--slot', $slot_name, '--verbose', '--endpos', $nextlsn ],
+	"streaming some WAL");
+
+# Get the slot restart_lsn and make sure we retrieve the associated WAL file
+# even after deletion of the previous stored WAL files.
+$slot = $primary->slot($slot_name);
+my $restart_lsn = $slot->{'restart_lsn'};
+# Add one so that the restart_lsn doesn't correspond to the previous file.
+$restart_lsn =~ s/.$/1/;
+my $walfile_to_be_archived = $primary->safe_psql('postgres',
+	"SELECT pg_walfile_name('$restart_lsn');");
+isnt($restart_lsn, '', 'restart LSN of new slot is not null');
+
+unlink glob "'${stream_dir}/*'";
+
+$primary->psql('postgres',
+	'INSERT INTO test_table VALUES (generate_series(1,100));');
+$primary->psql('postgres', 'SELECT pg_switch_wal();');
+$nextlsn =
+  $primary->safe_psql('postgres', 'SELECT pg_current_wal_insert_lsn();');
+chomp($nextlsn);
+# First verify what happens if we try with a non-existing slot.
+$primary->command_fails_like(
+  [ 'pg_receivewal', '-D', $stream_dir, '--slot', 'nonexistentslot', '-n', '--verbose', '--endpos', $nextlsn ],
+  qr/pg_receivewal: error: replication slot "nonexistentslot" does not exist/,
+  'pg_receivewal fails with a non-existing slot');
+$primary->command_ok(
+	[ 'pg_receivewal', '-D', $stream_dir, '--slot', $slot_name, '--verbose', '--endpos', $nextlsn ],
+	"Stream some wal starting from the slot's restart_lsn");
+$slot = $primary->slot($slot_name);
+my @walfiles = glob "${stream_dir}/*";
+ok(-e "$stream_dir/$walfile_to_be_archived", "WAL from the slot's restart_lsn has been archived");
+
 # Permissions on WAL files should be default
 SKIP:
 {
-- 
2.33.0

#31Michael Paquier
michael@paquier.xyz
In reply to: Ronan Dunklau (#30)
Re: pg_receivewal starting position

On Tue, Oct 19, 2021 at 05:32:55PM +0200, Ronan Dunklau wrote:

Following recommendations, I stripped most of the features from the patch. For
now we support only physical replication slots, and only provide the two fields
of interest (restart_lsn, restart_tli) in addition to the slot type (fixed at
physical) to not paint ourselves in a corner.

I also removed the part about pg_basebackup since other fixes have been
proposed for that case.

Patch 0001 looks rather clean. I have a couple of comments.

+       if (OidIsValid(slot_contents.data.database))
+           elog(ERROR, "READ_REPLICATION_SLOT is only supported for physical slots");

elog() can only be used for internal errors. Errors that can be
triggered by a user should use ereport() instead.

+ok($stdout eq '||',
+   "READ_REPLICATION_SLOT returns NULL values if slot does not exist");
[...]
+ok($stdout =~ 'physical\|[^|]*\|1',
+   "READ_REPLICATION_SLOT returns tuple corresponding to the slot");
Isn't result pattern matching something we usually test with like()?
+($ret, $stdout, $stderr) = $node_primary->psql(
+   'postgres',
+   "READ_REPLICATION_SLOT $slotname;",
+   extra_params => [ '-d', $connstr_rep ]);
No need for extra_params in this test.  You can just pass down
"replication => 1" instead, no?
--- a/src/test/recovery/t/006_logical_decoding.pl
+++ b/src/test/recovery/t/006_logical_decoding.pl
[...]
+ok($stderr=~ 'READ_REPLICATION_SLOT is only supported for physical slots',
+   'Logical replication slot is not supported');
This one should use like().
+        <para>
+        The slot's <literal>restart_lsn</literal> can also beused as a starting
+        point if the target directory is empty.
+        </para>
I am not sure that there is a need for this addition as the same thing
is said when describing the lookup ordering.
+      If nothing is found and a slot is specified, use the
+      <command>READ_REPLICATION_SLOT</command>
+      command.
It may be clearer to say that the position is retrieved from the
command.
+bool
+GetSlotInformation(PGconn *conn, const char *slot_name, XLogRecPtr
*restart_lsn, TimeLineID* restart_tli)
+{
Could you extend that so as we still run the command but don't crash
if the caller specifies NULL for any of the result fields?  This would
be handy.
+   if (PQgetisnull(res, 0, 0))
+   {
+       PQclear(res);
+       pg_log_error("replication slot \"%s\" does not exist",
slot_name);
+       return false;
+   }
+   if (PQntuples(res) != 1 || PQnfields(res) < 3)
+   {
+       pg_log_error("could not fetch replication slot: got %d rows
and %d fields, expected %d rows and %d or more fields",
+                    PQntuples(res), PQnfields(res), 1, 3);
+       PQclear(res);
+       return false;
+   }
Wouldn't it be better to reverse the order of these two checks?

I don't mind the addition of the slot type being part of the result of
READ_REPLICATION_SLOT even if it is not mandatory (?), but at least
GetSlotInformation() should check after it.

+# Setup the slot, and connect to it a first time
+$primary->run_log(
+   [ 'pg_receivewal', '--slot', $slot_name, '--create-slot' ],
+   'creating a replication slot');
+$primary->psql('postgres',
+   'INSERT INTO test_table VALUES (generate_series(1,100));');
+$primary->psql('postgres', 'SELECT pg_switch_wal();');
+$nextlsn =
+  $primary->safe_psql('postgres', 'SELECT pg_current_wal_insert_lsn();');
+chomp($nextlsn);
Wouldn't it be simpler to use CREATE_REPLICATION_SLOT with RESERVE_WAL
here, rather than going through pg_receivewal?  It seems to me that
this would be cheaper without really impacting the coverage.
--
Michael
#32Ronan Dunklau
ronan.dunklau@aiven.io
In reply to: Michael Paquier (#31)
3 attachment(s)
Re: pg_receivewal starting position

Le mercredi 20 octobre 2021, 07:13:15 CEST Michael Paquier a écrit :

On Tue, Oct 19, 2021 at 05:32:55PM +0200, Ronan Dunklau wrote:

Following recommendations, I stripped most of the features from the patch.
For now we support only physical replication slots, and only provide the
two fields of interest (restart_lsn, restart_tli) in addition to the slot
type (fixed at physical) to not paint ourselves in a corner.

I also removed the part about pg_basebackup since other fixes have been
proposed for that case.

Patch 0001 looks rather clean. I have a couple of comments.

Thank you for the quick review !

+       if (OidIsValid(slot_contents.data.database))
+           elog(ERROR, "READ_REPLICATION_SLOT is only supported for
physical slots");

elog() can only be used for internal errors. Errors that can be
triggered by a user should use ereport() instead.

Ok.

+ok($stdout eq '||',
+   "READ_REPLICATION_SLOT returns NULL values if slot does not exist");
[...]
+ok($stdout =~ 'physical\|[^|]*\|1',
+   "READ_REPLICATION_SLOT returns tuple corresponding to the slot");
Isn't result pattern matching something we usually test with like()?

Ok.

+($ret, $stdout, $stderr) = $node_primary->psql(
+   'postgres',
+   "READ_REPLICATION_SLOT $slotname;",
+   extra_params => [ '-d', $connstr_rep ]);
No need for extra_params in this test.  You can just pass down
"replication => 1" instead, no?

In that test file, every replication connection is obtained by using
connstr_rep so I thought it would be best to use the same thing.

--- a/src/test/recovery/t/006_logical_decoding.pl
+++ b/src/test/recovery/t/006_logical_decoding.pl
[...]
+ok($stderr=~ 'READ_REPLICATION_SLOT is only supported for physical slots',
+   'Logical replication slot is not supported');
This one should use like().

Ok.

+        <para>
+        The slot's <literal>restart_lsn</literal> can also beused as a
starting +        point if the target directory is empty.
+        </para>
I am not sure that there is a need for this addition as the same thing
is said when describing the lookup ordering.

Ok, removed.

+      If nothing is found and a slot is specified, use the
+      <command>READ_REPLICATION_SLOT</command>
+      command.
It may be clearer to say that the position is retrieved from the
command.

Ok, done. The doc also uses the active voice here now.

+bool
+GetSlotInformation(PGconn *conn, const char *slot_name, XLogRecPtr
*restart_lsn, TimeLineID* restart_tli)
+{
Could you extend that so as we still run the command but don't crash
if the caller specifies NULL for any of the result fields?  This would
be handy.

Done.

+   if (PQgetisnull(res, 0, 0))
+   {
+       PQclear(res);
+       pg_log_error("replication slot \"%s\" does not exist",
slot_name);
+       return false;
+   }
+   if (PQntuples(res) != 1 || PQnfields(res) < 3)
+   {
+       pg_log_error("could not fetch replication slot: got %d rows
and %d fields, expected %d rows and %d or more fields",
+                    PQntuples(res), PQnfields(res), 1, 3);
+       PQclear(res);
+       return false;
+   }
Wouldn't it be better to reverse the order of these two checks?

Yes it is, and the PQntuples condition should be removed from the first error
test.

I don't mind the addition of the slot type being part of the result of
READ_REPLICATION_SLOT even if it is not mandatory (?), but at least
GetSlotInformation() should check after it.

Ok.

+# Setup the slot, and connect to it a first time
+$primary->run_log(
+   [ 'pg_receivewal', '--slot', $slot_name, '--create-slot' ],
+   'creating a replication slot');
+$primary->psql('postgres',
+   'INSERT INTO test_table VALUES (generate_series(1,100));');
+$primary->psql('postgres', 'SELECT pg_switch_wal();');
+$nextlsn =
+  $primary->safe_psql('postgres', 'SELECT pg_current_wal_insert_lsn();');
+chomp($nextlsn);
Wouldn't it be simpler to use CREATE_REPLICATION_SLOT with RESERVE_WAL
here, rather than going through pg_receivewal?  It seems to me that
this would be cheaper without really impacting the coverage.

You're right, we can skip two invocations of pg_receivewal like this (for the
slot creation + for starting the slot a first time).

--
Ronan Dunklau

Attachments:

v7-0003-Add-documentation-for-pg_receivewal.patchtext/x-patch; charset=UTF-8; name=v7-0003-Add-documentation-for-pg_receivewal.patchDownload
From 845b2fdd33318f0d7528ef0ecbb69e4aecf43c46 Mon Sep 17 00:00:00 2001
From: Ronan Dunklau <ronan.dunklau@aiven.io>
Date: Mon, 6 Sep 2021 11:08:54 +0200
Subject: [PATCH v7 3/3] Add documentation for pg_receivewal

---
 doc/src/sgml/ref/pg_receivewal.sgml | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/doc/src/sgml/ref/pg_receivewal.sgml b/doc/src/sgml/ref/pg_receivewal.sgml
index 45b544cf49..5fb2b61d34 100644
--- a/doc/src/sgml/ref/pg_receivewal.sgml
+++ b/doc/src/sgml/ref/pg_receivewal.sgml
@@ -37,6 +37,32 @@ PostgreSQL documentation
    <xref linkend="continuous-archiving"/>).
   </para>
 
+  <para>
+    When <application>pg_receivewal</application> is launched, it tries to determine the
+    starting LSN from the following sources:
+    <orderedlist>
+      <listitem>
+      <para>
+      It scans the target directory to determine the next file it should archive.
+      </para>
+      </listitem>
+
+      <listitem>
+      <para>
+      If it doesn't find that file, and a slot was specified, it uses the
+      <command>READ_REPLICATION_SLOT</command> command to retrieve the slot's <literal>restart_lsn</literal>
+      </para>
+      </listitem>
+
+      <listitem>
+      <para>
+      If it still don't find a valid start position via <command>READ_REPLICATION_SLOT</command>,
+      it uses the last flush location as reported by the server.
+      </para>
+      </listitem>
+    </orderedlist>
+  </para>
+
   <para>
    <application>pg_receivewal</application> streams the write-ahead
    log in real time as it's being generated on the server, and does not wait
-- 
2.33.0

v7-0001-Add-READ_REPLICATION_SLOT-command.patchtext/x-patch; charset=UTF-8; name=v7-0001-Add-READ_REPLICATION_SLOT-command.patchDownload
From 5a47f17a17594cc171f744ce383ba820d44b6446 Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Thu, 2 Sep 2021 16:25:25 +0900
Subject: [PATCH v7 1/3] Add READ_REPLICATION_SLOT command

---
 doc/src/sgml/protocol.sgml                  | 47 +++++++++++
 src/backend/replication/repl_gram.y         | 16 +++-
 src/backend/replication/repl_scanner.l      |  1 +
 src/backend/replication/walsender.c         | 89 +++++++++++++++++++++
 src/include/nodes/nodes.h                   |  1 +
 src/include/nodes/replnodes.h               | 10 +++
 src/test/recovery/t/001_stream_rep.pl       | 47 ++++++++++-
 src/test/recovery/t/006_logical_decoding.pl |  9 ++-
 src/tools/pgindent/typedefs.list            |  1 +
 9 files changed, 218 insertions(+), 3 deletions(-)

diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index b95cc88599..51a15cc3da 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2067,6 +2067,53 @@ The commands accepted in replication mode are:
     </listitem>
   </varlistentry>
 
+  <varlistentry>
+    <term><literal>READ_REPLICATION_SLOT</literal> <replaceable class="parameter">slot_name</replaceable>
+      <indexterm><primary>READ_REPLICATION_SLOT</primary></indexterm>
+    </term>
+    <listitem>
+     <para>
+      Read the information of a replication slot. Returns a tuple with
+      <literal>NULL</literal> values if the replication slot does not
+      exist. This command is currently only supported for <literal>physical</literal> slots.
+     </para>
+     <para>
+      In response to this command, the server will return a one-row result set,
+      containing the following fields:
+      <variablelist>
+       <varlistentry>
+        <term><literal>slot_type</literal> (<type>text</type>)</term>
+        <listitem>
+         <para>
+          The replication slot's type, either <literal>physical</literal> or
+          <literal>NULL</literal>.
+         </para>
+        </listitem>
+       </varlistentry>
+
+       <varlistentry>
+        <term><literal>restart_lsn</literal> (<type>text</type>)</term>
+        <listitem>
+         <para>
+          The replication slot's <literal>restart_lsn</literal>.
+         </para>
+        </listitem>
+       </varlistentry>
+
+       <varlistentry>
+        <term><literal>restart_tli</literal> (<type>int4</type>)</term>
+        <listitem>
+         <para>
+          The timeline ID for the <literal>restart_lsn</literal> position,
+          when following the current timeline history.
+         </para>
+        </listitem>
+       </varlistentry>
+      </variablelist>
+     </para>
+    </listitem>
+  </varlistentry>
+
   <varlistentry>
     <term><literal>START_REPLICATION</literal> [ <literal>SLOT</literal> <replaceable class="parameter">slot_name</replaceable> ] [ <literal>PHYSICAL</literal> ] <replaceable class="parameter">XXX/XXX</replaceable> [ <literal>TIMELINE</literal> <replaceable class="parameter">tli</replaceable> ]
      <indexterm><primary>START_REPLICATION</primary></indexterm>
diff --git a/src/backend/replication/repl_gram.y b/src/backend/replication/repl_gram.y
index 126380e2df..913a99da5a 100644
--- a/src/backend/replication/repl_gram.y
+++ b/src/backend/replication/repl_gram.y
@@ -64,6 +64,7 @@ static SQLCmd *make_sqlcmd(void);
 /* Keyword tokens. */
 %token K_BASE_BACKUP
 %token K_IDENTIFY_SYSTEM
+%token K_READ_REPLICATION_SLOT
 %token K_SHOW
 %token K_START_REPLICATION
 %token K_CREATE_REPLICATION_SLOT
@@ -94,7 +95,7 @@ static SQLCmd *make_sqlcmd(void);
 %type <node>	command
 %type <node>	base_backup start_replication start_logical_replication
 				create_replication_slot drop_replication_slot identify_system
-				timeline_history show sql_cmd
+				read_replication_slot timeline_history show sql_cmd
 %type <list>	base_backup_legacy_opt_list generic_option_list
 %type <defelt>	base_backup_legacy_opt generic_option
 %type <uintval>	opt_timeline
@@ -120,6 +121,7 @@ opt_semicolon:	';'
 
 command:
 			identify_system
+			| read_replication_slot
 			| base_backup
 			| start_replication
 			| start_logical_replication
@@ -140,6 +142,18 @@ identify_system:
 				}
 			;
 
+/*
+ * READ_REPLICATION_SLOT %s
+ */
+read_replication_slot:
+			K_READ_REPLICATION_SLOT var_name
+				{
+					ReadReplicationSlotCmd *n = makeNode(ReadReplicationSlotCmd);
+					n->slotname = $2;
+					$$ = (Node *) n;
+				}
+			;
+
 /*
  * SHOW setting
  */
diff --git a/src/backend/replication/repl_scanner.l b/src/backend/replication/repl_scanner.l
index c038a636c3..1b599c255e 100644
--- a/src/backend/replication/repl_scanner.l
+++ b/src/backend/replication/repl_scanner.l
@@ -85,6 +85,7 @@ identifier		{ident_start}{ident_cont}*
 BASE_BACKUP			{ return K_BASE_BACKUP; }
 FAST			{ return K_FAST; }
 IDENTIFY_SYSTEM		{ return K_IDENTIFY_SYSTEM; }
+READ_REPLICATION_SLOT	{ return K_READ_REPLICATION_SLOT; }
 SHOW		{ return K_SHOW; }
 LABEL			{ return K_LABEL; }
 NOWAIT			{ return K_NOWAIT; }
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index b811a5c0ef..5d68a7e66a 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -232,6 +232,7 @@ static void XLogSendLogical(void);
 static void WalSndDone(WalSndSendDataCallback send_data);
 static XLogRecPtr GetStandbyFlushRecPtr(void);
 static void IdentifySystem(void);
+static void ReadReplicationSlot(ReadReplicationSlotCmd *cmd);
 static void CreateReplicationSlot(CreateReplicationSlotCmd *cmd);
 static void DropReplicationSlot(DropReplicationSlotCmd *cmd);
 static void StartReplication(StartReplicationCmd *cmd);
@@ -457,6 +458,87 @@ IdentifySystem(void)
 	end_tup_output(tstate);
 }
 
+/* Handle READ_REPLICATION_SLOT command */
+static void
+ReadReplicationSlot(ReadReplicationSlotCmd *cmd)
+{
+#define READ_REPLICATION_SLOT_COLS 3
+	ReplicationSlot *slot;
+	DestReceiver *dest;
+	TupOutputState *tstate;
+	TupleDesc	tupdesc;
+	Datum		values[READ_REPLICATION_SLOT_COLS];
+	bool		nulls[READ_REPLICATION_SLOT_COLS];
+
+	tupdesc = CreateTemplateTupleDesc(READ_REPLICATION_SLOT_COLS);
+	TupleDescInitBuiltinEntry(tupdesc, (AttrNumber) 1, "slot_type",
+							  TEXTOID, -1, 0);
+	TupleDescInitBuiltinEntry(tupdesc, (AttrNumber) 2, "restart_lsn",
+							  TEXTOID, -1, 0);
+	TupleDescInitBuiltinEntry(tupdesc, (AttrNumber) 3, "restart_tli",
+							  INT4OID, -1, 0);
+
+	MemSet(nulls, true, READ_REPLICATION_SLOT_COLS * sizeof(bool));
+
+	LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
+	slot = SearchNamedReplicationSlot(cmd->slotname, false);
+	if (slot == NULL || !slot->in_use)
+	{
+		LWLockRelease(ReplicationSlotControlLock);
+	}
+	else
+	{
+		List	   *timeline_history = NIL;
+		ReplicationSlot slot_contents;
+		int			i = 0;
+		char		xloc[MAXFNAMELEN];
+		TimeLineID	slots_position_timeline;
+
+		/* Copy slot contents while holding spinlock */
+		SpinLockAcquire(&slot->mutex);
+		slot_contents = *slot;
+		SpinLockRelease(&slot->mutex);
+		LWLockRelease(ReplicationSlotControlLock);
+
+		if (OidIsValid(slot_contents.data.database))
+			ereport(ERROR,
+					errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+					errmsg("READ_REPLICATION_SLOT is only supported for physical slots"));
+		values[i] = CStringGetTextDatum("physical");
+		nulls[i] = false;
+		i++;
+		if (!XLogRecPtrIsInvalid(slot_contents.data.restart_lsn))
+		{
+			snprintf(xloc, sizeof(xloc), "%X/%X",
+					 LSN_FORMAT_ARGS(slot_contents.data.restart_lsn));
+			values[i] = CStringGetTextDatum(xloc);
+			nulls[i] = false;
+		}
+		i++;
+
+		/*
+		 * Now get the timeline this wal was produced on, to get to the
+		 * current timeline
+		 */
+		if (!XLogRecPtrIsInvalid(slot_contents.data.restart_lsn))
+		{
+			timeline_history = readTimeLineHistory(ThisTimeLineID);
+			slots_position_timeline = tliOfPointInHistory(slot_contents.data.restart_lsn,
+														  timeline_history);
+			values[i] = Int32GetDatum(slots_position_timeline);
+			nulls[i] = false;
+		}
+		i++;
+
+		Assert(i == READ_REPLICATION_SLOT_COLS);
+	}
+
+	dest = CreateDestReceiver(DestRemoteSimple);
+	tstate = begin_tup_output_tupdesc(dest, tupdesc, &TTSOpsVirtual);
+	do_tup_output(tstate, values, nulls);
+	end_tup_output(tstate);
+}
+
 
 /*
  * Handle TIMELINE_HISTORY command.
@@ -1622,6 +1704,13 @@ exec_replication_command(const char *cmd_string)
 			EndReplicationCommand(cmdtag);
 			break;
 
+		case T_ReadReplicationSlotCmd:
+			cmdtag = "READ_REPLICATION_SLOT";
+			set_ps_display(cmdtag);
+			ReadReplicationSlot((ReadReplicationSlotCmd *) cmd_node);
+			EndReplicationCommand(cmdtag);
+			break;
+
 		case T_BaseBackupCmd:
 			cmdtag = "BASE_BACKUP";
 			set_ps_display(cmdtag);
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index e0057daa06..6201940637 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -493,6 +493,7 @@ typedef enum NodeTag
 	 * TAGS FOR REPLICATION GRAMMAR PARSE NODES (replnodes.h)
 	 */
 	T_IdentifySystemCmd,
+	T_ReadReplicationSlotCmd,
 	T_BaseBackupCmd,
 	T_CreateReplicationSlotCmd,
 	T_DropReplicationSlotCmd,
diff --git a/src/include/nodes/replnodes.h b/src/include/nodes/replnodes.h
index faa3a251f2..46384ea074 100644
--- a/src/include/nodes/replnodes.h
+++ b/src/include/nodes/replnodes.h
@@ -33,6 +33,16 @@ typedef struct IdentifySystemCmd
 	NodeTag		type;
 } IdentifySystemCmd;
 
+/* ----------------------
+ *		READ_REPLICATION_SLOT command
+ * ----------------------
+ */
+typedef struct ReadReplicationSlotCmd
+{
+	NodeTag		type;
+	char	   *slotname;
+} ReadReplicationSlotCmd;
+
 
 /* ----------------------
  *		BASE_BACKUP command
diff --git a/src/test/recovery/t/001_stream_rep.pl b/src/test/recovery/t/001_stream_rep.pl
index 9916a36012..2317f55149 100644
--- a/src/test/recovery/t/001_stream_rep.pl
+++ b/src/test/recovery/t/001_stream_rep.pl
@@ -6,7 +6,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 49;
+use Test::More tests => 55;
 
 # Initialize primary node
 my $node_primary = PostgresNode->new('primary');
@@ -254,6 +254,51 @@ ok( $ret == 0,
 	"SHOW with superuser-settable parameter, replication role and logical replication"
 );
 
+note "testing READ_REPLICATION_SLOT command";
+
+my $slotname = 'test_read_replication_slot_physical';
+
+($ret, $stdout, $stderr) = $node_primary->psql(
+	'postgres',
+	'READ_REPLICATION_SLOT non_existent_slot;',
+	extra_params => [ '-d', $connstr_rep ]);
+ok( $ret == 0,
+	"READ_REPLICATION_SLOT does not produce an error with non existent slot");
+ok($stdout eq '||',
+	"READ_REPLICATION_SLOT returns NULL values if slot does not exist");
+
+($ret, $stdout, $stderr) = $node_primary->psql(
+	'postgres',
+	"CREATE_REPLICATION_SLOT $slotname PHYSICAL RESERVE_WAL;",
+	extra_params => [ '-d', $connstr_rep ],
+	0,
+	'physical slot created on primary');
+
+($ret, $stdout, $stderr) = $node_primary->psql(
+	'postgres',
+	"READ_REPLICATION_SLOT $slotname;",
+	extra_params => [ '-d', $connstr_rep ]);
+ok($ret == 0,
+	"READ_REPLICATION_SLOT does not produce an error with existing slot");
+like($stdout, qr/^physical\|[^|]*\|1$/,
+	"READ_REPLICATION_SLOT returns tuple corresponding to the slot");
+
+$node_primary->psql(
+	'postgres',
+	"DROP_REPLICATION_SLOT $slotname;",
+	extra_params => [ '-d', $connstr_rep ],
+	0,
+	'physical slot dropped on primary');
+
+($ret, $stdout, $stderr) = $node_primary->psql(
+	'postgres',
+	"READ_REPLICATION_SLOT $slotname;",
+	extra_params => [ '-d', $connstr_rep ]);
+ok($ret == 0,
+	"READ_REPLICATION_SLOT does not produce an error with dropped slot");
+ok($stdout eq '||',
+	"READ_REPLICATION_SLOT returns NULL values if slot has been dropped");
+
 note "switching to physical replication slot";
 
 # Switch to using a physical replication slot. We can do this without a new
diff --git a/src/test/recovery/t/006_logical_decoding.pl b/src/test/recovery/t/006_logical_decoding.pl
index cc116062c2..1e08afca74 100644
--- a/src/test/recovery/t/006_logical_decoding.pl
+++ b/src/test/recovery/t/006_logical_decoding.pl
@@ -10,7 +10,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 14;
+use Test::More tests => 15;
 use Config;
 
 # Initialize primary node
@@ -39,6 +39,13 @@ ok( $stderr =~
 	  m/replication slot "test_slot" was not created in this database/,
 	"Logical decoding correctly fails to start");
 
+($result, $stdout, $stderr) = $node_primary->psql(
+	'template1',
+	qq[READ_REPLICATION_SLOT test_slot;],
+	replication => 'database');
+like($stderr, qr/READ_REPLICATION_SLOT is only supported for physical slots/,
+	'Logical replication slot is not supported');
+
 # Check case of walsender not using a database connection.  Logical
 # decoding should not be allowed.
 ($result, $stdout, $stderr) = $node_primary->psql(
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index cb5b5ec74c..a8e4a2afd6 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2128,6 +2128,7 @@ ReadBufferMode
 ReadBytePtrType
 ReadExtraTocPtrType
 ReadFunc
+ReadReplicationSlotCmd
 ReassignOwnedStmt
 RecheckForeignScan_function
 RecordCacheEntry
-- 
2.33.0

v7-0002-Use-READ_REPLICATION_SLOT-command-in-pg_receivewa.patchtext/x-patch; charset=UTF-8; name=v7-0002-Use-READ_REPLICATION_SLOT-command-in-pg_receivewa.patchDownload
From 8065443324067a4b5a5da522a1a52f25a9b5f21a Mon Sep 17 00:00:00 2001
From: Ronan Dunklau <ronan.dunklau@aiven.io>
Date: Wed, 1 Sep 2021 15:52:32 +0200
Subject: [PATCH v7 2/3] Use READ_REPLICATION_SLOT command in pg_receivewal.

Prior to this patch, when running pg_receivewal, the start LSN is determined by looking at
the WAL files currently stored on disk, then using the current flush lsn
from the server.

If for some reason the WAL files from pg_receivewal were moved, we want
to restart where we left at, which is the replication slot's restart_lsn
instead of skipping right to the current flush location.

To keep compatibility with prior server versions, we only attempt it if
the version is < 15.
---
 src/bin/pg_basebackup/pg_receivewal.c        | 20 ++++-
 src/bin/pg_basebackup/streamutil.c           | 87 ++++++++++++++++++++
 src/bin/pg_basebackup/streamutil.h           |  1 +
 src/bin/pg_basebackup/t/020_pg_receivewal.pl | 44 +++++++++-
 4 files changed, 148 insertions(+), 4 deletions(-)

diff --git a/src/bin/pg_basebackup/pg_receivewal.c b/src/bin/pg_basebackup/pg_receivewal.c
index d5140a79fe..1128389839 100644
--- a/src/bin/pg_basebackup/pg_receivewal.c
+++ b/src/bin/pg_basebackup/pg_receivewal.c
@@ -409,10 +409,26 @@ StreamLog(void)
 	stream.startpos = FindStreamingStart(&stream.timeline);
 	if (stream.startpos == InvalidXLogRecPtr)
 	{
-		stream.startpos = serverpos;
-		stream.timeline = servertli;
+		/* Try to get it from the slot if any, and the server supports it */
+		if (replication_slot != NULL)
+		{
+			/* Error is logged by GetSlotInformation, so just return. */
+			if (!GetSlotInformation(conn, replication_slot, &stream.startpos, &stream.timeline))
+				return;
+		}
+
+		/*
+		 * If it is still unknown, use the current flush value from the server
+		 */
+		if (stream.startpos == InvalidXLogRecPtr)
+		{
+			stream.startpos = serverpos;
+			stream.timeline = servertli;
+		}
 	}
 
+	Assert(stream.startpos != InvalidXLogRecPtr);
+
 	/*
 	 * Always start streaming at the beginning of a segment
 	 */
diff --git a/src/bin/pg_basebackup/streamutil.c b/src/bin/pg_basebackup/streamutil.c
index a9bc1ce214..1257c7b536 100644
--- a/src/bin/pg_basebackup/streamutil.c
+++ b/src/bin/pg_basebackup/streamutil.c
@@ -479,6 +479,93 @@ RunIdentifySystem(PGconn *conn, char **sysid, TimeLineID *starttli,
 	return true;
 }
 
+
+/*
+ * Returns wether a replication slot exists through a given connection,
+ * and fills in the slot_info with the results if passed by the caller.
+ */
+bool
+GetSlotInformation(PGconn *conn, const char *slot_name, XLogRecPtr *restart_lsn, TimeLineID* restart_tli)
+{
+	PGresult   *res;
+	PQExpBuffer query;
+
+	Assert(slot_name != NULL);
+
+	if (PQserverVersion(conn) < 150000)
+		return false;
+
+
+	query = createPQExpBuffer();
+	appendPQExpBuffer(query, "READ_REPLICATION_SLOT %s", slot_name);
+	res = PQexec(conn, query->data);
+	destroyPQExpBuffer(query);
+	/* The commpand should always return precisely one tuple */
+	if (PQresultStatus(res) != PGRES_TUPLES_OK)
+	{
+		pg_log_error("could not read replication slot : %s",
+					 PQerrorMessage(conn));
+		PQclear(res);
+		return false;
+	}
+
+	if (PQntuples(res) != 1 || PQnfields(res) < 3)
+	{
+		pg_log_error("could not fetch replication slot: got %d rows and %d fields, expected %d rows and %d or more fields",
+					 PQntuples(res), PQnfields(res), 1, 3);
+		PQclear(res);
+		return false;
+	}
+
+	/*
+	 * When the slot doesn't exist, the command returns an all-null tuple. The
+	 * first column (slot_type) will only be null if the slot doesn't exists.
+	 */
+	if (PQgetisnull(res, 0, 0))
+	{
+		PQclear(res);
+		pg_log_error("replication slot \"%s\" does not exist", slot_name);
+		return false;
+	}
+
+	/* This cannot happen for now, but future versions may accept
+	 * READ_REPLICATION_SLOT for other slot types.
+	 */
+	if (strcmp(PQgetvalue(res, 0, 0), "physical") != 0)
+		pg_log_error("expected a physical slot, got type \"%s\" instead",
+					 PQgetvalue(res, 0, 0));
+
+	/* Restart LSN */
+	if (restart_lsn)
+	{
+		if (PQgetisnull(res, 0, 1))
+			*restart_lsn = InvalidXLogRecPtr;
+		else
+		{
+			uint32		hi,
+						lo;
+
+			if (sscanf(PQgetvalue(res, 0, 1), "%X/%X", &hi, &lo) != 2)
+			{
+				pg_log_error("could not parse slot's restart_lsn \"%s\"",
+							 PQgetvalue(res, 0, 1));
+				PQclear(res);
+				return false;
+			}
+			*restart_lsn = ((uint64) hi) << 32 | lo;
+		}
+	}
+	if (restart_tli)
+	{
+		if (PQgetisnull(res, 0, 2))
+			*restart_tli = 0;
+		else
+			*restart_tli = atoi(PQgetvalue(res, 0, 2));
+	}
+	PQclear(res);
+	return true;
+}
+
 /*
  * Create a replication slot for the given connection. This function
  * returns true in case of success.
diff --git a/src/bin/pg_basebackup/streamutil.h b/src/bin/pg_basebackup/streamutil.h
index 65135c79e0..907457b8ae 100644
--- a/src/bin/pg_basebackup/streamutil.h
+++ b/src/bin/pg_basebackup/streamutil.h
@@ -52,6 +52,7 @@ extern void AppendIntegerCommandOption(PQExpBuffer buf,
 									   bool use_new_option_syntax,
 									   char *option_name, int32 option_value);
 
+extern bool GetSlotInformation(PGconn *conn, const char *slot_name, XLogRecPtr *restart_lsn, TimeLineID *restart_tli);
 extern bool RetrieveWalSegSize(PGconn *conn);
 extern TimestampTz feGetCurrentTimestamp(void);
 extern void feTimestampDifference(TimestampTz start_time, TimestampTz stop_time,
diff --git a/src/bin/pg_basebackup/t/020_pg_receivewal.pl b/src/bin/pg_basebackup/t/020_pg_receivewal.pl
index 0b33d73900..bb7cef4f72 100644
--- a/src/bin/pg_basebackup/t/020_pg_receivewal.pl
+++ b/src/bin/pg_basebackup/t/020_pg_receivewal.pl
@@ -5,7 +5,7 @@ use strict;
 use warnings;
 use TestLib;
 use PostgresNode;
-use Test::More tests => 27;
+use Test::More tests => 32;
 
 program_help_ok('pg_receivewal');
 program_version_ok('pg_receivewal');
@@ -49,7 +49,7 @@ is($primary->slot($slot_name)->{'slot_type'},
 
 # Generate some WAL.  Use --synchronous at the same time to add more
 # code coverage.  Switch to the next segment first so that subsequent
-# restarts of pg_receivewal will see this segment as full..
+# restarts of pg_receivewal will see this segment as full
 $primary->psql('postgres', 'CREATE TABLE test_table(x integer);');
 $primary->psql('postgres', 'SELECT pg_switch_wal();');
 my $nextlsn =
@@ -146,6 +146,46 @@ $primary->command_ok(
 $partial_wals[0] =~ s/(\.gz)?.partial//;
 ok(-e $partial_wals[0], "check that previously partial WAL is now complete");
 
+# Verify that if we use a replication slot, we resume where we left even in the
+# absence of WALs
+unlink glob "'${stream_dir}/*'";
+# Setup the slot
+$primary->psql('', "CREATE_REPLICATION_SLOT $slot_name PHYSICAL (RESERVE_WAL)",
+  replication => 1);
+$primary->psql('postgres',
+	'INSERT INTO test_table VALUES (generate_series(1,100));');
+$primary->psql('postgres', 'SELECT pg_switch_wal();');
+$nextlsn =
+  $primary->safe_psql('postgres', 'SELECT pg_current_wal_insert_lsn();');
+chomp($nextlsn);
+
+# Get the slot restart_lsn and make sure we retrieve the associated WAL file
+$slot = $primary->slot($slot_name);
+my $restart_lsn = $slot->{'restart_lsn'};
+# Add one so that the restart_lsn doesn't correspond to the previous file.
+$restart_lsn =~ s/.$/1/;
+my $walfile_to_be_archived = $primary->safe_psql('postgres',
+	"SELECT pg_walfile_name('$restart_lsn');");
+isnt($restart_lsn, '', 'restart LSN of new slot is not null');
+
+$primary->psql('postgres',
+	'INSERT INTO test_table VALUES (generate_series(1,100));');
+$primary->psql('postgres', 'SELECT pg_switch_wal();');
+$nextlsn =
+  $primary->safe_psql('postgres', 'SELECT pg_current_wal_insert_lsn();');
+chomp($nextlsn);
+# First verify what happens if we try with a non-existing slot.
+$primary->command_fails_like(
+  [ 'pg_receivewal', '-D', $stream_dir, '--slot', 'nonexistentslot', '-n', '--verbose', '--endpos', $nextlsn ],
+  qr/pg_receivewal: error: replication slot "nonexistentslot" does not exist/,
+  'pg_receivewal fails with a non-existing slot');
+$primary->command_ok(
+	[ 'pg_receivewal', '-D', $stream_dir, '--slot', $slot_name, '--verbose', '--endpos', $nextlsn ],
+	"Stream some wal starting from the slot's restart_lsn");
+$slot = $primary->slot($slot_name);
+my @walfiles = glob "${stream_dir}/*";
+ok(-e "$stream_dir/$walfile_to_be_archived", "WAL from the slot's restart_lsn has been archived");
+
 # Permissions on WAL files should be default
 SKIP:
 {
-- 
2.33.0

#33Ronan Dunklau
ronan.dunklau@aiven.io
In reply to: Ronan Dunklau (#32)
4 attachment(s)
Re: pg_receivewal starting position

Le mercredi 20 octobre 2021 11:40:18 CEST, vous avez écrit :

+# Setup the slot, and connect to it a first time
+$primary->run_log(
+   [ 'pg_receivewal', '--slot', $slot_name, '--create-slot' ],
+   'creating a replication slot');
+$primary->psql('postgres',
+   'INSERT INTO test_table VALUES (generate_series(1,100));');
+$primary->psql('postgres', 'SELECT pg_switch_wal();');
+$nextlsn =
+  $primary->safe_psql('postgres', 'SELECT pg_current_wal_insert_lsn();');
+chomp($nextlsn);
Wouldn't it be simpler to use CREATE_REPLICATION_SLOT with RESERVE_WAL
here, rather than going through pg_receivewal?  It seems to me that
this would be cheaper without really impacting the coverage.

You're right, we can skip two invocations of pg_receivewal like this (for
the slot creation + for starting the slot a first time).

After sending the previous patch suite, I figured it would be worthwhile to
also have tests covering timeline switches, which was not covered before.
So please find attached a new version with an additional patch for those tests,
covering both "resume from last know archive" and "resume from the
replication slots position" cases.

--
Ronan Dunklau

Attachments:

v8-0004-Add-tests-making-sure-pg_receivewal-can-follow-a-.patchtext/x-patch; charset=UTF-8; name=v8-0004-Add-tests-making-sure-pg_receivewal-can-follow-a-.patchDownload
From f85b402491dd22ecc2521bbb7a3f75bfe47c60a8 Mon Sep 17 00:00:00 2001
From: Ronan Dunklau <ronan.dunklau@aiven.io>
Date: Wed, 20 Oct 2021 14:52:13 +0200
Subject: [PATCH v8 4/4] Add tests making sure pg_receivewal can follow a
 timeline switch.

---
 src/bin/pg_basebackup/t/020_pg_receivewal.pl | 61 +++++++++++++++++++-
 1 file changed, 60 insertions(+), 1 deletion(-)

diff --git a/src/bin/pg_basebackup/t/020_pg_receivewal.pl b/src/bin/pg_basebackup/t/020_pg_receivewal.pl
index 42749c3a40..a5e19c5616 100644
--- a/src/bin/pg_basebackup/t/020_pg_receivewal.pl
+++ b/src/bin/pg_basebackup/t/020_pg_receivewal.pl
@@ -5,7 +5,7 @@ use strict;
 use warnings;
 use TestLib;
 use PostgresNode;
-use Test::More tests => 32;
+use Test::More tests => 38;
 
 program_help_ok('pg_receivewal');
 program_version_ok('pg_receivewal');
@@ -185,6 +185,65 @@ $primary->command_ok(
 $slot = $primary->slot($slot_name);
 ok(-e "$stream_dir/$walfile_to_be_archived", "WAL from the slot's restart_lsn has been archived");
 
+# Check what happens on a timeline switch
+sub change_timeline{
+  # Setup a new standby, start pg_receivewal once, then promote it.
+  my ($primary, $backup_name, $standby_name, $slot_name, $stream_dir) = @_;
+  my $standby = PostgresNode->new($standby_name);
+  $standby->init_from_backup($primary, $backup_name, has_streaming => 1);
+  $standby->start;
+  $primary->wait_for_catchup($standby, 'replay', $primary->lsn('write'));
+  $standby->psql('', "CREATE_REPLICATION_SLOT $slot_name PHYSICAL (RESERVE_WAL)",
+                 replication => 1);
+  $primary->psql('postgres',
+	'INSERT INTO test_table VALUES (generate_series(1,100));');
+  # Get a position which can act as an end position.
+  my $nextlsn = $primary->safe_psql('postgres', 'SELECT pg_current_wal_insert_lsn();');
+  $primary->safe_psql('postgres', 'SELECT pg_switch_wal();');
+  # Stream some wal, this will populate the stream directory for the
+  # resume-from-archive test case.
+  $standby->run_log(
+    [ 'pg_receivewal', '-D', $stream_dir, '--verbose', '--endpos', $nextlsn, "--slot", $slot_name],
+    "Stream some wal before promoting");
+  $standby->psql(
+	'postgres',
+	"SELECT pg_promote(wait_seconds => 300)");
+  $standby->psql('postgres',
+	'INSERT INTO test_table VALUES (generate_series(1,100));');
+  $standby->safe_psql('postgres', 'SELECT pg_switch_wal();');
+  # Get a full walfilename that will be generated.
+  my $walfilename = $standby->safe_psql('postgres', "SELECT pg_walfile_name(pg_current_wal_insert_lsn())");
+  $standby->psql('postgres',
+	'INSERT INTO test_table VALUES (generate_series(1,100));');
+  $standby->safe_psql('postgres', 'SELECT pg_switch_wal();');
+  $nextlsn = $standby->safe_psql('postgres', 'SELECT pg_current_wal_insert_lsn();');
+  return $standby, $nextlsn, $walfilename;
+}
+
+my $backup_name = 'my_backup';
+$primary->backup($backup_name);
+my $standby;
+unlink glob "'${stream_dir}/*'";
+# First test: change timeline, and check we can resume from the slot.
+($standby, $nextlsn, $walfile_to_be_archived) = change_timeline($primary, 'my_backup', 'standby1', $slot_name, $stream_dir);
+# To resume from the slot, the target directory must be empty
+unlink glob "'${stream_dir}/*'";
+$standby->command_ok(
+	[ 'pg_receivewal', '-D', $stream_dir, '--verbose', '--endpos', $nextlsn, "--slot", $slot_name],
+	"Stream some wal after promoting, resuming from the slot's position");
+ok(-e "$stream_dir/$walfile_to_be_archived", "WAL from the new timeline has been archived");
+ok(-e "$stream_dir/00000002.history", "Timeline history file has been archived");
+
+# Now do the same, but resuming from the latest archived file.
+# Cleanup leftover from the first standby
+unlink glob "'${stream_dir}/*'";
+($standby, $nextlsn, $walfile_to_be_archived) = change_timeline($primary, 'my_backup', 'standby2', $slot_name, $stream_dir);
+$standby->command_ok(
+	[ 'pg_receivewal', '-D', $stream_dir, '--verbose', '--endpos', $nextlsn],
+	"Stream some wal after promoting, resuming from the latest archived file");
+ok(-e "$stream_dir/$walfile_to_be_archived", "WAL from the new timeline has been archived");
+ok(-e "$stream_dir/00000002.history", "Timeline history file has been archived");
+
 # Permissions on WAL files should be default
 SKIP:
 {
-- 
2.33.0

v8-0001-Add-READ_REPLICATION_SLOT-command.patchtext/x-patch; charset=UTF-8; name=v8-0001-Add-READ_REPLICATION_SLOT-command.patchDownload
From 5a47f17a17594cc171f744ce383ba820d44b6446 Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Thu, 2 Sep 2021 16:25:25 +0900
Subject: [PATCH v8 1/4] Add READ_REPLICATION_SLOT command

---
 doc/src/sgml/protocol.sgml                  | 47 +++++++++++
 src/backend/replication/repl_gram.y         | 16 +++-
 src/backend/replication/repl_scanner.l      |  1 +
 src/backend/replication/walsender.c         | 89 +++++++++++++++++++++
 src/include/nodes/nodes.h                   |  1 +
 src/include/nodes/replnodes.h               | 10 +++
 src/test/recovery/t/001_stream_rep.pl       | 47 ++++++++++-
 src/test/recovery/t/006_logical_decoding.pl |  9 ++-
 src/tools/pgindent/typedefs.list            |  1 +
 9 files changed, 218 insertions(+), 3 deletions(-)

diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index b95cc88599..51a15cc3da 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2067,6 +2067,53 @@ The commands accepted in replication mode are:
     </listitem>
   </varlistentry>
 
+  <varlistentry>
+    <term><literal>READ_REPLICATION_SLOT</literal> <replaceable class="parameter">slot_name</replaceable>
+      <indexterm><primary>READ_REPLICATION_SLOT</primary></indexterm>
+    </term>
+    <listitem>
+     <para>
+      Read the information of a replication slot. Returns a tuple with
+      <literal>NULL</literal> values if the replication slot does not
+      exist. This command is currently only supported for <literal>physical</literal> slots.
+     </para>
+     <para>
+      In response to this command, the server will return a one-row result set,
+      containing the following fields:
+      <variablelist>
+       <varlistentry>
+        <term><literal>slot_type</literal> (<type>text</type>)</term>
+        <listitem>
+         <para>
+          The replication slot's type, either <literal>physical</literal> or
+          <literal>NULL</literal>.
+         </para>
+        </listitem>
+       </varlistentry>
+
+       <varlistentry>
+        <term><literal>restart_lsn</literal> (<type>text</type>)</term>
+        <listitem>
+         <para>
+          The replication slot's <literal>restart_lsn</literal>.
+         </para>
+        </listitem>
+       </varlistentry>
+
+       <varlistentry>
+        <term><literal>restart_tli</literal> (<type>int4</type>)</term>
+        <listitem>
+         <para>
+          The timeline ID for the <literal>restart_lsn</literal> position,
+          when following the current timeline history.
+         </para>
+        </listitem>
+       </varlistentry>
+      </variablelist>
+     </para>
+    </listitem>
+  </varlistentry>
+
   <varlistentry>
     <term><literal>START_REPLICATION</literal> [ <literal>SLOT</literal> <replaceable class="parameter">slot_name</replaceable> ] [ <literal>PHYSICAL</literal> ] <replaceable class="parameter">XXX/XXX</replaceable> [ <literal>TIMELINE</literal> <replaceable class="parameter">tli</replaceable> ]
      <indexterm><primary>START_REPLICATION</primary></indexterm>
diff --git a/src/backend/replication/repl_gram.y b/src/backend/replication/repl_gram.y
index 126380e2df..913a99da5a 100644
--- a/src/backend/replication/repl_gram.y
+++ b/src/backend/replication/repl_gram.y
@@ -64,6 +64,7 @@ static SQLCmd *make_sqlcmd(void);
 /* Keyword tokens. */
 %token K_BASE_BACKUP
 %token K_IDENTIFY_SYSTEM
+%token K_READ_REPLICATION_SLOT
 %token K_SHOW
 %token K_START_REPLICATION
 %token K_CREATE_REPLICATION_SLOT
@@ -94,7 +95,7 @@ static SQLCmd *make_sqlcmd(void);
 %type <node>	command
 %type <node>	base_backup start_replication start_logical_replication
 				create_replication_slot drop_replication_slot identify_system
-				timeline_history show sql_cmd
+				read_replication_slot timeline_history show sql_cmd
 %type <list>	base_backup_legacy_opt_list generic_option_list
 %type <defelt>	base_backup_legacy_opt generic_option
 %type <uintval>	opt_timeline
@@ -120,6 +121,7 @@ opt_semicolon:	';'
 
 command:
 			identify_system
+			| read_replication_slot
 			| base_backup
 			| start_replication
 			| start_logical_replication
@@ -140,6 +142,18 @@ identify_system:
 				}
 			;
 
+/*
+ * READ_REPLICATION_SLOT %s
+ */
+read_replication_slot:
+			K_READ_REPLICATION_SLOT var_name
+				{
+					ReadReplicationSlotCmd *n = makeNode(ReadReplicationSlotCmd);
+					n->slotname = $2;
+					$$ = (Node *) n;
+				}
+			;
+
 /*
  * SHOW setting
  */
diff --git a/src/backend/replication/repl_scanner.l b/src/backend/replication/repl_scanner.l
index c038a636c3..1b599c255e 100644
--- a/src/backend/replication/repl_scanner.l
+++ b/src/backend/replication/repl_scanner.l
@@ -85,6 +85,7 @@ identifier		{ident_start}{ident_cont}*
 BASE_BACKUP			{ return K_BASE_BACKUP; }
 FAST			{ return K_FAST; }
 IDENTIFY_SYSTEM		{ return K_IDENTIFY_SYSTEM; }
+READ_REPLICATION_SLOT	{ return K_READ_REPLICATION_SLOT; }
 SHOW		{ return K_SHOW; }
 LABEL			{ return K_LABEL; }
 NOWAIT			{ return K_NOWAIT; }
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index b811a5c0ef..5d68a7e66a 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -232,6 +232,7 @@ static void XLogSendLogical(void);
 static void WalSndDone(WalSndSendDataCallback send_data);
 static XLogRecPtr GetStandbyFlushRecPtr(void);
 static void IdentifySystem(void);
+static void ReadReplicationSlot(ReadReplicationSlotCmd *cmd);
 static void CreateReplicationSlot(CreateReplicationSlotCmd *cmd);
 static void DropReplicationSlot(DropReplicationSlotCmd *cmd);
 static void StartReplication(StartReplicationCmd *cmd);
@@ -457,6 +458,87 @@ IdentifySystem(void)
 	end_tup_output(tstate);
 }
 
+/* Handle READ_REPLICATION_SLOT command */
+static void
+ReadReplicationSlot(ReadReplicationSlotCmd *cmd)
+{
+#define READ_REPLICATION_SLOT_COLS 3
+	ReplicationSlot *slot;
+	DestReceiver *dest;
+	TupOutputState *tstate;
+	TupleDesc	tupdesc;
+	Datum		values[READ_REPLICATION_SLOT_COLS];
+	bool		nulls[READ_REPLICATION_SLOT_COLS];
+
+	tupdesc = CreateTemplateTupleDesc(READ_REPLICATION_SLOT_COLS);
+	TupleDescInitBuiltinEntry(tupdesc, (AttrNumber) 1, "slot_type",
+							  TEXTOID, -1, 0);
+	TupleDescInitBuiltinEntry(tupdesc, (AttrNumber) 2, "restart_lsn",
+							  TEXTOID, -1, 0);
+	TupleDescInitBuiltinEntry(tupdesc, (AttrNumber) 3, "restart_tli",
+							  INT4OID, -1, 0);
+
+	MemSet(nulls, true, READ_REPLICATION_SLOT_COLS * sizeof(bool));
+
+	LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
+	slot = SearchNamedReplicationSlot(cmd->slotname, false);
+	if (slot == NULL || !slot->in_use)
+	{
+		LWLockRelease(ReplicationSlotControlLock);
+	}
+	else
+	{
+		List	   *timeline_history = NIL;
+		ReplicationSlot slot_contents;
+		int			i = 0;
+		char		xloc[MAXFNAMELEN];
+		TimeLineID	slots_position_timeline;
+
+		/* Copy slot contents while holding spinlock */
+		SpinLockAcquire(&slot->mutex);
+		slot_contents = *slot;
+		SpinLockRelease(&slot->mutex);
+		LWLockRelease(ReplicationSlotControlLock);
+
+		if (OidIsValid(slot_contents.data.database))
+			ereport(ERROR,
+					errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+					errmsg("READ_REPLICATION_SLOT is only supported for physical slots"));
+		values[i] = CStringGetTextDatum("physical");
+		nulls[i] = false;
+		i++;
+		if (!XLogRecPtrIsInvalid(slot_contents.data.restart_lsn))
+		{
+			snprintf(xloc, sizeof(xloc), "%X/%X",
+					 LSN_FORMAT_ARGS(slot_contents.data.restart_lsn));
+			values[i] = CStringGetTextDatum(xloc);
+			nulls[i] = false;
+		}
+		i++;
+
+		/*
+		 * Now get the timeline this wal was produced on, to get to the
+		 * current timeline
+		 */
+		if (!XLogRecPtrIsInvalid(slot_contents.data.restart_lsn))
+		{
+			timeline_history = readTimeLineHistory(ThisTimeLineID);
+			slots_position_timeline = tliOfPointInHistory(slot_contents.data.restart_lsn,
+														  timeline_history);
+			values[i] = Int32GetDatum(slots_position_timeline);
+			nulls[i] = false;
+		}
+		i++;
+
+		Assert(i == READ_REPLICATION_SLOT_COLS);
+	}
+
+	dest = CreateDestReceiver(DestRemoteSimple);
+	tstate = begin_tup_output_tupdesc(dest, tupdesc, &TTSOpsVirtual);
+	do_tup_output(tstate, values, nulls);
+	end_tup_output(tstate);
+}
+
 
 /*
  * Handle TIMELINE_HISTORY command.
@@ -1622,6 +1704,13 @@ exec_replication_command(const char *cmd_string)
 			EndReplicationCommand(cmdtag);
 			break;
 
+		case T_ReadReplicationSlotCmd:
+			cmdtag = "READ_REPLICATION_SLOT";
+			set_ps_display(cmdtag);
+			ReadReplicationSlot((ReadReplicationSlotCmd *) cmd_node);
+			EndReplicationCommand(cmdtag);
+			break;
+
 		case T_BaseBackupCmd:
 			cmdtag = "BASE_BACKUP";
 			set_ps_display(cmdtag);
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index e0057daa06..6201940637 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -493,6 +493,7 @@ typedef enum NodeTag
 	 * TAGS FOR REPLICATION GRAMMAR PARSE NODES (replnodes.h)
 	 */
 	T_IdentifySystemCmd,
+	T_ReadReplicationSlotCmd,
 	T_BaseBackupCmd,
 	T_CreateReplicationSlotCmd,
 	T_DropReplicationSlotCmd,
diff --git a/src/include/nodes/replnodes.h b/src/include/nodes/replnodes.h
index faa3a251f2..46384ea074 100644
--- a/src/include/nodes/replnodes.h
+++ b/src/include/nodes/replnodes.h
@@ -33,6 +33,16 @@ typedef struct IdentifySystemCmd
 	NodeTag		type;
 } IdentifySystemCmd;
 
+/* ----------------------
+ *		READ_REPLICATION_SLOT command
+ * ----------------------
+ */
+typedef struct ReadReplicationSlotCmd
+{
+	NodeTag		type;
+	char	   *slotname;
+} ReadReplicationSlotCmd;
+
 
 /* ----------------------
  *		BASE_BACKUP command
diff --git a/src/test/recovery/t/001_stream_rep.pl b/src/test/recovery/t/001_stream_rep.pl
index 9916a36012..2317f55149 100644
--- a/src/test/recovery/t/001_stream_rep.pl
+++ b/src/test/recovery/t/001_stream_rep.pl
@@ -6,7 +6,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 49;
+use Test::More tests => 55;
 
 # Initialize primary node
 my $node_primary = PostgresNode->new('primary');
@@ -254,6 +254,51 @@ ok( $ret == 0,
 	"SHOW with superuser-settable parameter, replication role and logical replication"
 );
 
+note "testing READ_REPLICATION_SLOT command";
+
+my $slotname = 'test_read_replication_slot_physical';
+
+($ret, $stdout, $stderr) = $node_primary->psql(
+	'postgres',
+	'READ_REPLICATION_SLOT non_existent_slot;',
+	extra_params => [ '-d', $connstr_rep ]);
+ok( $ret == 0,
+	"READ_REPLICATION_SLOT does not produce an error with non existent slot");
+ok($stdout eq '||',
+	"READ_REPLICATION_SLOT returns NULL values if slot does not exist");
+
+($ret, $stdout, $stderr) = $node_primary->psql(
+	'postgres',
+	"CREATE_REPLICATION_SLOT $slotname PHYSICAL RESERVE_WAL;",
+	extra_params => [ '-d', $connstr_rep ],
+	0,
+	'physical slot created on primary');
+
+($ret, $stdout, $stderr) = $node_primary->psql(
+	'postgres',
+	"READ_REPLICATION_SLOT $slotname;",
+	extra_params => [ '-d', $connstr_rep ]);
+ok($ret == 0,
+	"READ_REPLICATION_SLOT does not produce an error with existing slot");
+like($stdout, qr/^physical\|[^|]*\|1$/,
+	"READ_REPLICATION_SLOT returns tuple corresponding to the slot");
+
+$node_primary->psql(
+	'postgres',
+	"DROP_REPLICATION_SLOT $slotname;",
+	extra_params => [ '-d', $connstr_rep ],
+	0,
+	'physical slot dropped on primary');
+
+($ret, $stdout, $stderr) = $node_primary->psql(
+	'postgres',
+	"READ_REPLICATION_SLOT $slotname;",
+	extra_params => [ '-d', $connstr_rep ]);
+ok($ret == 0,
+	"READ_REPLICATION_SLOT does not produce an error with dropped slot");
+ok($stdout eq '||',
+	"READ_REPLICATION_SLOT returns NULL values if slot has been dropped");
+
 note "switching to physical replication slot";
 
 # Switch to using a physical replication slot. We can do this without a new
diff --git a/src/test/recovery/t/006_logical_decoding.pl b/src/test/recovery/t/006_logical_decoding.pl
index cc116062c2..1e08afca74 100644
--- a/src/test/recovery/t/006_logical_decoding.pl
+++ b/src/test/recovery/t/006_logical_decoding.pl
@@ -10,7 +10,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 14;
+use Test::More tests => 15;
 use Config;
 
 # Initialize primary node
@@ -39,6 +39,13 @@ ok( $stderr =~
 	  m/replication slot "test_slot" was not created in this database/,
 	"Logical decoding correctly fails to start");
 
+($result, $stdout, $stderr) = $node_primary->psql(
+	'template1',
+	qq[READ_REPLICATION_SLOT test_slot;],
+	replication => 'database');
+like($stderr, qr/READ_REPLICATION_SLOT is only supported for physical slots/,
+	'Logical replication slot is not supported');
+
 # Check case of walsender not using a database connection.  Logical
 # decoding should not be allowed.
 ($result, $stdout, $stderr) = $node_primary->psql(
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index cb5b5ec74c..a8e4a2afd6 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2128,6 +2128,7 @@ ReadBufferMode
 ReadBytePtrType
 ReadExtraTocPtrType
 ReadFunc
+ReadReplicationSlotCmd
 ReassignOwnedStmt
 RecheckForeignScan_function
 RecordCacheEntry
-- 
2.33.0

v8-0002-Use-READ_REPLICATION_SLOT-command-in-pg_receivewa.patchtext/x-patch; charset=UTF-8; name=v8-0002-Use-READ_REPLICATION_SLOT-command-in-pg_receivewa.patchDownload
From 08718ca8563fa1de809fa6e95b65dbd54629f1e9 Mon Sep 17 00:00:00 2001
From: Ronan Dunklau <ronan.dunklau@aiven.io>
Date: Wed, 1 Sep 2021 15:52:32 +0200
Subject: [PATCH v8 2/4] Use READ_REPLICATION_SLOT command in pg_receivewal.

Prior to this patch, when running pg_receivewal, the start LSN is determined by looking at
the WAL files currently stored on disk, then using the current flush lsn
from the server.

If for some reason the WAL files from pg_receivewal were moved, we want
to restart where we left at, which is the replication slot's restart_lsn
instead of skipping right to the current flush location.

To keep compatibility with prior server versions, we only attempt it if
the version is < 15.
---
 src/bin/pg_basebackup/pg_receivewal.c        | 20 ++++-
 src/bin/pg_basebackup/streamutil.c           | 87 ++++++++++++++++++++
 src/bin/pg_basebackup/streamutil.h           |  1 +
 src/bin/pg_basebackup/t/020_pg_receivewal.pl | 43 +++++++++-
 4 files changed, 147 insertions(+), 4 deletions(-)

diff --git a/src/bin/pg_basebackup/pg_receivewal.c b/src/bin/pg_basebackup/pg_receivewal.c
index d5140a79fe..1128389839 100644
--- a/src/bin/pg_basebackup/pg_receivewal.c
+++ b/src/bin/pg_basebackup/pg_receivewal.c
@@ -409,10 +409,26 @@ StreamLog(void)
 	stream.startpos = FindStreamingStart(&stream.timeline);
 	if (stream.startpos == InvalidXLogRecPtr)
 	{
-		stream.startpos = serverpos;
-		stream.timeline = servertli;
+		/* Try to get it from the slot if any, and the server supports it */
+		if (replication_slot != NULL)
+		{
+			/* Error is logged by GetSlotInformation, so just return. */
+			if (!GetSlotInformation(conn, replication_slot, &stream.startpos, &stream.timeline))
+				return;
+		}
+
+		/*
+		 * If it is still unknown, use the current flush value from the server
+		 */
+		if (stream.startpos == InvalidXLogRecPtr)
+		{
+			stream.startpos = serverpos;
+			stream.timeline = servertli;
+		}
 	}
 
+	Assert(stream.startpos != InvalidXLogRecPtr);
+
 	/*
 	 * Always start streaming at the beginning of a segment
 	 */
diff --git a/src/bin/pg_basebackup/streamutil.c b/src/bin/pg_basebackup/streamutil.c
index a9bc1ce214..1257c7b536 100644
--- a/src/bin/pg_basebackup/streamutil.c
+++ b/src/bin/pg_basebackup/streamutil.c
@@ -479,6 +479,93 @@ RunIdentifySystem(PGconn *conn, char **sysid, TimeLineID *starttli,
 	return true;
 }
 
+
+/*
+ * Returns wether a replication slot exists through a given connection,
+ * and fills in the slot_info with the results if passed by the caller.
+ */
+bool
+GetSlotInformation(PGconn *conn, const char *slot_name, XLogRecPtr *restart_lsn, TimeLineID* restart_tli)
+{
+	PGresult   *res;
+	PQExpBuffer query;
+
+	Assert(slot_name != NULL);
+
+	if (PQserverVersion(conn) < 150000)
+		return false;
+
+
+	query = createPQExpBuffer();
+	appendPQExpBuffer(query, "READ_REPLICATION_SLOT %s", slot_name);
+	res = PQexec(conn, query->data);
+	destroyPQExpBuffer(query);
+	/* The commpand should always return precisely one tuple */
+	if (PQresultStatus(res) != PGRES_TUPLES_OK)
+	{
+		pg_log_error("could not read replication slot : %s",
+					 PQerrorMessage(conn));
+		PQclear(res);
+		return false;
+	}
+
+	if (PQntuples(res) != 1 || PQnfields(res) < 3)
+	{
+		pg_log_error("could not fetch replication slot: got %d rows and %d fields, expected %d rows and %d or more fields",
+					 PQntuples(res), PQnfields(res), 1, 3);
+		PQclear(res);
+		return false;
+	}
+
+	/*
+	 * When the slot doesn't exist, the command returns an all-null tuple. The
+	 * first column (slot_type) will only be null if the slot doesn't exists.
+	 */
+	if (PQgetisnull(res, 0, 0))
+	{
+		PQclear(res);
+		pg_log_error("replication slot \"%s\" does not exist", slot_name);
+		return false;
+	}
+
+	/* This cannot happen for now, but future versions may accept
+	 * READ_REPLICATION_SLOT for other slot types.
+	 */
+	if (strcmp(PQgetvalue(res, 0, 0), "physical") != 0)
+		pg_log_error("expected a physical slot, got type \"%s\" instead",
+					 PQgetvalue(res, 0, 0));
+
+	/* Restart LSN */
+	if (restart_lsn)
+	{
+		if (PQgetisnull(res, 0, 1))
+			*restart_lsn = InvalidXLogRecPtr;
+		else
+		{
+			uint32		hi,
+						lo;
+
+			if (sscanf(PQgetvalue(res, 0, 1), "%X/%X", &hi, &lo) != 2)
+			{
+				pg_log_error("could not parse slot's restart_lsn \"%s\"",
+							 PQgetvalue(res, 0, 1));
+				PQclear(res);
+				return false;
+			}
+			*restart_lsn = ((uint64) hi) << 32 | lo;
+		}
+	}
+	if (restart_tli)
+	{
+		if (PQgetisnull(res, 0, 2))
+			*restart_tli = 0;
+		else
+			*restart_tli = atoi(PQgetvalue(res, 0, 2));
+	}
+	PQclear(res);
+	return true;
+}
+
 /*
  * Create a replication slot for the given connection. This function
  * returns true in case of success.
diff --git a/src/bin/pg_basebackup/streamutil.h b/src/bin/pg_basebackup/streamutil.h
index 65135c79e0..907457b8ae 100644
--- a/src/bin/pg_basebackup/streamutil.h
+++ b/src/bin/pg_basebackup/streamutil.h
@@ -52,6 +52,7 @@ extern void AppendIntegerCommandOption(PQExpBuffer buf,
 									   bool use_new_option_syntax,
 									   char *option_name, int32 option_value);
 
+extern bool GetSlotInformation(PGconn *conn, const char *slot_name, XLogRecPtr *restart_lsn, TimeLineID *restart_tli);
 extern bool RetrieveWalSegSize(PGconn *conn);
 extern TimestampTz feGetCurrentTimestamp(void);
 extern void feTimestampDifference(TimestampTz start_time, TimestampTz stop_time,
diff --git a/src/bin/pg_basebackup/t/020_pg_receivewal.pl b/src/bin/pg_basebackup/t/020_pg_receivewal.pl
index 0b33d73900..42749c3a40 100644
--- a/src/bin/pg_basebackup/t/020_pg_receivewal.pl
+++ b/src/bin/pg_basebackup/t/020_pg_receivewal.pl
@@ -5,7 +5,7 @@ use strict;
 use warnings;
 use TestLib;
 use PostgresNode;
-use Test::More tests => 27;
+use Test::More tests => 32;
 
 program_help_ok('pg_receivewal');
 program_version_ok('pg_receivewal');
@@ -49,7 +49,7 @@ is($primary->slot($slot_name)->{'slot_type'},
 
 # Generate some WAL.  Use --synchronous at the same time to add more
 # code coverage.  Switch to the next segment first so that subsequent
-# restarts of pg_receivewal will see this segment as full..
+# restarts of pg_receivewal will see this segment as full
 $primary->psql('postgres', 'CREATE TABLE test_table(x integer);');
 $primary->psql('postgres', 'SELECT pg_switch_wal();');
 my $nextlsn =
@@ -146,6 +146,45 @@ $primary->command_ok(
 $partial_wals[0] =~ s/(\.gz)?.partial//;
 ok(-e $partial_wals[0], "check that previously partial WAL is now complete");
 
+# Verify that if we use a replication slot, we resume where we left even in the
+# absence of WALs
+unlink glob "'${stream_dir}/*'";
+# Setup the slot
+$primary->psql('', "CREATE_REPLICATION_SLOT $slot_name PHYSICAL (RESERVE_WAL)",
+  replication => 1);
+$primary->psql('postgres',
+	'INSERT INTO test_table VALUES (generate_series(1,100));');
+$primary->psql('postgres', 'SELECT pg_switch_wal();');
+$nextlsn =
+  $primary->safe_psql('postgres', 'SELECT pg_current_wal_insert_lsn();');
+chomp($nextlsn);
+
+# Get the slot restart_lsn and make sure we retrieve the associated WAL file
+$slot = $primary->slot($slot_name);
+my $restart_lsn = $slot->{'restart_lsn'};
+# Add one so that the restart_lsn doesn't correspond to the previous file.
+$restart_lsn =~ s/.$/1/;
+my $walfile_to_be_archived = $primary->safe_psql('postgres',
+	"SELECT pg_walfile_name('$restart_lsn');");
+isnt($restart_lsn, '', 'restart LSN of new slot is not null');
+
+$primary->psql('postgres',
+	'INSERT INTO test_table VALUES (generate_series(1,100));');
+$primary->psql('postgres', 'SELECT pg_switch_wal();');
+$nextlsn =
+  $primary->safe_psql('postgres', 'SELECT pg_current_wal_insert_lsn();');
+chomp($nextlsn);
+# First verify what happens if we try with a non-existing slot.
+$primary->command_fails_like(
+  [ 'pg_receivewal', '-D', $stream_dir, '--slot', 'nonexistentslot', '-n', '--verbose', '--endpos', $nextlsn ],
+  qr/pg_receivewal: error: replication slot "nonexistentslot" does not exist/,
+  'pg_receivewal fails with a non-existing slot');
+$primary->command_ok(
+	[ 'pg_receivewal', '-D', $stream_dir, '--slot', $slot_name, '--verbose', '--endpos', $nextlsn ],
+	"Stream some wal starting from the slot's restart_lsn");
+$slot = $primary->slot($slot_name);
+ok(-e "$stream_dir/$walfile_to_be_archived", "WAL from the slot's restart_lsn has been archived");
+
 # Permissions on WAL files should be default
 SKIP:
 {
-- 
2.33.0

v8-0003-Add-documentation-for-pg_receivewal.patchtext/x-patch; charset=UTF-8; name=v8-0003-Add-documentation-for-pg_receivewal.patchDownload
From bf9db289aed59b4e67e5491e08feded66638c617 Mon Sep 17 00:00:00 2001
From: Ronan Dunklau <ronan.dunklau@aiven.io>
Date: Mon, 6 Sep 2021 11:08:54 +0200
Subject: [PATCH v8 3/4] Add documentation for pg_receivewal

---
 doc/src/sgml/ref/pg_receivewal.sgml | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/doc/src/sgml/ref/pg_receivewal.sgml b/doc/src/sgml/ref/pg_receivewal.sgml
index 45b544cf49..5fb2b61d34 100644
--- a/doc/src/sgml/ref/pg_receivewal.sgml
+++ b/doc/src/sgml/ref/pg_receivewal.sgml
@@ -37,6 +37,32 @@ PostgreSQL documentation
    <xref linkend="continuous-archiving"/>).
   </para>
 
+  <para>
+    When <application>pg_receivewal</application> is launched, it tries to determine the
+    starting LSN from the following sources:
+    <orderedlist>
+      <listitem>
+      <para>
+      It scans the target directory to determine the next file it should archive.
+      </para>
+      </listitem>
+
+      <listitem>
+      <para>
+      If it doesn't find that file, and a slot was specified, it uses the
+      <command>READ_REPLICATION_SLOT</command> command to retrieve the slot's <literal>restart_lsn</literal>
+      </para>
+      </listitem>
+
+      <listitem>
+      <para>
+      If it still don't find a valid start position via <command>READ_REPLICATION_SLOT</command>,
+      it uses the last flush location as reported by the server.
+      </para>
+      </listitem>
+    </orderedlist>
+  </para>
+
   <para>
    <application>pg_receivewal</application> streams the write-ahead
    log in real time as it's being generated on the server, and does not wait
-- 
2.33.0

#34Michael Paquier
michael@paquier.xyz
In reply to: Ronan Dunklau (#33)
2 attachment(s)
Re: pg_receivewal starting position

On Wed, Oct 20, 2021 at 02:58:26PM +0200, Ronan Dunklau wrote:

After sending the previous patch suite, I figured it would be worthwhile to
also have tests covering timeline switches, which was not covered before.

That seems independent to me. I'll take a look.

So please find attached a new version with an additional patch for those tests,
covering both "resume from last know archive" and "resume from the
replication slots position" cases.

So, taking things in order, I have looked at 0003 and 0001, and
attached are refined versions for both of them.

0003 is an existing hole in the docs, which I think we had better
address first and backpatch, taking into account that the starting
point calculation considers compressed segments when looking for
completed segments.

Regarding 0001, I have found the last test to check for NULL values
returned by READ_REPLICATION_SLOT after dropping the slot overlaps
with the first test, so I have removed that. I have expanded a bit
the use of like(), and there were some confusion with
PostgresNode::psql and some extra arguments (see DROP_REPLICATION_SLOT
and CREATE_REPLICATION_SLOT, and no need for return values in the
CREATE case either). Some comments, docs and code have been slightly
tweaked.

Here are some comments about 0002.

+ /* The commpand should always return precisely one tuple */
s/commpand/command/

+       pg_log_error("could not fetch replication slot: got %d rows and %d fields, expected %d rows and %d or more fields",
+                    PQntuples(res), PQnfields(res), 1, 3);
Should this be "could not read" instead?
+       if (sscanf(PQgetvalue(res, 0, 1), "%X/%X", &hi, &lo) != 2)
+       {
+           pg_log_error("could not parse slot's restart_lsn \"%s\"",
+                        PQgetvalue(res, 0, 1));
+           PQclear(res);
+           return false;
+       }
Wouldn't it be saner to initialize *restart_lsn and *restart_tli to
some default values at the top of GetSlotInformation() instead, if
they are specified by the caller?  And I think that we should still
complain even if restart_lsn is NULL.

On a quick read of 0004, I find the split of the logic with
change_timeline() a bit hard to understand. It looks like we should
be able to make a cleaner split, but I am not sure how that would
look, though.
--
Michael

Attachments:

v9-0001-doc-Describe-calculation-method-of-streaming-star.patchtext/x-diff; charset=us-asciiDownload
From afca5e895e07279049f272396e40bdb78ae61d0e Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Thu, 21 Oct 2021 13:21:43 +0900
Subject: [PATCH v9 1/2] doc: Describe calculation method of streaming start
 for pg_receivewal

The documentation was unprecise about the fact that the current WAL
flush location is used if nothing can be found on the local archive
directory describe, independently of the compression used by each
segment (ZLIB or uncompressed).
---
 doc/src/sgml/ref/pg_receivewal.sgml | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/doc/src/sgml/ref/pg_receivewal.sgml b/doc/src/sgml/ref/pg_receivewal.sgml
index 45b544cf49..9adbddb657 100644
--- a/doc/src/sgml/ref/pg_receivewal.sgml
+++ b/doc/src/sgml/ref/pg_receivewal.sgml
@@ -75,6 +75,29 @@ PostgreSQL documentation
    one session available for the stream.
   </para>
 
+  <para>
+   The starting point of the write-ahead log streaming is calculated when
+   <application>pg_receivewal</application> starts:
+   <orderedlist>
+    <listitem>
+     <para>
+      First, scan the directory where the WAL segment files are written and
+      find the newest completed segment, using as starting point the beginning
+      of the next WAL segment file. This is calculated independently of the
+      compression method used to compress each segment.
+     </para>
+    </listitem>
+
+    <listitem>
+     <para>
+      If a starting point cannot be calculated with the previous method,
+      the latest WAL flush location is used as reported by the server from
+      a <literal>IDENTIFY_SYSTEM</literal> command.
+     </para>
+    </listitem>
+   </orderedlist>
+  </para>
+
   <para>
    If the connection is lost, or if it cannot be initially established,
    with a non-fatal error, <application>pg_receivewal</application> will
-- 
2.33.0

v9-0002-Add-READ_REPLICATION_SLOT.patchtext/x-diff; charset=us-asciiDownload
From 3ef08cc14e4365eaec320294275bee9f19c7dbef Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Thu, 21 Oct 2021 14:23:41 +0900
Subject: [PATCH v9 2/2] Add READ_REPLICATION_SLOT

---
 src/include/nodes/nodes.h                   |  1 +
 src/include/nodes/replnodes.h               | 11 +++
 src/backend/replication/repl_gram.y         | 16 +++-
 src/backend/replication/repl_scanner.l      |  1 +
 src/backend/replication/walsender.c         | 93 +++++++++++++++++++++
 src/test/recovery/t/001_stream_rep.pl       | 32 ++++++-
 src/test/recovery/t/006_logical_decoding.pl | 11 ++-
 doc/src/sgml/protocol.sgml                  | 48 +++++++++++
 src/tools/pgindent/typedefs.list            |  1 +
 9 files changed, 211 insertions(+), 3 deletions(-)

diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index e0057daa06..541e9861ba 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -496,6 +496,7 @@ typedef enum NodeTag
 	T_BaseBackupCmd,
 	T_CreateReplicationSlotCmd,
 	T_DropReplicationSlotCmd,
+	T_ReadReplicationSlotCmd,
 	T_StartReplicationCmd,
 	T_TimeLineHistoryCmd,
 	T_SQLCmd,
diff --git a/src/include/nodes/replnodes.h b/src/include/nodes/replnodes.h
index faa3a251f2..a746fafc12 100644
--- a/src/include/nodes/replnodes.h
+++ b/src/include/nodes/replnodes.h
@@ -87,6 +87,17 @@ typedef struct StartReplicationCmd
 } StartReplicationCmd;
 
 
+/* ----------------------
+ *		READ_REPLICATION_SLOT command
+ * ----------------------
+ */
+typedef struct ReadReplicationSlotCmd
+{
+	NodeTag		type;
+	char	   *slotname;
+} ReadReplicationSlotCmd;
+
+
 /* ----------------------
  *		TIMELINE_HISTORY command
  * ----------------------
diff --git a/src/backend/replication/repl_gram.y b/src/backend/replication/repl_gram.y
index 126380e2df..dcb1108579 100644
--- a/src/backend/replication/repl_gram.y
+++ b/src/backend/replication/repl_gram.y
@@ -64,6 +64,7 @@ static SQLCmd *make_sqlcmd(void);
 /* Keyword tokens. */
 %token K_BASE_BACKUP
 %token K_IDENTIFY_SYSTEM
+%token K_READ_REPLICATION_SLOT
 %token K_SHOW
 %token K_START_REPLICATION
 %token K_CREATE_REPLICATION_SLOT
@@ -94,7 +95,7 @@ static SQLCmd *make_sqlcmd(void);
 %type <node>	command
 %type <node>	base_backup start_replication start_logical_replication
 				create_replication_slot drop_replication_slot identify_system
-				timeline_history show sql_cmd
+				read_replication_slot timeline_history show sql_cmd
 %type <list>	base_backup_legacy_opt_list generic_option_list
 %type <defelt>	base_backup_legacy_opt generic_option
 %type <uintval>	opt_timeline
@@ -125,6 +126,7 @@ command:
 			| start_logical_replication
 			| create_replication_slot
 			| drop_replication_slot
+			| read_replication_slot
 			| timeline_history
 			| show
 			| sql_cmd
@@ -140,6 +142,18 @@ identify_system:
 				}
 			;
 
+/*
+ * READ_REPLICATION_SLOT %s
+ */
+read_replication_slot:
+			K_READ_REPLICATION_SLOT var_name
+				{
+					ReadReplicationSlotCmd *n = makeNode(ReadReplicationSlotCmd);
+					n->slotname = $2;
+					$$ = (Node *) n;
+				}
+			;
+
 /*
  * SHOW setting
  */
diff --git a/src/backend/replication/repl_scanner.l b/src/backend/replication/repl_scanner.l
index c038a636c3..1b599c255e 100644
--- a/src/backend/replication/repl_scanner.l
+++ b/src/backend/replication/repl_scanner.l
@@ -85,6 +85,7 @@ identifier		{ident_start}{ident_cont}*
 BASE_BACKUP			{ return K_BASE_BACKUP; }
 FAST			{ return K_FAST; }
 IDENTIFY_SYSTEM		{ return K_IDENTIFY_SYSTEM; }
+READ_REPLICATION_SLOT	{ return K_READ_REPLICATION_SLOT; }
 SHOW		{ return K_SHOW; }
 LABEL			{ return K_LABEL; }
 NOWAIT			{ return K_NOWAIT; }
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index b811a5c0ef..7b3d16f731 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -232,6 +232,7 @@ static void XLogSendLogical(void);
 static void WalSndDone(WalSndSendDataCallback send_data);
 static XLogRecPtr GetStandbyFlushRecPtr(void);
 static void IdentifySystem(void);
+static void ReadReplicationSlot(ReadReplicationSlotCmd *cmd);
 static void CreateReplicationSlot(CreateReplicationSlotCmd *cmd);
 static void DropReplicationSlot(DropReplicationSlotCmd *cmd);
 static void StartReplication(StartReplicationCmd *cmd);
@@ -457,6 +458,91 @@ IdentifySystem(void)
 	end_tup_output(tstate);
 }
 
+/* Handle READ_REPLICATION_SLOT command */
+static void
+ReadReplicationSlot(ReadReplicationSlotCmd *cmd)
+{
+#define READ_REPLICATION_SLOT_COLS 3
+	ReplicationSlot *slot;
+	DestReceiver *dest;
+	TupOutputState *tstate;
+	TupleDesc	tupdesc;
+	Datum		values[READ_REPLICATION_SLOT_COLS];
+	bool		nulls[READ_REPLICATION_SLOT_COLS];
+
+	tupdesc = CreateTemplateTupleDesc(READ_REPLICATION_SLOT_COLS);
+	TupleDescInitBuiltinEntry(tupdesc, (AttrNumber) 1, "slot_type",
+							  TEXTOID, -1, 0);
+	TupleDescInitBuiltinEntry(tupdesc, (AttrNumber) 2, "restart_lsn",
+							  TEXTOID, -1, 0);
+	TupleDescInitBuiltinEntry(tupdesc, (AttrNumber) 3, "restart_tli",
+							  INT4OID, -1, 0);
+
+	MemSet(nulls, true, READ_REPLICATION_SLOT_COLS * sizeof(bool));
+
+	LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
+	slot = SearchNamedReplicationSlot(cmd->slotname, false);
+	if (slot == NULL || !slot->in_use)
+	{
+		LWLockRelease(ReplicationSlotControlLock);
+	}
+	else
+	{
+		ReplicationSlot slot_contents;
+		int			i = 0;
+
+		/* Copy slot contents while holding spinlock */
+		SpinLockAcquire(&slot->mutex);
+		slot_contents = *slot;
+		SpinLockRelease(&slot->mutex);
+		LWLockRelease(ReplicationSlotControlLock);
+
+		if (OidIsValid(slot_contents.data.database))
+			ereport(ERROR,
+					errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					errmsg("cannot use \"%s\" with logical replication slots",
+						   "READ_REPLICATION_SLOT"));
+
+		/* slot type */
+		values[i] = CStringGetTextDatum("physical");
+		nulls[i] = false;
+		i++;
+
+		/* start LSN */
+		if (!XLogRecPtrIsInvalid(slot_contents.data.restart_lsn))
+		{
+			char		xloc[MAXFNAMELEN];
+
+			snprintf(xloc, sizeof(xloc), "%X/%X",
+					 LSN_FORMAT_ARGS(slot_contents.data.restart_lsn));
+			values[i] = CStringGetTextDatum(xloc);
+			nulls[i] = false;
+		}
+		i++;
+
+		/* timeline this WAL was produced on */
+		if (!XLogRecPtrIsInvalid(slot_contents.data.restart_lsn))
+		{
+			TimeLineID	slots_position_timeline;
+			List	   *timeline_history = NIL;
+
+			timeline_history = readTimeLineHistory(ThisTimeLineID);
+			slots_position_timeline = tliOfPointInHistory(slot_contents.data.restart_lsn,
+														  timeline_history);
+			values[i] = Int32GetDatum(slots_position_timeline);
+			nulls[i] = false;
+		}
+		i++;
+
+		Assert(i == READ_REPLICATION_SLOT_COLS);
+	}
+
+	dest = CreateDestReceiver(DestRemoteSimple);
+	tstate = begin_tup_output_tupdesc(dest, tupdesc, &TTSOpsVirtual);
+	do_tup_output(tstate, values, nulls);
+	end_tup_output(tstate);
+}
+
 
 /*
  * Handle TIMELINE_HISTORY command.
@@ -1622,6 +1708,13 @@ exec_replication_command(const char *cmd_string)
 			EndReplicationCommand(cmdtag);
 			break;
 
+		case T_ReadReplicationSlotCmd:
+			cmdtag = "READ_REPLICATION_SLOT";
+			set_ps_display(cmdtag);
+			ReadReplicationSlot((ReadReplicationSlotCmd *) cmd_node);
+			EndReplicationCommand(cmdtag);
+			break;
+
 		case T_BaseBackupCmd:
 			cmdtag = "BASE_BACKUP";
 			set_ps_display(cmdtag);
diff --git a/src/test/recovery/t/001_stream_rep.pl b/src/test/recovery/t/001_stream_rep.pl
index 9916a36012..8fb7487857 100644
--- a/src/test/recovery/t/001_stream_rep.pl
+++ b/src/test/recovery/t/001_stream_rep.pl
@@ -6,7 +6,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 49;
+use Test::More tests => 53;
 
 # Initialize primary node
 my $node_primary = PostgresNode->new('primary');
@@ -254,6 +254,36 @@ ok( $ret == 0,
 	"SHOW with superuser-settable parameter, replication role and logical replication"
 );
 
+note "testing READ_REPLICATION_SLOT command for replication connection";
+
+my $slotname = 'test_read_replication_slot_physical';
+
+($ret, $stdout, $stderr) = $node_primary->psql(
+	'postgres',
+	'READ_REPLICATION_SLOT non_existent_slot;',
+	extra_params => [ '-d', $connstr_rep ]);
+ok($ret == 0, "READ_REPLICATION_SLOT exit code 0 on success");
+like($stdout, qr/^||$/,
+	"READ_REPLICATION_SLOT returns NULL values if slot does not exist");
+
+$node_primary->psql(
+	'postgres',
+	"CREATE_REPLICATION_SLOT $slotname PHYSICAL RESERVE_WAL;",
+	extra_params => [ '-d', $connstr_rep ]);
+
+($ret, $stdout, $stderr) = $node_primary->psql(
+	'postgres',
+	"READ_REPLICATION_SLOT $slotname;",
+	extra_params => [ '-d', $connstr_rep ]);
+ok($ret == 0, "READ_REPLICATION_SLOT success with existing slot");
+like($stdout, qr/^physical\|[^|]*\|1$/,
+	"READ_REPLICATION_SLOT returns tuple with slot information");
+
+$node_primary->psql(
+	'postgres',
+	"DROP_REPLICATION_SLOT $slotname;",
+	extra_params => [ '-d', $connstr_rep ]);
+
 note "switching to physical replication slot";
 
 # Switch to using a physical replication slot. We can do this without a new
diff --git a/src/test/recovery/t/006_logical_decoding.pl b/src/test/recovery/t/006_logical_decoding.pl
index cc116062c2..1b74f38f10 100644
--- a/src/test/recovery/t/006_logical_decoding.pl
+++ b/src/test/recovery/t/006_logical_decoding.pl
@@ -10,7 +10,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 14;
+use Test::More tests => 15;
 use Config;
 
 # Initialize primary node
@@ -39,6 +39,15 @@ ok( $stderr =~
 	  m/replication slot "test_slot" was not created in this database/,
 	"Logical decoding correctly fails to start");
 
+($result, $stdout, $stderr) = $node_primary->psql(
+	'template1',
+	qq[READ_REPLICATION_SLOT test_slot;],
+	replication => 'database');
+like(
+	$stderr,
+	qr/cannot use "READ_REPLICATION_SLOT" with logical replication slots/,
+	'READ_REPLICATION_SLOT not supported for logical slots');
+
 # Check case of walsender not using a database connection.  Logical
 # decoding should not be allowed.
 ($result, $stdout, $stderr) = $node_primary->psql(
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index b95cc88599..37c26ec8ae 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2067,6 +2067,54 @@ The commands accepted in replication mode are:
     </listitem>
   </varlistentry>
 
+  <varlistentry>
+    <term><literal>READ_REPLICATION_SLOT</literal> <replaceable class="parameter">slot_name</replaceable>
+      <indexterm><primary>READ_REPLICATION_SLOT</primary></indexterm>
+    </term>
+    <listitem>
+     <para>
+      Read some information associated to a replication slot. Returns a tuple
+      with <literal>NULL</literal> values if the replication slot does not
+      exist. This command is currently only supported for physical replication
+      slots.
+     </para>
+     <para>
+      In response to this command, the server will return a one-row result set,
+      containing the following fields:
+      <variablelist>
+       <varlistentry>
+        <term><literal>slot_type</literal> (<type>text</type>)</term>
+        <listitem>
+         <para>
+          The replication slot's type, either <literal>physical</literal> or
+          <literal>NULL</literal>.
+         </para>
+        </listitem>
+       </varlistentry>
+
+       <varlistentry>
+        <term><literal>restart_lsn</literal> (<type>text</type>)</term>
+        <listitem>
+         <para>
+          The replication slot's <literal>restart_lsn</literal>.
+         </para>
+        </listitem>
+       </varlistentry>
+
+       <varlistentry>
+        <term><literal>restart_tli</literal> (<type>int4</type>)</term>
+        <listitem>
+         <para>
+          The timeline ID associated to <literal>restart_lsn</literal>,
+          following the current timeline history.
+         </para>
+        </listitem>
+       </varlistentry>
+      </variablelist>
+     </para>
+    </listitem>
+  </varlistentry>
+
   <varlistentry>
     <term><literal>START_REPLICATION</literal> [ <literal>SLOT</literal> <replaceable class="parameter">slot_name</replaceable> ] [ <literal>PHYSICAL</literal> ] <replaceable class="parameter">XXX/XXX</replaceable> [ <literal>TIMELINE</literal> <replaceable class="parameter">tli</replaceable> ]
      <indexterm><primary>START_REPLICATION</primary></indexterm>
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index cb5b5ec74c..a8e4a2afd6 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2128,6 +2128,7 @@ ReadBufferMode
 ReadBytePtrType
 ReadExtraTocPtrType
 ReadFunc
+ReadReplicationSlotCmd
 ReassignOwnedStmt
 RecheckForeignScan_function
 RecordCacheEntry
-- 
2.33.0

#35Ronan Dunklau
ronan.dunklau@aiven.io
In reply to: Michael Paquier (#34)
Re: pg_receivewal starting position

Le jeudi 21 octobre 2021, 07:35:08 CEST Michael Paquier a écrit :

On Wed, Oct 20, 2021 at 02:58:26PM +0200, Ronan Dunklau wrote:

After sending the previous patch suite, I figured it would be worthwhile
to
also have tests covering timeline switches, which was not covered before.

That seems independent to me. I'll take a look.

So please find attached a new version with an additional patch for those
tests, covering both "resume from last know archive" and "resume from
the replication slots position" cases.

So, taking things in order, I have looked at 0003 and 0001, and
attached are refined versions for both of them.

0003 is an existing hole in the docs, which I think we had better
address first and backpatch, taking into account that the starting
point calculation considers compressed segments when looking for
completed segments.

Ok, do you want me to propose a different patch for previous versions ?

Regarding 0001, I have found the last test to check for NULL values
returned by READ_REPLICATION_SLOT after dropping the slot overlaps
with the first test, so I have removed that. I have expanded a bit
the use of like(), and there were some confusion with
PostgresNode::psql and some extra arguments (see DROP_REPLICATION_SLOT
and CREATE_REPLICATION_SLOT, and no need for return values in the
CREATE case either). Some comments, docs and code have been slightly
tweaked.

Thank you for this.

Here are some comments about 0002.

+ /* The commpand should always return precisely one tuple */
s/commpand/command/

+ pg_log_error("could not fetch replication slot: got %d rows and %d
fields, expected %d rows and %d or more fields", +
PQntuples(res), PQnfields(res), 1, 3);
Should this be "could not read" instead?

+       if (sscanf(PQgetvalue(res, 0, 1), "%X/%X", &hi, &lo) != 2)
+       {
+           pg_log_error("could not parse slot's restart_lsn \"%s\"",
+                        PQgetvalue(res, 0, 1));
+           PQclear(res);
+           return false;
+       }
Wouldn't it be saner to initialize *restart_lsn and *restart_tli to
some default values at the top of GetSlotInformation() instead, if
they are specified by the caller?  

Ok.

And I think that we should still
complain even if restart_lsn is NULL.

Do you mean restart_lsn as the pointer argument to the function, or
restart_lsn as the field returned by the command ? If it's the first, I'll
change it but if it's the latter it is expected that we sometime run this on a
slot where WAL has never been reserved yet.

On a quick read of 0004, I find the split of the logic with
change_timeline() a bit hard to understand. It looks like we should
be able to make a cleaner split, but I am not sure how that would
look, though.

Thanks, at least if the proposal to test this seems sensible I can move
forward. I wanted to avoid having a lot of code duplication since the test
setup is a bit more complicated.
My first approach was to split it into two functions, setup_standby and
change_timeline, but then realized that what would happen between the two
invocations would basically be the same for the two test cases, so I ended up
with that patch. I'll try to see if I can see a better way of organizing that
code.

--
Ronan Dunklau

#36Michael Paquier
michael@paquier.xyz
In reply to: Ronan Dunklau (#35)
Re: pg_receivewal starting position

On Thu, Oct 21, 2021 at 08:29:54AM +0200, Ronan Dunklau wrote:

Ok, do you want me to propose a different patch for previous versions ?

That's not necessary. Thanks for proposing.

Do you mean restart_lsn as the pointer argument to the function, or
restart_lsn as the field returned by the command ? If it's the first, I'll
change it but if it's the latter it is expected that we sometime run this on a
slot where WAL has never been reserved yet.

restart_lsn as the pointer of the function.
--
Michael

#37Ronan Dunklau
ronan.dunklau@aiven.io
In reply to: Michael Paquier (#36)
4 attachment(s)
Re: pg_receivewal starting position

Le jeudi 21 octobre 2021, 09:21:44 CEST Michael Paquier a écrit :

On Thu, Oct 21, 2021 at 08:29:54AM +0200, Ronan Dunklau wrote:

Ok, do you want me to propose a different patch for previous versions ?

That's not necessary. Thanks for proposing.

Do you mean restart_lsn as the pointer argument to the function, or
restart_lsn as the field returned by the command ? If it's the first, I'll
change it but if it's the latter it is expected that we sometime run this
on a slot where WAL has never been reserved yet.

restart_lsn as the pointer of the function.

Done. I haven't touched the timeline switch test patch for now, but I still
include it here for completeness.

--
Ronan Dunklau

Attachments:

v10-0001-doc-Describe-calculation-method-of-streaming-sta.patchtext/x-patch; charset=UTF-8; name=v10-0001-doc-Describe-calculation-method-of-streaming-sta.patchDownload
From b3703868d230f773378742a0deb4360f6cc7343a Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Thu, 21 Oct 2021 13:21:43 +0900
Subject: [PATCH v10 1/4] doc: Describe calculation method of streaming start
 for pg_receivewal

The documentation was unprecise about the fact that the current WAL
flush location is used if nothing can be found on the local archive
directory describe, independently of the compression used by each
segment (ZLIB or uncompressed).
---
 doc/src/sgml/ref/pg_receivewal.sgml | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/doc/src/sgml/ref/pg_receivewal.sgml b/doc/src/sgml/ref/pg_receivewal.sgml
index 45b544cf49..9adbddb657 100644
--- a/doc/src/sgml/ref/pg_receivewal.sgml
+++ b/doc/src/sgml/ref/pg_receivewal.sgml
@@ -75,6 +75,29 @@ PostgreSQL documentation
    one session available for the stream.
   </para>
 
+  <para>
+   The starting point of the write-ahead log streaming is calculated when
+   <application>pg_receivewal</application> starts:
+   <orderedlist>
+    <listitem>
+     <para>
+      First, scan the directory where the WAL segment files are written and
+      find the newest completed segment, using as starting point the beginning
+      of the next WAL segment file. This is calculated independently of the
+      compression method used to compress each segment.
+     </para>
+    </listitem>
+
+    <listitem>
+     <para>
+      If a starting point cannot be calculated with the previous method,
+      the latest WAL flush location is used as reported by the server from
+      a <literal>IDENTIFY_SYSTEM</literal> command.
+     </para>
+    </listitem>
+   </orderedlist>
+  </para>
+
   <para>
    If the connection is lost, or if it cannot be initially established,
    with a non-fatal error, <application>pg_receivewal</application> will
-- 
2.33.0

v10-0002-Add-READ_REPLICATION_SLOT.patchtext/x-patch; charset=UTF-8; name=v10-0002-Add-READ_REPLICATION_SLOT.patchDownload
From 067353a88803e96b295f425167c1195ccf984352 Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Thu, 21 Oct 2021 14:23:41 +0900
Subject: [PATCH v10 2/4] Add READ_REPLICATION_SLOT

---
 doc/src/sgml/protocol.sgml                  | 48 +++++++++++
 src/backend/replication/repl_gram.y         | 16 +++-
 src/backend/replication/repl_scanner.l      |  1 +
 src/backend/replication/walsender.c         | 93 +++++++++++++++++++++
 src/include/nodes/nodes.h                   |  1 +
 src/include/nodes/replnodes.h               | 11 +++
 src/test/recovery/t/001_stream_rep.pl       | 32 ++++++-
 src/test/recovery/t/006_logical_decoding.pl | 11 ++-
 src/tools/pgindent/typedefs.list            |  1 +
 9 files changed, 211 insertions(+), 3 deletions(-)

diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index b95cc88599..37c26ec8ae 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2067,6 +2067,54 @@ The commands accepted in replication mode are:
     </listitem>
   </varlistentry>
 
+  <varlistentry>
+    <term><literal>READ_REPLICATION_SLOT</literal> <replaceable class="parameter">slot_name</replaceable>
+      <indexterm><primary>READ_REPLICATION_SLOT</primary></indexterm>
+    </term>
+    <listitem>
+     <para>
+      Read some information associated to a replication slot. Returns a tuple
+      with <literal>NULL</literal> values if the replication slot does not
+      exist. This command is currently only supported for physical replication
+      slots.
+     </para>
+     <para>
+      In response to this command, the server will return a one-row result set,
+      containing the following fields:
+      <variablelist>
+       <varlistentry>
+        <term><literal>slot_type</literal> (<type>text</type>)</term>
+        <listitem>
+         <para>
+          The replication slot's type, either <literal>physical</literal> or
+          <literal>NULL</literal>.
+         </para>
+        </listitem>
+       </varlistentry>
+
+       <varlistentry>
+        <term><literal>restart_lsn</literal> (<type>text</type>)</term>
+        <listitem>
+         <para>
+          The replication slot's <literal>restart_lsn</literal>.
+         </para>
+        </listitem>
+       </varlistentry>
+
+       <varlistentry>
+        <term><literal>restart_tli</literal> (<type>int4</type>)</term>
+        <listitem>
+         <para>
+          The timeline ID associated to <literal>restart_lsn</literal>,
+          following the current timeline history.
+         </para>
+        </listitem>
+       </varlistentry>
+      </variablelist>
+     </para>
+    </listitem>
+  </varlistentry>
+
   <varlistentry>
     <term><literal>START_REPLICATION</literal> [ <literal>SLOT</literal> <replaceable class="parameter">slot_name</replaceable> ] [ <literal>PHYSICAL</literal> ] <replaceable class="parameter">XXX/XXX</replaceable> [ <literal>TIMELINE</literal> <replaceable class="parameter">tli</replaceable> ]
      <indexterm><primary>START_REPLICATION</primary></indexterm>
diff --git a/src/backend/replication/repl_gram.y b/src/backend/replication/repl_gram.y
index 126380e2df..dcb1108579 100644
--- a/src/backend/replication/repl_gram.y
+++ b/src/backend/replication/repl_gram.y
@@ -64,6 +64,7 @@ static SQLCmd *make_sqlcmd(void);
 /* Keyword tokens. */
 %token K_BASE_BACKUP
 %token K_IDENTIFY_SYSTEM
+%token K_READ_REPLICATION_SLOT
 %token K_SHOW
 %token K_START_REPLICATION
 %token K_CREATE_REPLICATION_SLOT
@@ -94,7 +95,7 @@ static SQLCmd *make_sqlcmd(void);
 %type <node>	command
 %type <node>	base_backup start_replication start_logical_replication
 				create_replication_slot drop_replication_slot identify_system
-				timeline_history show sql_cmd
+				read_replication_slot timeline_history show sql_cmd
 %type <list>	base_backup_legacy_opt_list generic_option_list
 %type <defelt>	base_backup_legacy_opt generic_option
 %type <uintval>	opt_timeline
@@ -125,6 +126,7 @@ command:
 			| start_logical_replication
 			| create_replication_slot
 			| drop_replication_slot
+			| read_replication_slot
 			| timeline_history
 			| show
 			| sql_cmd
@@ -140,6 +142,18 @@ identify_system:
 				}
 			;
 
+/*
+ * READ_REPLICATION_SLOT %s
+ */
+read_replication_slot:
+			K_READ_REPLICATION_SLOT var_name
+				{
+					ReadReplicationSlotCmd *n = makeNode(ReadReplicationSlotCmd);
+					n->slotname = $2;
+					$$ = (Node *) n;
+				}
+			;
+
 /*
  * SHOW setting
  */
diff --git a/src/backend/replication/repl_scanner.l b/src/backend/replication/repl_scanner.l
index c038a636c3..1b599c255e 100644
--- a/src/backend/replication/repl_scanner.l
+++ b/src/backend/replication/repl_scanner.l
@@ -85,6 +85,7 @@ identifier		{ident_start}{ident_cont}*
 BASE_BACKUP			{ return K_BASE_BACKUP; }
 FAST			{ return K_FAST; }
 IDENTIFY_SYSTEM		{ return K_IDENTIFY_SYSTEM; }
+READ_REPLICATION_SLOT	{ return K_READ_REPLICATION_SLOT; }
 SHOW		{ return K_SHOW; }
 LABEL			{ return K_LABEL; }
 NOWAIT			{ return K_NOWAIT; }
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index b811a5c0ef..7b3d16f731 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -232,6 +232,7 @@ static void XLogSendLogical(void);
 static void WalSndDone(WalSndSendDataCallback send_data);
 static XLogRecPtr GetStandbyFlushRecPtr(void);
 static void IdentifySystem(void);
+static void ReadReplicationSlot(ReadReplicationSlotCmd *cmd);
 static void CreateReplicationSlot(CreateReplicationSlotCmd *cmd);
 static void DropReplicationSlot(DropReplicationSlotCmd *cmd);
 static void StartReplication(StartReplicationCmd *cmd);
@@ -457,6 +458,91 @@ IdentifySystem(void)
 	end_tup_output(tstate);
 }
 
+/* Handle READ_REPLICATION_SLOT command */
+static void
+ReadReplicationSlot(ReadReplicationSlotCmd *cmd)
+{
+#define READ_REPLICATION_SLOT_COLS 3
+	ReplicationSlot *slot;
+	DestReceiver *dest;
+	TupOutputState *tstate;
+	TupleDesc	tupdesc;
+	Datum		values[READ_REPLICATION_SLOT_COLS];
+	bool		nulls[READ_REPLICATION_SLOT_COLS];
+
+	tupdesc = CreateTemplateTupleDesc(READ_REPLICATION_SLOT_COLS);
+	TupleDescInitBuiltinEntry(tupdesc, (AttrNumber) 1, "slot_type",
+							  TEXTOID, -1, 0);
+	TupleDescInitBuiltinEntry(tupdesc, (AttrNumber) 2, "restart_lsn",
+							  TEXTOID, -1, 0);
+	TupleDescInitBuiltinEntry(tupdesc, (AttrNumber) 3, "restart_tli",
+							  INT4OID, -1, 0);
+
+	MemSet(nulls, true, READ_REPLICATION_SLOT_COLS * sizeof(bool));
+
+	LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
+	slot = SearchNamedReplicationSlot(cmd->slotname, false);
+	if (slot == NULL || !slot->in_use)
+	{
+		LWLockRelease(ReplicationSlotControlLock);
+	}
+	else
+	{
+		ReplicationSlot slot_contents;
+		int			i = 0;
+
+		/* Copy slot contents while holding spinlock */
+		SpinLockAcquire(&slot->mutex);
+		slot_contents = *slot;
+		SpinLockRelease(&slot->mutex);
+		LWLockRelease(ReplicationSlotControlLock);
+
+		if (OidIsValid(slot_contents.data.database))
+			ereport(ERROR,
+					errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					errmsg("cannot use \"%s\" with logical replication slots",
+						   "READ_REPLICATION_SLOT"));
+
+		/* slot type */
+		values[i] = CStringGetTextDatum("physical");
+		nulls[i] = false;
+		i++;
+
+		/* start LSN */
+		if (!XLogRecPtrIsInvalid(slot_contents.data.restart_lsn))
+		{
+			char		xloc[MAXFNAMELEN];
+
+			snprintf(xloc, sizeof(xloc), "%X/%X",
+					 LSN_FORMAT_ARGS(slot_contents.data.restart_lsn));
+			values[i] = CStringGetTextDatum(xloc);
+			nulls[i] = false;
+		}
+		i++;
+
+		/* timeline this WAL was produced on */
+		if (!XLogRecPtrIsInvalid(slot_contents.data.restart_lsn))
+		{
+			TimeLineID	slots_position_timeline;
+			List	   *timeline_history = NIL;
+
+			timeline_history = readTimeLineHistory(ThisTimeLineID);
+			slots_position_timeline = tliOfPointInHistory(slot_contents.data.restart_lsn,
+														  timeline_history);
+			values[i] = Int32GetDatum(slots_position_timeline);
+			nulls[i] = false;
+		}
+		i++;
+
+		Assert(i == READ_REPLICATION_SLOT_COLS);
+	}
+
+	dest = CreateDestReceiver(DestRemoteSimple);
+	tstate = begin_tup_output_tupdesc(dest, tupdesc, &TTSOpsVirtual);
+	do_tup_output(tstate, values, nulls);
+	end_tup_output(tstate);
+}
+
 
 /*
  * Handle TIMELINE_HISTORY command.
@@ -1622,6 +1708,13 @@ exec_replication_command(const char *cmd_string)
 			EndReplicationCommand(cmdtag);
 			break;
 
+		case T_ReadReplicationSlotCmd:
+			cmdtag = "READ_REPLICATION_SLOT";
+			set_ps_display(cmdtag);
+			ReadReplicationSlot((ReadReplicationSlotCmd *) cmd_node);
+			EndReplicationCommand(cmdtag);
+			break;
+
 		case T_BaseBackupCmd:
 			cmdtag = "BASE_BACKUP";
 			set_ps_display(cmdtag);
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index e0057daa06..541e9861ba 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -496,6 +496,7 @@ typedef enum NodeTag
 	T_BaseBackupCmd,
 	T_CreateReplicationSlotCmd,
 	T_DropReplicationSlotCmd,
+	T_ReadReplicationSlotCmd,
 	T_StartReplicationCmd,
 	T_TimeLineHistoryCmd,
 	T_SQLCmd,
diff --git a/src/include/nodes/replnodes.h b/src/include/nodes/replnodes.h
index faa3a251f2..a746fafc12 100644
--- a/src/include/nodes/replnodes.h
+++ b/src/include/nodes/replnodes.h
@@ -87,6 +87,17 @@ typedef struct StartReplicationCmd
 } StartReplicationCmd;
 
 
+/* ----------------------
+ *		READ_REPLICATION_SLOT command
+ * ----------------------
+ */
+typedef struct ReadReplicationSlotCmd
+{
+	NodeTag		type;
+	char	   *slotname;
+} ReadReplicationSlotCmd;
+
+
 /* ----------------------
  *		TIMELINE_HISTORY command
  * ----------------------
diff --git a/src/test/recovery/t/001_stream_rep.pl b/src/test/recovery/t/001_stream_rep.pl
index 9916a36012..8fb7487857 100644
--- a/src/test/recovery/t/001_stream_rep.pl
+++ b/src/test/recovery/t/001_stream_rep.pl
@@ -6,7 +6,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 49;
+use Test::More tests => 53;
 
 # Initialize primary node
 my $node_primary = PostgresNode->new('primary');
@@ -254,6 +254,36 @@ ok( $ret == 0,
 	"SHOW with superuser-settable parameter, replication role and logical replication"
 );
 
+note "testing READ_REPLICATION_SLOT command for replication connection";
+
+my $slotname = 'test_read_replication_slot_physical';
+
+($ret, $stdout, $stderr) = $node_primary->psql(
+	'postgres',
+	'READ_REPLICATION_SLOT non_existent_slot;',
+	extra_params => [ '-d', $connstr_rep ]);
+ok($ret == 0, "READ_REPLICATION_SLOT exit code 0 on success");
+like($stdout, qr/^||$/,
+	"READ_REPLICATION_SLOT returns NULL values if slot does not exist");
+
+$node_primary->psql(
+	'postgres',
+	"CREATE_REPLICATION_SLOT $slotname PHYSICAL RESERVE_WAL;",
+	extra_params => [ '-d', $connstr_rep ]);
+
+($ret, $stdout, $stderr) = $node_primary->psql(
+	'postgres',
+	"READ_REPLICATION_SLOT $slotname;",
+	extra_params => [ '-d', $connstr_rep ]);
+ok($ret == 0, "READ_REPLICATION_SLOT success with existing slot");
+like($stdout, qr/^physical\|[^|]*\|1$/,
+	"READ_REPLICATION_SLOT returns tuple with slot information");
+
+$node_primary->psql(
+	'postgres',
+	"DROP_REPLICATION_SLOT $slotname;",
+	extra_params => [ '-d', $connstr_rep ]);
+
 note "switching to physical replication slot";
 
 # Switch to using a physical replication slot. We can do this without a new
diff --git a/src/test/recovery/t/006_logical_decoding.pl b/src/test/recovery/t/006_logical_decoding.pl
index cc116062c2..1b74f38f10 100644
--- a/src/test/recovery/t/006_logical_decoding.pl
+++ b/src/test/recovery/t/006_logical_decoding.pl
@@ -10,7 +10,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 14;
+use Test::More tests => 15;
 use Config;
 
 # Initialize primary node
@@ -39,6 +39,15 @@ ok( $stderr =~
 	  m/replication slot "test_slot" was not created in this database/,
 	"Logical decoding correctly fails to start");
 
+($result, $stdout, $stderr) = $node_primary->psql(
+	'template1',
+	qq[READ_REPLICATION_SLOT test_slot;],
+	replication => 'database');
+like(
+	$stderr,
+	qr/cannot use "READ_REPLICATION_SLOT" with logical replication slots/,
+	'READ_REPLICATION_SLOT not supported for logical slots');
+
 # Check case of walsender not using a database connection.  Logical
 # decoding should not be allowed.
 ($result, $stdout, $stderr) = $node_primary->psql(
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index cb5b5ec74c..a8e4a2afd6 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2128,6 +2128,7 @@ ReadBufferMode
 ReadBytePtrType
 ReadExtraTocPtrType
 ReadFunc
+ReadReplicationSlotCmd
 ReassignOwnedStmt
 RecheckForeignScan_function
 RecordCacheEntry
-- 
2.33.0

v10-0004-Add-tests-making-sure-pg_receivewal-can-follow-a.patchtext/x-patch; charset=UTF-8; name=v10-0004-Add-tests-making-sure-pg_receivewal-can-follow-a.patchDownload
From f2aaf094a33c3b62bae88ad95816377186c91fa6 Mon Sep 17 00:00:00 2001
From: Ronan Dunklau <ronan.dunklau@aiven.io>
Date: Wed, 20 Oct 2021 14:52:13 +0200
Subject: [PATCH v10 4/4] Add tests making sure pg_receivewal can follow a
 timeline switch.

---
 src/bin/pg_basebackup/t/020_pg_receivewal.pl | 61 +++++++++++++++++++-
 1 file changed, 60 insertions(+), 1 deletion(-)

diff --git a/src/bin/pg_basebackup/t/020_pg_receivewal.pl b/src/bin/pg_basebackup/t/020_pg_receivewal.pl
index 42749c3a40..a5e19c5616 100644
--- a/src/bin/pg_basebackup/t/020_pg_receivewal.pl
+++ b/src/bin/pg_basebackup/t/020_pg_receivewal.pl
@@ -5,7 +5,7 @@ use strict;
 use warnings;
 use TestLib;
 use PostgresNode;
-use Test::More tests => 32;
+use Test::More tests => 38;
 
 program_help_ok('pg_receivewal');
 program_version_ok('pg_receivewal');
@@ -185,6 +185,65 @@ $primary->command_ok(
 $slot = $primary->slot($slot_name);
 ok(-e "$stream_dir/$walfile_to_be_archived", "WAL from the slot's restart_lsn has been archived");
 
+# Check what happens on a timeline switch
+sub change_timeline{
+  # Setup a new standby, start pg_receivewal once, then promote it.
+  my ($primary, $backup_name, $standby_name, $slot_name, $stream_dir) = @_;
+  my $standby = PostgresNode->new($standby_name);
+  $standby->init_from_backup($primary, $backup_name, has_streaming => 1);
+  $standby->start;
+  $primary->wait_for_catchup($standby, 'replay', $primary->lsn('write'));
+  $standby->psql('', "CREATE_REPLICATION_SLOT $slot_name PHYSICAL (RESERVE_WAL)",
+                 replication => 1);
+  $primary->psql('postgres',
+	'INSERT INTO test_table VALUES (generate_series(1,100));');
+  # Get a position which can act as an end position.
+  my $nextlsn = $primary->safe_psql('postgres', 'SELECT pg_current_wal_insert_lsn();');
+  $primary->safe_psql('postgres', 'SELECT pg_switch_wal();');
+  # Stream some wal, this will populate the stream directory for the
+  # resume-from-archive test case.
+  $standby->run_log(
+    [ 'pg_receivewal', '-D', $stream_dir, '--verbose', '--endpos', $nextlsn, "--slot", $slot_name],
+    "Stream some wal before promoting");
+  $standby->psql(
+	'postgres',
+	"SELECT pg_promote(wait_seconds => 300)");
+  $standby->psql('postgres',
+	'INSERT INTO test_table VALUES (generate_series(1,100));');
+  $standby->safe_psql('postgres', 'SELECT pg_switch_wal();');
+  # Get a full walfilename that will be generated.
+  my $walfilename = $standby->safe_psql('postgres', "SELECT pg_walfile_name(pg_current_wal_insert_lsn())");
+  $standby->psql('postgres',
+	'INSERT INTO test_table VALUES (generate_series(1,100));');
+  $standby->safe_psql('postgres', 'SELECT pg_switch_wal();');
+  $nextlsn = $standby->safe_psql('postgres', 'SELECT pg_current_wal_insert_lsn();');
+  return $standby, $nextlsn, $walfilename;
+}
+
+my $backup_name = 'my_backup';
+$primary->backup($backup_name);
+my $standby;
+unlink glob "'${stream_dir}/*'";
+# First test: change timeline, and check we can resume from the slot.
+($standby, $nextlsn, $walfile_to_be_archived) = change_timeline($primary, 'my_backup', 'standby1', $slot_name, $stream_dir);
+# To resume from the slot, the target directory must be empty
+unlink glob "'${stream_dir}/*'";
+$standby->command_ok(
+	[ 'pg_receivewal', '-D', $stream_dir, '--verbose', '--endpos', $nextlsn, "--slot", $slot_name],
+	"Stream some wal after promoting, resuming from the slot's position");
+ok(-e "$stream_dir/$walfile_to_be_archived", "WAL from the new timeline has been archived");
+ok(-e "$stream_dir/00000002.history", "Timeline history file has been archived");
+
+# Now do the same, but resuming from the latest archived file.
+# Cleanup leftover from the first standby
+unlink glob "'${stream_dir}/*'";
+($standby, $nextlsn, $walfile_to_be_archived) = change_timeline($primary, 'my_backup', 'standby2', $slot_name, $stream_dir);
+$standby->command_ok(
+	[ 'pg_receivewal', '-D', $stream_dir, '--verbose', '--endpos', $nextlsn],
+	"Stream some wal after promoting, resuming from the latest archived file");
+ok(-e "$stream_dir/$walfile_to_be_archived", "WAL from the new timeline has been archived");
+ok(-e "$stream_dir/00000002.history", "Timeline history file has been archived");
+
 # Permissions on WAL files should be default
 SKIP:
 {
-- 
2.33.0

v10-0003-Use-READ_REPLICATION_SLOT-command-in-pg_receivew.patchtext/x-patch; charset=UTF-8; name=v10-0003-Use-READ_REPLICATION_SLOT-command-in-pg_receivew.patchDownload
From a455eb220e34a8fea43056cd14bbda5ea9ea1f27 Mon Sep 17 00:00:00 2001
From: Ronan Dunklau <ronan.dunklau@aiven.io>
Date: Wed, 1 Sep 2021 15:52:32 +0200
Subject: [PATCH v10 3/4] Use READ_REPLICATION_SLOT command in pg_receivewal.

Prior to this patch, when running pg_receivewal, the start LSN is determined by looking at
the WAL files currently stored on disk, then using the current flush lsn
from the server.

If for some reason the WAL files from pg_receivewal were moved, we want
to restart where we left at, which is the replication slot's restart_lsn
instead of skipping right to the current flush location.

To keep compatibility with prior server versions, we only attempt it if
the version is < 15.
---
 doc/src/sgml/ref/pg_receivewal.sgml          |  8 ++
 src/bin/pg_basebackup/pg_receivewal.c        | 20 ++++-
 src/bin/pg_basebackup/streamutil.c           | 84 ++++++++++++++++++++
 src/bin/pg_basebackup/streamutil.h           |  1 +
 src/bin/pg_basebackup/t/020_pg_receivewal.pl | 43 +++++++++-
 5 files changed, 152 insertions(+), 4 deletions(-)

diff --git a/doc/src/sgml/ref/pg_receivewal.sgml b/doc/src/sgml/ref/pg_receivewal.sgml
index 9adbddb657..667efb3cb3 100644
--- a/doc/src/sgml/ref/pg_receivewal.sgml
+++ b/doc/src/sgml/ref/pg_receivewal.sgml
@@ -88,6 +88,14 @@ PostgreSQL documentation
      </para>
     </listitem>
 
+    <listitem>
+      <para>
+      If the previous method couldn't calculate a starting point, and a
+      replication slot is used, the <command>READ_REPLICATION_SLOT</command>
+      command is issued to retrieve the slot's <literal>restart_lsn</literal>.
+      </para>
+    </listitem>
+
     <listitem>
      <para>
       If a starting point cannot be calculated with the previous method,
diff --git a/src/bin/pg_basebackup/pg_receivewal.c b/src/bin/pg_basebackup/pg_receivewal.c
index d5140a79fe..1128389839 100644
--- a/src/bin/pg_basebackup/pg_receivewal.c
+++ b/src/bin/pg_basebackup/pg_receivewal.c
@@ -409,10 +409,26 @@ StreamLog(void)
 	stream.startpos = FindStreamingStart(&stream.timeline);
 	if (stream.startpos == InvalidXLogRecPtr)
 	{
-		stream.startpos = serverpos;
-		stream.timeline = servertli;
+		/* Try to get it from the slot if any, and the server supports it */
+		if (replication_slot != NULL)
+		{
+			/* Error is logged by GetSlotInformation, so just return. */
+			if (!GetSlotInformation(conn, replication_slot, &stream.startpos, &stream.timeline))
+				return;
+		}
+
+		/*
+		 * If it is still unknown, use the current flush value from the server
+		 */
+		if (stream.startpos == InvalidXLogRecPtr)
+		{
+			stream.startpos = serverpos;
+			stream.timeline = servertli;
+		}
 	}
 
+	Assert(stream.startpos != InvalidXLogRecPtr);
+
 	/*
 	 * Always start streaming at the beginning of a segment
 	 */
diff --git a/src/bin/pg_basebackup/streamutil.c b/src/bin/pg_basebackup/streamutil.c
index a9bc1ce214..f437b65529 100644
--- a/src/bin/pg_basebackup/streamutil.c
+++ b/src/bin/pg_basebackup/streamutil.c
@@ -479,6 +479,90 @@ RunIdentifySystem(PGconn *conn, char **sysid, TimeLineID *starttli,
 	return true;
 }
 
+
+/*
+ * Returns wether a replication slot exists through a given connection,
+ * and fills in the slot_info with the results if passed by the caller.
+ */
+bool
+GetSlotInformation(PGconn *conn, const char *slot_name, XLogRecPtr *restart_lsn, TimeLineID* restart_tli)
+{
+	PGresult   *res;
+	PQExpBuffer query;
+
+	Assert(slot_name != NULL);
+	Assert(restart_lsn != NULL);
+
+	if (PQserverVersion(conn) < 150000)
+		return false;
+
+	*restart_lsn = InvalidXLogRecPtr;
+
+	if (restart_tli != NULL)
+		*restart_tli = 0;
+
+	query = createPQExpBuffer();
+	appendPQExpBuffer(query, "READ_REPLICATION_SLOT %s", slot_name);
+	res = PQexec(conn, query->data);
+	destroyPQExpBuffer(query);
+	/* The command should always return precisely one tuple */
+	if (PQresultStatus(res) != PGRES_TUPLES_OK)
+	{
+		pg_log_error("could not read replication slot : %s",
+					 PQerrorMessage(conn));
+		PQclear(res);
+		return false;
+	}
+
+	if (PQntuples(res) != 1 || PQnfields(res) < 3)
+	{
+		pg_log_error("could not read replication slot: got %d rows and %d fields, expected %d rows and %d or more fields",
+					 PQntuples(res), PQnfields(res), 1, 3);
+		PQclear(res);
+		return false;
+	}
+
+	/*
+	 * When the slot doesn't exist, the command returns an all-null tuple. The
+	 * first column (slot_type) will only be null if the slot doesn't exists.
+	 */
+	if (PQgetisnull(res, 0, 0))
+	{
+		PQclear(res);
+		pg_log_error("replication slot \"%s\" does not exist", slot_name);
+		return false;
+	}
+
+	/* This cannot happen for now, but future versions may accept
+	 * READ_REPLICATION_SLOT for other slot types.
+	 */
+	if (strcmp(PQgetvalue(res, 0, 0), "physical") != 0)
+		pg_log_error("expected a physical slot, got type \"%s\" instead",
+					 PQgetvalue(res, 0, 0));
+
+	/* Restart LSN */
+	if (!PQgetisnull(res, 0, 1))
+	{
+		uint32		hi,
+					lo;
+
+		if (sscanf(PQgetvalue(res, 0, 1), "%X/%X", &hi, &lo) != 2)
+		{
+			pg_log_error("could not parse slot's restart_lsn \"%s\"",
+						 PQgetvalue(res, 0, 1));
+			PQclear(res);
+			return false;
+		}
+		*restart_lsn = ((uint64) hi) << 32 | lo;
+	}
+	if (restart_tli && !PQgetisnull(res, 0, 2))
+	{
+		*restart_tli = atoi(PQgetvalue(res, 0, 2));
+	}
+	PQclear(res);
+	return true;
+}
+
 /*
  * Create a replication slot for the given connection. This function
  * returns true in case of success.
diff --git a/src/bin/pg_basebackup/streamutil.h b/src/bin/pg_basebackup/streamutil.h
index 65135c79e0..907457b8ae 100644
--- a/src/bin/pg_basebackup/streamutil.h
+++ b/src/bin/pg_basebackup/streamutil.h
@@ -52,6 +52,7 @@ extern void AppendIntegerCommandOption(PQExpBuffer buf,
 									   bool use_new_option_syntax,
 									   char *option_name, int32 option_value);
 
+extern bool GetSlotInformation(PGconn *conn, const char *slot_name, XLogRecPtr *restart_lsn, TimeLineID *restart_tli);
 extern bool RetrieveWalSegSize(PGconn *conn);
 extern TimestampTz feGetCurrentTimestamp(void);
 extern void feTimestampDifference(TimestampTz start_time, TimestampTz stop_time,
diff --git a/src/bin/pg_basebackup/t/020_pg_receivewal.pl b/src/bin/pg_basebackup/t/020_pg_receivewal.pl
index 0b33d73900..42749c3a40 100644
--- a/src/bin/pg_basebackup/t/020_pg_receivewal.pl
+++ b/src/bin/pg_basebackup/t/020_pg_receivewal.pl
@@ -5,7 +5,7 @@ use strict;
 use warnings;
 use TestLib;
 use PostgresNode;
-use Test::More tests => 27;
+use Test::More tests => 32;
 
 program_help_ok('pg_receivewal');
 program_version_ok('pg_receivewal');
@@ -49,7 +49,7 @@ is($primary->slot($slot_name)->{'slot_type'},
 
 # Generate some WAL.  Use --synchronous at the same time to add more
 # code coverage.  Switch to the next segment first so that subsequent
-# restarts of pg_receivewal will see this segment as full..
+# restarts of pg_receivewal will see this segment as full
 $primary->psql('postgres', 'CREATE TABLE test_table(x integer);');
 $primary->psql('postgres', 'SELECT pg_switch_wal();');
 my $nextlsn =
@@ -146,6 +146,45 @@ $primary->command_ok(
 $partial_wals[0] =~ s/(\.gz)?.partial//;
 ok(-e $partial_wals[0], "check that previously partial WAL is now complete");
 
+# Verify that if we use a replication slot, we resume where we left even in the
+# absence of WALs
+unlink glob "'${stream_dir}/*'";
+# Setup the slot
+$primary->psql('', "CREATE_REPLICATION_SLOT $slot_name PHYSICAL (RESERVE_WAL)",
+  replication => 1);
+$primary->psql('postgres',
+	'INSERT INTO test_table VALUES (generate_series(1,100));');
+$primary->psql('postgres', 'SELECT pg_switch_wal();');
+$nextlsn =
+  $primary->safe_psql('postgres', 'SELECT pg_current_wal_insert_lsn();');
+chomp($nextlsn);
+
+# Get the slot restart_lsn and make sure we retrieve the associated WAL file
+$slot = $primary->slot($slot_name);
+my $restart_lsn = $slot->{'restart_lsn'};
+# Add one so that the restart_lsn doesn't correspond to the previous file.
+$restart_lsn =~ s/.$/1/;
+my $walfile_to_be_archived = $primary->safe_psql('postgres',
+	"SELECT pg_walfile_name('$restart_lsn');");
+isnt($restart_lsn, '', 'restart LSN of new slot is not null');
+
+$primary->psql('postgres',
+	'INSERT INTO test_table VALUES (generate_series(1,100));');
+$primary->psql('postgres', 'SELECT pg_switch_wal();');
+$nextlsn =
+  $primary->safe_psql('postgres', 'SELECT pg_current_wal_insert_lsn();');
+chomp($nextlsn);
+# First verify what happens if we try with a non-existing slot.
+$primary->command_fails_like(
+  [ 'pg_receivewal', '-D', $stream_dir, '--slot', 'nonexistentslot', '-n', '--verbose', '--endpos', $nextlsn ],
+  qr/pg_receivewal: error: replication slot "nonexistentslot" does not exist/,
+  'pg_receivewal fails with a non-existing slot');
+$primary->command_ok(
+	[ 'pg_receivewal', '-D', $stream_dir, '--slot', $slot_name, '--verbose', '--endpos', $nextlsn ],
+	"Stream some wal starting from the slot's restart_lsn");
+$slot = $primary->slot($slot_name);
+ok(-e "$stream_dir/$walfile_to_be_archived", "WAL from the slot's restart_lsn has been archived");
+
 # Permissions on WAL files should be default
 SKIP:
 {
-- 
2.33.0

#38Michael Paquier
michael@paquier.xyz
In reply to: Ronan Dunklau (#37)
1 attachment(s)
Re: pg_receivewal starting position

On Thu, Oct 21, 2021 at 10:36:42AM +0200, Ronan Dunklau wrote:

Done. I haven't touched the timeline switch test patch for now, but I still
include it here for completeness.

Thanks. I have applied and back-patched 0001, then looked again at
0002 that adds READ_REPLICATION_SLOT:
- Change the TLI to use int8 rather than int4, so as we will always be
right with TimelineID which is unsigned (this was discussed upthread
but I got back on it after more thoughts, to avoid any future
issues).
- Added an extra initialization for the set of Datum values, just as
an extra safety net.
- There was a bug with the timeline returned when executing the
command while in recovery as ThisTimeLineID is 0 in the context of a
standby, but we need to support the case of physical slots even when
streaming archives from a standby. The fix is similar to what we do
for IDENTIFY_SYSTEM, where we need to use the timeline currently
replayed from GetXLogReplayRecPtr(), before looking at the past
timeline history using restart_lsn and the replayed TLI.

With that in place, I think that we are good now for this part.
--
Michael

Attachments:

v10-0001-Add-replication-command-READ_REPLICATION_SLOT.patchtext/x-diff; charset=us-asciiDownload
From 7e20a294d18c280b8031ee6cc862ba6a661cf40f Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Sat, 23 Oct 2021 16:28:10 +0900
Subject: [PATCH v10] Add replication command READ_REPLICATION_SLOT

The command is supported for physical slots for now, and returns the
type of slot, its restart_lsn and its restart_tli.

This will be useful for an upcoming patch related to pg_receivewal, to
allow the tool to be able to stream from the position of a slot, rather
than the last WAL position flushed by the backend (as reported by
IDENTIFY_SYSTEM), if the archive directory is found as empty, which
would be an advantage in the case of switching to a different archive
location with the same slot used to avoid holes in what gets backed up.

Author: Ronan Dunklau
Reviewed-by: Kyotaro Horiguchi, Michael Paquier, Bharath Rupireddy
Discussion: https://postgr.es/m/18708360.4lzOvYHigE@aivenronan
---
 src/include/nodes/nodes.h                   |   1 +
 src/include/nodes/replnodes.h               |  11 ++
 src/backend/replication/repl_gram.y         |  16 ++-
 src/backend/replication/repl_scanner.l      |   1 +
 src/backend/replication/walsender.c         | 105 ++++++++++++++++++++
 src/test/recovery/t/001_stream_rep.pl       |  32 +++++-
 src/test/recovery/t/006_logical_decoding.pl |  11 +-
 doc/src/sgml/protocol.sgml                  |  48 +++++++++
 src/tools/pgindent/typedefs.list            |   1 +
 9 files changed, 223 insertions(+), 3 deletions(-)

diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index e0057daa06..541e9861ba 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -496,6 +496,7 @@ typedef enum NodeTag
 	T_BaseBackupCmd,
 	T_CreateReplicationSlotCmd,
 	T_DropReplicationSlotCmd,
+	T_ReadReplicationSlotCmd,
 	T_StartReplicationCmd,
 	T_TimeLineHistoryCmd,
 	T_SQLCmd,
diff --git a/src/include/nodes/replnodes.h b/src/include/nodes/replnodes.h
index faa3a251f2..a746fafc12 100644
--- a/src/include/nodes/replnodes.h
+++ b/src/include/nodes/replnodes.h
@@ -87,6 +87,17 @@ typedef struct StartReplicationCmd
 } StartReplicationCmd;
 
 
+/* ----------------------
+ *		READ_REPLICATION_SLOT command
+ * ----------------------
+ */
+typedef struct ReadReplicationSlotCmd
+{
+	NodeTag		type;
+	char	   *slotname;
+} ReadReplicationSlotCmd;
+
+
 /* ----------------------
  *		TIMELINE_HISTORY command
  * ----------------------
diff --git a/src/backend/replication/repl_gram.y b/src/backend/replication/repl_gram.y
index 126380e2df..dcb1108579 100644
--- a/src/backend/replication/repl_gram.y
+++ b/src/backend/replication/repl_gram.y
@@ -64,6 +64,7 @@ static SQLCmd *make_sqlcmd(void);
 /* Keyword tokens. */
 %token K_BASE_BACKUP
 %token K_IDENTIFY_SYSTEM
+%token K_READ_REPLICATION_SLOT
 %token K_SHOW
 %token K_START_REPLICATION
 %token K_CREATE_REPLICATION_SLOT
@@ -94,7 +95,7 @@ static SQLCmd *make_sqlcmd(void);
 %type <node>	command
 %type <node>	base_backup start_replication start_logical_replication
 				create_replication_slot drop_replication_slot identify_system
-				timeline_history show sql_cmd
+				read_replication_slot timeline_history show sql_cmd
 %type <list>	base_backup_legacy_opt_list generic_option_list
 %type <defelt>	base_backup_legacy_opt generic_option
 %type <uintval>	opt_timeline
@@ -125,6 +126,7 @@ command:
 			| start_logical_replication
 			| create_replication_slot
 			| drop_replication_slot
+			| read_replication_slot
 			| timeline_history
 			| show
 			| sql_cmd
@@ -140,6 +142,18 @@ identify_system:
 				}
 			;
 
+/*
+ * READ_REPLICATION_SLOT %s
+ */
+read_replication_slot:
+			K_READ_REPLICATION_SLOT var_name
+				{
+					ReadReplicationSlotCmd *n = makeNode(ReadReplicationSlotCmd);
+					n->slotname = $2;
+					$$ = (Node *) n;
+				}
+			;
+
 /*
  * SHOW setting
  */
diff --git a/src/backend/replication/repl_scanner.l b/src/backend/replication/repl_scanner.l
index c038a636c3..1b599c255e 100644
--- a/src/backend/replication/repl_scanner.l
+++ b/src/backend/replication/repl_scanner.l
@@ -85,6 +85,7 @@ identifier		{ident_start}{ident_cont}*
 BASE_BACKUP			{ return K_BASE_BACKUP; }
 FAST			{ return K_FAST; }
 IDENTIFY_SYSTEM		{ return K_IDENTIFY_SYSTEM; }
+READ_REPLICATION_SLOT	{ return K_READ_REPLICATION_SLOT; }
 SHOW		{ return K_SHOW; }
 LABEL			{ return K_LABEL; }
 NOWAIT			{ return K_NOWAIT; }
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index b811a5c0ef..ef3e64846d 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -232,6 +232,7 @@ static void XLogSendLogical(void);
 static void WalSndDone(WalSndSendDataCallback send_data);
 static XLogRecPtr GetStandbyFlushRecPtr(void);
 static void IdentifySystem(void);
+static void ReadReplicationSlot(ReadReplicationSlotCmd *cmd);
 static void CreateReplicationSlot(CreateReplicationSlotCmd *cmd);
 static void DropReplicationSlot(DropReplicationSlotCmd *cmd);
 static void StartReplication(StartReplicationCmd *cmd);
@@ -457,6 +458,103 @@ IdentifySystem(void)
 	end_tup_output(tstate);
 }
 
+/* Handle READ_REPLICATION_SLOT command */
+static void
+ReadReplicationSlot(ReadReplicationSlotCmd *cmd)
+{
+#define READ_REPLICATION_SLOT_COLS 3
+	ReplicationSlot *slot;
+	DestReceiver *dest;
+	TupOutputState *tstate;
+	TupleDesc	tupdesc;
+	Datum		values[READ_REPLICATION_SLOT_COLS];
+	bool		nulls[READ_REPLICATION_SLOT_COLS];
+
+	tupdesc = CreateTemplateTupleDesc(READ_REPLICATION_SLOT_COLS);
+	TupleDescInitBuiltinEntry(tupdesc, (AttrNumber) 1, "slot_type",
+							  TEXTOID, -1, 0);
+	TupleDescInitBuiltinEntry(tupdesc, (AttrNumber) 2, "restart_lsn",
+							  TEXTOID, -1, 0);
+	/* TimeLineID is unsigned, so int4 is not wide enough. */
+	TupleDescInitBuiltinEntry(tupdesc, (AttrNumber) 3, "restart_tli",
+							  INT8OID, -1, 0);
+
+	MemSet(values, 0, READ_REPLICATION_SLOT_COLS * sizeof(Datum));
+	MemSet(nulls, true, READ_REPLICATION_SLOT_COLS * sizeof(bool));
+
+	LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
+	slot = SearchNamedReplicationSlot(cmd->slotname, false);
+	if (slot == NULL || !slot->in_use)
+	{
+		LWLockRelease(ReplicationSlotControlLock);
+	}
+	else
+	{
+		ReplicationSlot slot_contents;
+		int			i = 0;
+
+		/* Copy slot contents while holding spinlock */
+		SpinLockAcquire(&slot->mutex);
+		slot_contents = *slot;
+		SpinLockRelease(&slot->mutex);
+		LWLockRelease(ReplicationSlotControlLock);
+
+		if (OidIsValid(slot_contents.data.database))
+			ereport(ERROR,
+					errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					errmsg("cannot use \"%s\" with logical replication slots",
+						   "READ_REPLICATION_SLOT"));
+
+		/* slot type */
+		values[i] = CStringGetTextDatum("physical");
+		nulls[i] = false;
+		i++;
+
+		/* start LSN */
+		if (!XLogRecPtrIsInvalid(slot_contents.data.restart_lsn))
+		{
+			char		xloc[64];
+
+			snprintf(xloc, sizeof(xloc), "%X/%X",
+					 LSN_FORMAT_ARGS(slot_contents.data.restart_lsn));
+			values[i] = CStringGetTextDatum(xloc);
+			nulls[i] = false;
+		}
+		i++;
+
+		/* timeline this WAL was produced on */
+		if (!XLogRecPtrIsInvalid(slot_contents.data.restart_lsn))
+		{
+			TimeLineID	slots_position_timeline;
+			TimeLineID	current_timeline;
+			List	   *timeline_history = NIL;
+
+			/*
+			 * While in recovery, use as timeline the currently-replaying
+			 * one to get the LSN position's history.
+			 */
+			if (RecoveryInProgress())
+				(void) GetXLogReplayRecPtr(&current_timeline);
+			else
+				current_timeline = ThisTimeLineID;
+
+			timeline_history = readTimeLineHistory(current_timeline);
+			slots_position_timeline = tliOfPointInHistory(slot_contents.data.restart_lsn,
+														  timeline_history);
+			values[i] = Int64GetDatum((int64) slots_position_timeline);
+			nulls[i] = false;
+		}
+		i++;
+
+		Assert(i == READ_REPLICATION_SLOT_COLS);
+	}
+
+	dest = CreateDestReceiver(DestRemoteSimple);
+	tstate = begin_tup_output_tupdesc(dest, tupdesc, &TTSOpsVirtual);
+	do_tup_output(tstate, values, nulls);
+	end_tup_output(tstate);
+}
+
 
 /*
  * Handle TIMELINE_HISTORY command.
@@ -1622,6 +1720,13 @@ exec_replication_command(const char *cmd_string)
 			EndReplicationCommand(cmdtag);
 			break;
 
+		case T_ReadReplicationSlotCmd:
+			cmdtag = "READ_REPLICATION_SLOT";
+			set_ps_display(cmdtag);
+			ReadReplicationSlot((ReadReplicationSlotCmd *) cmd_node);
+			EndReplicationCommand(cmdtag);
+			break;
+
 		case T_BaseBackupCmd:
 			cmdtag = "BASE_BACKUP";
 			set_ps_display(cmdtag);
diff --git a/src/test/recovery/t/001_stream_rep.pl b/src/test/recovery/t/001_stream_rep.pl
index 9916a36012..8fb7487857 100644
--- a/src/test/recovery/t/001_stream_rep.pl
+++ b/src/test/recovery/t/001_stream_rep.pl
@@ -6,7 +6,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 49;
+use Test::More tests => 53;
 
 # Initialize primary node
 my $node_primary = PostgresNode->new('primary');
@@ -254,6 +254,36 @@ ok( $ret == 0,
 	"SHOW with superuser-settable parameter, replication role and logical replication"
 );
 
+note "testing READ_REPLICATION_SLOT command for replication connection";
+
+my $slotname = 'test_read_replication_slot_physical';
+
+($ret, $stdout, $stderr) = $node_primary->psql(
+	'postgres',
+	'READ_REPLICATION_SLOT non_existent_slot;',
+	extra_params => [ '-d', $connstr_rep ]);
+ok($ret == 0, "READ_REPLICATION_SLOT exit code 0 on success");
+like($stdout, qr/^||$/,
+	"READ_REPLICATION_SLOT returns NULL values if slot does not exist");
+
+$node_primary->psql(
+	'postgres',
+	"CREATE_REPLICATION_SLOT $slotname PHYSICAL RESERVE_WAL;",
+	extra_params => [ '-d', $connstr_rep ]);
+
+($ret, $stdout, $stderr) = $node_primary->psql(
+	'postgres',
+	"READ_REPLICATION_SLOT $slotname;",
+	extra_params => [ '-d', $connstr_rep ]);
+ok($ret == 0, "READ_REPLICATION_SLOT success with existing slot");
+like($stdout, qr/^physical\|[^|]*\|1$/,
+	"READ_REPLICATION_SLOT returns tuple with slot information");
+
+$node_primary->psql(
+	'postgres',
+	"DROP_REPLICATION_SLOT $slotname;",
+	extra_params => [ '-d', $connstr_rep ]);
+
 note "switching to physical replication slot";
 
 # Switch to using a physical replication slot. We can do this without a new
diff --git a/src/test/recovery/t/006_logical_decoding.pl b/src/test/recovery/t/006_logical_decoding.pl
index cc116062c2..1b74f38f10 100644
--- a/src/test/recovery/t/006_logical_decoding.pl
+++ b/src/test/recovery/t/006_logical_decoding.pl
@@ -10,7 +10,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 14;
+use Test::More tests => 15;
 use Config;
 
 # Initialize primary node
@@ -39,6 +39,15 @@ ok( $stderr =~
 	  m/replication slot "test_slot" was not created in this database/,
 	"Logical decoding correctly fails to start");
 
+($result, $stdout, $stderr) = $node_primary->psql(
+	'template1',
+	qq[READ_REPLICATION_SLOT test_slot;],
+	replication => 'database');
+like(
+	$stderr,
+	qr/cannot use "READ_REPLICATION_SLOT" with logical replication slots/,
+	'READ_REPLICATION_SLOT not supported for logical slots');
+
 # Check case of walsender not using a database connection.  Logical
 # decoding should not be allowed.
 ($result, $stdout, $stderr) = $node_primary->psql(
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index b95cc88599..132436c6e6 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2067,6 +2067,54 @@ The commands accepted in replication mode are:
     </listitem>
   </varlistentry>
 
+  <varlistentry>
+    <term><literal>READ_REPLICATION_SLOT</literal> <replaceable class="parameter">slot_name</replaceable>
+      <indexterm><primary>READ_REPLICATION_SLOT</primary></indexterm>
+    </term>
+    <listitem>
+     <para>
+      Read some information associated to a replication slot. Returns a tuple
+      with <literal>NULL</literal> values if the replication slot does not
+      exist. This command is currently only supported for physical replication
+      slots.
+     </para>
+     <para>
+      In response to this command, the server will return a one-row result set,
+      containing the following fields:
+      <variablelist>
+       <varlistentry>
+        <term><literal>slot_type</literal> (<type>text</type>)</term>
+        <listitem>
+         <para>
+          The replication slot's type, either <literal>physical</literal> or
+          <literal>NULL</literal>.
+         </para>
+        </listitem>
+       </varlistentry>
+
+       <varlistentry>
+        <term><literal>restart_lsn</literal> (<type>text</type>)</term>
+        <listitem>
+         <para>
+          The replication slot's <literal>restart_lsn</literal>.
+         </para>
+        </listitem>
+       </varlistentry>
+
+       <varlistentry>
+        <term><literal>restart_tli</literal> (<type>int8</type>)</term>
+        <listitem>
+         <para>
+          The timeline ID associated to <literal>restart_lsn</literal>,
+          following the current timeline history.
+         </para>
+        </listitem>
+       </varlistentry>
+      </variablelist>
+     </para>
+    </listitem>
+  </varlistentry>
+
   <varlistentry>
     <term><literal>START_REPLICATION</literal> [ <literal>SLOT</literal> <replaceable class="parameter">slot_name</replaceable> ] [ <literal>PHYSICAL</literal> ] <replaceable class="parameter">XXX/XXX</replaceable> [ <literal>TIMELINE</literal> <replaceable class="parameter">tli</replaceable> ]
      <indexterm><primary>START_REPLICATION</primary></indexterm>
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index cb5b5ec74c..a8e4a2afd6 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2128,6 +2128,7 @@ ReadBufferMode
 ReadBytePtrType
 ReadExtraTocPtrType
 ReadFunc
+ReadReplicationSlotCmd
 ReassignOwnedStmt
 RecheckForeignScan_function
 RecordCacheEntry
-- 
2.33.0

#39Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Michael Paquier (#38)
Re: pg_receivewal starting position

On Sat, Oct 23, 2021 at 1:14 PM Michael Paquier <michael@paquier.xyz> wrote:

On Thu, Oct 21, 2021 at 10:36:42AM +0200, Ronan Dunklau wrote:

Done. I haven't touched the timeline switch test patch for now, but I still
include it here for completeness.

Thanks. I have applied and back-patched 0001, then looked again at
0002 that adds READ_REPLICATION_SLOT:
- Change the TLI to use int8 rather than int4, so as we will always be
right with TimelineID which is unsigned (this was discussed upthread
but I got back on it after more thoughts, to avoid any future
issues).
- Added an extra initialization for the set of Datum values, just as
an extra safety net.
- There was a bug with the timeline returned when executing the
command while in recovery as ThisTimeLineID is 0 in the context of a
standby, but we need to support the case of physical slots even when
streaming archives from a standby. The fix is similar to what we do
for IDENTIFY_SYSTEM, where we need to use the timeline currently
replayed from GetXLogReplayRecPtr(), before looking at the past
timeline history using restart_lsn and the replayed TLI.

With that in place, I think that we are good now for this part.

Thanks for the updated patch. I have following comments on v10:

1) It's better to initialize nulls with false, we can avoid setting
them to true. The instances where the columns are not nulls is going
to be more than the columns with null values, so we could avoid some
of nulls[i] = false; instructions.
+ MemSet(nulls, true, READ_REPLICATION_SLOT_COLS * sizeof(bool));
I suggest we do the following. The number of instances of hitting the
"else" parts will be less.
MemSet(nulls, false, READ_REPLICATION_SLOT_COLS * sizeof(bool));

if (slot == NULL || !slot->in_use)
{
MemSet(nulls, true, READ_REPLICATION_SLOT_COLS * sizeof(bool));
LWLockRelease(ReplicationSlotControlLock);
}

if (!XLogRecPtrIsInvalid(slot_contents.data.restart_lsn))
{
}
else
nulls[i] = true;

if (!XLogRecPtrIsInvalid(slot_contents.data.restart_lsn))
{
}
else
nulls[i] = true;

2) As I previously mentioned, we are not copying the slot contents
while holding the spinlock, it's just we are taking the memory address
and releasing the lock, so there is a chance that the memory we are
looking at can become unavailable or stale while we access
slot_contents. So, I suggest we do the memcpy of the *slot to
slot_contents. I'm sure the memcpy ing the entire ReplicationSlot
contents will be costlier, so let's just take the info that we need
(data.database, data.restart_lsn) into local variables while we hold
the spin lock
+ /* Copy slot contents while holding spinlock */
+ SpinLockAcquire(&slot->mutex);
+ slot_contents = *slot;
+ SpinLockRelease(&slot->mutex);
+ LWLockRelease(ReplicationSlotControlLock);

The code will look like following:
Oid database;
XLogRecPtr restart_lsn;

/* Take required information from slot contents while holding
spinlock */
SpinLockAcquire(&slot->mutex);
database= slot->data.database;
restart_lsn= slot->data.restart_lsn;
SpinLockRelease(&slot->mutex);
LWLockRelease(ReplicationSlotControlLock);

3) The data that the new command returns to the client can actually
become stale while it is captured and in transit to the client as we
release the spinlock and other backends can drop or alter the info.
So, it's better we talk about this in the documentation of the new
command and also in the comments saying "clients will have to deal
with it."

4) How about we be more descriptive about the error added? This will
help identify for which replication slot the command has failed from
tons of server logs which really help in debugging and analysis.
I suggest we have this:
errmsg("cannot use \"%s\" command with logical replication slot
\"%s\"", "READ_REPLICATION_SLOT", cmd->slotname);
instead of just a plain, non-informative, generic message:
errmsg("cannot use \"%s\" with logical replication slots",

Regards,
Bharath Rupireddy.

#40Michael Paquier
michael@paquier.xyz
In reply to: Bharath Rupireddy (#39)
Re: pg_receivewal starting position

On Sat, Oct 23, 2021 at 10:46:30PM +0530, Bharath Rupireddy wrote:

1) It's better to initialize nulls with false, we can avoid setting
them to true. The instances where the columns are not nulls is going
to be more than the columns with null values, so we could avoid some
of nulls[i] = false; instructions.

I don't think that this is an improvement in this case as in the
default case we'd return a tuple full of NULL values if the slot does
not exist, so the existing code is simpler when we don't look at the
slot contents.

2) As I previously mentioned, we are not copying the slot contents
while holding the spinlock, it's just we are taking the memory address
and releasing the lock, so there is a chance that the memory we are
looking at can become unavailable or stale while we access
slot_contents. So, I suggest we do the memcpy of the *slot to
slot_contents. I'm sure the memcpy ing the entire ReplicationSlot
contents will be costlier, so let's just take the info that we need
(data.database, data.restart_lsn) into local variables while we hold
the spin lock

The style of the patch is consistent with what we do in other areas
(see pg_get_replication_slots as one example).

+ /* Copy slot contents while holding spinlock */
+ SpinLockAcquire(&slot->mutex);
+ slot_contents = *slot;

And what this does is to copy the contents of the slot into a local
area (note that we use a NameData pointing to an area with
NAMEDATALEN). Aka if the contents of *slot are changed by whatever
reason (this cannot change as of the LWLock acquired), what we have
saved is unchanged as of this command's context.

3) The data that the new command returns to the client can actually
become stale while it is captured and in transit to the client as we
release the spinlock and other backends can drop or alter the info.
So, it's better we talk about this in the documentation of the new
command and also in the comments saying "clients will have to deal
with it."

The same can be said with IDENTIFY_SYSTEM when the flushed location
becomes irrelevant. I am not sure that this has any need to apply
here. We could add that this is useful to get a streaming start
position though.

4) How about we be more descriptive about the error added? This will
help identify for which replication slot the command has failed from
tons of server logs which really help in debugging and analysis.
I suggest we have this:
errmsg("cannot use \"%s\" command with logical replication slot
\"%s\"", "READ_REPLICATION_SLOT", cmd->slotname);
instead of just a plain, non-informative, generic message:
errmsg("cannot use \"%s\" with logical replication slots",

Yeah. I don't mind adding the slot name in this string.
--
Michael

#41Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Michael Paquier (#40)
Re: pg_receivewal starting position

On Sun, Oct 24, 2021 at 4:40 AM Michael Paquier <michael@paquier.xyz> wrote:

On Sat, Oct 23, 2021 at 10:46:30PM +0530, Bharath Rupireddy wrote:

2) As I previously mentioned, we are not copying the slot contents
while holding the spinlock, it's just we are taking the memory address
and releasing the lock, so there is a chance that the memory we are
looking at can become unavailable or stale while we access
slot_contents. So, I suggest we do the memcpy of the *slot to
slot_contents. I'm sure the memcpy ing the entire ReplicationSlot
contents will be costlier, so let's just take the info that we need
(data.database, data.restart_lsn) into local variables while we hold
the spin lock

The style of the patch is consistent with what we do in other areas
(see pg_get_replication_slots as one example).

+ /* Copy slot contents while holding spinlock */
+ SpinLockAcquire(&slot->mutex);
+ slot_contents = *slot;

And what this does is to copy the contents of the slot into a local
area (note that we use a NameData pointing to an area with
NAMEDATALEN). Aka if the contents of *slot are changed by whatever
reason (this cannot change as of the LWLock acquired), what we have
saved is unchanged as of this command's context.

pg_get_replication_slots holds the ReplicationSlotControlLock until
the end of the function so it can be assured that *slot contents will
not change. In ReadReplicationSlot, the ReplicationSlotControlLock is
released immediately after taking *slot pointer into slot_contents.
Isn't it better if we hold the lock until the end of the function so
that we can avoid the slot contents becoming stale problems?

Having said that, the ReplicationSlotCreate,
SearchNamedReplicationSlot (when need_lock is true) etc. release the
lock immediately. I'm not sure if I should ignore this staleness
problem in ReadReplicationSlot.

3) The data that the new command returns to the client can actually
become stale while it is captured and in transit to the client as we
release the spinlock and other backends can drop or alter the info.
So, it's better we talk about this in the documentation of the new
command and also in the comments saying "clients will have to deal
with it."

The same can be said with IDENTIFY_SYSTEM when the flushed location
becomes irrelevant. I am not sure that this has any need to apply
here. We could add that this is useful to get a streaming start
position though.

I think that's okay, let's not make any changes or add any comments in
regards to the above. The client is basically bound to get the
snapshot of the data at the time it requests the database.

4) How about we be more descriptive about the error added? This will
help identify for which replication slot the command has failed from
tons of server logs which really help in debugging and analysis.
I suggest we have this:
errmsg("cannot use \"%s\" command with logical replication slot
\"%s\"", "READ_REPLICATION_SLOT", cmd->slotname);
instead of just a plain, non-informative, generic message:
errmsg("cannot use \"%s\" with logical replication slots",

Yeah. I don't mind adding the slot name in this string.

Thanks.

Regards,
Bharath Rupireddy.

#42Michael Paquier
michael@paquier.xyz
In reply to: Bharath Rupireddy (#41)
Re: pg_receivewal starting position

On Sun, Oct 24, 2021 at 09:08:01AM +0530, Bharath Rupireddy wrote:

pg_get_replication_slots holds the ReplicationSlotControlLock until
the end of the function so it can be assured that *slot contents will
not change. In ReadReplicationSlot, the ReplicationSlotControlLock is
released immediately after taking *slot pointer into slot_contents.
Isn't it better if we hold the lock until the end of the function so
that we can avoid the slot contents becoming stale problems?

The reason is different in the case of pg_get_replication_slots(). We
have to hold ReplicationSlotControlLock for the whole duration of the
shared memory scan to return back to the user a consistent set of
information to the user, for all the slots.
--
Michael

#43Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Michael Paquier (#42)
Re: pg_receivewal starting position

On Sun, Oct 24, 2021 at 12:21 PM Michael Paquier <michael@paquier.xyz> wrote:

On Sun, Oct 24, 2021 at 09:08:01AM +0530, Bharath Rupireddy wrote:

pg_get_replication_slots holds the ReplicationSlotControlLock until
the end of the function so it can be assured that *slot contents will
not change. In ReadReplicationSlot, the ReplicationSlotControlLock is
released immediately after taking *slot pointer into slot_contents.
Isn't it better if we hold the lock until the end of the function so
that we can avoid the slot contents becoming stale problems?

The reason is different in the case of pg_get_replication_slots(). We
have to hold ReplicationSlotControlLock for the whole duration of the
shared memory scan to return back to the user a consistent set of
information to the user, for all the slots.

Thanks. I've no further comments on the v10 patch.

Regards,
Bharath Rupireddy.

#44Michael Paquier
michael@paquier.xyz
In reply to: Bharath Rupireddy (#43)
Re: pg_receivewal starting position

On Sun, Oct 24, 2021 at 07:20:57PM +0530, Bharath Rupireddy wrote:

Thanks. I've no further comments on the v10 patch.

Okay, thanks. I have applied this part, then.
--
Michael

#45Michael Paquier
michael@paquier.xyz
In reply to: Ronan Dunklau (#37)
1 attachment(s)
Re: pg_receivewal starting position

On Thu, Oct 21, 2021 at 10:36:42AM +0200, Ronan Dunklau wrote:

Done. I haven't touched the timeline switch test patch for now, but I still
include it here for completeness.

As 0001 and 0002 have been applied, I have put my hands on 0003, and
found a couple of issues upon review.

+       Assert(slot_name != NULL);
+       Assert(restart_lsn != NULL);
There is no need for those asserts, as we should support the case
where the caller gives NULL for those variables.
+       if (PQserverVersion(conn) < 150000)
+               return false;
Returning false is incorrect for older server versions as we won't
fallback to the old method when streaming from older server.  What
this needs to do is return true and set restart_lsn to
InvalidXLogRecPtr, so as pg_receivewal would just stream from the
current flush location.  "false" should just be returned on error,
with pg_log_error().
+$primary->psql('postgres',
+       'INSERT INTO test_table VALUES (generate_series(1,100));');
+$primary->psql('postgres', 'SELECT pg_switch_wal();');
+$nextlsn =
+  $primary->safe_psql('postgres', 'SELECT pg_current_wal_insert_lsn();');
+chomp($nextlsn);
There is no need to switch twice to a new WAL segment as we just need
to be sure that the WAL segment of the restart_lsn is the one
archived.  Note that RESERVE_WAL uses the last redo point, so it is
better to use a checkpoint and reduce the number of logs we stream
into the new location.

Better to add some --no-sync to the new commands of pg_receivewal, to
not stress the I/O more than necessary. I have added some extra -n
while on it to avoid loops on failure.

Attached is the updated patch I am finishing with, which is rather
clean now. I have tweaked a couple of things while on it, and
documented better the new GetSlotInformation() in streamutil.c.
--
Michael

Attachments:

v11-0001-Allow-pg_receivewal-to-stream-from-a-slot-s-rest.patchtext/x-diff; charset=us-asciiDownload
From 3b9c5336866881361ecaa428ae8e57f5ba9aee90 Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Mon, 25 Oct 2021 15:34:46 +0900
Subject: [PATCH v11] Allow pg_receivewal to stream from a slot's restart LSN

Prior to this patch, when running pg_receivewal, the streaming start
point would be the current location of the archives if anything is
found, and pg_receivewal would fall back to the current WAL flush
location if there are no archives, as of the result of an
IDENTIFY_SYSTEM command.

If for some reason the WAL files from pg_receivewal were moved, we want
to restart where we left at, which is the replication slot's restart_lsn
instead of skipping right to the current flush location.  So this makes
pg_receivewal use the following sequence of methods to determine the
starting streaming LSN:
- Scan the local archives.
- Use the slot's restart_lsn, if supported.
- Fallback to the current flush LSN.

To keep compatibility with prior server versions, we only attempt to use
READ_REPLICATION_SLOT if the backend version is at least 15, and
fallback to the older behavior of streaming from the current flush
LSN if the command is not supported.

Some new TAP tests are added to cover this feature.
---
 src/bin/pg_basebackup/pg_receivewal.c        |  29 +++++-
 src/bin/pg_basebackup/streamutil.c           | 104 +++++++++++++++++++
 src/bin/pg_basebackup/streamutil.h           |   1 +
 src/bin/pg_basebackup/t/020_pg_receivewal.pl |  53 +++++++++-
 doc/src/sgml/ref/pg_receivewal.sgml          |  11 ++
 5 files changed, 194 insertions(+), 4 deletions(-)

diff --git a/src/bin/pg_basebackup/pg_receivewal.c b/src/bin/pg_basebackup/pg_receivewal.c
index d5140a79fe..85f2ce53a6 100644
--- a/src/bin/pg_basebackup/pg_receivewal.c
+++ b/src/bin/pg_basebackup/pg_receivewal.c
@@ -404,15 +404,38 @@ StreamLog(void)
 		exit(1);
 
 	/*
-	 * Figure out where to start streaming.
+	 * Figure out where to start streaming.  First scan the local directory.
 	 */
 	stream.startpos = FindStreamingStart(&stream.timeline);
 	if (stream.startpos == InvalidXLogRecPtr)
 	{
-		stream.startpos = serverpos;
-		stream.timeline = servertli;
+		/*
+		 * Try to get the starting point from the slot if any, the upstream
+		 * server may not support this option.
+		 */
+		if (replication_slot != NULL)
+		{
+			if (!GetSlotInformation(conn, replication_slot, &stream.startpos,
+									&stream.timeline))
+			{
+				/* Error is logged by GetSlotInformation() */
+				return;
+			}
+		}
+
+		/*
+		 * If it the starting point is still not known, use the current WAL
+		 * flush value as last resort.
+		 */
+		if (stream.startpos == InvalidXLogRecPtr)
+		{
+			stream.startpos = serverpos;
+			stream.timeline = servertli;
+		}
 	}
 
+	Assert(stream.startpos != InvalidXLogRecPtr);
+
 	/*
 	 * Always start streaming at the beginning of a segment
 	 */
diff --git a/src/bin/pg_basebackup/streamutil.c b/src/bin/pg_basebackup/streamutil.c
index a9bc1ce214..3d3250a3f5 100644
--- a/src/bin/pg_basebackup/streamutil.c
+++ b/src/bin/pg_basebackup/streamutil.c
@@ -479,6 +479,110 @@ RunIdentifySystem(PGconn *conn, char **sysid, TimeLineID *starttli,
 	return true;
 }
 
+
+/*
+ * Run READ_REPLICATION_SLOT through a given connection and give back to
+ * caller some result information if requested for this slot:
+ * - Start LSN position, InvalidXLogRecPtr if unknown.
+ * - Current timeline ID, 0 if unknown.
+ * Returns false on failure, and true otherwise.
+ */
+bool
+GetSlotInformation(PGconn *conn, const char *slot_name, XLogRecPtr *restart_lsn, TimeLineID *restart_tli)
+{
+	PGresult   *res;
+	PQExpBuffer query;
+	XLogRecPtr	lsn_loc = InvalidXLogRecPtr;
+	TimeLineID	tli_loc = 0;
+
+	if (*restart_lsn)
+		*restart_lsn = lsn_loc;
+	if (restart_tli != NULL)
+		*restart_tli = tli_loc;
+
+	/*
+	 * If the command is not supported, letting the caller know that no
+	 * information was found.
+	 */
+	if (PQserverVersion(conn) < 150000)
+		return true;
+
+	query = createPQExpBuffer();
+	appendPQExpBuffer(query, "READ_REPLICATION_SLOT %s", slot_name);
+	res = PQexec(conn, query->data);
+	destroyPQExpBuffer(query);
+
+	if (PQresultStatus(res) != PGRES_TUPLES_OK)
+	{
+		pg_log_error("could not read replication slot : %s",
+					 PQerrorMessage(conn));
+		PQclear(res);
+		return false;
+	}
+
+	/* The command should always return precisely one tuple and three fields */
+	if (PQntuples(res) != 1 || PQnfields(res) != 3)
+	{
+		pg_log_error("could not read replication slot: got %d rows and %d fields, expected %d rows and %d or more fields",
+					 PQntuples(res), PQnfields(res), 1, 3);
+		PQclear(res);
+		return false;
+	}
+
+	/*
+	 * When the slot doesn't exist, the command returns a tuple with NULL
+	 * values.  This checks only the slot type field.
+	 */
+	if (PQgetisnull(res, 0, 0))
+	{
+		pg_log_error("replication slot \"%s\" does not exist", slot_name);
+		PQclear(res);
+		return false;
+	}
+
+	/*
+	 * Note that this cannot happen as READ_REPLICATION_SLOT supports only
+	 * physical slots, but play it safe.
+	 */
+	if (strcmp(PQgetvalue(res, 0, 0), "physical") != 0)
+	{
+		pg_log_error("expected a physical replication slot, got type \"%s\" instead",
+					 PQgetvalue(res, 0, 0));
+		PQclear(res);
+		return false;
+	}
+
+	/* restart LSN */
+	if (!PQgetisnull(res, 0, 1))
+	{
+		uint32		hi,
+					lo;
+
+		if (sscanf(PQgetvalue(res, 0, 1), "%X/%X", &hi, &lo) != 2)
+		{
+			pg_log_error("could not parse slot's restart_lsn \"%s\"",
+						 PQgetvalue(res, 0, 1));
+			PQclear(res);
+			return false;
+		}
+		lsn_loc = ((uint64) hi) << 32 | lo;
+	}
+
+	/* current TLI */
+	if (!PQgetisnull(res, 0, 2))
+		tli_loc = (TimeLineID) atol(PQgetvalue(res, 0, 2));
+
+	PQclear(res);
+
+	/* Assign results if requested */
+	if (restart_lsn)
+		*restart_lsn = lsn_loc;
+	if (restart_tli)
+		*restart_tli = tli_loc;
+
+	return true;
+}
+
 /*
  * Create a replication slot for the given connection. This function
  * returns true in case of success.
diff --git a/src/bin/pg_basebackup/streamutil.h b/src/bin/pg_basebackup/streamutil.h
index 65135c79e0..907457b8ae 100644
--- a/src/bin/pg_basebackup/streamutil.h
+++ b/src/bin/pg_basebackup/streamutil.h
@@ -52,6 +52,7 @@ extern void AppendIntegerCommandOption(PQExpBuffer buf,
 									   bool use_new_option_syntax,
 									   char *option_name, int32 option_value);
 
+extern bool GetSlotInformation(PGconn *conn, const char *slot_name, XLogRecPtr *restart_lsn, TimeLineID *restart_tli);
 extern bool RetrieveWalSegSize(PGconn *conn);
 extern TimestampTz feGetCurrentTimestamp(void);
 extern void feTimestampDifference(TimestampTz start_time, TimestampTz stop_time,
diff --git a/src/bin/pg_basebackup/t/020_pg_receivewal.pl b/src/bin/pg_basebackup/t/020_pg_receivewal.pl
index b93493b5e9..c7801f9f2f 100644
--- a/src/bin/pg_basebackup/t/020_pg_receivewal.pl
+++ b/src/bin/pg_basebackup/t/020_pg_receivewal.pl
@@ -5,7 +5,7 @@ use strict;
 use warnings;
 use PostgreSQL::Test::Utils;
 use PostgreSQL::Test::Cluster;
-use Test::More tests => 27;
+use Test::More tests => 31;
 
 program_help_ok('pg_receivewal');
 program_version_ok('pg_receivewal');
@@ -72,6 +72,8 @@ $primary->command_ok(
 my @partial_wals = glob "$stream_dir/*\.partial";
 is(scalar(@partial_wals), 1, "one partial WAL segment was created");
 
+note "Testing pg_receivewal with compression methods";
+
 # Check ZLIB compression if available.
 SKIP:
 {
@@ -155,3 +157,52 @@ SKIP:
 	ok(check_mode_recursive($stream_dir, 0700, 0600),
 		"check stream dir permissions");
 }
+
+note "Testing pg_receivewal with slot as starting streaming point";
+
+# When using a replication slot, archiving should be resumed from the slot's
+# restart LSN.  Use a new archive location and new slot for this test.
+my $slot_dir = $primary->basedir . '/slot_wal';
+mkdir($slot_dir);
+$slot_name = 'archive_slot';
+
+# Setup the slot, reserving WAL at creation (corresponding to the
+# last redo LSN here, actually).
+$primary->psql('postgres',
+	"SELECT pg_create_physical_replication_slot('$slot_name', true);");
+
+# Get the segment name associated with the slot's restart LSN, that should
+# be archived.
+my $walfile_streamed = $primary->safe_psql(
+	'postgres',
+	"SELECT pg_walfile_name(restart_lsn)
+  FROM pg_replication_slots
+  WHERE slot_name = '$slot_name';");
+
+# Switch to a new segment, to make sure that the segment retained by the
+# slot is still streamed.  This may not be necessary, but play it safe.
+$primary->psql('postgres',
+	'INSERT INTO test_table VALUES (generate_series(1,100));');
+$primary->psql('postgres', 'SELECT pg_switch_wal();');
+$nextlsn =
+  $primary->safe_psql('postgres', 'SELECT pg_current_wal_insert_lsn();');
+chomp($nextlsn);
+
+# Check case where the slot does not exist.
+$primary->command_fails_like(
+	[
+		'pg_receivewal',   '-D', $slot_dir,   '--slot',
+		'nonexistentslot', '-n', '--no-sync', '--verbose',
+		'--endpos',        $nextlsn
+	],
+	qr/pg_receivewal: error: replication slot "nonexistentslot" does not exist/,
+	'pg_receivewal fails with non-existing slot');
+$primary->command_ok(
+	[
+		'pg_receivewal', '-D', $slot_dir,   '--slot',
+		$slot_name,      '-n', '--no-sync', '--verbose',
+		'--endpos',      $nextlsn
+	],
+	"WAL streamed from the slot's restart_lsn");
+ok(-e "$slot_dir/$walfile_streamed",
+	"WAL from the slot's restart_lsn has been archived");
diff --git a/doc/src/sgml/ref/pg_receivewal.sgml b/doc/src/sgml/ref/pg_receivewal.sgml
index 6da8b2be8c..5fff503fe5 100644
--- a/doc/src/sgml/ref/pg_receivewal.sgml
+++ b/doc/src/sgml/ref/pg_receivewal.sgml
@@ -88,6 +88,17 @@ PostgreSQL documentation
      </para>
     </listitem>
 
+    <listitem>
+     <para>
+      If a starting point could not be calculated with the previous method,
+      and if a replication slot is used, an extra
+      <command>READ_REPLICATION_SLOT</command> is issued to retrieve the
+      slot's <literal>restart_lsn</literal> to use as starting point.
+      This option is only available when streaming write-ahead logs from
+      <productname>PostgreSQL</productname> 15 and up.
+     </para>
+    </listitem>
+
     <listitem>
      <para>
       If a starting point cannot be calculated with the previous method,
-- 
2.33.0

#46Ronan Dunklau
ronan.dunklau@aiven.io
In reply to: Michael Paquier (#45)
Re: pg_receivewal starting position

Le lundi 25 octobre 2021, 08:51:23 CEST Michael Paquier a écrit :

On Thu, Oct 21, 2021 at 10:36:42AM +0200, Ronan Dunklau wrote:

Done. I haven't touched the timeline switch test patch for now, but I
still
include it here for completeness.

As 0001 and 0002 have been applied, I have put my hands on 0003, and
found a couple of issues upon review.

+       Assert(slot_name != NULL);
+       Assert(restart_lsn != NULL);
There is no need for those asserts, as we should support the case
where the caller gives NULL for those variables.

Does it make sense though ? The NULL slot_name case handling is pretty
straight forward has it will be handled by string formatting, but in the case
of a null restart_lsn, we have no way of knowing if the command was issued at
all.

+       if (PQserverVersion(conn) < 150000)
+               return false;
Returning false is incorrect for older server versions as we won't
fallback to the old method when streaming from older server.  What
this needs to do is return true and set restart_lsn to
InvalidXLogRecPtr, so as pg_receivewal would just stream from the
current flush location.  "false" should just be returned on error,
with pg_log_error().

Thank you, this was an oversight when moving from the more complicated error
handling code.

+$primary->psql('postgres',
+       'INSERT INTO test_table VALUES (generate_series(1,100));');
+$primary->psql('postgres', 'SELECT pg_switch_wal();');
+$nextlsn =
+  $primary->safe_psql('postgres', 'SELECT pg_current_wal_insert_lsn();');
+chomp($nextlsn);
There is no need to switch twice to a new WAL segment as we just need
to be sure that the WAL segment of the restart_lsn is the one
archived.  Note that RESERVE_WAL uses the last redo point, so it is
better to use a checkpoint and reduce the number of logs we stream
into the new location.

Better to add some --no-sync to the new commands of pg_receivewal, to
not stress the I/O more than necessary. I have added some extra -n
while on it to avoid loops on failure.

Attached is the updated patch I am finishing with, which is rather
clean now. I have tweaked a couple of things while on it, and
documented better the new GetSlotInformation() in streamutil.c.
--
Michael

--
Ronan Dunklau

#47Ronan Dunklau
ronan.dunklau@aiven.io
In reply to: Michael Paquier (#44)
Re: pg_receivewal starting position

Le lundi 25 octobre 2021, 00:43:11 CEST Michael Paquier a écrit :

On Sun, Oct 24, 2021 at 07:20:57PM +0530, Bharath Rupireddy wrote:

Thanks. I've no further comments on the v10 patch.

Okay, thanks. I have applied this part, then.
--
Michael

Thank you all for your work on this patch.

--
Ronan Dunklau

#48Michael Paquier
michael@paquier.xyz
In reply to: Ronan Dunklau (#46)
Re: pg_receivewal starting position

On Mon, Oct 25, 2021 at 09:15:32AM +0200, Ronan Dunklau wrote:

Does it make sense though ? The NULL slot_name case handling is pretty
straight forward has it will be handled by string formatting, but in the case
of a null restart_lsn, we have no way of knowing if the command was issued at
all.

If I am following your point, I don't think that it matters much here,
and it seems useful to me to be able to pass NULL for both of them, so
as one can check if the slot exists or not with an API designed this
way.
--
Michael

#49Ronan Dunklau
ronan.dunklau@aiven.io
In reply to: Michael Paquier (#48)
Re: pg_receivewal starting position

Le lundi 25 octobre 2021, 09:40:10 CEST Michael Paquier a écrit :

On Mon, Oct 25, 2021 at 09:15:32AM +0200, Ronan Dunklau wrote:

Does it make sense though ? The NULL slot_name case handling is pretty
straight forward has it will be handled by string formatting, but in the
case of a null restart_lsn, we have no way of knowing if the command was
issued at all.

If I am following your point, I don't think that it matters much here,
and it seems useful to me to be able to pass NULL for both of them, so
as one can check if the slot exists or not with an API designed this
way.

You're right, but I'm afraid we would have to check the server version twice
in any case different from the basic pg_receivewal on (once in the function
itself, and one before calling it if we want a meaningful result). Maybe we
should move the version check outside the GetSlotInformation function to avoid
this, and let it fail with a syntax error when the server doesn't support it ?

--
Ronan Dunklau

#50Michael Paquier
michael@paquier.xyz
In reply to: Ronan Dunklau (#49)
Re: pg_receivewal starting position

On Mon, Oct 25, 2021 at 09:50:01AM +0200, Ronan Dunklau wrote:

Le lundi 25 octobre 2021, 09:40:10 CEST Michael Paquier a écrit :

On Mon, Oct 25, 2021 at 09:15:32AM +0200, Ronan Dunklau wrote:

Does it make sense though ? The NULL slot_name case handling is pretty
straight forward has it will be handled by string formatting, but in the
case of a null restart_lsn, we have no way of knowing if the command was
issued at all.

If I am following your point, I don't think that it matters much here,
and it seems useful to me to be able to pass NULL for both of them, so
as one can check if the slot exists or not with an API designed this
way.

You're right, but I'm afraid we would have to check the server version twice
in any case different from the basic pg_receivewal on (once in the function
itself, and one before calling it if we want a meaningful result). Maybe we
should move the version check outside the GetSlotInformation function to avoid
this, and let it fail with a syntax error when the server doesn't support it ?

With the approach taken by the patch, we fall down silently to the
previous behavior if we connect to a server <= 14, and rely on the new
behavior with a server >= 15, ensuring compatibility. Why would you
want to make sure that the command is executed when we should just
enforce that the old behavior is what happens when there is a slot
defined and a backend <= 14?
--
Michael

#51Ronan Dunklau
ronan.dunklau@aiven.io
In reply to: Michael Paquier (#50)
Re: pg_receivewal starting position

Le lundi 25 octobre 2021, 10:10:13 CEST Michael Paquier a écrit :

On Mon, Oct 25, 2021 at 09:50:01AM +0200, Ronan Dunklau wrote:

Le lundi 25 octobre 2021, 09:40:10 CEST Michael Paquier a écrit :

On Mon, Oct 25, 2021 at 09:15:32AM +0200, Ronan Dunklau wrote:

Does it make sense though ? The NULL slot_name case handling is pretty
straight forward has it will be handled by string formatting, but in the
case of a null restart_lsn, we have no way of knowing if the command was
issued at all.

If I am following your point, I don't think that it matters much here,
and it seems useful to me to be able to pass NULL for both of them, so
as one can check if the slot exists or not with an API designed this
way.

You're right, but I'm afraid we would have to check the server version
twice in any case different from the basic pg_receivewal on (once in the
function itself, and one before calling it if we want a meaningful
result). Maybe we should move the version check outside the
GetSlotInformation function to avoid this, and let it fail with a syntax
error when the server doesn't support it ?

With the approach taken by the patch, we fall down silently to the
previous behavior if we connect to a server <= 14, and rely on the new
behavior with a server >= 15, ensuring compatibility. Why would you
want to make sure that the command is executed when we should just
enforce that the old behavior is what happens when there is a slot
defined and a backend <= 14?

Sorry I haven't been clear. For the use case of this patch, the current
approach is perfect (and we supply the restart_lsn).

However, if we want to support the case of "just check if the slot exists", we
need to make sure the command is actually executed, and check the version
before calling the function, which would make the check executed twice.

What I'm proposing is just that we let the responsibility of checking
PQServerVersion() to the caller, and remove it from GetSlotInformation, ie:

-               if (replication_slot != NULL)
+               if (replication_slot != NULL && PQserverVersion(conn) >= 
150000)
                {
                        if (!GetSlotInformation(conn, replication_slot, 
&stream.startpos,
                                                                        &stream.timeline))

That way, if we introduce a caller wanting to use this function as an API to
check a slot exists, the usage of checking the server version beforehand will
be consistent.

--
Ronan Dunklau

#52Michael Paquier
michael@paquier.xyz
In reply to: Ronan Dunklau (#51)
Re: pg_receivewal starting position

On Mon, Oct 25, 2021 at 10:24:46AM +0200, Ronan Dunklau wrote:

However, if we want to support the case of "just check if the slot exists", we
need to make sure the command is actually executed, and check the version
before calling the function, which would make the check executed twice.

What I'm proposing is just that we let the responsibility of checking
PQServerVersion() to the caller, and remove it from GetSlotInformation, ie:

-               if (replication_slot != NULL)
+               if (replication_slot != NULL && PQserverVersion(conn) >= 
150000)
{
if (!GetSlotInformation(conn, replication_slot, 
&stream.startpos,
&stream.timeline))

That way, if we introduce a caller wanting to use this function as an API to
check a slot exists, the usage of checking the server version beforehand will
be consistent.

Ah, good point. My apologies for not following. Indeed, the patch
makes this part of the routine a bit blurry. It is fine by me to do
as you suggest, and let the caller do the version check as you
propose, while making the routine fail if directly called for an older
server.
--
Michael

#53Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Michael Paquier (#45)
Re: pg_receivewal starting position

On Mon, Oct 25, 2021 at 12:21 PM Michael Paquier <michael@paquier.xyz> wrote:

Attached is the updated patch I am finishing with, which is rather
clean now. I have tweaked a couple of things while on it, and
documented better the new GetSlotInformation() in streamutil.c.

Thanks for the v11 patch, here are some comments:

1) Remove the extra whitespace in between "t" and ":"
+ pg_log_error("could not read replication slot : %s",

2) I think we should tweak the below error message
+ pg_log_error("could not read replication slot \"%s\": %s",
slot_name, PQerrorMessage(conn));
to
pg_log_error("could not read replication slot \"%s\": %s",
"IDENTIFY_SYSTEM", PQerrorMessage(conn));
Having slot name in the error message helps to isolate the error
message from tons of server logs that gets generated.

3) Also for the same reasons stated as above, change the below error message
pg_log_error("could not read replication slot: got %d rows and %d
fields, expected %d rows and %d or more fields",
to
pg_log_error("could not read replication slot \"%s\": got %d rows and
%d fields, expected %d rows and %d or more fields", slot_name,....

4) Also for the same reasons, change below
+ pg_log_error("could not parse slot's restart_lsn \"%s\"",
to
pg_log_error("could not parse replicaton slot \"%s\" restart_lsn \"%s\"",
slot_name, PQgetvalue(res, 0, 1));

5) I think we should also have assertion for the timeline id:
Assert(stream.startpos != InvalidXLogRecPtr);
Assert(stream.timeline!= 0);

6) Why do we need these two assignements?
+ if (*restart_lsn)
+ *restart_lsn = lsn_loc;
+ if (restart_tli != NULL)
+ *restart_tli = tli_loc;
+ /* Assign results if requested */
+ if (restart_lsn)
+ *restart_lsn = lsn_loc;
+ if (restart_tli)
+ *restart_tli = tli_loc;

I think we can just get rid of lsn_loc and tli_loc, initlaize
*restart_lsn = InvalidXLogRecPtr and *restart_tli = 0 at the start of
the function and directly assign the requrested values to *restart_lsn
and *restart_tli, also see comment (8).

7) Let's be consistent, change the following
+
+ if (*restart_lsn)
+ *restart_lsn = lsn_loc;
+ if (restart_tli != NULL)
+ *restart_tli = tli_loc;
to
+
+ if (restart_lsn)
+ *restart_lsn = lsn_loc;
+ if (restart_tli != NULL)
+ *restart_tli = tli_loc;
8) Let's extract the values asked by the caller, change:
+ /* restart LSN */
+ if (!PQgetisnull(res, 0, 1))
+ /* current TLI */
+ if (!PQgetisnull(res, 0, 2))
+ tli_loc = (TimeLineID) atol(PQgetvalue(res, 0, 2));

to

+ /* restart LSN */
+ if (restart_lsn && !PQgetisnull(res, 0, 1))
+ /* current TLI */
+ if (restart_tli && !PQgetisnull(res, 0, 2))

9) 80char limit crossed:
+GetSlotInformation(PGconn *conn, const char *slot_name, XLogRecPtr
*restart_lsn, TimeLineID *restart_tli)

10) Missing word "command", and use "issued to the server", so change the below:
+      <command>READ_REPLICATION_SLOT</command> is issued to retrieve the
to
+      <command>READ_REPLICATION_SLOT</command> command is issued to
the server to retrieve the
11) Will replication_slot ever be NULL? If it ever be null, then we
don't reach this far right? We see the pg_log_error("%s needs a slot
to be specified using --slot". Please revmove below if condition:
+ * server may not support this option.
+ */
+ if (replication_slot != NULL)

We can just add Assert(slot_name); in GetSlotInformation().

12) How about following:
"If a starting point cannot be calculated with the previous method,
<command>READ_REPLICATION_SLOT</command> command with the provided
slot is issued to the server for retreving the slot's restart_lsn and
timelineid"
instead of
+      and if a replication slot is used, an extra
+      <command>READ_REPLICATION_SLOT</command> is issued to retrieve the
+      slot's <literal>restart_lsn</literal> to use as starting point.

The IDENTIFY_SYSTEM descritption starts with "If a starting point
cannot be calculated....":
<listitem>
<para>
If a starting point cannot be calculated with the previous method,
the latest WAL flush location is used as reported by the server from
a <literal>IDENTIFY_SYSTEM</literal> command.

Regards,
Bharath Rupireddy.

#54Michael Paquier
michael@paquier.xyz
In reply to: Bharath Rupireddy (#53)
1 attachment(s)
Re: pg_receivewal starting position

On Mon, Oct 25, 2021 at 02:40:05PM +0530, Bharath Rupireddy wrote:

2) I think we should tweak the below error message
to
pg_log_error("could not read replication slot \"%s\": %s",
"IDENTIFY_SYSTEM", PQerrorMessage(conn));
Having slot name in the error message helps to isolate the error
message from tons of server logs that gets generated.

Yes, this suggestion makes sense.

3) Also for the same reasons stated as above, change the below error message
pg_log_error("could not read replication slot: got %d rows and %d
fields, expected %d rows and %d or more fields",
to
pg_log_error("could not read replication slot \"%s\": got %d rows and
%d fields, expected %d rows and %d or more fields", slot_name,....

We can even get rid of "or more" to match the condition used.

4) Also for the same reasons, change below
+ pg_log_error("could not parse slot's restart_lsn \"%s\"",
to
pg_log_error("could not parse replicaton slot \"%s\" restart_lsn \"%s\"",
slot_name, PQgetvalue(res, 0, 1));

Appending the slot name makes sense.

5) I think we should also have assertion for the timeline id:
Assert(stream.startpos != InvalidXLogRecPtr);
Assert(stream.timeline!= 0);

Okay.

6) Why do we need these two assignements?
I think we can just get rid of lsn_loc and tli_loc, initlaize
*restart_lsn = InvalidXLogRecPtr and *restart_tli = 0 at the start of
the function and directly assign the requrested values to *restart_lsn
and *restart_tli, also see comment (8).

FWIW, I find the style of the patch easier to follow.

9) 80char limit crossed:
+GetSlotInformation(PGconn *conn, const char *slot_name, XLogRecPtr
*restart_lsn, TimeLineID *restart_tli)

pgindent says nothing.

10) Missing word "command", and use "issued to the server", so change the below:
+      <command>READ_REPLICATION_SLOT</command> is issued to retrieve the
to
+      <command>READ_REPLICATION_SLOT</command> command is issued to
the server to retrieve the

Okay.

11) Will replication_slot ever be NULL? If it ever be null, then we
don't reach this far right? We see the pg_log_error("%s needs a slot
to be specified using --slot". Please revmove below if condition:
+ * server may not support this option.

Did you notice that this applies when creating or dropping a slot, for
code paths entirely different than what we are dealing with here?
--
Michael

Attachments:

v12-0001-Allow-pg_receivewal-to-stream-from-a-slot-s-rest.patchtext/x-diff; charset=us-asciiDownload
From bc38918b44e3e41836bb075e7f915dcc40fddcb9 Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Mon, 25 Oct 2021 19:48:11 +0900
Subject: [PATCH v12] Allow pg_receivewal to stream from a slot's restart LSN

Prior to this patch, when running pg_receivewal, the streaming start
point would be the current location of the archives if anything is
found, and pg_receivewal would fall back to the current WAL flush
location if there are no archives, as of the result of an
IDENTIFY_SYSTEM command.

If for some reason the WAL files from pg_receivewal were moved, we want
to restart where we left at, which is the replication slot's restart_lsn
instead of skipping right to the current flush location.  So this makes
pg_receivewal use the following sequence of methods to determine the
starting streaming LSN:
- Scan the local archives.
- Use the slot's restart_lsn, if supported.
- Fallback to the current flush LSN.

To keep compatibility with prior server versions, we only attempt to use
READ_REPLICATION_SLOT if the backend version is at least 15, and
fallback to the older behavior of streaming from the current flush
LSN if the command is not supported.

Some new TAP tests are added to cover this feature.
---
 src/bin/pg_basebackup/pg_receivewal.c        | 31 ++++++-
 src/bin/pg_basebackup/streamutil.c           | 98 ++++++++++++++++++++
 src/bin/pg_basebackup/streamutil.h           |  3 +
 src/bin/pg_basebackup/t/020_pg_receivewal.pl | 53 ++++++++++-
 doc/src/sgml/ref/pg_receivewal.sgml          | 11 +++
 5 files changed, 192 insertions(+), 4 deletions(-)

diff --git a/src/bin/pg_basebackup/pg_receivewal.c b/src/bin/pg_basebackup/pg_receivewal.c
index d5140a79fe..04ba20b197 100644
--- a/src/bin/pg_basebackup/pg_receivewal.c
+++ b/src/bin/pg_basebackup/pg_receivewal.c
@@ -404,15 +404,40 @@ StreamLog(void)
 		exit(1);
 
 	/*
-	 * Figure out where to start streaming.
+	 * Figure out where to start streaming.  First scan the local directory.
 	 */
 	stream.startpos = FindStreamingStart(&stream.timeline);
 	if (stream.startpos == InvalidXLogRecPtr)
 	{
-		stream.startpos = serverpos;
-		stream.timeline = servertli;
+		/*
+		 * Try to get the starting point from the slot if any.  This is
+		 * supported in PostgreSQL 15 and newer.
+		 */
+		if (replication_slot != NULL &&
+			PQserverVersion(conn) >= 150000)
+		{
+			if (!GetSlotInformation(conn, replication_slot, &stream.startpos,
+									&stream.timeline))
+			{
+				/* Error is logged by GetSlotInformation() */
+				return;
+			}
+		}
+
+		/*
+		 * If it the starting point is still not known, use the current WAL
+		 * flush value as last resort.
+		 */
+		if (stream.startpos == InvalidXLogRecPtr)
+		{
+			stream.startpos = serverpos;
+			stream.timeline = servertli;
+		}
 	}
 
+	Assert(stream.startpos != InvalidXLogRecPtr &&
+		   stream.timeline != 0);
+
 	/*
 	 * Always start streaming at the beginning of a segment
 	 */
diff --git a/src/bin/pg_basebackup/streamutil.c b/src/bin/pg_basebackup/streamutil.c
index a9bc1ce214..6f14ed7748 100644
--- a/src/bin/pg_basebackup/streamutil.c
+++ b/src/bin/pg_basebackup/streamutil.c
@@ -479,6 +479,104 @@ RunIdentifySystem(PGconn *conn, char **sysid, TimeLineID *starttli,
 	return true;
 }
 
+
+/*
+ * Run READ_REPLICATION_SLOT through a given connection and give back to
+ * caller some result information if requested for this slot:
+ * - Start LSN position, InvalidXLogRecPtr if unknown.
+ * - Current timeline ID, 0 if unknown.
+ * Returns false on failure, and true otherwise.
+ */
+bool
+GetSlotInformation(PGconn *conn, const char *slot_name,
+				   XLogRecPtr *restart_lsn, TimeLineID *restart_tli)
+{
+	PGresult   *res;
+	PQExpBuffer query;
+	XLogRecPtr	lsn_loc = InvalidXLogRecPtr;
+	TimeLineID	tli_loc = 0;
+
+	if (*restart_lsn)
+		*restart_lsn = lsn_loc;
+	if (restart_tli != NULL)
+		*restart_tli = tli_loc;
+
+	query = createPQExpBuffer();
+	appendPQExpBuffer(query, "READ_REPLICATION_SLOT %s", slot_name);
+	res = PQexec(conn, query->data);
+	destroyPQExpBuffer(query);
+
+	if (PQresultStatus(res) != PGRES_TUPLES_OK)
+	{
+		pg_log_error("could not send replication command \"%s\": %s",
+					 "READ_REPLICATION_SLOT", PQerrorMessage(conn));
+		PQclear(res);
+		return false;
+	}
+
+	/* The command should always return precisely one tuple and three fields */
+	if (PQntuples(res) != 1 || PQnfields(res) != 3)
+	{
+		pg_log_error("could not read replication slot \"%s\": got %d rows and %d fields, expected %d rows and %d fields",
+					 slot_name, PQntuples(res), PQnfields(res), 1, 3);
+		PQclear(res);
+		return false;
+	}
+
+	/*
+	 * When the slot doesn't exist, the command returns a tuple with NULL
+	 * values.  This checks only the slot type field.
+	 */
+	if (PQgetisnull(res, 0, 0))
+	{
+		pg_log_error("could not find replication slot \"%s\"", slot_name);
+		PQclear(res);
+		return false;
+	}
+
+	/*
+	 * Note that this cannot happen as READ_REPLICATION_SLOT supports only
+	 * physical slots, but play it safe.
+	 */
+	if (strcmp(PQgetvalue(res, 0, 0), "physical") != 0)
+	{
+		pg_log_error("expected a physical replication slot, got type \"%s\" instead",
+					 PQgetvalue(res, 0, 0));
+		PQclear(res);
+		return false;
+	}
+
+	/* restart LSN */
+	if (!PQgetisnull(res, 0, 1))
+	{
+		uint32		hi,
+					lo;
+
+		if (sscanf(PQgetvalue(res, 0, 1), "%X/%X", &hi, &lo) != 2)
+		{
+			pg_log_error("could not parse restart_lsn \"%s\" for replication slot \"%s\"",
+						 PQgetvalue(res, 0, 1), slot_name);
+			PQclear(res);
+			return false;
+		}
+		lsn_loc = ((uint64) hi) << 32 | lo;
+	}
+
+	/* current TLI */
+	if (!PQgetisnull(res, 0, 2))
+		tli_loc = (TimeLineID) atol(PQgetvalue(res, 0, 2));
+
+	PQclear(res);
+
+	/* Assign results if requested */
+	if (restart_lsn)
+		*restart_lsn = lsn_loc;
+	if (restart_tli)
+		*restart_tli = tli_loc;
+
+	return true;
+}
+
 /*
  * Create a replication slot for the given connection. This function
  * returns true in case of success.
diff --git a/src/bin/pg_basebackup/streamutil.h b/src/bin/pg_basebackup/streamutil.h
index 65135c79e0..7918935cb3 100644
--- a/src/bin/pg_basebackup/streamutil.h
+++ b/src/bin/pg_basebackup/streamutil.h
@@ -52,6 +52,9 @@ extern void AppendIntegerCommandOption(PQExpBuffer buf,
 									   bool use_new_option_syntax,
 									   char *option_name, int32 option_value);
 
+extern bool GetSlotInformation(PGconn *conn, const char *slot_name,
+							   XLogRecPtr *restart_lsn,
+							   TimeLineID *restart_tli);
 extern bool RetrieveWalSegSize(PGconn *conn);
 extern TimestampTz feGetCurrentTimestamp(void);
 extern void feTimestampDifference(TimestampTz start_time, TimestampTz stop_time,
diff --git a/src/bin/pg_basebackup/t/020_pg_receivewal.pl b/src/bin/pg_basebackup/t/020_pg_receivewal.pl
index b93493b5e9..092c9b6f25 100644
--- a/src/bin/pg_basebackup/t/020_pg_receivewal.pl
+++ b/src/bin/pg_basebackup/t/020_pg_receivewal.pl
@@ -5,7 +5,7 @@ use strict;
 use warnings;
 use PostgreSQL::Test::Utils;
 use PostgreSQL::Test::Cluster;
-use Test::More tests => 27;
+use Test::More tests => 31;
 
 program_help_ok('pg_receivewal');
 program_version_ok('pg_receivewal');
@@ -72,6 +72,8 @@ $primary->command_ok(
 my @partial_wals = glob "$stream_dir/*\.partial";
 is(scalar(@partial_wals), 1, "one partial WAL segment was created");
 
+note "Testing pg_receivewal with compression methods";
+
 # Check ZLIB compression if available.
 SKIP:
 {
@@ -155,3 +157,52 @@ SKIP:
 	ok(check_mode_recursive($stream_dir, 0700, 0600),
 		"check stream dir permissions");
 }
+
+note "Testing pg_receivewal with slot as starting streaming point";
+
+# When using a replication slot, archiving should be resumed from the slot's
+# restart LSN.  Use a new archive location and new slot for this test.
+my $slot_dir = $primary->basedir . '/slot_wal';
+mkdir($slot_dir);
+$slot_name = 'archive_slot';
+
+# Setup the slot, reserving WAL at creation (corresponding to the
+# last redo LSN here, actually).
+$primary->psql('postgres',
+	"SELECT pg_create_physical_replication_slot('$slot_name', true);");
+
+# Get the segment name associated with the slot's restart LSN, that should
+# be archived.
+my $walfile_streamed = $primary->safe_psql(
+	'postgres',
+	"SELECT pg_walfile_name(restart_lsn)
+  FROM pg_replication_slots
+  WHERE slot_name = '$slot_name';");
+
+# Switch to a new segment, to make sure that the segment retained by the
+# slot is still streamed.  This may not be necessary, but play it safe.
+$primary->psql('postgres',
+	'INSERT INTO test_table VALUES (generate_series(1,100));');
+$primary->psql('postgres', 'SELECT pg_switch_wal();');
+$nextlsn =
+  $primary->safe_psql('postgres', 'SELECT pg_current_wal_insert_lsn();');
+chomp($nextlsn);
+
+# Check case where the slot does not exist.
+$primary->command_fails_like(
+	[
+		'pg_receivewal',   '-D', $slot_dir,   '--slot',
+		'nonexistentslot', '-n', '--no-sync', '--verbose',
+		'--endpos',        $nextlsn
+	],
+	qr/pg_receivewal: error: could not find replication slot "nonexistentslot"/,
+	'pg_receivewal fails with non-existing slot');
+$primary->command_ok(
+	[
+		'pg_receivewal', '-D', $slot_dir,   '--slot',
+		$slot_name,      '-n', '--no-sync', '--verbose',
+		'--endpos',      $nextlsn
+	],
+	"WAL streamed from the slot's restart_lsn");
+ok(-e "$slot_dir/$walfile_streamed",
+	"WAL from the slot's restart_lsn has been archived");
diff --git a/doc/src/sgml/ref/pg_receivewal.sgml b/doc/src/sgml/ref/pg_receivewal.sgml
index 6da8b2be8c..d3c7488293 100644
--- a/doc/src/sgml/ref/pg_receivewal.sgml
+++ b/doc/src/sgml/ref/pg_receivewal.sgml
@@ -88,6 +88,17 @@ PostgreSQL documentation
      </para>
     </listitem>
 
+    <listitem>
+     <para>
+      If a starting point cannot not be calculated with the previous method,
+      and if a replication slot is used, an extra
+      <command>READ_REPLICATION_SLOT</command> command is issued to retrieve
+      the slot's <literal>restart_lsn</literal> to use as starting point.
+      This option is only available when streaming write-ahead logs from
+      <productname>PostgreSQL</productname> 15 and up.
+     </para>
+    </listitem>
+
     <listitem>
      <para>
       If a starting point cannot be calculated with the previous method,
-- 
2.33.0

#55Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Michael Paquier (#54)
Re: pg_receivewal starting position

On Mon, Oct 25, 2021 at 4:19 PM Michael Paquier <michael@paquier.xyz> wrote:

6) Why do we need these two assignements?
I think we can just get rid of lsn_loc and tli_loc, initlaize
*restart_lsn = InvalidXLogRecPtr and *restart_tli = 0 at the start of
the function and directly assign the requrested values to *restart_lsn
and *restart_tli, also see comment (8).

FWIW, I find the style of the patch easier to follow.

Then, please change if (*restart_lsn) and if (*restart_tli) to if
(restart_lsn) and if (restart_tli), just to be consistent with the
other parts of the patch and the existing code of RunIdentifySystem():
if (*restart_lsn)
*restart_lsn = lsn_loc;
if (restart_tli != NULL)
*restart_tli = tli_loc;

11) Will replication_slot ever be NULL? If it ever be null, then we
don't reach this far right? We see the pg_log_error("%s needs a slot
to be specified using --slot". Please revmove below if condition:
+ * server may not support this option.

Did you notice that this applies when creating or dropping a slot, for
code paths entirely different than what we are dealing with here?

StreamLog() isn't reached for create and drop slot cases, see [1]/* * Drop a replication slot. */ if (do_drop_slot) { if (verbose) pg_log_info("dropping replication slot \"%s\"", replication_slot);. I
suggest to remove replication_slot != NULL and have Assert(slot_name)
in GetSlotInformation():
/*
* Try to get the starting point from the slot. This is supported in
* PostgreSQL 15 and up.
*/
if (PQserverVersion(conn) >= 150000)
{
if (!GetSlotInformation(conn, replication_slot, &stream.startpos,
&stream.timeline))
{
/* Error is logged by GetSlotInformation() */
return;
}
}

Here is another comment on the patch:
Remove the extra new line above the GetSlotInformation() definition:
return true;
}

+    -----> REMOVE THIS
+/*
+ * Run READ_REPLICATION_SLOT through a given connection and give back to

Apart from the above v12 patch LGTM.

[1]: /* * Drop a replication slot. */ if (do_drop_slot) { if (verbose) pg_log_info("dropping replication slot \"%s\"", replication_slot);
/*
* Drop a replication slot.
*/
if (do_drop_slot)
{
if (verbose)
pg_log_info("dropping replication slot \"%s\"", replication_slot);

if (!DropReplicationSlot(conn, replication_slot))
exit(1);
exit(0);
}

/* Create a replication slot */
if (do_create_slot)
{
if (verbose)
pg_log_info("creating replication slot \"%s\"", replication_slot);

if (!CreateReplicationSlot(conn, replication_slot, NULL,
false, true, false,
slot_exists_ok, false))
exit(1);
exit(0);
}

Regards,
Bharath Rupireddy.

#56Michael Paquier
michael@paquier.xyz
In reply to: Bharath Rupireddy (#55)
Re: pg_receivewal starting position

On Mon, Oct 25, 2021 at 05:46:57PM +0530, Bharath Rupireddy wrote:

StreamLog() isn't reached for create and drop slot cases, see [1]. I
suggest to remove replication_slot != NULL and have Assert(slot_name)
in GetSlotInformation():
/*
* Try to get the starting point from the slot. This is supported in
* PostgreSQL 15 and up.
*/
if (PQserverVersion(conn) >= 150000)
{
if (!GetSlotInformation(conn, replication_slot, &stream.startpos,
&stream.timeline))
{
/* Error is logged by GetSlotInformation() */
return;
}
}

Please note that it is possible to use pg_receivewal without a slot,
which is the default case, so we cannot do what you are suggesting
here. An assertion on slot_name in GetSlotInformation() is not that
helpful either in my opinion, as we would just crash a couple of lines
down the road.

I have changed the patch per Ronan's suggestion to have the version
check out of GetSlotInformation(), addressed what you have reported,
and the result looked good. So I have applied this part.

What remains on this thread is the addition of new tests to make sure
that pg_receivewal is able to follow a timeline switch. Now that we
can restart from a slot that should be a bit easier to implemented as
a test by creating a slot on a standby. Ronan, are you planning to
send a new patch for this part?
--
Michael

#57Ronan Dunklau
ronan.dunklau@aiven.io
In reply to: Michael Paquier (#56)
Re: pg_receivewal starting position

Le mardi 26 octobre 2021, 03:15:40 CEST Michael Paquier a écrit :

I have changed the patch per Ronan's suggestion to have the version
check out of GetSlotInformation(), addressed what you have reported,
and the result looked good. So I have applied this part.

Thanks !

What remains on this thread is the addition of new tests to make sure
that pg_receivewal is able to follow a timeline switch. Now that we
can restart from a slot that should be a bit easier to implemented as
a test by creating a slot on a standby. Ronan, are you planning to
send a new patch for this part?

Yes, I will try to simplify the logic of the patch I sent last week. I'll keep
you posted here soon.

--
Ronan Dunklau

#58Ronan Dunklau
ronan.dunklau@aiven.io
In reply to: Ronan Dunklau (#57)
1 attachment(s)
Re: pg_receivewal starting position

Le mardi 26 octobre 2021, 08:27:47 CEST Ronan Dunklau a écrit :

Yes, I will try to simplify the logic of the patch I sent last week. I'll
keep you posted here soon.

I was able to simplify it quite a bit, by using only one standby for both test
scenarios.

This test case verify that after a timeline switch, if we resume from a
previous state we will archive:
- segments from the old timeline
- segments from the new timeline
- the timeline history file itself.

I chose to check against a full segment from the previous timeline, but it
would have been possible to check that the latest timeline segment was
partial. I chose not not, in the unlikely event we promote at an exact segment
boundary. I don't think it matters much, since partial wal files are already
covered by other tests.

--
Ronan Dunklau

Attachments:

v13-0001-Add-a-test-for-pg_receivewal-following-timeline-.patchtext/x-patch; charset=UTF-8; name=v13-0001-Add-a-test-for-pg_receivewal-following-timeline-.patchDownload
From 0daf03e7e69aa11accb18df86d686004002115ff Mon Sep 17 00:00:00 2001
From: Ronan Dunklau <ronan.dunklau@aiven.io>
Date: Tue, 26 Oct 2021 10:54:12 +0200
Subject: [PATCH v13] Add a test for pg_receivewal following timeline switch.

pg_receivewal is able to follow a timeline switch, but this was not
tested anywher. This test case verify that it works as expected both
when resuming from a replication slot and from the archive directory,
which have different methods of retrieving the timeline it left off.
---
 src/bin/pg_basebackup/t/020_pg_receivewal.pl | 87 +++++++++++++++++++-
 1 file changed, 86 insertions(+), 1 deletion(-)

diff --git a/src/bin/pg_basebackup/t/020_pg_receivewal.pl b/src/bin/pg_basebackup/t/020_pg_receivewal.pl
index 092c9b6f25..f285e9b3f3 100644
--- a/src/bin/pg_basebackup/t/020_pg_receivewal.pl
+++ b/src/bin/pg_basebackup/t/020_pg_receivewal.pl
@@ -5,7 +5,7 @@ use strict;
 use warnings;
 use PostgreSQL::Test::Utils;
 use PostgreSQL::Test::Cluster;
-use Test::More tests => 31;
+use Test::More tests => 39;
 
 program_help_ok('pg_receivewal');
 program_version_ok('pg_receivewal');
@@ -206,3 +206,88 @@ $primary->command_ok(
 	"WAL streamed from the slot's restart_lsn");
 ok(-e "$slot_dir/$walfile_streamed",
 	"WAL from the slot's restart_lsn has been archived");
+
+# Test a timeline switch.
+# This test is split in two, using the same standby: one test check the
+# resume-from-folder case, the other the resume-from-slot one.
+
+# Setup a standby for our tests
+my $backup_name = "basebackup";
+$primary->backup($backup_name);
+my $standby = PostgreSQL::Test::Cluster->new("standby");
+$standby->init_from_backup($primary, $backup_name, has_streaming => 1);
+$standby->start;
+
+# Cleanup the previous stream directories to reuse them
+unlink glob "'${stream_dir}/*'";
+unlink glob "'${slot_dir}/*'";
+
+# Create two replication slots.
+# First one is to make sure we keep the wal for the resume-from-folder case.
+my $folder_slot = "folder_slot";
+$standby->psql('',
+  "CREATE_REPLICATION_SLOT $folder_slot PHYSICAL (RESERVE_WAL)",
+  replication => 1);
+# Second one is for testing the resume-from-slot case
+my $archive_slot = "archive_slot";
+$standby->psql('',
+  "CREATE_REPLICATION_SLOT $archive_slot PHYSICAL (RESERVE_WAL)",
+  replication => 1);
+# Get a walfilename from before the promotion to make sure it is archived
+# after promotion
+my $walfile_before_promotion = $primary->safe_psql('postgres',
+  "SELECT pg_walfile_name(pg_current_wal_insert_lsn())");
+# Switch wal to make sure it is not a partial file but a complete segment.
+$primary->psql('postgres',
+	'INSERT INTO test_table VALUES (generate_series(1,100));');
+$primary->psql('postgres', 'SELECT pg_switch_wal();');
+
+# Populate the stream_dir for the resume-from-folder case.
+$nextlsn = $primary->safe_psql('postgres', 'SELECT pg_current_wal_insert_lsn();');
+chomp($nextlsn);
+$standby->run_log(
+    [ 'pg_receivewal', '-D', $stream_dir, '--verbose', '--endpos', $nextlsn,
+      '--slot', $folder_slot, '--no-sync'],
+    "Stream some wal before promoting");
+
+# Everything is setup, promote the standby to trigger a timeline switch
+$standby->psql(
+  'postgres',
+  "SELECT pg_promote(wait_seconds => 300)");
+
+# Force a wal switch to make sure at least one full WAL is archived on the new
+# timeline, and fetch this walfilename.
+my $walfile_after_promotion = $standby->safe_psql('postgres',
+  "SELECT pg_walfile_name(pg_current_wal_insert_lsn())");
+$standby->psql('postgres',
+	'INSERT INTO test_table VALUES (generate_series(1,100));');
+$standby->psql('postgres', 'SELECT pg_switch_wal();');
+$nextlsn =
+  $standby->safe_psql('postgres', 'SELECT pg_current_wal_insert_lsn();');
+chomp($nextlsn);
+
+# Now, try to resume after the promotion, from the folder.
+$standby->command_ok(
+	[ 'pg_receivewal', '-D', $stream_dir, '--verbose', '--endpos', $nextlsn,
+    '--slot', $folder_slot, '--no-sync'],
+	"Stream some wal after promoting, resuming from the folder's position");
+
+ok(-e "$stream_dir/$walfile_before_promotion",
+  "WAL from the old timeline has been archived resuming from the folder");
+ok(-e "$stream_dir/$walfile_after_promotion",
+  "WAL from the new timeline has been archived resuming from the folder");
+ok(-e "$stream_dir/00000002.history",
+  "Timeline history file has been archived resuming from the folder");
+
+# Same thing, resuming from the slot
+$standby->command_ok(
+	[ 'pg_receivewal', '-D', $slot_dir, '--verbose', '--endpos', $nextlsn,
+    '--slot', $archive_slot, '--no-sync'],
+	"Stream some wal after promoting, resuming from the slot's position");
+my @walfiles = glob "$slot_dir/*";
+ok(-e "$slot_dir/$walfile_before_promotion",
+  "WAL from the old timeline has been archived resuming from the slot");
+ok(-e "$slot_dir/$walfile_after_promotion",
+  "WAL from the new timeline has been archived resuming from the slot");
+ok(-e "$slot_dir/00000002.history",
+  "Timeline history file has been archived resuming from the slot");
-- 
2.33.1

#59Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Ronan Dunklau (#58)
Re: pg_receivewal starting position

At Tue, 26 Oct 2021 11:01:46 +0200, Ronan Dunklau <ronan.dunklau@aiven.io> wrote in

Le mardi 26 octobre 2021, 08:27:47 CEST Ronan Dunklau a écrit :

Yes, I will try to simplify the logic of the patch I sent last week. I'll
keep you posted here soon.

I was able to simplify it quite a bit, by using only one standby for both test
scenarios.

This test case verify that after a timeline switch, if we resume from a
previous state we will archive:
- segments from the old timeline
- segments from the new timeline
- the timeline history file itself.

I chose to check against a full segment from the previous timeline, but it
would have been possible to check that the latest timeline segment was
partial. I chose not not, in the unlikely event we promote at an exact segment
boundary. I don't think it matters much, since partial wal files are already
covered by other tests.

+my @walfiles = glob "$slot_dir/*";

This is not used.

Each pg_receivewal run stalls for about 10 or more seconds before
finishing, which is not great from the standpoint of recently
increasing test run time.

Maybe we want to advance LSN a bit, after taking $nextlsn then pass
"-s 1" to pg_receivewal.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#60Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Kyotaro Horiguchi (#59)
Re: pg_receivewal starting position

At Wed, 27 Oct 2021 11:17:28 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in

Each pg_receivewal run stalls for about 10 or more seconds before
finishing, which is not great from the standpoint of recently
increasing test run time.

Maybe we want to advance LSN a bit, after taking $nextlsn then pass
"-s 1" to pg_receivewal.

Hmm. Sorry, my fingers slipped. We don't need '-s 1' for this purpose.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#61Ronan Dunklau
ronan.dunklau@aiven.io
In reply to: Kyotaro Horiguchi (#59)
1 attachment(s)
Re: pg_receivewal starting position

Le mercredi 27 octobre 2021, 04:17:28 CEST Kyotaro Horiguchi a écrit :

+my @walfiles = glob "$slot_dir/*";

This is not used.

Sorry, fixed in attached version.

Each pg_receivewal run stalls for about 10 or more seconds before
finishing, which is not great from the standpoint of recently
increasing test run time.

Maybe we want to advance LSN a bit, after taking $nextlsn then pass
"-s 1" to pg_receivewal.

I incorrectly assumed it was due to the promotion time without looking into
it. In fact, you're right the LSN was not incremented after we fetched the end
lsn, and thus we would wait for quite a while. I fixed that too.

Thank you for the review !

--
Ronan Dunklau

Attachments:

v14-0001-Add-a-test-for-pg_receivewal-following-timeline-.patchtext/x-patch; charset=UTF-8; name=v14-0001-Add-a-test-for-pg_receivewal-following-timeline-.patchDownload
From 04747199b7be896a62021e4f064b2342234427d5 Mon Sep 17 00:00:00 2001
From: Ronan Dunklau <ronan.dunklau@aiven.io>
Date: Tue, 26 Oct 2021 10:54:12 +0200
Subject: [PATCH v14] Add a test for pg_receivewal following timeline switch.

pg_receivewal is able to follow a timeline switch, but this was not
tested anywher. This test case verify that it works as expected both
when resuming from a replication slot and from the archive directory,
which have different methods of retrieving the timeline it left off.
---
 src/bin/pg_basebackup/t/020_pg_receivewal.pl | 86 +++++++++++++++++++-
 1 file changed, 85 insertions(+), 1 deletion(-)

diff --git a/src/bin/pg_basebackup/t/020_pg_receivewal.pl b/src/bin/pg_basebackup/t/020_pg_receivewal.pl
index 092c9b6f25..28e4e9b3a4 100644
--- a/src/bin/pg_basebackup/t/020_pg_receivewal.pl
+++ b/src/bin/pg_basebackup/t/020_pg_receivewal.pl
@@ -5,7 +5,7 @@ use strict;
 use warnings;
 use PostgreSQL::Test::Utils;
 use PostgreSQL::Test::Cluster;
-use Test::More tests => 31;
+use Test::More tests => 39;
 
 program_help_ok('pg_receivewal');
 program_version_ok('pg_receivewal');
@@ -206,3 +206,87 @@ $primary->command_ok(
 	"WAL streamed from the slot's restart_lsn");
 ok(-e "$slot_dir/$walfile_streamed",
 	"WAL from the slot's restart_lsn has been archived");
+
+# Test a timeline switch.
+# This test is split in two, using the same standby: one test check the
+# resume-from-folder case, the other the resume-from-slot one.
+
+# Setup a standby for our tests
+my $backup_name = "basebackup";
+$primary->backup($backup_name);
+my $standby = PostgreSQL::Test::Cluster->new("standby");
+$standby->init_from_backup($primary, $backup_name, has_streaming => 1);
+$standby->start;
+
+# Cleanup the previous stream directories to reuse them
+unlink glob "'${stream_dir}/*'";
+unlink glob "'${slot_dir}/*'";
+
+# Create two replication slots.
+# First one is to make sure we keep the wal for the resume-from-folder case.
+my $folder_slot = "folder_slot";
+$standby->psql('',
+  "CREATE_REPLICATION_SLOT $folder_slot PHYSICAL (RESERVE_WAL)",
+  replication => 1);
+# Second one is for testing the resume-from-slot case
+my $archive_slot = "archive_slot";
+$standby->psql('',
+  "CREATE_REPLICATION_SLOT $archive_slot PHYSICAL (RESERVE_WAL)",
+  replication => 1);
+# Get a walfilename from before the promotion to make sure it is archived
+# after promotion
+my $walfile_before_promotion = $primary->safe_psql('postgres',
+  "SELECT pg_walfile_name(pg_current_wal_insert_lsn())");
+# Switch wal to make sure it is not a partial file but a complete segment.
+$primary->psql('postgres',
+	'INSERT INTO test_table VALUES (generate_series(1,100));');
+$primary->psql('postgres', 'SELECT pg_switch_wal();');
+
+# Populate the stream_dir for the resume-from-folder case.
+$nextlsn = $primary->safe_psql('postgres', 'SELECT pg_current_wal_insert_lsn();');
+chomp($nextlsn);
+$standby->run_log(
+    [ 'pg_receivewal', '-D', $stream_dir, '--verbose', '--endpos', $nextlsn,
+      '--slot', $folder_slot, '--no-sync'],
+    "Stream some wal before promoting");
+
+# Everything is setup, promote the standby to trigger a timeline switch
+$standby->psql(
+  'postgres',
+  "SELECT pg_promote(wait_seconds => 300)");
+
+# Force a wal switch to make sure at least one full WAL is archived on the new
+# timeline, and fetch this walfilename.
+my $walfile_after_promotion = $standby->safe_psql('postgres',
+  "SELECT pg_walfile_name(pg_current_wal_insert_lsn())");
+$standby->psql('postgres',
+	'INSERT INTO test_table VALUES (generate_series(1,100));');
+$standby->psql('postgres', 'SELECT pg_switch_wal();');
+$nextlsn =
+  $standby->safe_psql('postgres', 'SELECT pg_current_wal_insert_lsn();');
+chomp($nextlsn);
+
+# Now, try to resume after the promotion, from the folder.
+$standby->command_ok(
+	[ 'pg_receivewal', '-D', $stream_dir, '--verbose', '--endpos', $nextlsn,
+    '--slot', $folder_slot, '--no-sync'],
+	"Stream some wal after promoting, resuming from the folder's position");
+
+ok(-e "$stream_dir/$walfile_before_promotion",
+  "WAL from the old timeline has been archived resuming from the folder");
+ok(-e "$stream_dir/$walfile_after_promotion",
+  "WAL from the new timeline has been archived resuming from the folder");
+ok(-e "$stream_dir/00000002.history",
+  "Timeline history file has been archived resuming from the folder");
+
+# Same thing, resuming from the slot
+$standby->command_ok(
+	[ 'pg_receivewal', '-D', $slot_dir, '--verbose', '--endpos', $nextlsn,
+    '--slot', $archive_slot, '--no-sync'],
+	"Stream some wal after promoting, resuming from the slot's position");
+ok(-e "$slot_dir/$walfile_before_promotion",
+  "WAL from the old timeline has been archived resuming from the slot");
+ok(-e "$slot_dir/$walfile_after_promotion",
+  "WAL from the new timeline has been archived resuming from the slot");
+ok(-e "$slot_dir/00000002.history",
+  "Timeline history file has been archived resuming from the slot");
-- 
2.33.1

#62Ronan Dunklau
ronan.dunklau@aiven.io
In reply to: Ronan Dunklau (#61)
1 attachment(s)
Re: pg_receivewal starting position

Le mercredi 27 octobre 2021, 10:00:40 CEST Ronan Dunklau a écrit :

Le mercredi 27 octobre 2021, 04:17:28 CEST Kyotaro Horiguchi a écrit :

+my @walfiles = glob "$slot_dir/*";

This is not used.

Sorry, fixed in attached version.

Each pg_receivewal run stalls for about 10 or more seconds before
finishing, which is not great from the standpoint of recently
increasing test run time.

Maybe we want to advance LSN a bit, after taking $nextlsn then pass
"-s 1" to pg_receivewal.

I incorrectly assumed it was due to the promotion time without looking into
it. In fact, you're right the LSN was not incremented after we fetched the
end lsn, and thus we would wait for quite a while. I fixed that too.

Thank you for the review !

Sorry I sent an intermediary version of the patch, here is the correct one.

--
Ronan Dunklau

Attachments:

v15-0001-Add-a-test-for-pg_receivewal-following-timeline-.patchtext/x-patch; charset=UTF-8; name=v15-0001-Add-a-test-for-pg_receivewal-following-timeline-.patchDownload
From 8fa22687e50f02539119568f98b6a11c077f1774 Mon Sep 17 00:00:00 2001
From: Ronan Dunklau <ronan.dunklau@aiven.io>
Date: Tue, 26 Oct 2021 10:54:12 +0200
Subject: [PATCH v15] Add a test for pg_receivewal following timeline switch.

pg_receivewal is able to follow a timeline switch, but this was not
tested anywher. This test case verify that it works as expected both
when resuming from a replication slot and from the archive directory,
which have different methods of retrieving the timeline it left off.
---
 src/bin/pg_basebackup/t/020_pg_receivewal.pl | 88 +++++++++++++++++++-
 1 file changed, 87 insertions(+), 1 deletion(-)

diff --git a/src/bin/pg_basebackup/t/020_pg_receivewal.pl b/src/bin/pg_basebackup/t/020_pg_receivewal.pl
index 092c9b6f25..f561aca8f8 100644
--- a/src/bin/pg_basebackup/t/020_pg_receivewal.pl
+++ b/src/bin/pg_basebackup/t/020_pg_receivewal.pl
@@ -5,7 +5,7 @@ use strict;
 use warnings;
 use PostgreSQL::Test::Utils;
 use PostgreSQL::Test::Cluster;
-use Test::More tests => 31;
+use Test::More tests => 39;
 
 program_help_ok('pg_receivewal');
 program_version_ok('pg_receivewal');
@@ -206,3 +206,89 @@ $primary->command_ok(
 	"WAL streamed from the slot's restart_lsn");
 ok(-e "$slot_dir/$walfile_streamed",
 	"WAL from the slot's restart_lsn has been archived");
+
+# Test a timeline switch.
+# This test is split in two, using the same standby: one test check the
+# resume-from-folder case, the other the resume-from-slot one.
+
+# Setup a standby for our tests
+my $backup_name = "basebackup";
+$primary->backup($backup_name);
+my $standby = PostgreSQL::Test::Cluster->new("standby");
+$standby->init_from_backup($primary, $backup_name, has_streaming => 1);
+$standby->start;
+
+# Cleanup the previous stream directories to reuse them
+unlink glob "'${stream_dir}/*'";
+unlink glob "'${slot_dir}/*'";
+
+# Create two replication slots.
+# First one is to make sure we keep the wal for the resume-from-folder case.
+my $folder_slot = "folder_slot";
+$standby->psql('',
+  "CREATE_REPLICATION_SLOT $folder_slot PHYSICAL (RESERVE_WAL)",
+  replication => 1);
+# Second one is for testing the resume-from-slot case
+my $archive_slot = "archive_slot";
+$standby->psql('',
+  "CREATE_REPLICATION_SLOT $archive_slot PHYSICAL (RESERVE_WAL)",
+  replication => 1);
+# Get a walfilename from before the promotion to make sure it is archived
+# after promotion
+my $walfile_before_promotion = $primary->safe_psql('postgres',
+  "SELECT pg_walfile_name(pg_current_wal_insert_lsn())");
+# Populate the stream_dir for the resume-from-folder case.
+$nextlsn = $primary->safe_psql('postgres', 'SELECT pg_current_wal_insert_lsn();');
+chomp($nextlsn);
+# Switch wal to make sure it is not a partial file but a complete segment.
+$primary->psql('postgres',
+	'INSERT INTO test_table VALUES (generate_series(1,100));');
+$primary->psql('postgres', 'SELECT pg_switch_wal();');
+
+$standby->run_log(
+    [ 'pg_receivewal', '-D', $stream_dir, '--verbose', '--endpos', $nextlsn,
+      '--slot', $folder_slot, '--no-sync'],
+    "Stream some wal before promoting");
+
+# Everything is setup, promote the standby to trigger a timeline switch
+$standby->psql(
+  'postgres',
+  "SELECT pg_promote(wait_seconds => 300)");
+
+# Force a wal switch to make sure at least one full WAL is archived on the new
+# timeline, and fetch this walfilename.
+my $walfile_after_promotion = $standby->safe_psql('postgres',
+  "SELECT pg_walfile_name(pg_current_wal_insert_lsn())");
+$standby->psql('postgres',
+	'INSERT INTO test_table VALUES (generate_series(1,100));');
+$standby->psql('postgres', 'SELECT pg_switch_wal();');
+$nextlsn =
+  $standby->safe_psql('postgres', 'SELECT pg_current_wal_insert_lsn();');
+chomp($nextlsn);
+$standby->psql('postgres',
+	'INSERT INTO test_table VALUES (generate_series(1,100));');
+
+# Now, try to resume after the promotion, from the folder.
+$standby->command_ok(
+	[ 'pg_receivewal', '-D', $stream_dir, '--verbose', '--endpos', $nextlsn,
+    '--slot', $folder_slot, '--no-sync'],
+	"Stream some wal after promoting, resuming from the folder's position");
+
+ok(-e "$stream_dir/$walfile_before_promotion",
+  "WAL from the old timeline has been archived resuming from the folder");
+ok(-e "$stream_dir/$walfile_after_promotion",
+  "WAL from the new timeline has been archived resuming from the folder");
+ok(-e "$stream_dir/00000002.history",
+  "Timeline history file has been archived resuming from the folder");
+
+# Same thing, resuming from the slot
+$standby->command_ok(
+	[ 'pg_receivewal', '-D', $slot_dir, '--verbose', '--endpos', $nextlsn,
+    '--slot', $archive_slot, '--no-sync'],
+	"Stream some wal after promoting, resuming from the slot's position");
+ok(-e "$slot_dir/$walfile_before_promotion",
+  "WAL from the old timeline has been archived resuming from the slot");
+ok(-e "$slot_dir/$walfile_after_promotion",
+  "WAL from the new timeline has been archived resuming from the slot");
+ok(-e "$slot_dir/00000002.history",
+  "Timeline history file has been archived resuming from the slot");
-- 
2.33.1

#63Michael Paquier
michael@paquier.xyz
In reply to: Ronan Dunklau (#62)
1 attachment(s)
Re: pg_receivewal starting position

On Wed, Oct 27, 2021 at 10:11:00AM +0200, Ronan Dunklau wrote:

Sorry I sent an intermediary version of the patch, here is the correct one.

While looking at this patch, I have figured out a simple way to make
the tests faster without impacting the coverage. The size of the WAL
segments archived is a bit of a bottleneck as they need to be zeroed
by pg_receivewal at creation. This finishes by being a waste of time,
even if we don't flush the data. So I'd like to switch the test so as
we use a segment size of 1MB, first.

A second thing is that we copy too many segments than really needed
when using the slot's restart_lsn as starting point as RESERVE_WAL
would use the current redo location, so it seems to me that a
checkpoint is in order before the slot creation. A third thing is
that generating some extra data after the end LSN we want to use makes
the test much faster at the end.

With those three methods combined, the test goes down from 11s to 9s
here. Attached is a patch I'd like to apply to make the test
cheaper.

I also had a look at your patch. Here are some comments.

+# Cleanup the previous stream directories to reuse them
+unlink glob "'${stream_dir}/*'";
+unlink glob "'${slot_dir}/*'";
I think that we'd better choose a different location for the
archives.  Keeping the segments of the previous tests is useful for
debugging if a previous step of the test failed.
+$standby->psql('',
+  "CREATE_REPLICATION_SLOT $folder_slot PHYSICAL (RESERVE_WAL)",
+  replication => 1);
Here as well we could use a restart point to reduce the number of
segments archived.
+# Now, try to resume after the promotion, from the folder.
+$standby->command_ok(
+   [ 'pg_receivewal', '-D', $stream_dir, '--verbose', '--endpos', $nextlsn,
+    '--slot', $folder_slot, '--no-sync'],
+   "Stream some wal after promoting, resuming from the folder's position");
What is called the "resume-from-folder" test in the patch is IMO
too costly and brings little extra value, requiring two commands of
pg_receivewal (one to populate the folder and one to test the TLI jump
from the previously-populated point) to test basically the same thing
as when the starting point is taken from the slot.  Except that
restarting from the slot is twice cheaper.  The point of the
resume-from-folder case is to make sure that we restart from the point
of the archive folder rather than the slot's restart_lsn, but your
test fails this part, in fact, because the first command populating
the archive folder also uses "--slot $folder_slot", updating the
slot's restart_lsn before the second pg_receivewal uses this slot
again.

I think, as a whole, that testing for the case where an archive folder
is populated (without a slot!), followed by a second command where we
use a slot that has a restart_lsn older than the archive's location,
to not be that interesting. If we include such a test, there is no
need to include that within the TLI jump part, in my opinion. So I
think that we had better drop this part of the patch, and keep only
the case where we resume from a slot for the TLI jump.

The commands of pg_receivewal included in the test had better use -n
so as there is no infinite loop on failure.
--
Michael

Attachments:

receivewal-tap-cheaper.patchtext/x-diff; charset=us-asciiDownload
diff --git a/src/bin/pg_basebackup/t/020_pg_receivewal.pl b/src/bin/pg_basebackup/t/020_pg_receivewal.pl
index 092c9b6f25..87edcb6200 100644
--- a/src/bin/pg_basebackup/t/020_pg_receivewal.pl
+++ b/src/bin/pg_basebackup/t/020_pg_receivewal.pl
@@ -15,7 +15,7 @@ program_options_handling_ok('pg_receivewal');
 umask(0077);
 
 my $primary = PostgreSQL::Test::Cluster->new('primary');
-$primary->init(allows_streaming => 1);
+$primary->init(allows_streaming => 1, extra => ['--wal-segsize=1']);
 $primary->start;
 
 my $stream_dir = $primary->basedir . '/archive_wal';
@@ -167,7 +167,9 @@ mkdir($slot_dir);
 $slot_name = 'archive_slot';
 
 # Setup the slot, reserving WAL at creation (corresponding to the
-# last redo LSN here, actually).
+# last redo LSN here, actually, so use a checkpoint to reduce the
+# number of segments archived).
+$primary->psql('postgres', 'checkpoint;');
 $primary->psql('postgres',
 	"SELECT pg_create_physical_replication_slot('$slot_name', true);");
 
@@ -182,12 +184,16 @@ my $walfile_streamed = $primary->safe_psql(
 # Switch to a new segment, to make sure that the segment retained by the
 # slot is still streamed.  This may not be necessary, but play it safe.
 $primary->psql('postgres',
-	'INSERT INTO test_table VALUES (generate_series(1,100));');
+	'INSERT INTO test_table VALUES (generate_series(1,10));');
 $primary->psql('postgres', 'SELECT pg_switch_wal();');
 $nextlsn =
   $primary->safe_psql('postgres', 'SELECT pg_current_wal_insert_lsn();');
 chomp($nextlsn);
 
+# Add a bit more data to accelerate the end of the next commands.
+$primary->psql('postgres',
+	'INSERT INTO test_table VALUES (generate_series(1,10));');
+
 # Check case where the slot does not exist.
 $primary->command_fails_like(
 	[
#64Ronan Dunklau
ronan.dunklau@aiven.io
In reply to: Michael Paquier (#63)
2 attachment(s)
Re: pg_receivewal starting position

Le jeudi 28 octobre 2021, 14:31:36 CEST Michael Paquier a écrit :

On Wed, Oct 27, 2021 at 10:11:00AM +0200, Ronan Dunklau wrote:

Sorry I sent an intermediary version of the patch, here is the correct
one.

While looking at this patch, I have figured out a simple way to make
the tests faster without impacting the coverage. The size of the WAL
segments archived is a bit of a bottleneck as they need to be zeroed
by pg_receivewal at creation. This finishes by being a waste of time,
even if we don't flush the data. So I'd like to switch the test so as
we use a segment size of 1MB, first.

A second thing is that we copy too many segments than really needed
when using the slot's restart_lsn as starting point as RESERVE_WAL
would use the current redo location, so it seems to me that a
checkpoint is in order before the slot creation. A third thing is
that generating some extra data after the end LSN we want to use makes
the test much faster at the end.

With those three methods combined, the test goes down from 11s to 9s
here. Attached is a patch I'd like to apply to make the test
cheaper.

Interesting ideas, thanks. For the record, the time drops from ~4.5s to 3s on
average on my machine.
I think if you reduce the size of the generate_series batches, this should
probably be reduced everywhere. With what we do though, inserting a single
line should work just as well, I wonder why we insist on inserting a hundred
lines ? I updated your patch with that small modification, it also makes the
code less verbose.

I also had a look at your patch. Here are some comments.

+# Cleanup the previous stream directories to reuse them
+unlink glob "'${stream_dir}/*'";
+unlink glob "'${slot_dir}/*'";
I think that we'd better choose a different location for the
archives.  Keeping the segments of the previous tests is useful for
debugging if a previous step of the test failed.

Ok.

+$standby->psql('',
+  "CREATE_REPLICATION_SLOT $folder_slot PHYSICAL (RESERVE_WAL)",
+  replication => 1);
Here as well we could use a restart point to reduce the number of
segments archived.

The restart point should be very close, as we don't generate any activity on
the primary between the backup and the slot's creation. I'm not sure adding
the complexity of triggering a checkpoint on the primary and waiting for the
standby to catch up on it would be that useful.

+# Now, try to resume after the promotion, from the folder.
+$standby->command_ok(
+   [ 'pg_receivewal', '-D', $stream_dir, '--verbose', '--endpos', $nextlsn,
+    '--slot', $folder_slot, '--no-sync'],
+   "Stream some wal after promoting, resuming from the folder's position");
What is called the "resume-from-folder" test in the patch is IMO
too costly and brings little extra value, requiring two commands of
pg_receivewal (one to populate the folder and one to test the TLI jump
from the previously-populated point) to test basically the same thing
as when the starting point is taken from the slot.  Except that
restarting from the slot is twice cheaper.  The point of the
resume-from-folder case is to make sure that we restart from the point
of the archive folder rather than the slot's restart_lsn, but your
test fails this part, in fact, because the first command populating
the archive folder also uses "--slot $folder_slot", updating the
slot's restart_lsn before the second pg_receivewal uses this slot
again.

I think, as a whole, that testing for the case where an archive folder
is populated (without a slot!), followed by a second command where we
use a slot that has a restart_lsn older than the archive's location,
to not be that interesting. If we include such a test, there is no
need to include that within the TLI jump part, in my opinion. So I
think that we had better drop this part of the patch, and keep only
the case where we resume from a slot for the TLI jump.

You're right about the test not being that interesting in that case. I thought
it would be worthwhile to test both cases, but resume_from_folder doesn't
actually exercise any code that was not already called before (timeline
computation from segment name).

The commands of pg_receivewal included in the test had better use -n
so as there is no infinite loop on failure.

Ok.

--
Ronan Dunklau

Attachments:

v16-0002-Speed-up-pg_receivewal-tests.patchtext/x-patch; charset=UTF-8; name=v16-0002-Speed-up-pg_receivewal-tests.patchDownload
From ad09605b6f8b2d1cc91e67a255bf469363cad2b3 Mon Sep 17 00:00:00 2001
From: Ronan Dunklau <ronan.dunklau@aiven.io>
Date: Thu, 28 Oct 2021 15:12:04 +0200
Subject: [PATCH v16 2/2] Speed up pg_receivewal tests.

Author: Michael Paquier
---
 src/bin/pg_basebackup/t/020_pg_receivewal.pl | 21 ++++++++++----------
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/src/bin/pg_basebackup/t/020_pg_receivewal.pl b/src/bin/pg_basebackup/t/020_pg_receivewal.pl
index e0422e2242..1691054f5c 100644
--- a/src/bin/pg_basebackup/t/020_pg_receivewal.pl
+++ b/src/bin/pg_basebackup/t/020_pg_receivewal.pl
@@ -15,7 +15,7 @@ program_options_handling_ok('pg_receivewal');
 umask(0077);
 
 my $primary = PostgreSQL::Test::Cluster->new('primary');
-$primary->init(allows_streaming => 1);
+$primary->init(allows_streaming => 1, extra => ['--wal-segsize=1']);
 $primary->start;
 
 my $stream_dir = $primary->basedir . '/archive_wal';
@@ -55,8 +55,7 @@ $primary->psql('postgres', 'SELECT pg_switch_wal();');
 my $nextlsn =
   $primary->safe_psql('postgres', 'SELECT pg_current_wal_insert_lsn();');
 chomp($nextlsn);
-$primary->psql('postgres',
-	'INSERT INTO test_table VALUES (generate_series(1,100));');
+$primary->psql('postgres', 'INSERT INTO test_table VALUES (1);');
 
 # Stream up to the given position.  This is necessary to have a fixed
 # started point for the next commands done in this test, with or without
@@ -85,8 +84,7 @@ SKIP:
 	$nextlsn =
 	  $primary->safe_psql('postgres', 'SELECT pg_current_wal_insert_lsn();');
 	chomp($nextlsn);
-	$primary->psql('postgres',
-		'INSERT INTO test_table VALUES (generate_series(100,200));');
+	$primary->psql('postgres', 'INSERT INTO test_table VALUES (1);');
 
 	# Note the trailing whitespace after the value of --compress, that is
 	# a valid value.
@@ -136,8 +134,7 @@ $primary->psql('postgres', 'SELECT pg_switch_wal();');
 $nextlsn =
   $primary->safe_psql('postgres', 'SELECT pg_current_wal_insert_lsn();');
 chomp($nextlsn);
-$primary->psql('postgres',
-	'INSERT INTO test_table VALUES (generate_series(200,300));');
+$primary->psql('postgres', 'INSERT INTO test_table VALUES (1);');
 $primary->command_ok(
 	[
 		'pg_receivewal', '-D',     $stream_dir, '--verbose',
@@ -167,7 +164,9 @@ mkdir($slot_dir);
 $slot_name = 'archive_slot';
 
 # Setup the slot, reserving WAL at creation (corresponding to the
-# last redo LSN here, actually).
+# last redo LSN here, actually, so use a checkpoint to reduce the
+# number of segments archived).
+$primary->psql('postgres', 'checkpoint;');
 $primary->psql('postgres',
 	"SELECT pg_create_physical_replication_slot('$slot_name', true);");
 
@@ -181,13 +180,15 @@ my $walfile_streamed = $primary->safe_psql(
 
 # Switch to a new segment, to make sure that the segment retained by the
 # slot is still streamed.  This may not be necessary, but play it safe.
-$primary->psql('postgres',
-	'INSERT INTO test_table VALUES (generate_series(1,100));');
+$primary->psql('postgres', 'INSERT INTO test_table VALUES (1);');
 $primary->psql('postgres', 'SELECT pg_switch_wal();');
 $nextlsn =
   $primary->safe_psql('postgres', 'SELECT pg_current_wal_insert_lsn();');
 chomp($nextlsn);
 
+# Add a bit more data to accelerate the end of the next commands.
+$primary->psql('postgres', 'INSERT INTO test_table VALUES (1);');
+
 # Check case where the slot does not exist.
 $primary->command_fails_like(
 	[
-- 
2.33.1

v16-0001-Add-a-test-for-pg_receivewal-following-timeline-.patchtext/x-patch; charset=UTF-8; name=v16-0001-Add-a-test-for-pg_receivewal-following-timeline-.patchDownload
From 957177881692b772a1415316bcf0babcc7958e09 Mon Sep 17 00:00:00 2001
From: Ronan Dunklau <ronan.dunklau@aiven.io>
Date: Tue, 26 Oct 2021 10:54:12 +0200
Subject: [PATCH v16 1/2] Add a test for pg_receivewal following timeline
 switch.

pg_receivewal is able to follow a timeline switch, but this was not
tested anywher. This test case verify that it works as expected both
when resuming from a replication slot and from the archive directory,
which have different methods of retrieving the timeline it left off.
---
 src/bin/pg_basebackup/t/020_pg_receivewal.pl | 58 +++++++++++++++++++-
 1 file changed, 57 insertions(+), 1 deletion(-)

diff --git a/src/bin/pg_basebackup/t/020_pg_receivewal.pl b/src/bin/pg_basebackup/t/020_pg_receivewal.pl
index 092c9b6f25..e0422e2242 100644
--- a/src/bin/pg_basebackup/t/020_pg_receivewal.pl
+++ b/src/bin/pg_basebackup/t/020_pg_receivewal.pl
@@ -5,7 +5,7 @@ use strict;
 use warnings;
 use PostgreSQL::Test::Utils;
 use PostgreSQL::Test::Cluster;
-use Test::More tests => 31;
+use Test::More tests => 35;
 
 program_help_ok('pg_receivewal');
 program_version_ok('pg_receivewal');
@@ -206,3 +206,59 @@ $primary->command_ok(
 	"WAL streamed from the slot's restart_lsn");
 ok(-e "$slot_dir/$walfile_streamed",
 	"WAL from the slot's restart_lsn has been archived");
+
+# Test a timeline switch.
+# This test is split in two, using the same standby: one test check the
+# resume-from-folder case, the other the resume-from-slot one.
+
+# Setup a standby for our tests
+my $backup_name = "basebackup";
+$primary->backup($backup_name);
+my $standby = PostgreSQL::Test::Cluster->new("standby");
+$standby->init_from_backup($primary, $backup_name, has_streaming => 1);
+$standby->start;
+
+# Create a replication slot on this new standby
+my $archive_slot = "archive_slot";
+$standby->psql('',
+  "CREATE_REPLICATION_SLOT $archive_slot PHYSICAL (RESERVE_WAL)",
+  replication => 1);
+# Get a walfilename from before the promotion to make sure it is archived
+# after promotion
+my $walfile_before_promotion = $primary->safe_psql('postgres',
+  "SELECT pg_walfile_name(pg_current_wal_insert_lsn())");
+# Switch wal to make sure it is not a partial file but a complete segment.
+$primary->psql('postgres', 'INSERT INTO test_table VALUES (1);');
+$primary->psql('postgres', 'SELECT pg_switch_wal();');
+$primary->wait_for_catchup($standby, 'replay', $primary->lsn('write'));
+
+# Everything is setup, promote the standby to trigger a timeline switch
+$standby->psql(
+  'postgres',
+  "SELECT pg_promote(wait_seconds => 300)");
+
+# Force a wal switch to make sure at least one full WAL is archived on the new
+# timeline, and fetch this walfilename.
+my $walfile_after_promotion = $standby->safe_psql('postgres',
+  "SELECT pg_walfile_name(pg_current_wal_insert_lsn())");
+$standby->psql('postgres', 'INSERT INTO test_table VALUES (1);');
+$standby->psql('postgres', 'SELECT pg_switch_wal();');
+$nextlsn =
+  $standby->safe_psql('postgres', 'SELECT pg_current_wal_insert_lsn();');
+chomp($nextlsn);
+$standby->psql('postgres', 'INSERT INTO test_table VALUES (1);');
+
+# Now try to resume from the slot after the promotion.
+my $timeline_dir = $primary->basedir . '/timeline_wal';
+mkdir($timeline_dir);
+
+$standby->command_ok(
+	[ 'pg_receivewal', '-D', $timeline_dir, '--verbose', '--endpos', $nextlsn,
+    '--slot', $archive_slot, '--no-sync', '-n'],
+	"Stream some wal after promoting, resuming from the slot's position");
+ok(-e "$timeline_dir/$walfile_before_promotion",
+  "WAL from the old timeline has been archived resuming from the slot");
+ok(-e "$timeline_dir/$walfile_after_promotion",
+  "WAL from the new timeline has been archived resuming from the slot");
+ok(-e "$timeline_dir/00000002.history",
+  "Timeline history file has been archived resuming from the slot");
-- 
2.33.1

#65Michael Paquier
michael@paquier.xyz
In reply to: Ronan Dunklau (#64)
Re: pg_receivewal starting position

On Thu, Oct 28, 2021 at 03:55:12PM +0200, Ronan Dunklau wrote:

Interesting ideas, thanks. For the record, the time drops from ~4.5s to 3s on
average on my machine.
I think if you reduce the size of the generate_series batches, this should
probably be reduced everywhere. With what we do though, inserting a single
line should work just as well, I wonder why we insist on inserting a hundred
lines ? I updated your patch with that small modification, it also makes the
code less verbose.

Thanks for the extra numbers. I have added your suggestions,
switching the dummy table to use a primary key with different values,
while on it, as there is an argument that it makes debugging easier,
and applied the speedup patch.

+$standby->psql('',
+  "CREATE_REPLICATION_SLOT $folder_slot PHYSICAL (RESERVE_WAL)",
+  replication => 1);
Here as well we could use a restart point to reduce the number of
segments archived.

The restart point should be very close, as we don't generate any activity on
the primary between the backup and the slot's creation. I'm not sure adding
the complexity of triggering a checkpoint on the primary and waiting for the
standby to catch up on it would be that useful.

Yes, you are right here. The base backup taken from the primary
at this point ensures a fresh point.

+# This test is split in two, using the same standby: one test check the
+# resume-from-folder case, the other the resume-from-slot one.
This comment needs a refresh, as the resume-from-folder case is no
more.
+$standby->psql(
+  'postgres',
+  "SELECT pg_promote(wait_seconds => 300)");
This could be $standby->promote.
+# Switch wal to make sure it is not a partial file but a complete
segment.
+$primary->psql('postgres', 'INSERT INTO test_table VALUES (1);');
+$primary->psql('postgres', 'SELECT pg_switch_wal();');
+$primary->wait_for_catchup($standby, 'replay', $primary->lsn('write'));
This INSERT needs a slight change to adapt to the primary key of the
table.  This one is on me :p

Anyway, is this first segment switch really necessary? From the data
archived by pg_receivewal in the command testing the TLI jump, we
finish with the following contents (contents generated after fixing
the three INSERTs):
00000001000000000000000B
00000001000000000000000C
00000002000000000000000D
00000002000000000000000E.partial
00000002.history

So, even if we don't do the first switch, we'd still have one
completed segment on the previous timeline, before switching to the
new timeline and the next segment (pg_receivewal is a bit inconsistent
with the backend here, by the way, as the first segment on the new
timeline would map with the last segment of the old timeline, but here
we have a clean switch as of stop_streaming in pg_receivewal.c).

+# Force a wal switch to make sure at least one full WAL is archived on the new
+# timeline, and fetch this walfilename.
No arguments against the second segment switch to ensure the presence
of a full segment on the new TLI, of course.
--
Michael
#66Ronan Dunklau
ronan.dunklau@aiven.io
In reply to: Michael Paquier (#65)
1 attachment(s)
Re: pg_receivewal starting position

Le vendredi 29 octobre 2021, 04:27:51 CEST Michael Paquier a écrit :

On Thu, Oct 28, 2021 at 03:55:12PM +0200, Ronan Dunklau wrote:

Interesting ideas, thanks. For the record, the time drops from ~4.5s to 3s
on average on my machine.
I think if you reduce the size of the generate_series batches, this should
probably be reduced everywhere. With what we do though, inserting a single
line should work just as well, I wonder why we insist on inserting a
hundred lines ? I updated your patch with that small modification, it
also makes the code less verbose.

Thanks for the extra numbers. I have added your suggestions,
switching the dummy table to use a primary key with different values,
while on it, as there is an argument that it makes debugging easier,
and applied the speedup patch.

Thanks !

+$standby->psql('',
+  "CREATE_REPLICATION_SLOT $folder_slot PHYSICAL (RESERVE_WAL)",
+  replication => 1);
Here as well we could use a restart point to reduce the number of
segments archived.

The restart point should be very close, as we don't generate any activity
on the primary between the backup and the slot's creation. I'm not sure
adding the complexity of triggering a checkpoint on the primary and
waiting for the standby to catch up on it would be that useful.

Yes, you are right here. The base backup taken from the primary
at this point ensures a fresh point.

+# This test is split in two, using the same standby: one test check the
+# resume-from-folder case, the other the resume-from-slot one.
This comment needs a refresh, as the resume-from-folder case is no
more.

Done.

+$standby->psql(
+  'postgres',
+  "SELECT pg_promote(wait_seconds => 300)");
This could be $standby->promote.

Oh, didn't know about that.

+# Switch wal to make sure it is not a partial file but a complete
segment.
+$primary->psql('postgres', 'INSERT INTO test_table VALUES (1);');
+$primary->psql('postgres', 'SELECT pg_switch_wal();');
+$primary->wait_for_catchup($standby, 'replay', $primary->lsn('write'));
This INSERT needs a slight change to adapt to the primary key of the
table.  This one is on me :p

Done.

Anyway, is this first segment switch really necessary? From the data
archived by pg_receivewal in the command testing the TLI jump, we
finish with the following contents (contents generated after fixing
the three INSERTs):
00000001000000000000000B
00000001000000000000000C
00000002000000000000000D
00000002000000000000000E.partial
00000002.history

So, even if we don't do the first switch, we'd still have one
completed segment on the previous timeline, before switching to the
new timeline and the next segment (pg_receivewal is a bit inconsistent
with the backend here, by the way, as the first segment on the new
timeline would map with the last segment of the old timeline, but here
we have a clean switch as of stop_streaming in pg_receivewal.c).

The first completed segment on the previous timeline comes from the fact we
stream from the restart point. I removed the switch to use the walfilename of
the replication slot's restart point instead. This means querying both the
standby (to get the replication slot's restart_lsn) and the primary (to have
access to pg_walfile_name).

We could use a single query on the primary (using the primary's checkpoint LSN
instead) but it feels a bit convoluted just to avoid a query on the standby.

--
Ronan Dunklau

Attachments:

v17-0001-Add-a-test-for-pg_receivewal-following-timeline-.patchtext/x-patch; charset=UTF-8; name=v17-0001-Add-a-test-for-pg_receivewal-following-timeline-.patchDownload
From eb65a8b5883d903dda78bb66c2e1795b48cafc24 Mon Sep 17 00:00:00 2001
From: Ronan Dunklau <ronan.dunklau@aiven.io>
Date: Tue, 26 Oct 2021 10:54:12 +0200
Subject: [PATCH v17] Add a test for pg_receivewal following timeline switch.

pg_receivewal is able to follow a timeline switch, but this was not
tested anywher. This test case verify that it works as expected both
when resuming from a replication slot and from the archive directory,
which have different methods of retrieving the timeline it left off.
---
 src/bin/pg_basebackup/t/020_pg_receivewal.pl | 57 +++++++++++++++++++-
 1 file changed, 56 insertions(+), 1 deletion(-)

diff --git a/src/bin/pg_basebackup/t/020_pg_receivewal.pl b/src/bin/pg_basebackup/t/020_pg_receivewal.pl
index 2da200396e..1fe32758a0 100644
--- a/src/bin/pg_basebackup/t/020_pg_receivewal.pl
+++ b/src/bin/pg_basebackup/t/020_pg_receivewal.pl
@@ -5,7 +5,7 @@ use strict;
 use warnings;
 use PostgreSQL::Test::Utils;
 use PostgreSQL::Test::Cluster;
-use Test::More tests => 31;
+use Test::More tests => 35;
 
 program_help_ok('pg_receivewal');
 program_version_ok('pg_receivewal');
@@ -208,3 +208,58 @@ $primary->command_ok(
 	"WAL streamed from the slot's restart_lsn");
 ok(-e "$slot_dir/$walfile_streamed",
 	"WAL from the slot's restart_lsn has been archived");
+
+# Test a timeline switch using a replication slot
+
+# Setup a standby for our tests
+my $backup_name = "basebackup";
+$primary->backup($backup_name);
+my $standby = PostgreSQL::Test::Cluster->new("standby");
+$standby->init_from_backup($primary, $backup_name, has_streaming => 1);
+$standby->start;
+
+# Create a replication slot on this new standby
+my $archive_slot = "archive_slot";
+$standby->psql('',
+  "CREATE_REPLICATION_SLOT $archive_slot PHYSICAL (RESERVE_WAL)",
+  replication => 1);
+# Wait for standby catchup
+$primary->wait_for_catchup($standby, 'replay', $primary->lsn('write'));
+# Get a walfilename from before the promotion to make sure it is archived
+# after promotion
+my $replication_slot_lsn = $standby->safe_psql(
+    'postgres',
+    "SELECT restart_lsn
+    FROM pg_replication_slots
+    WHERE slot_name = '$archive_slot';");
+# pg_walfile_name is not supported on a standby
+my $walfile_before_promotion =
+  $primary->safe_psql('postgres', "SELECT pg_walfile_name('$replication_slot_lsn');");
+# Everything is setup, promote the standby to trigger a timeline switch
+$standby->promote;
+
+# Force a wal switch to make sure at least one full WAL is archived on the new
+# timeline, and fetch this walfilename.
+my $walfile_after_promotion = $standby->safe_psql('postgres',
+  "SELECT pg_walfile_name(pg_current_wal_insert_lsn());");
+$standby->psql('postgres', 'INSERT INTO test_table VALUES (6);');
+$standby->psql('postgres', 'SELECT pg_switch_wal();');
+$nextlsn =
+  $standby->safe_psql('postgres', 'SELECT pg_current_wal_insert_lsn();');
+chomp($nextlsn);
+$standby->psql('postgres', 'INSERT INTO test_table VALUES (7);');
+
+# Now try to resume from the slot after the promotion.
+my $timeline_dir = $primary->basedir . '/timeline_wal';
+mkdir($timeline_dir);
+
+$standby->command_ok(
+	[ 'pg_receivewal', '-D', $timeline_dir, '--verbose', '--endpos', $nextlsn,
+    '--slot', $archive_slot, '--no-sync', '-n'],
+	"Stream some wal after promoting, resuming from the slot's position");
+ok(-e "$timeline_dir/$walfile_before_promotion",
+  "WAL from the old timeline has been archived resuming from the slot");
+ok(-e "$timeline_dir/$walfile_after_promotion",
+  "WAL from the new timeline has been archived resuming from the slot");
+ok(-e "$timeline_dir/00000002.history",
+  "Timeline history file has been archived resuming from the slot");
-- 
2.33.1

#67Michael Paquier
michael@paquier.xyz
In reply to: Ronan Dunklau (#66)
Re: pg_receivewal starting position

On Fri, Oct 29, 2021 at 10:13:44AM +0200, Ronan Dunklau wrote:

We could use a single query on the primary (using the primary's checkpoint LSN
instead) but it feels a bit convoluted just to avoid a query on the standby.

Cheating with pg_walfile_name() running on the primary is fine by me.
One thing that stood out while reading this patch again is that we
can use $standby->slot($archive_slot) to grab the slot's restart_lsn.
I have changed that, and applied it. So we are done with this
thread.
--
Michael