replication slot restart_lsn initialization

Started by Duran, Daniloalmost 11 years ago26 messageshackers
Jump to latest
#1Duran, Danilo
danilod@amazon.com

Hello,

I am looking to better understand the thought behind a replication slot's restart_lsn initialization. Currently in 9.4 and master, a replication slot's restart_lsn is set to InvalidXLogRecPtr and will only start tracking restart_lsn once a walreceiver has confirmed receipt of an lsn.

Was there any consideration for initializing restart_lsn to the latest WAL write pointer when a slot is created? Or for allowing an optional parameter in pg_create_(physical|logical)_replication_slot() for specifying the restart_lsn at slot creation?

I believe there are valid usage patterns where the user would like to start holding transaction logs from being removed/recycled during the time that the standby is being restored from base backup. Currently this can be worked around by using pg_receivexlog immediately after creating the replication slot but it feels kind of hacky.

It is also strange that the return type for pg_create_(physical|logical)_replication_slot includes xlog_position but as far as I can tell, it will never contain a value. Is this intended for something in the future?

Thanks,
Danilo

#2Andres Freund
andres@anarazel.de
In reply to: Duran, Danilo (#1)
Re: replication slot restart_lsn initialization

Hi,

On 2015-05-06 00:42:14 +0000, Duran, Danilo wrote:

I am looking to better understand the thought behind a replication
slot's restart_lsn initialization. Currently in 9.4 and master, a
replication slot's restart_lsn is set to InvalidXLogRecPtr and will
only start tracking restart_lsn once a walreceiver has confirmed
receipt of an lsn.

Right, for physical slots that's true.

Was there any consideration for initializing restart_lsn to the latest
WAL write pointer when a slot is created? Or for allowing an optional
parameter in pg_create_(physical|logical)_replication_slot() for
specifying the restart_lsn at slot creation?

I've been wondering about allowing for the latter alternative. I could
have used it a couple times. The former doesn't make much sense to me,
it could be too far *ahead* in many cases actually. A patch for this
would be fairly trivial.

It doesn't make sense for logical slots (as the computation of the
initial lsn is more complicated; it's also also already set when you
create one.

I believe there are valid usage patterns where the user would like to
start holding transaction logs from being removed/recycled during the
time that the standby is being restored from base backup. Currently
this can be worked around by using pg_receivexlog immediately after
creating the replication slot but it feels kind of hacky.

Yea, that's what I've done as well. I'd much rather have a proper option
for it.

It is also strange that the return type for
pg_create_(physical|logical)_replication_slot includes xlog_position
but as far as I can tell, it will never contain a value. Is this
intended for something in the future?

It'll contain something for logical slots. I wanted the physical version
to be analogous, with the thought of adding a different signature at
some point.

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Gurjeet Singh
gurjeet@singh.im
In reply to: Andres Freund (#2)
Re: replication slot restart_lsn initialization

On Tue, May 5, 2015 at 5:53 PM, Andres Freund <andres@anarazel.de> wrote:

Was there any consideration for initializing restart_lsn to the latest
WAL write pointer when a slot is created? Or for allowing an optional
parameter in pg_create_(physical|logical)_replication_slot() for
specifying the restart_lsn at slot creation?

I've been wondering about allowing for the latter alternative. I could
have used it a couple times. The former doesn't make much sense to me,
it could be too far *ahead* in many cases actually. A patch for this
would be fairly trivial.

Attached is the patch that takes the former approach (initialize
restart_lsn when the slot is created). I think it's better than the latter
approach (depend on user to specify an LSN) because the LSN user specifies
may have already been recycled. pg_create_logical_replication_slot()
prevents LSN from being recycled that by looping (worst case 2 times) until
there's no conflict with the checkpointer recycling the segment. So I have
used the same approach.

The function pg_create_physical_replication_slot() now has an additional
boolean parameter 'activate' which user can use to allocate restart_lsn as
part of the creation process.

Best regards,
--
Gurjeet Singh http://gurjeet.singh.im/

Attachments:

physical_repl_slot_activate_restart_lsn.patchapplication/octet-stream; name=physical_repl_slot_activate_restart_lsn.patchDownload+94-56
#4Andres Freund
andres@anarazel.de
In reply to: Gurjeet Singh (#3)
Re: replication slot restart_lsn initialization

On 2015-06-10 08:00:28 -0700, Gurjeet Singh wrote:

Attached is the patch that takes the former approach (initialize
restart_lsn when the slot is created).

If it's an option that's imo a sane approach.

pg_create_logical_replication_slot() prevents LSN from being
recycled that by looping (worst case 2 times) until there's no
conflict with the checkpointer recycling the segment. So I have used
the same approach.

There's no need to change anything for logical slots? Or do you just
mean that you moved the existing code?

/*
+ * Grab and save an LSN value to prevent WAL recycling past that point.
+ */
+void
+ReplicationSlotRegisterRestartLSN()
+{
+	ReplicationSlot *slot = MyReplicationSlot;
+
+	Assert(slot != NULL);
+	Assert(slot->data.restart_lsn == InvalidXLogRecPtr);
+
+	/*
+	 * The replication slot mechanism is used to prevent removal of required
+	 * WAL. As there is no interlock between this and checkpoints required, WAL
+	 * segment could be removed before ReplicationSlotsComputeRequiredLSN() has
+	 * been called to prevent that. In the very unlikely case that this happens
+	 * we'll just retry.
+	 */
+	while (true)
+	{
+		XLogSegNo	segno;
+
+		/*
+		 * Let's start with enough information if we can, so log a standby
+		 * snapshot and start decoding at exactly that position.
+		 */
+		if (!RecoveryInProgress())
+		{
+			XLogRecPtr	flushptr;
+
+			/* start at current insert position */
+			slot->data.restart_lsn = GetXLogInsertRecPtr();
+
+			/* make sure we have enough information to start */
+			flushptr = LogStandbySnapshot();
+
+			/* and make sure it's fsynced to disk */
+			XLogFlush(flushptr);
+		}
+		else
+			slot->data.restart_lsn = GetRedoRecPtr();
+
+		/* prevent WAL removal as fast as possible */
+		ReplicationSlotsComputeRequiredLSN();
+
+		/*
+		 * If all required WAL is still there, great, otherwise retry. The
+		 * slot should prevent further removal of WAL, unless there's a
+		 * concurrent ReplicationSlotsComputeRequiredLSN() after we've written
+		 * the new restart_lsn above, so normally we should never need to loop
+		 * more than twice.
+		 */
+		XLByteToSeg(slot->data.restart_lsn, segno);
+		if (XLogGetLastRemovedSegno() < segno)
+			break;
+	}
+}

That doesn't look right to me. Why is this code logging a standby
snapshot for physical slots?

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Gurjeet Singh
gurjeet@singh.im
In reply to: Andres Freund (#4)
Re: replication slot restart_lsn initialization

On Wed, Jun 10, 2015 at 8:07 AM, Andres Freund <andres@anarazel.de> wrote:

On 2015-06-10 08:00:28 -0700, Gurjeet Singh wrote:

pg_create_logical_replication_slot() prevents LSN from being
recycled that by looping (worst case 2 times) until there's no
conflict with the checkpointer recycling the segment. So I have used
the same approach.

There's no need to change anything for logical slots? Or do you just
mean that you moved the existing code?

Yes, I turned the code from logical replication into a function and used it
from logical and physical replication.

/*
+ * Grab and save an LSN value to prevent WAL recycling past that point.
+ */
+void
+ReplicationSlotRegisterRestartLSN()
+{

...

+             /*
+              * Let's start with enough information if we can, so log a

standby

+              * snapshot and start decoding at exactly that position.
+              */
+             if (!RecoveryInProgress())
+             {
+                     XLogRecPtr      flushptr;
+
+                     /* start at current insert position */
+                     slot->data.restart_lsn = GetXLogInsertRecPtr();
+
+                     /* make sure we have enough information to start */
+                     flushptr = LogStandbySnapshot();
+
+                     /* and make sure it's fsynced to disk */
+                     XLogFlush(flushptr);
+             }
+             else
+                     slot->data.restart_lsn = GetRedoRecPtr();
+

That doesn't look right to me. Why is this code logging a standby
snapshot for physical slots?

This is the new function I referred to above. The logging of the snapshot
is in 'not RecoveryInProgress()' case, meaning it's running in primary and
not in a standby.

Best regards,
--
Gurjeet Singh http://gurjeet.singh.im/

#6Andres Freund
andres@anarazel.de
In reply to: Gurjeet Singh (#5)
Re: replication slot restart_lsn initialization

On 2015-06-10 08:24:23 -0700, Gurjeet Singh wrote:

On Wed, Jun 10, 2015 at 8:07 AM, Andres Freund <andres@anarazel.de> wrote:

That doesn't look right to me. Why is this code logging a standby
snapshot for physical slots?

This is the new function I referred to above. The logging of the snapshot
is in 'not RecoveryInProgress()' case, meaning it's running in primary and
not in a standby.

It's still done uselessly for physical slots?

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Gurjeet Singh
gurjeet@singh.im
In reply to: Andres Freund (#6)
Re: replication slot restart_lsn initialization

On Wed, Jun 10, 2015 at 8:36 AM, Andres Freund <andres@anarazel.de> wrote:

On 2015-06-10 08:24:23 -0700, Gurjeet Singh wrote:

On Wed, Jun 10, 2015 at 8:07 AM, Andres Freund <andres@anarazel.de>

wrote:

That doesn't look right to me. Why is this code logging a standby
snapshot for physical slots?

This is the new function I referred to above. The logging of the snapshot
is in 'not RecoveryInProgress()' case, meaning it's running in primary

and

not in a standby.

It's still done uselessly for physical slots?

I missed the comments on LogStandbySnapshot(). Attached is the new patch
which does the snapshot logging only for a logical replication slot.

I am in the process of writing up a doc patch, and will submit that as well
in a short while.

I have removed a few #include statements which seemed unnecessary. These
changes are not relevant to the patch, so I can readily revert those parts
if deemed necessary.

Best regards,
--
Gurjeet Singh http://gurjeet.singh.im/

Attachments:

physical_repl_slot_activate_restart_lsn.v2.patchapplication/octet-stream; name=physical_repl_slot_activate_restart_lsn.v2.patchDownload+101-56
#8Gurjeet Singh
gurjeet@singh.im
In reply to: Gurjeet Singh (#7)
Re: replication slot restart_lsn initialization

On Wed, Jun 10, 2015 at 9:10 AM, Gurjeet Singh <gurjeet@singh.im> wrote:

I am in the process of writing up a doc patch, and will submit that as
well in a short while.

Please find attached the patch with the doc update.

Best regards,
--
Gurjeet Singh http://gurjeet.singh.im/

Attachments:

physical_repl_slot_activate_restart_lsn.v3.patchapplication/octet-stream; name=physical_repl_slot_activate_restart_lsn.v3.patchDownload+106-58
#9Andres Freund
andres@anarazel.de
In reply to: Gurjeet Singh (#8)
Re: replication slot restart_lsn initialization

On 2015-06-10 13:13:41 -0700, Gurjeet Singh wrote:

/*
+ * Grab and save an LSN value to prevent WAL recycling past that point.
+ */
+void
+ReplicationSlotRegisterRestartLSN()
+{
+	ReplicationSlot *slot = MyReplicationSlot;
+
+	Assert(slot != NULL);
+	Assert(slot->data.restart_lsn == InvalidXLogRecPtr);
+
+	/*
+	 * The replication slot mechanism is used to prevent removal of required
+	 * WAL. As there is no interlock between this and checkpoints required, WAL
+	 * segment could be removed before ReplicationSlotsComputeRequiredLSN() has
+	 * been called to prevent that. In the very unlikely case that this happens
+	 * we'll just retry.
+	 */
+	while (true)
+	{
+		XLogSegNo	segno;
+
+		/*
+		 * Let's start with enough information if we can, so log a standby
+		 * snapshot and start decoding at exactly that position.
+		 */
+		if (!RecoveryInProgress())
+		{
+			XLogRecPtr	flushptr;
+
+			/* start at current insert position */
+			slot->data.restart_lsn = GetXLogInsertRecPtr();
+
+			/*
+			 * Log an xid snapshot for logical replication. It's not needed for
+			 * physical slots as it is done in BGWriter on a regular basis.
+			 */
+			if (!slot->data.persistency == RS_PERSISTENT)
+			{
+				/* make sure we have enough information to start */
+				flushptr = LogStandbySnapshot();
+
+				/* and make sure it's fsynced to disk */
+				XLogFlush(flushptr);
+			}

Huh? The slot->data.persistency == RS_PERSISTENT check seems pretty much
entirely random to me. I mean physical slots can (and normally are)
persistent as well? Instead check whether it's a database specifics lot.

The reasoning why it's not helpful for physical slots is wrong. The
point is that a standby snapshot at that location will simply not help
physical replication, it's only going to read ones after the location at
which the base backup starts (i.e. the location from the backup label).

/* ----
- * Manipulation of ondisk state of replication slots
+ * Manipulation of on-disk state of replication slots
*
* NB: none of the routines below should take any notice whether a slot is the
* current one or not, that's all handled a layer above.
diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index 9a2793f..bd526fa 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -40,6 +40,7 @@ Datum
pg_create_physical_replication_slot(PG_FUNCTION_ARGS)
{
Name		name = PG_GETARG_NAME(0);
+	bool 		activate = PG_GETARG_BOOL(1);

Don't like 'activate' much as a name. How about 'immediately_reserve'?

Other than that it's looking good to me.

Regards,

Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Gurjeet Singh
gurjeet@singh.im
In reply to: Andres Freund (#9)
Re: replication slot restart_lsn initialization

On Tue, Jul 7, 2015 at 4:59 AM, Andres Freund <andres@anarazel.de> wrote:

On 2015-06-10 13:13:41 -0700, Gurjeet Singh wrote:

+                     /*
+                      * Log an xid snapshot for logical replication.

It's not needed for

+ * physical slots as it is done in BGWriter on a

regular basis.

+                      */
+                     if (!slot->data.persistency == RS_PERSISTENT)
+                     {
+                             /* make sure we have enough information to

start */

+                             flushptr = LogStandbySnapshot();
+
+                             /* and make sure it's fsynced to disk */
+                             XLogFlush(flushptr);
+                     }

Huh? The slot->data.persistency == RS_PERSISTENT check seems pretty much
entirely random to me.

There seems to be a misplaced not operator ! in that if statement, as
well. That sucks :( The MacOS gcc binary is actually clang, and its output
is too noisy [1]I am particularly annoyed by these:, which is probably why I might have missed warning if any.

[1]: I am particularly annoyed by these:

note: remove extraneous parentheses around the comparison to silence this
warning
note: use '=' to turn this equality comparison into an assignment

Eg. : if (((opaque)->btpo_next == 0))

I'll see what I can do about these.

I mean physical slots can (and normally are)
persistent as well? Instead check whether it's a database specifics lot.

Agreed, the slot being database-specific is the right indicator.

The reasoning why it's not helpful for physical slots is wrong. The
point is that a standby snapshot at that location will simply not help
physical replication, it's only going to read ones after the location at
which the base backup starts (i.e. the location from the backup label).

pg_create_physical_replication_slot(PG_FUNCTION_ARGS)
{
Name name = PG_GETARG_NAME(0);
+ bool activate = PG_GETARG_BOOL(1);

Don't like 'activate' much as a name. How about 'immediately_reserve'?

I still like 'activate, but okay. How about 'reserve_immediately' instead?

Also, do you want this name change just in the C code, or in the pg_proc
and docs as well?

Other than that it's looking good to me.

Will send a new patch after your feedback on the 'activate' renaming.

Best regards,
--
Gurjeet Singh http://gurjeet.singh.im/

#11Andres Freund
andres@anarazel.de
In reply to: Gurjeet Singh (#10)
Re: replication slot restart_lsn initialization

On 2015-07-07 06:41:55 -0700, Gurjeet Singh wrote:

There seems to be a misplaced not operator ! in that if statement, as
well. That sucks :( The MacOS gcc binary is actually clang, and its output
is too noisy [1], which is probably why I might have missed warning if any.

Which version of clang is it? With newer ones you can individually
disable nearly all of the warnings. I regularly use clang, and most of
the time it compiles master without warnings.

pg_create_physical_replication_slot(PG_FUNCTION_ARGS)
{
Name name = PG_GETARG_NAME(0);
+ bool activate = PG_GETARG_BOOL(1);

Don't like 'activate' much as a name. How about 'immediately_reserve'?

I still like 'activate, but okay. How about 'reserve_immediately' instead?

Activate is just utterly wrong. A slot isn't inactive before. And
'active' already is used for slots that are currently in use
(i.e. connected to).

Also, do you want this name change just in the C code, or in the pg_proc
and docs as well?

All.

- Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Gurjeet Singh
gurjeet@singh.im
In reply to: Andres Freund (#11)
Re: replication slot restart_lsn initialization

On Tue, Jul 7, 2015 at 6:49 AM, Andres Freund <andres@anarazel.de> wrote:

On 2015-07-07 06:41:55 -0700, Gurjeet Singh wrote:

There seems to be a misplaced not operator ! in that if statement, as
well. That sucks :( The MacOS gcc binary is actually clang, and its

output

is too noisy [1], which is probably why I might have missed warning if

any.

Which version of clang is it? With newer ones you can individually
disable nearly all of the warnings. I regularly use clang, and most of
the time it compiles master without warnings.

$ gcc --version

Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr
--with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 6.1.0 (clang-602.0.53) (based on LLVM 3.6.0svn)
Target: x86_64-apple-darwin14.4.0
Thread model: posix

I see that -Wno-parentheses-equality might help, but I am not looking
forward to maintaining OS specific flags for clang-that-pretends-to-be-gcc
in my shell scripts [2]https://github.com/gurjeet/pgd

[2]: https://github.com/gurjeet/pgd

Attached is a patch that introduces SlotIsPhyscial/SlotIsLogical macros.
This patch, if accepted, goes on top of the v4 patch.

pg_create_physical_replication_slot(PG_FUNCTION_ARGS)
{
Name name = PG_GETARG_NAME(0);
+ bool activate = PG_GETARG_BOOL(1);

Don't like 'activate' much as a name. How about 'immediately_reserve'?

I still like 'activate, but okay. How about 'reserve_immediately'

instead?

Activate is just utterly wrong. A slot isn't inactive before. And
'active' already is used for slots that are currently in use
(i.e. connected to).

Also, do you want this name change just in the C code, or in the pg_proc
and docs as well?

All.

Patch v4 attached.

On a side note, I see that the pg_create_*_replication_slot() functions do
not behave transactionally; that is, rolling back a transaction does not
undo the slot creation. Should we prevent these, and corresponding drop
functions, from being called inside an explicit transaction?
PreventTransactionChain() is geared towards serving just the utility
commands. Do we protect against calling such functions in a transaction
block, or from user functions? How?

Best regards,
--
Gurjeet Singh http://gurjeet.singh.im/

Attachments:

physical_repl_slot_activate_restart_lsn.v4.patchapplication/octet-stream; name=physical_repl_slot_activate_restart_lsn.v4.patchDownload+108-58
slot_is_physical_or_logical_macros.patchapplication/octet-stream; name=slot_is_physical_or_logical_macros.patchDownload+12-8
#13Andres Freund
andres@anarazel.de
In reply to: Gurjeet Singh (#12)
Re: replication slot restart_lsn initialization

On 2015-07-07 09:42:54 -0700, Gurjeet Singh wrote:

On a side note, I see that the pg_create_*_replication_slot() functions do
not behave transactionally; that is, rolling back a transaction does not
undo the slot creation.

It can't, because otherwise you couldn't run them on a standby.

Should we prevent these, and corresponding drop functions, from being
called inside an explicit transaction? PreventTransactionChain() is
geared towards serving just the utility commands. Do we protect
against calling such functions in a transaction block, or from user
functions? How?

We discussed that when slots where introduced. There seems little
benefit in doing so, and it'll make some legitimate use cases harder.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Andres Freund
andres@anarazel.de
In reply to: Gurjeet Singh (#12)
Re: replication slot restart_lsn initialization

On 2015-07-07 09:42:54 -0700, Gurjeet Singh wrote:

--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -40,10 +40,10 @@
#include <sys/stat.h>

#include "access/transam.h"
+#include "access/xlog_internal.h"
#include "common/string.h"
#include "miscadmin.h"
#include "replication/slot.h"
-#include "storage/fd.h"
#include "storage/proc.h"
#include "storage/procarray.h"

Why did you remove fd.h? The file definitely uses infrastructure from
there. We're not terribly consistent about that but I'd rather not rely
on it being included via xlog_internal.h -> xlog.h.

/*
+ * Grab and save an LSN value to prevent WAL recycling past that point.
+ */
+void
+ReplicationSlotRegisterRestartLSN()
+{

Didn't like that description and function name very much. What does
'grabbing' mean here? Should probably mention that it works on the
currently active slot and modifies it.

It's now ReplicationSlotReserveWal()

+	ReplicationSlot *slot = MyReplicationSlot;
+
+	Assert(slot != NULL);
+	Assert(slot->data.restart_lsn == InvalidXLogRecPtr);
+
+	/*
+	 * The replication slot mechanism is used to prevent removal of required
+	 * WAL. As there is no interlock between this and checkpoints required, WAL
+	 * segment could be removed before ReplicationSlotsComputeRequiredLSN() has
+	 * been called to prevent that. In the very unlikely case that this happens
+	 * we'll just retry.
+	 */

You removed some punctuation in that sentence converting a sentence in
bad english into one without the original meaning ;). See the attached
for a new version.

+	while (true)
+	{
+		XLogSegNo	segno;
+
+		/*
+		 * Let's start with enough information if we can, so log a standby
+		 * snapshot and start logical decoding at exactly that position.
+		 */
+		if (!RecoveryInProgress())
+		{
+			XLogRecPtr	flushptr;
+
+			/* start at current insert position */
+			slot->data.restart_lsn = GetXLogInsertRecPtr();
+
+			/*
+			 * Log an xid snapshot for logical replication. This snapshot is not
+			 * needed for physical replication, as it relies on the snapshot
+			 * created by checkpoint when the base backup starts.
+			 */
+			if (slot->data.database != InvalidOid)
+			{
+				/* make sure we have enough information to start */
+				flushptr = LogStandbySnapshot();
+
+				/* and make sure it's fsynced to disk */
+				XLogFlush(flushptr);
+			}
+		}
+		else
+			slot->data.restart_lsn = GetRedoRecPtr();
+
+		/* prevent WAL removal as fast as possible */
+		ReplicationSlotsComputeRequiredLSN();
+
+		/*
+		 * If all required WAL is still there, great, otherwise retry. The
+		 * slot should prevent further removal of WAL, unless there's a
+		 * concurrent ReplicationSlotsComputeRequiredLSN() after we've written
+		 * the new restart_lsn above, so normally we should never need to loop
+		 * more than twice.
+		 */
+		XLByteToSeg(slot->data.restart_lsn, segno);
+		if (XLogGetLastRemovedSegno() < segno)
+			break;
+	}
+}

The way you added the check for logical vs. physical slots in there
looks wrong to me. For a physical slot created !InRecovy we'll possibly
return a xlog position from the future (it's the insert position *and*
not flushed to disk), which then cannot be received.

+/*
* Flush all replication slots to disk.
*
* This needn't actually be part of a checkpoint, but it's a convenient
@@ -876,7 +942,7 @@ StartupReplicationSlots(void)
}

/* ----
- * Manipulation of ondisk state of replication slots
+ * Manipulation of on-disk state of replication slots
*
* NB: none of the routines below should take any notice whether a slot is the
* current one or not, that's all handled a layer above.
diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index 9a2793f..01b376a 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -40,6 +40,7 @@ Datum
pg_create_physical_replication_slot(PG_FUNCTION_ARGS)
{
Name		name = PG_GETARG_NAME(0);
+	bool 		immediately_reserve = PG_GETARG_BOOL(1);
Datum		values[2];
bool		nulls[2];
TupleDesc	tupdesc;
@@ -58,10 +59,28 @@ pg_create_physical_replication_slot(PG_FUNCTION_ARGS)
/* acquire replication slot, this will check for conflicting names */
ReplicationSlotCreate(NameStr(*name), false, RS_PERSISTENT);
-	values[0] = NameGetDatum(&MyReplicationSlot->data.name);
+	if (immediately_reserve)
+	{
+		/* Allocate restart-LSN, if the user asked for it */
+		ReplicationSlotRegisterRestartLSN();
+
+		/* Write this slot to disk */
+		ReplicationSlotMarkDirty();
+		ReplicationSlotSave();
-	nulls[0] = false;
-	nulls[1] = true;
+		values[0] = NameGetDatum(&MyReplicationSlot->data.name);
+		values[1] = LSNGetDatum(MyReplicationSlot->data.restart_lsn);
+
+		nulls[0] = false;
+		nulls[1] = false;
+	}
+	else
+	{
+		values[0] = NameGetDatum(&MyReplicationSlot->data.name);
+
+		nulls[0] = false;
+		nulls[1] = true;
+	}

I moved
values[0] = NameGetDatum(&MyReplicationSlot->data.name);
nulls[0] = false;
to outside the conditional block, there seems no reason to have it in
there?

I also removed a bunch of unrelated minor cleanups that I plan to commit
& backpatch separately.

What do you think?

Andres

Attachments:

0001-Introduce-macros-determining-if-a-replication-slot-i.patchtext/x-patch; charset=us-asciiDownload+10-8
0002-Allow-pg_create_physical_replication_slot-to-reserve.patchtext/x-patch; charset=us-asciiDownload+107-53
#15Gurjeet Singh
gurjeet@singh.im
In reply to: Andres Freund (#14)
Re: replication slot restart_lsn initialization

On Mon, Aug 10, 2015 at 7:12 AM, Andres Freund <andres@anarazel.de> wrote:

On 2015-07-07 09:42:54 -0700, Gurjeet Singh wrote:

/*
+ * Grab and save an LSN value to prevent WAL recycling past that point.
+ */
+void
+ReplicationSlotRegisterRestartLSN()
+{

Didn't like that description and function name very much. What does
'grabbing' mean here? Should probably mention that it works on the
currently active slot and modifies it.

In your version, I don't see a comment that refers to the fact that it
works on the currently active (global variable) slot.

It's now ReplicationSlotReserveWal()

Okay.

+     ReplicationSlot *slot = MyReplicationSlot;
+
+     Assert(slot != NULL);
+     Assert(slot->data.restart_lsn == InvalidXLogRecPtr);
+
+     /*
+      * The replication slot mechanism is used to prevent removal of

required

+ * WAL. As there is no interlock between this and checkpoints

required, WAL

+ * segment could be removed before

ReplicationSlotsComputeRequiredLSN() has

+ * been called to prevent that. In the very unlikely case that

this happens

+ * we'll just retry.
+ */

You removed some punctuation in that sentence converting a sentence in
bad english into one without the original meaning ;). See the attached
for a new version.

Your version looks better.

+/*
* Flush all replication slots to disk.
*
* This needn't actually be part of a checkpoint, but it's a convenient
@@ -876,7 +942,7 @@ StartupReplicationSlots(void)
}

/* ----
- * Manipulation of ondisk state of replication slots
+ * Manipulation of on-disk state of replication slots
*
* NB: none of the routines below should take any notice whether a slot

is the

* current one or not, that's all handled a layer above.
diff --git a/src/backend/replication/slotfuncs.c

b/src/backend/replication/slotfuncs.c

index 9a2793f..01b376a 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -40,6 +40,7 @@ Datum
pg_create_physical_replication_slot(PG_FUNCTION_ARGS)
{
Name            name = PG_GETARG_NAME(0);
+     bool            immediately_reserve = PG_GETARG_BOOL(1);
Datum           values[2];
bool            nulls[2];
TupleDesc       tupdesc;
@@ -58,10 +59,28 @@ pg_create_physical_replication_slot(PG_FUNCTION_ARGS)
/* acquire replication slot, this will check for conflicting names

*/

ReplicationSlotCreate(NameStr(*name), false, RS_PERSISTENT);

-     values[0] = NameGetDatum(&MyReplicationSlot->data.name);
+     if (immediately_reserve)
+     {
+             /* Allocate restart-LSN, if the user asked for it */
+             ReplicationSlotRegisterRestartLSN();
+
+             /* Write this slot to disk */
+             ReplicationSlotMarkDirty();
+             ReplicationSlotSave();
-     nulls[0] = false;
-     nulls[1] = true;
+             values[0] = NameGetDatum(&MyReplicationSlot->data.name);
+             values[1] =

LSNGetDatum(MyReplicationSlot->data.restart_lsn);

+
+             nulls[0] = false;
+             nulls[1] = false;
+     }
+     else
+     {
+             values[0] = NameGetDatum(&MyReplicationSlot->data.name);
+
+             nulls[0] = false;
+             nulls[1] = true;
+     }

I moved
values[0] = NameGetDatum(&MyReplicationSlot->data.name);
nulls[0] = false;
to outside the conditional block, there seems no reason to have it in
there?

The assignment to values[0] is being done twice. We can do away with the
one in the else part of the if condition.

Also, there was a typo in my patch [1]I altered you to it in a personal email, but probably it fell through the cracks. -- Gurjeet Singh http://gurjeet.singh.im/ and it seems to have made it into
the commit: <acronym<LSN</>. Tom seems to have just fixed it in
commit 750fc78b.

Best regards,

[1]: I altered you to it in a personal email, but probably it fell through the cracks. -- Gurjeet Singh http://gurjeet.singh.im/
the cracks.
--
Gurjeet Singh http://gurjeet.singh.im/

#16Andres Freund
andres@anarazel.de
In reply to: Gurjeet Singh (#15)
Re: replication slot restart_lsn initialization

On 2015-08-11 15:59:59 -0700, Gurjeet Singh wrote:

In your version, I don't see a comment that refers to the fact that it
works on the currently active (global variable) slot.

Hm?

/*
* Reserve WAL for the currently active slot.
*
* Compute and set restart_lsn in a manner that's appropriate for the type of
* the slot and concurrency safe.
*/

I moved
values[0] = NameGetDatum(&MyReplicationSlot->data.name);
nulls[0] = false;
to outside the conditional block, there seems no reason to have it in
there?

The assignment to values[0] is being done twice. We can do away with the
one in the else part of the if condition.

Ugh, that's a mistake.

[1]: I altered you to it in a personal email, but probably it fell through
the cracks.

I usually just check the lists for newer patch versions, sorry...

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17Michael Paquier
michael@paquier.xyz
In reply to: Andres Freund (#16)
Re: replication slot restart_lsn initialization

On Wed, Aug 12, 2015 at 8:20 AM, Andres Freund <andres@anarazel.de> wrote:

On 2015-08-11 15:59:59 -0700, Gurjeet Singh wrote:

In your version, I don't see a comment that refers to the fact that it
works on the currently active (global variable) slot.

Hm?

/*
* Reserve WAL for the currently active slot.
*
* Compute and set restart_lsn in a manner that's appropriate for the type of
* the slot and concurrency safe.
*/

I moved
values[0] = NameGetDatum(&MyReplicationSlot->data.name);
nulls[0] = false;
to outside the conditional block, there seems no reason to have it in
there?

The assignment to values[0] is being done twice. We can do away with the
one in the else part of the if condition.

Ugh, that's a mistake.

[1]: I altered you to it in a personal email, but probably it fell through
the cracks.

I usually just check the lists for newer patch versions, sorry...

Commit 6fcd8851, which is the result of this thread, is not touching
the replication protocol at all. This looks like an oversight to me:
we should be a maximum consistent between the SQL interface and the
replication protocol if possible, and it looks useful to me to be able
to set restart_lsn when creating the slot as well when using a
replication connection. Most of the work has visibly been done with
the refactoring that created ReplicationSlotReserveWal, hence how
would we expose this option in CREATE_REPLICATION_SLOT?

For now we can do that:
CREATE_REPLICATION_SLOT IDENT K_PHYSICAL
We could append a keyword like RESERVE on this query. Or go through
more fancy things like (slot_options) where slot_options is a list of
option items, reserve = on/off.
Thoughts?
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18Andres Freund
andres@anarazel.de
In reply to: Michael Paquier (#17)
Re: replication slot restart_lsn initialization

On 2015-08-14 16:44:44 +0900, Michael Paquier wrote:

Commit 6fcd8851, which is the result of this thread, is not touching
the replication protocol at all. This looks like an oversight to me:
we should be a maximum consistent between the SQL interface and the
replication protocol if possible, and it looks useful to me to be able
to set restart_lsn when creating the slot as well when using a
replication connection.

It wasn't, at least not from my side. You can relatively easily do
nearly the same just by connecting to the slot and sending a feedback
message. The complaint upthread (and/or a related thread) was that it's
not possible to do the same from SQL.

It'd be a good additional to offer the same facility to the replication
protocol.

For now we can do that: CREATE_REPLICATION_SLOT IDENT K_PHYSICAL We
could append a keyword like RESERVE on this query. Or go through more
fancy things like (slot_options) where slot_options is a list of
option items, reserve = on/off. Thoughts? -- Michael

I'd name it RESERVE_WAL. My feeling is that the options for the logical
case are geared towards the output plugin, not the walsender. I think
it'd be confusing to use (slot_options) differently for physical slots.

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19Shulgin, Oleksandr
oleksandr.shulgin@zalando.de
In reply to: Andres Freund (#18)
Re: replication slot restart_lsn initialization

On Fri, Aug 14, 2015 at 9:54 AM, Andres Freund <andres@anarazel.de> wrote:

On 2015-08-14 16:44:44 +0900, Michael Paquier wrote:

Commit 6fcd8851, which is the result of this thread, is not touching
the replication protocol at all. This looks like an oversight to me:
we should be a maximum consistent between the SQL interface and the
replication protocol if possible, and it looks useful to me to be able
to set restart_lsn when creating the slot as well when using a
replication connection.

It wasn't, at least not from my side. You can relatively easily do
nearly the same just by connecting to the slot and sending a feedback
message. The complaint upthread (and/or a related thread) was that it's
not possible to do the same from SQL.

It'd be a good additional to offer the same facility to the replication
protocol.

For now we can do that: CREATE_REPLICATION_SLOT IDENT K_PHYSICAL We
could append a keyword like RESERVE on this query. Or go through more
fancy things like (slot_options) where slot_options is a list of
option items, reserve = on/off. Thoughts? -- Michael

I'd name it RESERVE_WAL. My feeling is that the options for the logical
case are geared towards the output plugin, not the walsender. I think
it'd be confusing to use (slot_options) differently for physical slots.

Yes, but the options list you pass to START_REPLICATION ... LOGICAL, not to
CREATE_REPLICATION_SLOT.

2c
--
Alex

#20Andres Freund
andres@anarazel.de
In reply to: Shulgin, Oleksandr (#19)
Re: replication slot restart_lsn initialization

On 2015-08-14 11:09:38 +0200, Shulgin, Oleksandr wrote:

Yes, but the options list you pass to START_REPLICATION ... LOGICAL, not to
CREATE_REPLICATION_SLOT.

I know, but we might want to extend that at some point.

- Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21Michael Paquier
michael@paquier.xyz
In reply to: Andres Freund (#18)
#22Peter Eisentraut
peter_e@gmx.net
In reply to: Andres Freund (#18)
#23Andres Freund
andres@anarazel.de
In reply to: Peter Eisentraut (#22)
#24Michael Paquier
michael@paquier.xyz
In reply to: Andres Freund (#23)
#25Andres Freund
andres@anarazel.de
In reply to: Michael Paquier (#21)
#26Michael Paquier
michael@paquier.xyz
In reply to: Andres Freund (#25)