Function to move the position of a replication slot

Started by Magnus Haganderalmost 9 years ago33 messageshackers
Jump to latest
#1Magnus Hagander
magnus@hagander.net

PFA a patch that adds a new function, pg_move_replication_slot, that makes
it possible to move the location of a replication slot without actually
consuming all the WAL on it.

This can be useful for example to keep replication slots in sync between
different servers in a replication cluster.

(Obviously this is intended for 11, as we're well into the freeze for 10.
Just to be clear. so I'll go add itto the summer commitfest)

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/&gt;
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/&gt;

Attachments:

pg_move_replication_slot.patchtext/x-patch; charset=US-ASCII; name=pg_move_replication_slot.patchDownload+80-0
#2Craig Ringer
craig@2ndquadrant.com
In reply to: Magnus Hagander (#1)
Re: Function to move the position of a replication slot

On 4 May 2017 at 20:05, Magnus Hagander <magnus@hagander.net> wrote:

PFA a patch that adds a new function, pg_move_replication_slot, that makes
it possible to move the location of a replication slot without actually
consuming all the WAL on it.

This can be useful for example to keep replication slots in sync between
different servers in a replication cluster.

It needs a test to ensure it only operates on physical slots. It
should ERROR on a logical slot, since it has no way of correctly
advancing the catalog_xmin or finding a reasonable restart_lsn for
logical decoding.

I'm still fine with the name, since I plan to add that capability in
pg11 by running through logical decoding and ReorderBufferSkip()ing
each xact until we reach the target lsn.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Magnus Hagander
magnus@hagander.net
In reply to: Craig Ringer (#2)
Re: Function to move the position of a replication slot

On Thu, May 4, 2017 at 2:42 PM, Craig Ringer <craig@2ndquadrant.com> wrote:

On 4 May 2017 at 20:05, Magnus Hagander <magnus@hagander.net> wrote:

PFA a patch that adds a new function, pg_move_replication_slot, that

makes

it possible to move the location of a replication slot without actually
consuming all the WAL on it.

This can be useful for example to keep replication slots in sync between
different servers in a replication cluster.

It needs a test to ensure it only operates on physical slots. It
should ERROR on a logical slot, since it has no way of correctly
advancing the catalog_xmin or finding a reasonable restart_lsn for
logical decoding.

I guess that makes sense, yeah. I didn't try it with that :)

I'm still fine with the name, since I plan to add that capability in
pg11 by running through logical decoding and ReorderBufferSkip()ing
each xact until we reach the target lsn.

Cool.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/&gt;
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/&gt;

#4Craig Ringer
craig@2ndquadrant.com
In reply to: Magnus Hagander (#3)
Re: Function to move the position of a replication slot

On 4 May 2017 at 20:45, Magnus Hagander <magnus@hagander.net> wrote:

On Thu, May 4, 2017 at 2:42 PM, Craig Ringer <craig@2ndquadrant.com> wrote:

On 4 May 2017 at 20:05, Magnus Hagander <magnus@hagander.net> wrote:

PFA a patch that adds a new function, pg_move_replication_slot, that
makes
it possible to move the location of a replication slot without actually
consuming all the WAL on it.

This can be useful for example to keep replication slots in sync between
different servers in a replication cluster.

It needs a test to ensure it only operates on physical slots. It
should ERROR on a logical slot, since it has no way of correctly
advancing the catalog_xmin or finding a reasonable restart_lsn for
logical decoding.

I guess that makes sense, yeah. I didn't try it with that :)

Barring that and matching docs changes, looks fine to me at first glance.

Not sure you need to acquire the spinlock on the slot, since you
acquired the slot for your backend. It won't hurt, but I don't think
it's necessary.

I'm not convinced you need a WARNING for moving the slot backwards. If
one is emitted, it should be a proper ereport with current position
and requested position in errdetail. I'm not sure it's a useful
message though.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Craig Ringer (#4)
Re: Function to move the position of a replication slot

On 4 May 2017 at 20:05, Magnus Hagander <magnus@hagander.net> wrote:

PFA a patch that adds a new function, pg_move_replication_slot, that

makes it

possible to move the location of a replication slot without actually
consuming all the WAL on it.

Just a few questions just a few questions out of curiosity:

* does it make sense to create a few tests for this function in
`contrib/test_decoding` (as shown in attachment)?

* what should happen if the second argument is `NULL`? There is a
verification
`XLogRecPtrIsInvalid(moveto)`, but it's possible to pass `NULL`, and looks
like it leads to result different from boolean:

```
SELECT pg_move_replication_slot('regression_slot_5', NULL);
pg_move_replication_slot
--------------------------

(1 row)
```

Attachments:

pg_move_replication_slot_with_tests.patchtext/x-patch; charset=US-ASCII; name=pg_move_replication_slot_with_tests.patchDownload+125-0
#6Petr Jelinek
petr@2ndquadrant.com
In reply to: Dmitry Dolgov (#5)
Re: Function to move the position of a replication slot

On 10/05/17 22:17, Dmitry Dolgov wrote:

On 4 May 2017 at 20:05, Magnus Hagander <magnus@hagander.net

<mailto:magnus@hagander.net>> wrote:

PFA a patch that adds a new function, pg_move_replication_slot, that

makes it

possible to move the location of a replication slot without actually
consuming all the WAL on it.

Just a few questions just a few questions out of curiosity:

* does it make sense to create a few tests for this function in
`contrib/test_decoding` (as shown in attachment)?

As Craig said this will not work correctly for logical slots so it
should throw error on those at the moment (at least until we write
something that works, but that's far more complex than this patch).

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Peter Eisentraut
peter_e@gmx.net
In reply to: Magnus Hagander (#1)
Re: Function to move the position of a replication slot

On 5/4/17 08:05, Magnus Hagander wrote:

PFA a patch that adds a new function, pg_move_replication_slot, that
makes it possible to move the location of a replication slot without
actually consuming all the WAL on it.

The name keeps confusing me. I understand "move" to be "rename" or
possibly "move it elsewhere", but not "move around in". I understand
that there is an analogy with FETCH/MOVE in cursors, but it's still
confusing.

I would also like to see some documentation for a use case of this.

Anyway, as discussed elsewhere in this thread, this patch needs several
changes.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Andres Freund
andres@anarazel.de
In reply to: Peter Eisentraut (#7)
Re: Function to move the position of a replication slot

On 2017-08-16 12:24:11 -0400, Peter Eisentraut wrote:

On 5/4/17 08:05, Magnus Hagander wrote:

PFA a patch that adds a new function, pg_move_replication_slot, that
makes it possible to move the location of a replication slot without
actually consuming all the WAL on it.

The name keeps confusing me. I understand "move" to be "rename" or
possibly "move it elsewhere", but not "move around in". I understand
that there is an analogy with FETCH/MOVE in cursors, but it's still
confusing.

pg_forward_replication_slot()?

- Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#8)
Re: Function to move the position of a replication slot

Andres Freund <andres@anarazel.de> writes:

On 2017-08-16 12:24:11 -0400, Peter Eisentraut wrote:

On 5/4/17 08:05, Magnus Hagander wrote:

PFA a patch that adds a new function, pg_move_replication_slot, that
makes it possible to move the location of a replication slot without
actually consuming all the WAL on it.

The name keeps confusing me. I understand "move" to be "rename" or
possibly "move it elsewhere", but not "move around in". I understand
that there is an analogy with FETCH/MOVE in cursors, but it's still
confusing.

pg_forward_replication_slot()?

If I understand what this is meant to do, maybe better
pg_move_replication_slot_lsn() or pg_change_replication_slot_lsn() ?
The point being that you're adjusting the LSN pointer contained
in the slot, which is distinct from the slot itself.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#9)
Re: Function to move the position of a replication slot

On 2017-08-16 17:06:42 -0400, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

On 2017-08-16 12:24:11 -0400, Peter Eisentraut wrote:

On 5/4/17 08:05, Magnus Hagander wrote:

PFA a patch that adds a new function, pg_move_replication_slot, that
makes it possible to move the location of a replication slot without
actually consuming all the WAL on it.

The name keeps confusing me. I understand "move" to be "rename" or
possibly "move it elsewhere", but not "move around in". I understand
that there is an analogy with FETCH/MOVE in cursors, but it's still
confusing.

pg_forward_replication_slot()?

If I understand what this is meant to do, maybe better
pg_move_replication_slot_lsn() or pg_change_replication_slot_lsn() ?
The point being that you're adjusting the LSN pointer contained
in the slot, which is distinct from the slot itself.

I think we should constrain the API to only allow later LSNs than
currently in the slot, rather than arbitrary ones. That's why I was
thinking of "forward". I'm not convinced it's a good / safe idea to
allow arbitrary values to be set.

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#10)
Re: Function to move the position of a replication slot

Andres Freund <andres@anarazel.de> writes:

On 2017-08-16 17:06:42 -0400, Tom Lane wrote:

If I understand what this is meant to do, maybe better
pg_move_replication_slot_lsn() or pg_change_replication_slot_lsn() ?
The point being that you're adjusting the LSN pointer contained
in the slot, which is distinct from the slot itself.

I think we should constrain the API to only allow later LSNs than
currently in the slot, rather than arbitrary ones. That's why I was
thinking of "forward". I'm not convinced it's a good / safe idea to
allow arbitrary values to be set.

+1 for constraining it like that, but I don't think that's an argument
against using "move" or "change" as the verb. I don't like "forward"
because that's not the right word. The only verb senses of "forward"
in my Mac's dictionary are "send a message on to a further destination"
and "help to advance or promote" (the latter usage is pretty obscure IMO).
Neither one seems applicable here.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Andres Freund (#10)
Re: Function to move the position of a replication slot

Andres Freund wrote:

On 2017-08-16 17:06:42 -0400, Tom Lane wrote:

If I understand what this is meant to do, maybe better
pg_move_replication_slot_lsn() or pg_change_replication_slot_lsn() ?
The point being that you're adjusting the LSN pointer contained
in the slot, which is distinct from the slot itself.

I think we should constrain the API to only allow later LSNs than
currently in the slot, rather than arbitrary ones. That's why I was
thinking of "forward". I'm not convinced it's a good / safe idea to
allow arbitrary values to be set.

Hmm. In terms of safety, it is safe to move the LSN backwards, as long
as the oldest LSN across all slots is not changed -- in other words, the
actual safe limit is the oldest of all slot LSNs, rather than the
current position of the slot being manipulated (which is what you're
saying).

I don't know if this is useful for the use case Magnus described; TBH I
didn't understand that use case.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#12)
Re: Function to move the position of a replication slot

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

Andres Freund wrote:

I think we should constrain the API to only allow later LSNs than
currently in the slot, rather than arbitrary ones. That's why I was
thinking of "forward". I'm not convinced it's a good / safe idea to
allow arbitrary values to be set.

Hmm. In terms of safety, it is safe to move the LSN backwards, as long
as the oldest LSN across all slots is not changed -- in other words, the
actual safe limit is the oldest of all slot LSNs, rather than the
current position of the slot being manipulated (which is what you're
saying).

True, but you'd need to take some kind of global lock to verify that,
whereas "move forward only" would only need to examine/change the one
slot.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#11)
Re: Function to move the position of a replication slot

On 2017-08-16 18:14:45 -0400, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

On 2017-08-16 17:06:42 -0400, Tom Lane wrote:

If I understand what this is meant to do, maybe better
pg_move_replication_slot_lsn() or pg_change_replication_slot_lsn() ?
The point being that you're adjusting the LSN pointer contained
in the slot, which is distinct from the slot itself.

I think we should constrain the API to only allow later LSNs than
currently in the slot, rather than arbitrary ones. That's why I was
thinking of "forward". I'm not convinced it's a good / safe idea to
allow arbitrary values to be set.

+1 for constraining it like that, but I don't think that's an argument
against using "move" or "change" as the verb. I don't like "forward"
because that's not the right word. The only verb senses of "forward"
in my Mac's dictionary are "send a message on to a further destination"
and "help to advance or promote" (the latter usage is pretty obscure IMO).
Neither one seems applicable here.

I might have thinking too much with the German "version" of the word in
mind. However looking it up now I do see "to advance or help onward;
promote" and "to advance or play a cassette, digital recording, slide
projector, etc., in the forward direction: " -
http://www.dictionary.com/browse/forward
which kind of seems to fit. don't know how authoritative which
dictionary is considered to be...

- Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Michael Paquier
michael@paquier.xyz
In reply to: Tom Lane (#11)
Re: Function to move the position of a replication slot

On Thu, Aug 17, 2017 at 7:14 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andres Freund <andres@anarazel.de> writes:

On 2017-08-16 17:06:42 -0400, Tom Lane wrote:

If I understand what this is meant to do, maybe better
pg_move_replication_slot_lsn() or pg_change_replication_slot_lsn() ?
The point being that you're adjusting the LSN pointer contained
in the slot, which is distinct from the slot itself.

I think we should constrain the API to only allow later LSNs than
currently in the slot, rather than arbitrary ones. That's why I was
thinking of "forward". I'm not convinced it's a good / safe idea to
allow arbitrary values to be set.

+1 for constraining it like that, but I don't think that's an argument
against using "move" or "change" as the verb. I don't like "forward"
because that's not the right word. The only verb senses of "forward"
in my Mac's dictionary are "send a message on to a further destination"
and "help to advance or promote" (the latter usage is pretty obscure IMO).
Neither one seems applicable here.

Definitely agreed on that. Any move function would need to check if
the WAL position given by caller is already newer than what's
available in the local pg_wal (minimum of all other slots), with a
shared lock that would need to be taken by xlog.c when recycling past
segments. A forward function works on a single entry, which should be
disabled at the moment of the update. It looks dangerous to me to do
such an operation if there is a consumer of a slot currently on it.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16Craig Ringer
craig@2ndquadrant.com
In reply to: Michael Paquier (#15)
Re: Function to move the position of a replication slot

On 17 August 2017 at 07:30, Michael Paquier <michael.paquier@gmail.com>
wrote:

Definitely agreed on that. Any move function would need to check if
the WAL position given by caller is already newer than what's
available in the local pg_wal (minimum of all other slots), with a
shared lock that would need to be taken by xlog.c when recycling past
segments. A forward function works on a single entry, which should be
disabled at the moment of the update. It looks dangerous to me to do
such an operation if there is a consumer of a slot currently on it.

pg_advance_replication_slot(...)

ERROR's on logical slot, for now. Physical slots only.

Forward-only.

Future work to allow it to use the logical decoding infrastructure to
fast-forward a slot by reading only catalog change information and
invalidations, either via a dummy output plugin that filters out all xacts,
or by lower level use of the decoding code.

Reasonable?

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#17Michael Paquier
michael@paquier.xyz
In reply to: Craig Ringer (#16)
Re: Function to move the position of a replication slot

On Thu, Aug 17, 2017 at 9:19 AM, Craig Ringer <craig@2ndquadrant.com> wrote:

pg_advance_replication_slot(...)

ERROR's on logical slot, for now. Physical slots only.

Forward-only.

Future work to allow it to use the logical decoding infrastructure to
fast-forward a slot by reading only catalog change information and
invalidations, either via a dummy output plugin that filters out all xacts,
or by lower level use of the decoding code.

Reasonable?

Yes. I did not imply logical slots in my previous message, sorry if my
words were incomplete.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#10)
Re: Function to move the position of a replication slot

On Wed, Aug 16, 2017 at 5:55 PM, Andres Freund <andres@anarazel.de> wrote:

I think we should constrain the API to only allow later LSNs than
currently in the slot, rather than arbitrary ones. That's why I was
thinking of "forward". I'm not convinced it's a good / safe idea to
allow arbitrary values to be set.

Maybe I shouldn't play the devil's advocate here, but isn't a feature
like this by definition only for people who Know What They Are Doing?
If so, why not let them back the slot up? I'm sure that will work out
just fine. They Know What They Are Doing.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#18)
Re: Function to move the position of a replication slot

On 2017-08-16 21:25:48 -0400, Robert Haas wrote:

On Wed, Aug 16, 2017 at 5:55 PM, Andres Freund <andres@anarazel.de> wrote:

I think we should constrain the API to only allow later LSNs than
currently in the slot, rather than arbitrary ones. That's why I was
thinking of "forward". I'm not convinced it's a good / safe idea to
allow arbitrary values to be set.

Maybe I shouldn't play the devil's advocate here, but isn't a feature
like this by definition only for people who Know What They Are Doing?
If so, why not let them back the slot up? I'm sure that will work out
just fine. They Know What They Are Doing.

I have yet to hear a reason for allowing to move things backward etc. So
I'm not sure what the benefit would be. But more importantly I'd like to
make this available to non-superusers at some point, and there I think
it's more important that they can't do bad things. The reason for
allowing it for non-superusers is that I think it's quite a useful
function to be used by an automated system. E.g. to ensure enough, but
not too much, WAL is available for a tertiary standby both on the actual
primary and a failover node.

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20Craig Ringer
craig@2ndquadrant.com
In reply to: Andres Freund (#19)
Re: Function to move the position of a replication slot

On 17 August 2017 at 09:33, Andres Freund <andres@anarazel.de> wrote:

On 2017-08-16 21:25:48 -0400, Robert Haas wrote:

On Wed, Aug 16, 2017 at 5:55 PM, Andres Freund <andres@anarazel.de>

wrote:

I think we should constrain the API to only allow later LSNs than
currently in the slot, rather than arbitrary ones. That's why I was
thinking of "forward". I'm not convinced it's a good / safe idea to
allow arbitrary values to be set.

Maybe I shouldn't play the devil's advocate here, but isn't a feature
like this by definition only for people who Know What They Are Doing?
If so, why not let them back the slot up? I'm sure that will work out
just fine. They Know What They Are Doing.

I have yet to hear a reason for allowing to move things backward etc. So
I'm not sure what the benefit would be. But more importantly I'd like to
make this available to non-superusers at some point, and there I think
it's more important that they can't do bad things. The reason for
allowing it for non-superusers is that I think it's quite a useful
function to be used by an automated system. E.g. to ensure enough, but
not too much, WAL is available for a tertiary standby both on the actual
primary and a failover node.

I strongly agree.

If you really need to move a physical slot back (why?) you can do it with
an extension that uses the low level APIs. But I can't see why you would
want to.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#21Magnus Hagander
magnus@hagander.net
In reply to: Craig Ringer (#16)
#22Peter Eisentraut
peter_e@gmx.net
In reply to: Magnus Hagander (#21)
#23Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#22)
#24Michael Paquier
michael@paquier.xyz
In reply to: Tom Lane (#23)
#25Robert Haas
robertmhaas@gmail.com
In reply to: Peter Eisentraut (#22)
#26Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#23)
#27Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#26)
#28Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#27)
#29Magnus Hagander
magnus@hagander.net
In reply to: Robert Haas (#25)
#30Michael Paquier
michael@paquier.xyz
In reply to: Magnus Hagander (#21)
#31Andres Freund
andres@anarazel.de
In reply to: Michael Paquier (#30)
#32Michael Paquier
michael@paquier.xyz
In reply to: Andres Freund (#31)
#33Magnus Hagander
magnus@hagander.net
In reply to: Michael Paquier (#32)