pgsql: walreceiver uses a temporary replication slot by default
walreceiver uses a temporary replication slot by default
If no permanent replication slot is configured using
primary_slot_name, the walreceiver now creates and uses a temporary
replication slot. A new setting wal_receiver_create_temp_slot can be
used to disable this behavior, for example, if the remote instance is
out of replication slots.
Reviewed-by: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>
Discussion: /messages/by-id/CA+fd4k4dM0iEPLxyVyme2RAFsn8SUgrNtBJOu81YqTY4V+nqZA@mail.gmail.com
Branch
------
master
Details
-------
https://git.postgresql.org/pg/commitdiff/329730827848f61eb8d353d5addcbd885fa823da
Modified Files
--------------
doc/src/sgml/config.sgml | 20 +++++++++++
.../libpqwalreceiver/libpqwalreceiver.c | 4 +++
src/backend/replication/walreceiver.c | 41 ++++++++++++++++++++++
src/backend/utils/misc/guc.c | 9 +++++
src/backend/utils/misc/postgresql.conf.sample | 1 +
src/include/replication/walreceiver.h | 7 ++++
6 files changed, 82 insertions(+)
Hi Peter,
(Adding Andres and Sergei in CC.)
On Tue, Jan 14, 2020 at 01:57:34PM +0000, Peter Eisentraut wrote:
walreceiver uses a temporary replication slot by default
If no permanent replication slot is configured using
primary_slot_name, the walreceiver now creates and uses a temporary
replication slot. A new setting wal_receiver_create_temp_slot can be
used to disable this behavior, for example, if the remote instance is
out of replication slots.
A recent message from Seigei Kornilov has attracted my attention to
this commit:
/messages/by-id/370331579618998@vla3-6a5326aeb4ee.qloud-c.yandex.net
In the thread about switching primary_conninfo to be reloadable, we
have argued at great lengths that we should never have the WAL
receiver fetch by itself the GUC parameters used for the connection
with its primary. Here is the main area of the discussion:
/messages/by-id/20190217192720.qphwrraj66rht5lj@alap3.anarazel.de
The previous thread was long enough so it can easily be missed.
However, it seems to me that we may need to revisit a couple of things
for this commit? In short, the following things:
- wal_receiver_create_temp_slot should be made PGC_POSTMASTER,
similarly to primary_slot_name and primary_conninfo.
- WalReceiverMain() should not load the parameter from the GUC context
by itself.
- RequestXLogStreaming(), called by the startup process, should be in
charge of defining if a temp slot should be used or not.
--
Michael
Hello
In short, the following things:
- wal_receiver_create_temp_slot should be made PGC_POSTMASTER,
similarly to primary_slot_name and primary_conninfo.
- WalReceiverMain() should not load the parameter from the GUC context
by itself.
- RequestXLogStreaming(), called by the startup process, should be in
charge of defining if a temp slot should be used or not.
I would like to cross-post here a patch with such changes that I posted in "allow online change primary_conninfo" thread.
This thread is more appropriate for discussion about wal_receiver_create_temp_slot.
PS: I posted this patch in both threads mostly to make cfbot happy.
regards, Sergei
Attachments:
0001_v7_move_temp_slot_logic_to_startup.patchtext/x-diff; name=0001_v7_move_temp_slot_logic_to_startup.patchDownload+22-36
On Tue, Jan 14, 2020 at 8:57 AM Peter Eisentraut <peter@eisentraut.org> wrote:
walreceiver uses a temporary replication slot by default
If no permanent replication slot is configured using
primary_slot_name, the walreceiver now creates and uses a temporary
replication slot. A new setting wal_receiver_create_temp_slot can be
used to disable this behavior, for example, if the remote instance is
out of replication slots.Reviewed-by: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>
Discussion: /messages/by-id/CA+fd4k4dM0iEPLxyVyme2RAFsn8SUgrNtBJOu81YqTY4V+nqZA@mail.gmail.com
Neither the commit message for this patch nor any of the comments in
the patch seem to explain why this is a desirable change.
I assume that's probably discussed on the thread that is linked here,
but you shouldn't have to dig through the discussion thread to figure
out what the benefits of a change like this are.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On 2020-01-23 21:49, Robert Haas wrote:
On Tue, Jan 14, 2020 at 8:57 AM Peter Eisentraut <peter@eisentraut.org> wrote:
walreceiver uses a temporary replication slot by default
If no permanent replication slot is configured using
primary_slot_name, the walreceiver now creates and uses a temporary
replication slot. A new setting wal_receiver_create_temp_slot can be
used to disable this behavior, for example, if the remote instance is
out of replication slots.Reviewed-by: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>
Discussion: /messages/by-id/CA+fd4k4dM0iEPLxyVyme2RAFsn8SUgrNtBJOu81YqTY4V+nqZA@mail.gmail.comNeither the commit message for this patch nor any of the comments in
the patch seem to explain why this is a desirable change.I assume that's probably discussed on the thread that is linked here,
but you shouldn't have to dig through the discussion thread to figure
out what the benefits of a change like this are.
You are right, this has gotten a bit lost in the big thread.
The rationale is basically the same as why client-side tools like
pg_basebackup use a temporary slot: So that the WAL data that they are
interested in doesn't disappear while they are connected.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2020-01-22 06:55, Michael Paquier wrote:
In the thread about switching primary_conninfo to be reloadable, we
have argued at great lengths that we should never have the WAL
receiver fetch by itself the GUC parameters used for the connection
with its primary. Here is the main area of the discussion:
/messages/by-id/20190217192720.qphwrraj66rht5lj@alap3.anarazel.de
The way I understood that discussion was that the issue is having both
the startup process and the WAL receiver having possibly inconsistent
knowledge about the current configuration. That doesn't apply in this
case, because the setting is only used by the WAL receiver. Maybe I
misunderstood.
The previous thread was long enough so it can easily be missed.
However, it seems to me that we may need to revisit a couple of things
for this commit? In short, the following things:
- wal_receiver_create_temp_slot should be made PGC_POSTMASTER,
similarly to primary_slot_name and primary_conninfo.
- WalReceiverMain() should not load the parameter from the GUC context
by itself.
- RequestXLogStreaming(), called by the startup process, should be in
charge of defining if a temp slot should be used or not.
That would be a reasonable fix if we think the above is really an issue.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi,
On 2020-02-10 16:46:04 +0100, Peter Eisentraut wrote:
On 2020-01-22 06:55, Michael Paquier wrote:
In the thread about switching primary_conninfo to be reloadable, we
have argued at great lengths that we should never have the WAL
receiver fetch by itself the GUC parameters used for the connection
with its primary. Here is the main area of the discussion:
/messages/by-id/20190217192720.qphwrraj66rht5lj@alap3.anarazel.deThe way I understood that discussion was that the issue is having both the
startup process and the WAL receiver having possibly inconsistent knowledge
about the current configuration. That doesn't apply in this case, because
the setting is only used by the WAL receiver. Maybe I misunderstood.
Yes, that was my concern there. I do agree there's much less of an issue
here.
I still architecturally don't find it attractive that the active
configuration between walreceiver and startup process can diverge
though. Imagine if we e.g. added the ability to receive WAL over
multiple connections from one host, or from multiple hosts (e.g. to be
able to get the bulk of the WAL from a cascading node, but also to
provide syncrep acknowledgements directly to the primary), or to allow
for logical replication without needing all WAL locally on a standby
doing decoding. It seems not great if there's potentially diverging
configuration (hot standby feedback, temporary slots, ... ) between
those walreceivers, just depending on when they started. Here the model
e.g. paralell workers use, which explicitly ensure that the GUC state is
the same in workers and the leader, is considerably better, imo.
So I think adding more of these parameters affecting walreceivers
without coordination is not going quite in the right direction.
Greetings,
Andres Freund
Hello,
On Mon, 10 Feb 2020 16:37:53 +0100
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
On 2020-01-23 21:49, Robert Haas wrote:
On Tue, Jan 14, 2020 at 8:57 AM Peter Eisentraut <peter@eisentraut.org>
wrote:walreceiver uses a temporary replication slot by default
If no permanent replication slot is configured using
primary_slot_name, the walreceiver now creates and uses a temporary
replication slot. A new setting wal_receiver_create_temp_slot can be
used to disable this behavior, for example, if the remote instance is
out of replication slots.Reviewed-by: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>
Discussion:
/messages/by-id/CA+fd4k4dM0iEPLxyVyme2RAFsn8SUgrNtBJOu81YqTY4V+nqZA@mail.gmail.comNeither the commit message for this patch nor any of the comments in
the patch seem to explain why this is a desirable change.I assume that's probably discussed on the thread that is linked here,
but you shouldn't have to dig through the discussion thread to figure
out what the benefits of a change like this are.You are right, this has gotten a bit lost in the big thread.
The rationale is basically the same as why client-side tools like
pg_basebackup use a temporary slot: So that the WAL data that they are
interested in doesn't disappear while they are connected.
In my humble opinion, I prefer the previous behavior, streaming without
temporary slot, for one reason: primary availability.
Should the standby lag far behind the primary (no matter the root cause),
the standby was disconnected because of missing WAL. Worst case scenario, we
must rebuild it, hopefully from backups. Best case scenario, it fetches WALs
from PITR backup. As soon as the later is possible in the stack, I consider slot
like a burden from the operability point of view. If standbys can not fetch
archived WAL from PITR, then we can consider slots.
With temp slot created by default, if one standby lag far behind, it can make
the primary unavailable. We have nothing yet to forbid a slot to fill the
pg_wal partition. How new users creating their first cluster would react in such
situation? I suppose the original discussion was mostly targeting them?
Recovering from this is way more scary than building a standby.
So the default behavior might not be desirable and maybe
wal_receiver_create_temp_slot might be off by default?
Note that Kyotaro HORIGUCHI is working on a patch to restricting maximum keep
segments by repslots:
/messages/by-id/20190627162256.4f4872b8@firost
Regards,
On Mon, Feb 10, 2020 at 01:46:04PM -0800, Andres Freund wrote:
I still architecturally don't find it attractive that the active
configuration between walreceiver and startup process can diverge
though. Imagine if we e.g. added the ability to receive WAL over
multiple connections from one host, or from multiple hosts (e.g. to be
able to get the bulk of the WAL from a cascading node, but also to
provide syncrep acknowledgements directly to the primary), or to allow
for logical replication without needing all WAL locally on a standby
doing decoding. It seems not great if there's potentially diverging
configuration (hot standby feedback, temporary slots, ... ) between
those walreceivers, just depending on when they started. Here the model
e.g. parallel workers use, which explicitly ensure that the GUC state is
the same in workers and the leader, is considerably better, imo.
Yes, I still think that we should fix that inconsistency, mark the new
GUC wal_receiver_create_temp_slot as PGC_POSTMASTER, and add a note at
the top of RequestXLogStreaming() and walreceiver.c about the
assumptions we'd prefer rely to for the GUCs starting a WAL receiver.
So I think adding more of these parameters affecting walreceivers
without coordination is not going quite in the right direction.
Indeed. Adding more comments would be one way to prevent the
situation to happen here, I fear that others may forget this stuff in
the future.
--
Michael
On 2020/02/12 7:53, Jehan-Guillaume de Rorthais wrote:
Hello,
On Mon, 10 Feb 2020 16:37:53 +0100
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:On 2020-01-23 21:49, Robert Haas wrote:
On Tue, Jan 14, 2020 at 8:57 AM Peter Eisentraut <peter@eisentraut.org>
wrote:walreceiver uses a temporary replication slot by default
If no permanent replication slot is configured using
primary_slot_name, the walreceiver now creates and uses a temporary
replication slot. A new setting wal_receiver_create_temp_slot can be
used to disable this behavior, for example, if the remote instance is
out of replication slots.Reviewed-by: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>
Discussion:
/messages/by-id/CA+fd4k4dM0iEPLxyVyme2RAFsn8SUgrNtBJOu81YqTY4V+nqZA@mail.gmail.comNeither the commit message for this patch nor any of the comments in
the patch seem to explain why this is a desirable change.I assume that's probably discussed on the thread that is linked here,
but you shouldn't have to dig through the discussion thread to figure
out what the benefits of a change like this are.You are right, this has gotten a bit lost in the big thread.
The rationale is basically the same as why client-side tools like
pg_basebackup use a temporary slot: So that the WAL data that they are
interested in doesn't disappear while they are connected.In my humble opinion, I prefer the previous behavior, streaming without
temporary slot, for one reason: primary availability.
+1
Should the standby lag far behind the primary (no matter the root cause),
the standby was disconnected because of missing WAL. Worst case scenario, we
must rebuild it, hopefully from backups. Best case scenario, it fetches WALs
from PITR backup. As soon as the later is possible in the stack, I consider slot
like a burden from the operability point of view. If standbys can not fetch
archived WAL from PITR, then we can consider slots.With temp slot created by default, if one standby lag far behind, it can make
the primary unavailable. We have nothing yet to forbid a slot to fill the
pg_wal partition. How new users creating their first cluster would react in such
situation? I suppose the original discussion was mostly targeting them?
Recovering from this is way more scary than building a standby.So the default behavior might not be desirable and maybe
wal_receiver_create_temp_slot might be off by default?Note that Kyotaro HORIGUCHI is working on a patch to restricting maximum keep
segments by repslots:
Yeah, I think it's better to disable this option until something like
Horiguchi-san's proposal will have been committed, i.e., until
the upper limit on the number (or size) of WAL files that remain
for slots become configurable.
Regards,
--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters
On Wed, Feb 12, 2020 at 06:11:06PM +0900, Fujii Masao wrote:
On 2020/02/12 7:53, Jehan-Guillaume de Rorthais wrote:
In my humble opinion, I prefer the previous behavior, streaming without
temporary slot, for one reason: primary availability.+1
With temp slot created by default, if one standby lag far behind, it can make
the primary unavailable. We have nothing yet to forbid a slot to fill the
pg_wal partition. How new users creating their first cluster would react in such
situation? I suppose the original discussion was mostly targeting them?
Recovering from this is way more scary than building a standby.So the default behavior might not be desirable and maybe
wal_receiver_create_temp_slot might be off by default?Note that Kyotaro HORIGUCHI is working on a patch to restricting maximum keep
segments by repslots:Yeah, I think it's better to disable this option until something like
Horiguchi-san's proposal will have been committed, i.e., until
the upper limit on the number (or size) of WAL files that remain
for slots become configurable.
Even with that, are we sure this extra feature would be a reason
sufficient to change the default value of this option to be enabled?
I am not sure about that either. My opinion is that this option is
useful to have and that it is not really a problem if you have slot
monitoring on the primary (or a standby for cascading). And I'd like
to believe that it is a common practice lately for base backups,
archivers based on pg_receivewal or even logical decoding, but it
could be surprising for some users who do not do that yet. So
Jehan-Guillaume's arguments sound also sensible to me (he also
maintains an automatic failover solution called PAF).
From what I can see nobody really likes the current state of things
for this option, and that does not come down only to its default
value. The default GUC value and the way the parameter is loaded by
the WAL sender are problematic, still easy enough to fix. How do we
move on from here? I could post a patch based on what Sergei Kornilov
has sent around [1]/messages/by-id/20200122055510.GH174860@paquier.xyz -- Michael, but that's Peter's feature. Any opinions?
[1]: /messages/by-id/20200122055510.GH174860@paquier.xyz -- Michael
--
Michael
At Thu, 13 Feb 2020 16:48:21 +0900, Michael Paquier <michael@paquier.xyz> wrote in
On Wed, Feb 12, 2020 at 06:11:06PM +0900, Fujii Masao wrote:
On 2020/02/12 7:53, Jehan-Guillaume de Rorthais wrote:
In my humble opinion, I prefer the previous behavior, streaming without
temporary slot, for one reason: primary availability.+1
With temp slot created by default, if one standby lag far behind, it can make
the primary unavailable. We have nothing yet to forbid a slot to fill the
pg_wal partition. How new users creating their first cluster would react in such
situation? I suppose the original discussion was mostly targeting them?
Recovering from this is way more scary than building a standby.So the default behavior might not be desirable and maybe
wal_receiver_create_temp_slot might be off by default?Note that Kyotaro HORIGUCHI is working on a patch to restricting maximum keep
segments by repslots:Yeah, I think it's better to disable this option until something like
Horiguchi-san's proposal will have been committed, i.e., until
the upper limit on the number (or size) of WAL files that remain
for slots become configurable.Even with that, are we sure this extra feature would be a reason
sufficient to change the default value of this option to be enabled?
I think the feature (slot limit) is not going to be an reason to
enable it (tmp slot). In the first place I think we cannot determine
the default value generally workable..
I am not sure about that either. My opinion is that this option is
useful to have and that it is not really a problem if you have slot
monitoring on the primary (or a standby for cascading). And I'd like
to believe that it is a common practice lately for base backups,
archivers based on pg_receivewal or even logical decoding, but it
could be surprising for some users who do not do that yet. So
Jehan-Guillaume's arguments sound also sensible to me (he also
maintains an automatic failover solution called PAF).From what I can see nobody really likes the current state of things
for this option, and that does not come down only to its default
value. The default GUC value and the way the parameter is loaded by
the WAL sender are problematic, still easy enough to fix. How do we
move on from here? I could post a patch based on what Sergei Kornilov
has sent around [1], but that's Peter's feature. Any opinions?
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
On Wed, Jan 22, 2020 at 06:58:46PM +0300, Sergei Kornilov wrote:
I would like to cross-post here a patch with such changes that I posted in "allow online change primary_conninfo" thread.
This thread is more appropriate for discussion about wal_receiver_create_temp_slot.PS: I posted this patch in both threads mostly to make cfbot happy.
Thanks for posting this patch, Sergei. Here is a review to make
things move on.
- * Create temporary replication slot if no slot name is configured or
- * the slot from the previous run was temporary, unless
- * wal_receiver_create_temp_slot is disabled. We also need to handle
- * the case where the previous run used a temporary slot but
- * wal_receiver_create_temp_slot was changed in the meantime. In that
- * case, we delete the old slot name in shared memory. (This would
+ * Create temporary replication slot if requested. In that
+ * case, we update slot name in shared memory. (This would
The set of comments you are removing from walreceiver.c to decide if a
temporary slot needs to be created or not should be moved to
walreceiverfuncs.c as you move the logic from the WAL receiver startup
phase to the moment the WAL receiver spawn is requested.
I agree with the simplifications in WalReceiverMain() as you have
switched wal_receiver_create_temp_slot to be PGC_POSTMASTER, so
modifications are no longer a state that matter.
It would be more consistent with primary_conn_info and
primary_slot_name if wal_receiver_create_temp_slot is passed down as
an argument of RequestXLogStreaming().
As per the discussion done on this thread, let's also switch the
parameter default to be disabled. Peter, as the committer of 3297308,
it would be good if you could chime in.
--
Michael
Hello
Thanks for posting this patch, Sergei. Here is a review to make
things move on.
Thank you, here is updated patch
The set of comments you are removing from walreceiver.c to decide if a
temporary slot needs to be created or not should be moved to
walreceiverfuncs.c as you move the logic from the WAL receiver startup
phase to the moment the WAL receiver spawn is requested.
I changed this comments because they describes behavior during change value of wal_receiver_create_temp_slot.
But yes, I need to add some comments to RequestXLogStreaming.
It would be more consistent with primary_conn_info and
primary_slot_name if wal_receiver_create_temp_slot is passed down as
an argument of RequestXLogStreaming().
Yep, I thought about that. Changed.
As per the discussion done on this thread, let's also switch the
parameter default to be disabled.
Done (my vote is also for disabling this option by default).
regards, Sergei
Attachments:
0001-v2-change-wal_receiver_create_temp_slot.patchtext/x-diff; name=0001-v2-change-wal_receiver_create_temp_slot.patchDownload+37-45
On Mon, Feb 17, 2020 at 04:57:04PM +0300, Sergei Kornilov wrote:
Thank you, here is updated patch
Thanks
I changed this comments because they describes behavior during
change value of wal_receiver_create_temp_slot. But yes, I need to
add some comments to RequestXLogStreaming.
I have reworked that part, adding more comments about the use of GUC
parameters when establishing the connection to the primary for a WAL
receiver. And also I have added an extra comment to walreceiver.c
about the use of GUcs in general, to avoid this stuff again in the
future. There were some extra nits with the format of
postgresql.conf.sample.
As per the discussion done on this thread, let's also switch the
parameter default to be disabled.Done (my vote is also for disabling this option by default).
We visibly tend to move in this direction, at least based on our
discussion. Let's see where this leads. For now, I have registered
this patch to next CF (https://commitfest.postgresql.org/27/2456/),
with yourself as author and myself as reviewer, and then let's wait
for mainly Peter E. and others for more input.
--
Michael
Attachments:
0001-v3-change-wal_receiver_create_temp_slot.patchtext/x-diff; charset=us-asciiDownload+49-45
Hello
I have reworked that part, adding more comments about the use of GUC
parameters when establishing the connection to the primary for a WAL
receiver. And also I have added an extra comment to walreceiver.c
about the use of GUcs in general, to avoid this stuff again in the
future. There were some extra nits with the format of
postgresql.conf.sample.
Thank you! I just noticed that you removed my proposed change to this condition in RequestXLogStreaming
- if (slotname != NULL)
+ if (slotname != NULL && slotname[0] != '\0')
We need this change to set is_temp_slot properly. PrimarySlotName GUC can usually be an empty string, so just "slotname != NULL" is not enough.
I attached patch with this change.
regards, Sergei
Attachments:
0001-v3-change-wal_receiver_create_temp_slot.patchtext/x-diff; name=0001-v3-change-wal_receiver_create_temp_slot.patchDownload+50-46
On Tue, Mar 17, 2020 at 11:39:11PM +0300, Sergei Kornilov wrote:
We need this change to set is_temp_slot properly. PrimarySlotName
GUC can usually be an empty string, so just "slotname != NULL" is
not enough.
Yep, or a temporary slot would never be created even if there is no
slot defined, and the priority goes to primary_slot_name if set.
I attached patch with this change.
Thanks, I have added a new open item for v13 to track this effort:
https://wiki.postgresql.org/wiki/PostgreSQL_13_Open_Items
--
Michael