pg_basebackups and slots
I've started work on a patch to make pg_basebackup use the temporary slots
feature that has been committed (thanks Petr!!). The reason for this is to
avoid the cases where a burst of traffic on the master during the backup
can cause the receive log part of the basebackup to fall far enough behind
that it fails.
I have a few considerations at this point, about interaction with existing
options.
Right now, we have -S/--slot which specifies a slot name. If you want to
use that, you have to create the slot ahead of time, and it will be a
permanent slot (of course). This is primarily documented as a feature to
use for replication (to make sure xlog is kept around until the standby is
started up), but it does also solve the same problem. But to use it for
base backups today you have to manually create the slot, then base backup,
then drop the slot, which is error prone.
My thought is that if -S/--slot is not specified, but -X stream is, then we
use a temporary slot always. This obviously requires the server to be
configured with enough slots (I still think we should change the default
here, but that's a different topic), but I think that's acceptable. Then we
should add a "--no-slot" to make it revert to previous behaviour.
Does that seem reasonable? Or would people prefer it to default to off?
However, it also made me think that the interface for setting up a replica
with a *permanent* slot would be much nicer if we could just have
pg_basebackup create the slot, instead of having to create it manually.
Basically, I'm envisioning adding an IF_NOT_EXISTS argument to
CREATE_REPLICATION_SLOT, so that pg_basebackup can pass up the slot name it
wants to use and have it created if necessary. Do people think think this
is worth doing? (It would also make pg_receivexlog and pg_receivelogical
easier to use, I think, and it would obviously be implemented there as
well).
--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/
On Thu, Dec 15, 2016 at 10:04 AM, Magnus Hagander <magnus@hagander.net>
wrote:
I've started work on a patch to make pg_basebackup use the temporary slots
feature that has been committed (thanks Petr!!). The reason for this is to
avoid the cases where a burst of traffic on the master during the backup
can cause the receive log part of the basebackup to fall far enough behind
that it fails.I have a few considerations at this point, about interaction with existing
options.Right now, we have -S/--slot which specifies a slot name. If you want to
use that, you have to create the slot ahead of time, and it will be a
permanent slot (of course). This is primarily documented as a feature to
use for replication (to make sure xlog is kept around until the standby is
started up), but it does also solve the same problem. But to use it for
base backups today you have to manually create the slot, then base backup,
then drop the slot, which is error prone.My thought is that if -S/--slot is not specified, but -X stream is, then
we use a temporary slot always. This obviously requires the server to be
configured with enough slots (I still think we should change the default
here, but that's a different topic), but I think that's acceptable. Then we
should add a "--no-slot" to make it revert to previous behaviour.Does that seem reasonable? Or would people prefer it to default to off?
However, it also made me think that the interface for setting up a replica
with a *permanent* slot would be much nicer if we could just have
pg_basebackup create the slot, instead of having to create it manually.Basically, I'm envisioning adding an IF_NOT_EXISTS argument to
CREATE_REPLICATION_SLOT, so that pg_basebackup can pass up the slot name it
wants to use and have it created if necessary. Do people think think this
is worth doing? (It would also make pg_receivexlog and pg_receivelogical
easier to use, I think, and it would obviously be implemented there as
well).
Pah, nevermind this last question. I was looking a completely broken
branch -- we already have the if-not-exists parameter for
pg_receivexlog/pg_receivelogical. I shouldn't write emails before
thinking...
--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/
On Thu, Dec 15, 2016 at 10:26 AM, Magnus Hagander <magnus@hagander.net>
wrote:
On Thu, Dec 15, 2016 at 10:04 AM, Magnus Hagander <magnus@hagander.net>
wrote:I've started work on a patch to make pg_basebackup use the temporary
slots feature that has been committed (thanks Petr!!). The reason for this
is to avoid the cases where a burst of traffic on the master during the
backup can cause the receive log part of the basebackup to fall far enough
behind that it fails.I have a few considerations at this point, about interaction with
existing options.Right now, we have -S/--slot which specifies a slot name. If you want to
use that, you have to create the slot ahead of time, and it will be a
permanent slot (of course). This is primarily documented as a feature to
use for replication (to make sure xlog is kept around until the standby is
started up), but it does also solve the same problem. But to use it for
base backups today you have to manually create the slot, then base backup,
then drop the slot, which is error prone.My thought is that if -S/--slot is not specified, but -X stream is, then
we use a temporary slot always. This obviously requires the server to be
configured with enough slots (I still think we should change the default
here, but that's a different topic), but I think that's acceptable. Then we
should add a "--no-slot" to make it revert to previous behaviour.Does that seem reasonable? Or would people prefer it to default to off?
So here's a patch that does this, for discussion. It implements the
following behavior for -X:
* When used with <10.0 servers, behave just like before.
* When -S <name> is specified, behave just like before (use an existing
replication slot, fail if it does not exist)
* When used on 10.0 with no -S, create and use a temporary replication slot
while running, with name pg_basebackup_<pid>.
* When used with 10.0 with no -S but --no-slot specified, run without a
slot like before.
--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/
Attachments:
pg_basebackup_temp_slot.patchtext/x-patch; charset=US-ASCII; name=pg_basebackup_temp_slot.patchDownload+92-7
but -X stream is, then we use a temporary slot always.
This seems even more useful with -X fetch to me, as with fetch we are sure
we
are not immediately receiving the WAL. So, I'd propose to use it whenever -X
is specified, regardless of which method is specified.
(I still think we should change the default here, but that's a different
topic)
+1
Does that seem reasonable? Or would people prefer it to default to off?
Yes. Users are specifically requesting wal using -X, so making sure that
actually happens by default would work.
On Thu, Dec 15, 2016 at 7:28 PM, Magnus Hagander <magnus@hagander.net> wrote:
So here's a patch that does this, for discussion. It implements the
following behavior for -X:* When used with <10.0 servers, behave just like before.
* When -S <name> is specified, behave just like before (use an existing
replication slot, fail if it does not exist)
* When used on 10.0 with no -S, create and use a temporary replication slot
while running, with name pg_basebackup_<pid>.
* When used with 10.0 with no -S but --no-slot specified, run without a slot
like before.
There are users using -X stream without a slot now because they don't want to
cause WAL retention in pg_xlog and are ready for retries in taking the base
backup... I am wondering if it is a good idea to change the default behavior
and not introduce a new option like --temporary-slot, or
--slot-mode=temp|persistent|none with none being the default instead. There
are users caring about pg_xlog not filling up if pg_basebackup hangs on write
for example. And it may be a good idea to be able to use --slot-mode=temp
with -S/--slot actually to allow users to monitor it. If no slot name is given
with --slot generating one looks fine to me.
Regarding the patch, it would be good to add a set of TAP tests to cover the
new features. I am not seeing anything badly wrong with it at first sight by
the way.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Dec 16, 2016 07:27, "Michael Paquier" <michael.paquier@gmail.com> wrote:
On Thu, Dec 15, 2016 at 7:28 PM, Magnus Hagander <magnus@hagander.net>
wrote:
So here's a patch that does this, for discussion. It implements the
following behavior for -X:* When used with <10.0 servers, behave just like before.
* When -S <name> is specified, behave just like before (use an existing
replication slot, fail if it does not exist)
* When used on 10.0 with no -S, create and use a temporary replication
slot
while running, with name pg_basebackup_<pid>.
* When used with 10.0 with no -S but --no-slot specified, run without a
slot
like before.
There are users using -X stream without a slot now because they don't want
to
cause WAL retention in pg_xlog and are ready for retries in taking the base
backup... I am wondering if it is a good idea to change the default behavior
and not introduce a new option like --temporary-slot, or
--slot-mode=temp|persistent|none with none being the default instead. There
are users caring about pg_xlog not filling up if pg_basebackup hangs on
write
for example. And it may be a good idea to be able to use --slot-mode=temp
with -S/--slot actually to allow users to monitor it. If no slot name is
given
with --slot generating one looks fine to me.
I really think the default should be "what most people want", and not
"whatever is compatible with a mode that was necessary because we lacked
infrastructure".
Yes there are some people who are fine with their backups failing. I'm
willing to say there are *many* more who benefit from an automatic slot.
Regarding the patch, it would be good to add a set of TAP tests to cover the
new features. I am not seeing anything badly wrong with it at first sight by
the way
I don't really know how to write a good test for it. I mean, we could run
it with the parameter, but how to we make it actually verify the slot? Make
a really big database to make it guaranteed to be slow enough that we can
notice it in a different session?
/Magnus
At 2016-12-16 07:32:24 +0100, magnus@hagander.net wrote:
I really think the default should be "what most people want", and not
"whatever is compatible with a mode that was necessary because we
lacked infrastructure".
Very much agreed.
-- Abhijit
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Dec 16, 2016 at 3:32 PM, Magnus Hagander <magnus@hagander.net> wrote:
I don't really know how to write a good test for it. I mean, we could run it
with the parameter, but how to we make it actually verify the slot? Make a
really big database to make it guaranteed to be slow enough that we can
notice it in a different session?
No, not that. I was meaning tests to trigger the error code paths with
the compatibility switches you are adding.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 16/12/16 07:32, Magnus Hagander wrote:
On Dec 16, 2016 07:27, "Michael Paquier" <michael.paquier@gmail.com
<mailto:michael.paquier@gmail.com>> wrote:On Thu, Dec 15, 2016 at 7:28 PM, Magnus Hagander
<magnus@hagander.net <mailto:magnus@hagander.net>> wrote:So here's a patch that does this, for discussion. It implements the
following behavior for -X:* When used with <10.0 servers, behave just like before.
* When -S <name> is specified, behave just like before (use anexisting
replication slot, fail if it does not exist)
* When used on 10.0 with no -S, create and use a temporaryreplication slot
while running, with name pg_basebackup_<pid>.
* When used with 10.0 with no -S but --no-slot specified, runwithout a slot
like before.
There are users using -X stream without a slot now because they
don't want to
cause WAL retention in pg_xlog and are ready for retries in taking
the base
backup... I am wondering if it is a good idea to change the default
behavior
and not introduce a new option like --temporary-slot, or
--slot-mode=temp|persistent|none with none being the default
instead. There
are users caring about pg_xlog not filling up if pg_basebackup hangs
on write
for example. And it may be a good idea to be able to use
--slot-mode=temp
with -S/--slot actually to allow users to monitor it. If no slot
name is given
with --slot generating one looks fine to me.I really think the default should be "what most people want", and not
"whatever is compatible with a mode that was necessary because we lacked
infrastructure".
+1 (btw glad you made the patch)
Yes there are some people who are fine with their backups failing. I'm
willing to say there are *many* more who benefit from an automatic slot.
I think the issue with slots taking over disk space should be fixed by
making it possible to cut off slots when necessary (ie some GUC that
would say how much behind slot can be at maximum, with something like 0
being unlimited like it's now) instead of trying to work around that in
every client.
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 12/15/16 4:04 AM, Magnus Hagander wrote:
This obviously requires the server to be configured with enough slots (I
still think we should change the default here, but that's a different
topic), but I think that's acceptable.
We could implicitly reserve one replication slot per walsender. And
then max_replication_slots (possibly renamed) would just be additional
slots beyond that.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Dec 16, 2016 at 7:40 AM, Michael Paquier <michael.paquier@gmail.com>
wrote:
On Fri, Dec 16, 2016 at 3:32 PM, Magnus Hagander <magnus@hagander.net>
wrote:I don't really know how to write a good test for it. I mean, we could
run it
with the parameter, but how to we make it actually verify the slot? Make
a
really big database to make it guaranteed to be slow enough that we can
notice it in a different session?No, not that. I was meaning tests to trigger the error code paths with
the compatibility switches you are adding.
There is only one switch - --no-slot. I guess you mean rune one backup with
that one? Sure, that's easy enough to do. I'm not sure how much it really
tests, but let's do it :)
You mean something like this, right?
--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/
Attachments:
pg_basebackup_temp_slot2.patchtext/x-patch; charset=US-ASCII; name=pg_basebackup_temp_slot2.patchDownload+96-8
On Fri, Dec 16, 2016 at 12:41 PM, Petr Jelinek <petr.jelinek@2ndquadrant.com
wrote:
On 16/12/16 07:32, Magnus Hagander wrote:
On Dec 16, 2016 07:27, "Michael Paquier" <michael.paquier@gmail.com
<mailto:michael.paquier@gmail.com>> wrote:On Thu, Dec 15, 2016 at 7:28 PM, Magnus Hagander
<magnus@hagander.net <mailto:magnus@hagander.net>> wrote:So here's a patch that does this, for discussion. It implements the
following behavior for -X:* When used with <10.0 servers, behave just like before.
* When -S <name> is specified, behave just like before (use anexisting
replication slot, fail if it does not exist)
* When used on 10.0 with no -S, create and use a temporaryreplication slot
while running, with name pg_basebackup_<pid>.
* When used with 10.0 with no -S but --no-slot specified, runwithout a slot
like before.
There are users using -X stream without a slot now because they
don't want to
cause WAL retention in pg_xlog and are ready for retries in taking
the base
backup... I am wondering if it is a good idea to change the default
behavior
and not introduce a new option like --temporary-slot, or
--slot-mode=temp|persistent|none with none being the default
instead. There
are users caring about pg_xlog not filling up if pg_basebackup hangs
on write
for example. And it may be a good idea to be able to use
--slot-mode=temp
with -S/--slot actually to allow users to monitor it. If no slot
name is given
with --slot generating one looks fine to me.I really think the default should be "what most people want", and not
"whatever is compatible with a mode that was necessary because we lacked
infrastructure".+1 (btw glad you made the patch)
I'm glad you made the temp slots, so I could abandon that branch :)
Yes there are some people who are fine with their backups failing. I'm
willing to say there are *many* more who benefit from an automatic slot.I think the issue with slots taking over disk space should be fixed by
making it possible to cut off slots when necessary (ie some GUC that
would say how much behind slot can be at maximum, with something like 0
being unlimited like it's now) instead of trying to work around that in
every client.
Agreed, that seems like a better solution.
--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/
On Sat, Dec 17, 2016 at 3:54 PM, Peter Eisentraut <
peter.eisentraut@2ndquadrant.com> wrote:
On 12/15/16 4:04 AM, Magnus Hagander wrote:
This obviously requires the server to be configured with enough slots (I
still think we should change the default here, but that's a different
topic), but I think that's acceptable.We could implicitly reserve one replication slot per walsender. And
then max_replication_slots (possibly renamed) would just be additional
slots beyond that.
That makes a lot of sense now that we have temporary replication slots, I
agree. And then max_replication_slots could be something like
persistent_replication_slots?
--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/
On 12/17/16 9:55 AM, Magnus Hagander wrote:
That makes a lot of sense now that we have temporary replication slots,
I agree. And then max_replication_slots could be something like
persistent_replication_slots?
I was thinking was more like that each walsender gets one fixed slot,
which can be temporary or persistent.
Or maybe default max_replication_slots to something like -1, which means
equal to max_wal_senders.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
This patch looks reasonable to me.
Attached is a top-up patch with a few small fixups.
I suggest to wait for the resolution of the "Replication/backup
defaults" thread. I would not want to be in a situation where users who
have not been trained to use replication slots now have yet another
restart-only parameter to set before they can take a backup.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachments:
pg_basebackup_temp_slot2-fixups.patchtext/x-patch; name=pg_basebackup_temp_slot2-fixups.patchDownload+14-9
On Wed, Jan 4, 2017 at 3:32 PM, Peter Eisentraut <
peter.eisentraut@2ndquadrant.com> wrote:
This patch looks reasonable to me.
Attached is a top-up patch with a few small fixups.
I suggest to wait for the resolution of the "Replication/backup
defaults" thread. I would not want to be in a situation where users who
have not been trained to use replication slots now have yet another
restart-only parameter to set before they can take a backup.
Thanks for the review and the top-up patch. I've applied it and pushed.
I just realized I missed crediting you with the review and the top-up in
the commit message. My apologies for this.
--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/
On Mon, Jan 16, 2017 at 10:00 PM, Magnus Hagander <magnus@hagander.net> wrote:
On Wed, Jan 4, 2017 at 3:32 PM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:This patch looks reasonable to me.
Attached is a top-up patch with a few small fixups.
I suggest to wait for the resolution of the "Replication/backup
defaults" thread. I would not want to be in a situation where users who
have not been trained to use replication slots now have yet another
restart-only parameter to set before they can take a backup.Thanks for the review and the top-up patch. I've applied it and pushed.
- if (replication_slot && includewal != STREAM_WAL)
+ if ((replication_slot || no_slot) && includewal != STREAM_WAL)
Why do we need to check "no_slot" here? Since no_slot=true means that
no temporary replication slot is specified, it's fine even with both -X none
and fetch.
Regards,
--
Fujii Masao
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Jan 16, 2017 at 4:33 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
On Mon, Jan 16, 2017 at 10:00 PM, Magnus Hagander <magnus@hagander.net>
wrote:On Wed, Jan 4, 2017 at 3:32 PM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:This patch looks reasonable to me.
Attached is a top-up patch with a few small fixups.
I suggest to wait for the resolution of the "Replication/backup
defaults" thread. I would not want to be in a situation where users who
have not been trained to use replication slots now have yet another
restart-only parameter to set before they can take a backup.Thanks for the review and the top-up patch. I've applied it and pushed.
- if (replication_slot && includewal != STREAM_WAL) + if ((replication_slot || no_slot) && includewal != STREAM_WAL)Why do we need to check "no_slot" here? Since no_slot=true means that
no temporary replication slot is specified, it's fine even with both -X
none
and fetch.
Um yeah, that looks like a bad merge actually. Will fix.
--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/