WIP: Failover Slots
Failover Slots
If Logical Decoding is taking place from a master, then if we failover and
create a new master we would like to continue the Logical Decoding from the
new master. To facilitate this, we introduce the concept of “Failover
Slots”, which are slots that generate WAL, so that any downstream standby
can continue to produce logical decoding output from that named plugin.
In the current patch, any slot defined on a master will generate WAL,
leading to a pending-slot being present on all standby nodes. When a
standby is promoted, the slot becomes usable and will have the properties
as of the last fsync on the master.
Failover slots are not accessible until promotion of the standby. Logical
slots from a standby looks like a significant step beyond this and will not
be part of this patch.
Internal design is fairly clear, using a new rmgr. No real problems emerged
so far.
Patch is WIP, posted for comment, so you can see where I'm going.
I'm expecting to have a working version including timeline following for
9.6.
--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachments:
failover_slots.v2.patchapplication/octet-stream; name=failover_slots.v2.patchDownload+502-89
On 2 January 2016 at 08:50, Simon Riggs <simon@2ndquadrant.com> wrote:
Patch is WIP, posted for comment, so you can see where I'm going.
I've applied this on a branch of master and posted it, with some comment
editorialization, as
https://github.com/2ndQuadrant/postgres/tree/dev/failover-slots . The tree
will be subject to rebasing.
At present the patch does not appear to work. No slots are visible in the
replica's pg_replication_slots before or after promotion and no slot
information is written to the xlog according to pg_xlogdump:
$ ~/pg/96/bin/pg_xlogdump -r ReplicationSlot 000000010000000000000001
000000010000000000000003
pg_xlogdump: FATAL: error in WAL record at 0/301DDE0: invalid record
length at 0/301DE10
so it's very much a WIP. I've read through it and think the idea makes
sense, it's just still missing some pieces...
So. Initial review comments.
This looks pretty incomplete:
+ if (found_duplicate)
+ {
+ LWLockRelease(ReplicationSlotAllocationLock);
+
+ /*
+ * Do something nasty to the sinful duplicants, but
+ * take with locking.
+ */
+
+ LWLockAcquire(ReplicationSlotAllocationLock, LW_SHARED);
+ }
... and I'm not sure I understand how the possibility of a duplicate slot
can arise in the first place, since you cannot create a slot on a read
replica. This seems unnecessary.
I'm not sure I understand why, in ReplicationSlotRedoCreate, it's
especially desirable to prevent blocking iteration over
pg_replication_slots or acquiring a slot. When redoing a slot create isn't
that exactly what we should do? This looks like it's been copied & pasted
verbatim from CheckPointReplicationSlots . There it makes sense, since the
slots may be in active use. During redo it seems reasonable to just
acquire ReplicationSlotControlLock.
I'm not a fan of the ReplicationSlotInWAL typedef
for ReplicationSlotPersistentData. Especially as it's only used in the redo
functions but *not* when xlog is written out. I'd like to just replace it.
Purely for convenient testing there's a shell script in the tree -
https://github.com/2ndQuadrant/postgres/blob/dev/failover-slots/failover-slot-test.sh
.
Assuming a patched 9.6 in $HOME/pg/96 it'll do a run-through of the patch.
I'll attempt to convert this to use the new test infrastructure, but needed
something ASAP for development. Posted in case it's useful to others.
Now it's time to look into why this doesn't seem to be generating any xlog
when by rights it seems like it should. Also into at what point exactly we
purge existing slots on start / promotion of a read-replica.
TL;DR: this doesn't work yet, working on it.
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On 20 January 2016 at 21:02, Craig Ringer <craig@2ndquadrant.com> wrote:
TL;DR: this doesn't work yet, working on it.
Nothing is logged on slot creation because ReplicationSlot->data.failover
is never true. Once that's fixed by - for now - making all slots failover
slots, there's a crash in XLogInsert because of the use of reserved bits in
the XLogInsert info argument. Fix pushed.
I also noticed that slot drops seem are being logged whether or not the
slot is a failover slot. Pushed a fix for that.
The WAL writing is now working. I've made improvements to the rmgr xlogdump
support to make it clearer what's written.
Slots are still not visible on the replica so there's work to do tracing
redo, promotion, slot handling after starting from a basebackup, etc. The
patch is still very much WIP.
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Fri, Jan 1, 2016 at 7:50 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
Failover Slots
In the current patch, any slot defined on a master will generate WAL,
leading to a pending-slot being present on all standby nodes. When a standby
is promoted, the slot becomes usable and will have the properties as of the
last fsync on the master.
No objection to the concept, but I think the new behavior needs to be
optional. I am guessing that you are thinking along similar lines,
but it's not explicitly called out here.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 21 January 2016 at 16:31, Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Jan 1, 2016 at 7:50 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
Failover Slots
In the current patch, any slot defined on a master will generate WAL,
leading to a pending-slot being present on all standby nodes. When astandby
is promoted, the slot becomes usable and will have the properties as of
the
last fsync on the master.
No objection to the concept, but I think the new behavior needs to be
optional. I am guessing that you are thinking along similar lines,
but it's not explicitly called out here.
I was unsure myself; but making them optional seems reasonable.
--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 22 January 2016 at 00:31, Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Jan 1, 2016 at 7:50 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
Failover Slots
In the current patch, any slot defined on a master will generate WAL,
leading to a pending-slot being present on all standby nodes. When astandby
is promoted, the slot becomes usable and will have the properties as of
the
last fsync on the master.
No objection to the concept, but I think the new behavior needs to be
optional. I am guessing that you are thinking along similar lines,
but it's not explicitly called out here.
Yeah, I think that's the idea too. For one thing we'll want to allow
non-failover slots to continue to be usable from a streaming replica, but
we must ERROR if anyone attempts to attach to and replay from a failover
slot via a replica since we can't write WAL there. Both kinds are needed.
There's a 'failover' bool member in the slot persistent data for that
reason. It's not (yet) exposed via the UI.
I presume we'll want to:
* add a new default-false argument is_failover_slot or similar to
pg_create_logical_replication_slot and pg_create_physical_replication_slot
* Add a new optional flag argument FAILOVER to CREATE_REPLICATION_SLOT in
both its LOGICAL and PHYSICAL forms.
... and will be adding that to this patch, barring syntax objections etc.
It's also going to be necessary to handle what happens when a new failover
slot (physical or logical) is created on the master where it conflicts with
the name of a non-failover physical slot that was created on the replica.
In this case I am inclined to terminate any walsender backend for the
replica's slot with a conflict with recovery, remove its slot and replace
it with a failover slot. The failover slot does not permit replay while in
recovery so if the booted-off client reconnects it'll get an ERROR saying
it can't replay from a failover slot. It should be pretty clear to the
admin what's happening between the conflict with recovery and the failover
slot error. There could still be an issue if the client persistently keeps
retrying and successfully reconnects after replica promotion but I don't
see that much can be done about that. The documentation will need to
address the need to try to avoid name conflicts between slots created on
replicas and failover slots on the master.
I'll be working on those docs, on the name conflict handling, and on the
syntax during my coming flight. Toddler permitting.
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Fri, Jan 22, 2016 at 2:46 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
It's also going to be necessary to handle what happens when a new failover
slot (physical or logical) is created on the master where it conflicts with
the name of a non-failover physical slot that was created on the replica. In
this case I am inclined to terminate any walsender backend for the replica's
slot with a conflict with recovery, remove its slot and replace it with a
failover slot. The failover slot does not permit replay while in recovery so
if the booted-off client reconnects it'll get an ERROR saying it can't
replay from a failover slot. It should be pretty clear to the admin what's
happening between the conflict with recovery and the failover slot error.
There could still be an issue if the client persistently keeps retrying and
successfully reconnects after replica promotion but I don't see that much
can be done about that. The documentation will need to address the need to
try to avoid name conflicts between slots created on replicas and failover
slots on the master.
That's not going to win any design-of-the-year awards, but maybe it's
acceptable. It occurred to me to wonder if it might be better to
propagate logical slots partially or entirely outside the WAL stream,
because with this design you will end up with the logical slots on
every replica, including cascaded replicas, and I'm not sure that's
what we want. Then again, I'm also not sure it isn't what we want.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2016-01-22 11:40:24 -0500, Robert Haas wrote:
It occurred to me to wonder if it might be better to
propagate logical slots partially or entirely outside the WAL stream,
because with this design you will end up with the logical slots on
every replica, including cascaded replicas, and I'm not sure that's
what we want. Then again, I'm also not sure it isn't what we want.
Not propagating them through the WAL also has the rather large advantage
of not barring the way to using such slots on standbys.
I think it's technically quite possible to maintain the required
resources on multiple nodes. The question is how would you configure on
which nodes the resources need to be maintained? I can't come up with a
satisfying scheme...
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 23 January 2016 at 00:40, Robert Haas <robertmhaas@gmail.com> wrote:
It occurred to me to wonder if it might be better to
propagate logical slots partially or entirely outside the WAL stream,
because with this design you will end up with the logical slots on
every replica, including cascaded replicas, and I'm not sure that's
what we want. Then again, I'm also not sure it isn't what we want.
I think it's the most sensible default if there's only going to be one
choice to start with. It's consistent with what we do elsewhere with
replicas so there won't be any surprises. Failover slots are a fairly
simple feature that IMO just makes slots behave more like you might expect
them to do in the first place.
I'm pretty hesitant to start making cascaded replicas different to each
other just for slots. There are lots of other things where variation
between replicas would be lovely - the most obvious of which is omitting
some databases from some replicas. Right now we have a single data stream,
WAL, that goes to every replica. If we're going to change that I'd really
like to address it in a way that'll meet future needs like selective
physical replication too. I also think we'd want to deal with the problem
of identifying and labeling nodes to do a good job of selective replication
of slots.
I'd like to get failover slots in place for 9.6 since the're fairly
self-contained and meet an immediate need: allowing replication using slots
(physical or logical) to follow a failover event.
After that I want to add logical decoding support for slot
creation/drop/advance. So a logical decoding plugin can mirror logical slot
state on another node. It wouldn't be useful for physical slots, of course,
but it'd allow failover between logical replicas where we can do cool
things like replicate just a subset of DBs/tables/etc. (A mapping of
upstream to downstream LSNs would be required, but hopefully not too hard).
That's post-9.6 work and separate to failover slots, though dependent on
them for the ability to decode slot changes from WAL.
Down the track I'd very much like to be less dependent on forcing
everything though WAL with a single timeline. I agree with Andres that
being able to use failover slots on replicas would be good, and that it's
not possible when we use WAL to track the info. I just think it's a massive
increase in complexity and scope and I'd really like to be able to follow
failover the simple way first, then enhance it.
Nothing stops us changing failover slots in 9.7+ from using WAL to some
other mechanism that we design carefully at that time. We can extend the
walsender command set for physical rep at a major version bump with no
major issues, including adding to the streaming rep protocol. There's lots
to figure out though, including how to maintain slot changes in a strict
ordering with WAL, how to store and read the info, etc.
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On 23 January 2016 at 00:51, Andres Freund <andres@anarazel.de> wrote:
Not propagating them through the WAL also has the rather large advantage
of not barring the way to using such slots on standbys.
Yeah. So you could have a read-replica that has a slot and it has child
nodes you can fail over to, but you don't have to have the slot on the
master.
I don't personally find that to be a particularly compelling thing that
says "we must have this" ... but maybe I'm not seeing the full
significance/advantages.
I think it's technically quite possible to maintain the required
resources on multiple nodes. The question is how would you configure on
which nodes the resources need to be maintained? I can't come up with a
satisfying scheme...
That's part of it. Also the mechanism by which we actually replicate them -
protocol additions for the walsender protocol, how to reliably send
something that doesn't have an LSN, etc. It might be fairly simple, I
haven't thought about it deeply, but I'd rather not go there until the
basics are in place.
BTW, I'm keeping a working tree at
https://github.com/2ndQuadrant/postgres/tree/dev/failover-slots . Subject
to rebasing, history not clean. It has a test script in it that'll go away
before patch posting.
Current state needs work to ensure that on-disk and in-memory
representations are kept in sync, but is getting there.
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Hi all
Here's v2 of the failover slots patch. It replicates a logical slot to a
physical streaming replica downstream, keeping the slots in sync. After the
downstream is promoted a client can replay from the logical slot.
UI to allow creation of non-failover slots is pending.
There's more testing to do to cover all the corners: drop slots, drop and
re-create, name conflicts between downstream !failover slots and upstream
failover slots, etc.
There's also a known bug where WAL isn't correctly retained for a slot
where that slot was created before a basebackup which I'll fix in a
revision shortly.
I'm interested in ideas on how to better test this.
--
Craig Ringer http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
0001-Implement-failover-slots.patchtext/x-patch; charset=US-ASCII; name=0001-Implement-failover-slots.patchDownload+610-93
Hi all
Here's v3 of failover slots.
It doesn't add the UI yet, but it's now functionally complete except for
timeline following for logical slots, and I have a plan for that.
Attachments:
failover-slots-v3.patchtext/x-patch; charset=US-ASCII; name=failover-slots-v3.patchDownload+624-94
On Fri, Jan 22, 2016 at 11:51 AM, Andres Freund <andres@anarazel.de> wrote:
I think it's technically quite possible to maintain the required
resources on multiple nodes. The question is how would you configure on
which nodes the resources need to be maintained? I can't come up with a
satisfying scheme...
For this to work, I feel like the nodes need names, and a directory
that tells them how to reach each other.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi all
I've attached the completed failover slots patch.
There were quite a few details to address so this is split into a three
patch series. I've also attached the test script I've been using.
We need failover slots to allow a logical decoding client to follow
physical failover, allowing HA "above" a logical decoding client.
For reviewers there's some additional explanation for why a few of the
changes are made the way they are on the wiki:
https://wiki.postgresql.org/wiki/Failover_slots . See "Patch notes".
The tagged tree for this submission is at
https://github.com/2ndQuadrant/postgres/tree/failover-slots-v4 .
I intend to backport this to 9.4 and 9.5 (though of course not for mainline
submission!).
Patch 1: add support for timeline following in logical decoding.
This is necessary to make failover slots useful. Otherwise decoding from a
slot will fail after failover because the server tries to read WAL with
ThisTimeLineID but the needed archives are on a historical timeline. While
the smallest part of the patch series this was the most complex.
Patch 2: Failover slots core
* Add WAL logging and redo for failover slots
* copy pg_replslot/ in pg_basebackup
* drop non-failover slots on archive recovery startup
* expand the amount of WAL copied by pg_basebackup so failover slots are
usable after restore
* if a failover slot is created on the primary with the same name as an
existing non-failover slot on replica(s), kill any client connected to the
replica's slot and drop the replica's slot during redo
* Adds a new backup label entry MIN FAILOVER SLOT LSN to generated
backup label files if failover slots are present. This allows utilities
like
pgbarman, omnipitr, etc to know to retain more WAL to preserve the
function of failover slots.
* Return a lower LSN from pg_start_backup() and BASE_BACKUP
if needed to ensure that tools copy the extra WAL required by failover
slots during a base backup.
Relies on timeline following for logical decoding slots to be useful.
Does not add UI (function arguments, walsender syntax, changes to views,
etc) to expose failover slots to users. They can only be used by extensions
that call ReplicationSlotCreate directly.
Patch 3: User interface for failover slots
The 3rd patch adds the UI to expose failover slots to the user:
- A 'failover' boolean argument, default false, to
pg_create_physical_replication_slot(...) and
pg_create_logical_replication_slot(...)
- A new FAILOVER option to PG_CREATE_REPLICATION_SLOT on the walsender
protocol
- A new 'failover' boolean column in pg_catalog.pg_replication_slots
- SGML documentation changes for the new options and for failover slots in
general
Limited tests are also added in this patch since not much of this is really
testable by pg_regress. I've attached my local test script in case it's of
interest/use to anyone.
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
0001-Allow-logical-slots-to-follow-timeline-switches.patchtext/x-patch; charset=US-ASCII; name=0001-Allow-logical-slots-to-follow-timeline-switches.patchDownload+295-36
0002-Allow-replication-slots-to-follow-failover.patchtext/x-patch; charset=US-ASCII; name=0002-Allow-replication-slots-to-follow-failover.patchDownload+620-132
0003-Add-the-UI-and-documentation-for-failover-slots.patchtext/x-patch; charset=US-ASCII; name=0003-Add-the-UI-and-documentation-for-failover-slots.patchDownload+237-50
Hi,
here is my code level review:
0001:
This one looks ok except for broken indentation in the new notes in
xlogreader.c and .h. It's maybe slightly overdocumented but given the
complicated way the timeline reading works it's probably warranted.
0002:
+ /*
+ * No way to wait for the process since it's not a child
+ * of ours and there's no latch to set, so poll.
+ *
+ * We're checking this without any locks held, but
+ * we'll recheck when we attempt to drop the slot.
+ */
+ while (slot->in_use && slot->active_pid == active_pid
+ && max_sleep_micros > 0)
+ {
+ usleep(micros_per_sleep);
+ max_sleep_micros -= micros_per_sleep;
+ }
Not sure I buy this, what about postmaster crashes and fast shutdown
requests etc. Also you do usleep for 10s which is quite long. I'd
prefer the classic wait for latch with timeout and pg crash check
here. And even if we go with usleep, then it should be 1s not 10s and
pg_usleep instead of usleep.
0003:
There is a lot of documentation improvements here that are not related
to failover slots or timeline following, it might be good idea to
split those into separate patch as they are separately useful IMHO.
Other than that it looks good to me.
About other things discussed in this thread. Yes it makes sense in
certain situations to handle this outside of WAL and that does require
notions of nodes, etc. That being said, the timeline following is
needed even if this is handled outside of WAL. And once timeline
following is in, the slots can be handled by the replication solution
itself which is good. But I think the failover slots are still a good
thing to have - it provides HA solution for anything that uses slots,
and that includes physical replication as well. If the specific
logical replication solution wants to handle it for some reason itself
outside of WAL, it can create non-failover slot so in my opinion we
ideally need both types of slots (and that's exactly what this patch
gives us).
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 16 February 2016 at 01:23, Petr Jelinek <petr@2ndquadrant.com> wrote:
Hi,
here is my code level review:
0001:
This one looks ok except for broken indentation in the new notes in
xlogreader.c and .h.
I don't see the broken indentation. Not sure what you mean.
+ while (slot->in_use && slot->active_pid == active_pid + && max_sleep_micros > 0) + { + usleep(micros_per_sleep); + max_sleep_micros -= micros_per_sleep; + }Not sure I buy this, what about postmaster crashes and fast shutdown
requests etc.
Yeah. I was thinking - incorrectly - that I couldn't use a latch during
recovery.
Revision attached. There was a file missing from the patch too.
0003:
There is a lot of documentation improvements here that are not related
to failover slots or timeline following, it might be good idea to
split those into separate patch as they are separately useful IMHO.
Yeah, probably worth doing. We'll see how this patch goes.
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
0001-Allow-logical-slots-to-follow-timeline-switches.patchtext/x-patch; charset=US-ASCII; name=0001-Allow-logical-slots-to-follow-timeline-switches.patchDownload+295-36
0002-Allow-replication-slots-to-follow-failover.patchtext/x-patch; charset=US-ASCII; name=0002-Allow-replication-slots-to-follow-failover.patchDownload+798-132
0003-Add-the-UI-and-documentation-for-failover-slots.patchtext/x-patch; charset=US-ASCII; name=0003-Add-the-UI-and-documentation-for-failover-slots.patchDownload+236-49
Hi,
On 16 Feb 2016, at 09:11, Craig Ringer <craig@2ndquadrant.com> wrote:
Revision attached. There was a file missing from the patch too.
All attached patches apply normally. I only took a look at first 2, but also tried to run the Patroni with the modified version to check whether the basic replication works.
What it’s doing is calling pg_basebackup first to initialize the replica, and that actually failed with:
_basebackup: unexpected termination of replication stream: ERROR: requested WAL segment 000000010000000000000000 has already been removed
The segment name definitely looks bogus to me.
The actual command causing the failure was an attempt to clone the replica using pg_basebackup, turning on xlog streaming:
pg_basebackup --pgdata data/postgres1 --xlog-method=stream --dbname="host=localhost port=5432 user=replicator”
I checked the same command against the git master without the patches applied and could not reproduce this problem there.
On the code level, I have no comments on 0001, it’s well documented and I have no questions about the approach, although I might be not too knowledgable to judge the specifics of the implementation.
On the 0002, there are a few rough edges:
slots.c:294
elog(LOG, "persistency is %i", (int)slot->data.persistency);
Should be changed to DEBUG?
slot.c:468
Why did you drop “void" as a parameter type of ReplicationSlotDropAcquired?
walsender.c: 1509 at PhysicalConfirmReceivedLocation
I’ve noticed a comment stating that we don’t need to call ReplicationSlotSave(), but that pre-dated the WAL-logging of replication slot changes. Don’t we need to call it now, the same way it’s done for the logical slots in logical.c:at LogicalConfirmReceivedLocation?
Kind regards,
--
Oleksii
On 22 February 2016 at 23:39, Oleksii Kliukin <alexk@hintbits.com> wrote:
What it’s doing is calling pg_basebackup first to initialize the replica,
and that actually failed with:_basebackup: unexpected termination of replication stream: ERROR:
requested WAL segment 000000010000000000000000 has already been removedThe segment name definitely looks bogus to me.
The actual command causing the failure was an attempt to clone the replica
using pg_basebackup, turning on xlog streaming:pg_basebackup --pgdata data/postgres1 --xlog-method=stream
--dbname="host=localhost port=5432 user=replicator”I checked the same command against the git master without the patches
applied and could not reproduce this problem there.
That's a bug. In testing whether we need to return a lower LSN for minimum
WAL for BASE_BACKUP it failed to properly test for InvalidXLogRecPtr . Good
catch.
On the code level, I have no comments on 0001, it’s well documented and I
have no questions about the approach, although I might be not too
knowledgable to judge the specifics of the implementation.
The first patch is the most important IMO, and the one I think needs the
most thought since it's ... well, timelines aren't simple.
slots.c:294
elog(LOG, "persistency is %i", (int)slot->data.persistency);Should be changed to DEBUG?
That's an escapee log statement I thought I'd already rebased out. Well
spotted, fixed.
slot.c:468
Why did you drop “void" as a parameter type of ReplicationSlotDropAcquired?
That's an editing error on my part that I'll reverse. Since the prototype
declares (void) it doesn't matter, but it's a pointless change. Fixed.
walsender.c: 1509 at PhysicalConfirmReceivedLocation
I’ve noticed a comment stating that we don’t need to call
ReplicationSlotSave(), but that pre-dated the WAL-logging of replication
slot changes. Don’t we need to call it now, the same way it’s done for
the logical slots in logical.c:at LogicalConfirmReceivedLocation?
No, it's safe here. All we must ensure is that a slot is advanced on the
replica when it's advanced on the master. For physical slots even that's a
weak requirement, we just have to stop them from falling *too* far behind
and causing too much xlog retention. For logical slots we should ensure we
advance the slot on the replica before any vacuum activity that might
remove catalog tuples still needed by that slot gets replayed. Basically
the difference is that logical slots keep track of the catalog xmin too, so
they have (slightly) stricter requirements.
This patch doesn't touch either of those functions except for
renaming ReplicationSlotsComputeRequiredLSN
to ReplicationSlotsUpdateRequiredLSN . Which, by the way, I really don't
like doing, but I couldn't figure out a name to give the function that
computes-and-returns the required LSN that wouldn't be even more confusing
in the face of having a ReplicationSlotsComputeRequiredLSN function as
well. Ideas welcome.
Updated patch
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Updated patch
... attached
I've split it up a bit more too, so it's easier to tell what change is for
what and fixed the issues mentioned by Oleksii. I've also removed some
unrelated documentation changes.
Patch 0001, timeline switches for logical decoding, is unchanged since the
last post.
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
0003-Retain-extra-WAL-for-failover-slots-in-base-backups.patchtext/x-patch; charset=US-ASCII; name=0003-Retain-extra-WAL-for-failover-slots-in-base-backups.patchDownload+40-1
0004-Add-the-UI-and-for-failover-slots.patchtext/x-patch; charset=US-ASCII; name=0004-Add-the-UI-and-for-failover-slots.patchDownload+38-10
0005-Document-failover-slots.patchtext/x-patch; charset=US-ASCII; name=0005-Document-failover-slots.patchDownload+70-8
0006-Add-failover-to-pg_replication_slots.patchtext/x-patch; charset=US-ASCII; name=0006-Add-failover-to-pg_replication_slots.patchDownload+64-10
0007-not-for-inclusion-Test-script-for-failover-slots.patchtext/x-patch; charset=US-ASCII; name=0007-not-for-inclusion-Test-script-for-failover-slots.patchDownload+264-1
0001-Allow-logical-slots-to-follow-timeline-switches.patchtext/x-patch; charset=US-ASCII; name=0001-Allow-logical-slots-to-follow-timeline-switches.patchDownload+295-36
0002-Allow-replication-slots-to-follow-failover.patchtext/x-patch; charset=US-ASCII; name=0002-Allow-replication-slots-to-follow-failover.patchDownload+755-130
On 23 Feb 2016, at 11:30, Craig Ringer <craig@2ndquadrant.com> wrote:
Updated patch
... attached
I've split it up a bit more too, so it's easier to tell what change is for what and fixed the issues mentioned by Oleksii. I've also removed some unrelated documentation changes.
Patch 0001, timeline switches for logical decoding, is unchanged since the last post.
Thank you, I read the user-interface part now, looks good to me.
I found the following issue when shutting down a master with a connected replica that uses a physical failover slot:
2016-02-23 20:33:42.546 CET,,,54998,,56ccb3f3.d6d6,3,,2016-02-23 20:33:07 CET,,0,DEBUG,00000,"performing replication slot checkpoint",,,,,,,,,""
2016-02-23 20:33:42.594 CET,,,55002,,56ccb3f3.d6da,4,,2016-02-23 20:33:07 CET,,0,DEBUG,00000,"archived transaction log file ""000000010000000000000003""",,,,,,,,,""
2016-02-23 20:33:42.601 CET,,,54998,,56ccb3f3.d6d6,4,,2016-02-23 20:33:07 CET,,0,PANIC,XX000,"concurrent transaction log activity while database system is shutting down",,,,,,,,,""
2016-02-23 20:33:43.537 CET,,,54995,,56ccb3f3.d6d3,5,,2016-02-23 20:33:07 CET,,0,LOG,00000,"checkpointer process (PID 54998) was terminated by signal 6: Abort trap",,,,,,,,,""
2016-02-23 20:33:43.537 CET,,,54995,,56ccb3f3.d6d3,6,,2016-02-23 20:33:07 CET,,0,LOG,00000,"terminating any other active server processes",,,,,,,,,
Basically, the issue is that CreateCheckPoint calls CheckpointReplicationSlots, which currently produces WAL, and this violates the assumption at line xlog.c:8492
if (shutdown && checkPoint.redo != ProcLastRecPtr)
ereport(PANIC,
(errmsg("concurrent transaction log activity while database system is shutting down")));
There are a couple of incorrect comments
logical.c: 90
There's some things missing to allow this: I think it should be “There are some things missing to allow this:”
logical.c:93
"we need we would need”
slot.c:889
"and there's no latch to set, so poll” - clearly there is a latch used in the code below.
Also, slot.c:301 emits an error message for an attempt to create a failover slot on the replica after acquiring and releasing the locks and getting the shared memory slot, even though all the data to check for this condition is available right at the beginning of the function. Shouldn’t it avoid the extra work if it’s not needed?
--
Craig Ringer http://www.2ndQuadrant.com/ <http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Training & Services
<0001-Allow-logical-slots-to-follow-timeline-switches.patch><0002-Allow-replication-slots-to-follow-failover.patch><0003-Retain-extra-WAL-for-failover-slots-in-base-backups.patch><0004-Add-the-UI-and-for-failover-slots.patch><0005-Document-failover-slots.patch><0006-Add-failover-to-pg_replication_slots.patch><0007-not-for-inclusion-Test-script-for-failover-slots.patch>
Kind regards,
--
Oleksii