Synchronizing slots from primary to standby
I want to reactivate $subject. I took Petr Jelinek's patch from [0]/messages/by-id/3095349b-44d4-bf11-1b33-7eefb585d578@2ndquadrant.com,
rebased it, added a bit of testing. It basically works, but as
mentioned in [0]/messages/by-id/3095349b-44d4-bf11-1b33-7eefb585d578@2ndquadrant.com, there are various issues to work out.
The idea is that the standby runs a background worker to periodically
fetch replication slot information from the primary. On failover, a
logical subscriber would then ideally find up-to-date replication slots
on the new publisher and can just continue normally.
The previous thread didn't have a lot of discussion, but I have gathered
from off-line conversations that there is a wider agreement on this
approach. So the next steps would be to make it more robust and
configurable and documented. As I said, I added a small test case to
show that it works at all, but I think a lot more tests should be added.
I have also found that this breaks some seemingly unrelated tests in
the recovery test suite. I have disabled these here. I'm not sure if
the patch actually breaks anything or if these are just differences in
timing or implementation dependencies. This patch adds a LIST_SLOTS
replication command, but I think this could be replaced with just a
SELECT FROM pg_replication_slots query now. (This patch is originally
older than when you could run SELECT queries over the replication protocol.)
So, again, this isn't anywhere near ready, but there is already a lot
here to gather feedback about how it works, how it should work, how to
configure it, and how it fits into an overall replication and HA
architecture.
[0]: /messages/by-id/3095349b-44d4-bf11-1b33-7eefb585d578@2ndquadrant.com
/messages/by-id/3095349b-44d4-bf11-1b33-7eefb585d578@2ndquadrant.com
Attachments:
v1-0001-Synchronize-logical-replication-slots-from-primar.patchtext/plain; charset=UTF-8; name=v1-0001-Synchronize-logical-replication-slots-from-primar.patch; x-mac-creator=0; x-mac-type=0Download+819-73
On Sun, Oct 31, 2021 at 7:08 PM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:
I want to reactivate $subject. I took Petr Jelinek's patch from [0],
rebased it, added a bit of testing. It basically works, but as
mentioned in [0], there are various issues to work out.
Thank you for working on this feature!
The idea is that the standby runs a background worker to periodically
fetch replication slot information from the primary. On failover, a
logical subscriber would then ideally find up-to-date replication slots
on the new publisher and can just continue normally.I have also found that this breaks some seemingly unrelated tests in
the recovery test suite. I have disabled these here. I'm not sure if
the patch actually breaks anything or if these are just differences in
timing or implementation dependencies.
I haven’t looked at the patch deeply but regarding 007_sync_rep.pl,
the tests seem to fail since the tests rely on the order of the wal
sender array on the shared memory. Since a background worker for
synchronizing replication slots periodically connects to the walsender
on the primary and disconnects, it breaks the assumption of the order.
Regarding 010_logical_decoding_timelines.pl, I guess that the patch
breaks the test because the background worker for synchronizing slots
on the replica periodically advances the replica's slot. I think we
need to have a way to disable the slot synchronization or to specify
the slot name to sync with the primary. I'm not sure we already
discussed this topic but I think we need it at least for testing
purposes.
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
Hi all,
Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:
I want to reactivate $subject. I took Petr Jelinek's patch from [0],
rebased it, added a bit of testing. It basically works, but as
mentioned in [0], there are various issues to work out.
Thanks for working on that topic, I believe it's an important part of
Postgres' HA story.
The idea is that the standby runs a background worker to periodically fetch
replication slot information from the primary. On failover, a logical
subscriber would then ideally find up-to-date replication slots on the new
publisher and can just continue normally.
Is there a case to be made about doing the same thing for physical
replication slots too?
That's what pg_auto_failover [1]https://github.com/citusdata/pg_auto_failover does by default: it creates replication
slots on every node for every other node, in a way that a standby
Postgres instance now maintains a replication slot for the primary. This
ensures that after a promotion, the standby knows to retain any and all
WAL segments that the primary might need when rejoining, at pg_rewind
time.
The previous thread didn't have a lot of discussion, but I have gathered
from off-line conversations that there is a wider agreement on this
approach. So the next steps would be to make it more robust and
configurable and documented.
I suppose part of the configuration would then include taking care of
physical slots. Some people might want to turn that off and use the
Postgres 13+ ability to use the remote primary restore_command to fetch
missing WAL files, instead. Well, people who have setup an archiving
system, anyway.
As I said, I added a small test case to
show that it works at all, but I think a lot more tests should be added. I
have also found that this breaks some seemingly unrelated tests in the
recovery test suite. I have disabled these here. I'm not sure if the patch
actually breaks anything or if these are just differences in timing or
implementation dependencies. This patch adds a LIST_SLOTS replication
command, but I think this could be replaced with just a SELECT FROM
pg_replication_slots query now. (This patch is originally older than when
you could run SELECT queries over the replication protocol.)
Given the admitted state of the patch, I didn't focus on tests. I could
successfully apply the patch on-top of current master's branch, and
cleanly compile and `make check`.
Then I also updated pg_auto_failover to support Postgres 15devel [2]https://github.com/citusdata/pg_auto_failover/pull/838 so
that I could then `make NODES=3 cluster` there and play with the new
replication command:
$ psql -d "port=5501 replication=1" -c "LIST_SLOTS;"
psql:/Users/dim/.psqlrc:24: ERROR: XX000: cannot execute SQL commands in WAL sender for physical replication
LOCATION: exec_replication_command, walsender.c:1830
...
I'm not too sure about this idea of running SQL in a replication
protocol connection that you're mentioning, but I suppose that's just me
needing to brush up on the topic.
So, again, this isn't anywhere near ready, but there is already a lot here
to gather feedback about how it works, how it should work, how to configure
it, and how it fits into an overall replication and HA architecture.
Maybe the first question about configuration would be about selecting
which slots a standby should maintain from the primary. Is it all of the
slots that exists on both the nodes, or a sublist of that?
Is it possible to have a slot with the same name on a primary and a
standby node, in a way that the standby's slot would be a completely
separate entity from the primary's slot? If yes (I just don't know at
the moment), well then, should we continue to allow that?
Other aspects of the configuration might include a list of databases in
which to make the new background worker active, and the polling delay,
etc.
Also, do we want to even consider having the slot management on a
primary node depend on the ability to sync the advancing on one or more
standby nodes? I'm not sure to see that one as a good idea, but maybe we
want to kill it publically very early then ;-)
Regards,
--
dim
Author of “The Art of PostgreSQL”, see https://theartofpostgresql.com
[1]: https://github.com/citusdata/pg_auto_failover
[2]: https://github.com/citusdata/pg_auto_failover/pull/838
On Sun, Oct 31, 2021 at 3:38 PM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:
I want to reactivate $subject. I took Petr Jelinek's patch from [0],
rebased it, added a bit of testing. It basically works, but as
mentioned in [0], there are various issues to work out.The idea is that the standby runs a background worker to periodically
fetch replication slot information from the primary. On failover, a
logical subscriber would then ideally find up-to-date replication slots
on the new publisher and can just continue normally.The previous thread didn't have a lot of discussion, but I have gathered
from off-line conversations that there is a wider agreement on this
approach. So the next steps would be to make it more robust and
configurable and documented. As I said, I added a small test case to
show that it works at all, but I think a lot more tests should be added.
I have also found that this breaks some seemingly unrelated tests in
the recovery test suite. I have disabled these here. I'm not sure if
the patch actually breaks anything or if these are just differences in
timing or implementation dependencies. This patch adds a LIST_SLOTS
replication command, but I think this could be replaced with just a
SELECT FROM pg_replication_slots query now. (This patch is originally
older than when you could run SELECT queries over the replication protocol.)So, again, this isn't anywhere near ready, but there is already a lot
here to gather feedback about how it works, how it should work, how to
configure it, and how it fits into an overall replication and HA
architecture.[0]:
/messages/by-id/3095349b-44d4-bf11-1b33-7eefb585d578@2ndquadrant.com
Thanks for working on this patch. This feature will be useful as it
avoids manual intervention during the failover.
Here are some thoughts:
1) Instead of a new LIST_SLOT command, can't we use
READ_REPLICATION_SLOT (slight modifications needs to be done to make
it support logical replication slots and to get more information from
the subscriber).
2) How frequently the new bg worker is going to sync the slot info?
How can it ensure that the latest information exists say when the
subscriber is down/crashed before it picks up the latest slot
information?
3) Instead of the subscriber pulling the slot info, why can't the
publisher (via the walsender or a new bg worker maybe?) push the
latest slot info? I'm not sure we want to add more functionality to
the walsender, if yes, isn't it going to be much simpler?
4) IIUC, the proposal works only for logical replication slots but do
you also see the need for supporting some kind of synchronization of
physical replication slots as well? IMO, we need a better and
consistent way for both types of replication slots. If the walsender
can somehow push the slot info from the primary (for physical
replication slots)/publisher (for logical replication slots) to the
standby/subscribers, this will be a more consistent and simplistic
design. However, I'm not sure if this design is doable at all.
Regards,
Bharath Rupireddy.
3) Instead of the subscriber pulling the slot info, why can't the
publisher (via the walsender or a new bg worker maybe?) push the
latest slot info? I'm not sure we want to add more functionality to
the walsender, if yes, isn't it going to be much simpler?
Standby pulling the information or at least making a first attempt to
connect to the primary is a better design as primary doesn't need to spend
its cycles repeatedly connecting to an unreachable standby. In fact,
primary wouldn't even need to know the followers, for example followers /
log shipping standbys
On Mon, Nov 29, 2021 at 1:48 AM SATYANARAYANA NARLAPURAM
<satyanarlapuram@gmail.com> wrote:
3) Instead of the subscriber pulling the slot info, why can't the
publisher (via the walsender or a new bg worker maybe?) push the
latest slot info? I'm not sure we want to add more functionality to
the walsender, if yes, isn't it going to be much simpler?Standby pulling the information or at least making a first attempt to connect to the primary is a better design as primary doesn't need to spend its cycles repeatedly connecting to an unreachable standby. In fact, primary wouldn't even need to know the followers, for example followers / log shipping standbys
My idea was to let the existing walsender from the primary/publisher
to send the slot info (both logical and physical replication slots) to
the standby/subscriber, probably by piggybacking the slot info with
the WAL currently it sends. Having said that, I don't know the
feasibility of it. Anyways, I'm not in favour of having a new bg
worker to just ship the slot info. The standby/subscriber, while
making connection to primary/publisher, can choose to get the
replication slot info.
As I said upthread, the problem I see with standby/subscriber pulling
the info is that: how frequently the standby/subscriber is going to
sync the slot info from primary/publisher? How can it ensure that the
latest information exists say when the subscriber is down/crashed
before it picks up the latest slot information?
IIUC, the initial idea proposed in this patch deals with only logical
replication slots not the physical replication slots, what I'm
thinking is to have a generic way to deal with both of them.
Note: In the above description, I used primary-standby and
publisher-subscriber to represent the physical and logical replication
slots respectively.
Regards,
Bharath Rupireddy.
On Mon, Nov 29, 2021 at 9:40 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
On Mon, Nov 29, 2021 at 1:48 AM SATYANARAYANA NARLAPURAM
<satyanarlapuram@gmail.com> wrote:3) Instead of the subscriber pulling the slot info, why can't the
publisher (via the walsender or a new bg worker maybe?) push the
latest slot info? I'm not sure we want to add more functionality to
the walsender, if yes, isn't it going to be much simpler?Standby pulling the information or at least making a first attempt to connect to the primary is a better design as primary doesn't need to spend its cycles repeatedly connecting to an unreachable standby. In fact, primary wouldn't even need to know the followers, for example followers / log shipping standbys
My idea was to let the existing walsender from the primary/publisher
to send the slot info (both logical and physical replication slots) to
the standby/subscriber, probably by piggybacking the slot info with
the WAL currently it sends. Having said that, I don't know the
feasibility of it. Anyways, I'm not in favour of having a new bg
worker to just ship the slot info. The standby/subscriber, while
making connection to primary/publisher, can choose to get the
replication slot info.
I think it is possible that the standby is restoring the WAL directly
from the archive location and there might not be any wal sender at
time. So I think the idea of standby pulling the WAL looks better to
me.
As I said upthread, the problem I see with standby/subscriber pulling
the info is that: how frequently the standby/subscriber is going to
sync the slot info from primary/publisher? How can it ensure that the
latest information exists say when the subscriber is down/crashed
before it picks up the latest slot information?
Yeah that is a good question that how frequently the subscriber should
fetch the slot information, I think that should be configurable
values. And the time delay is more, the chances of losing the latest
slot is more.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Mon, Nov 29, 2021 at 11:14 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Mon, Nov 29, 2021 at 9:40 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:On Mon, Nov 29, 2021 at 1:48 AM SATYANARAYANA NARLAPURAM
<satyanarlapuram@gmail.com> wrote:3) Instead of the subscriber pulling the slot info, why can't the
publisher (via the walsender or a new bg worker maybe?) push the
latest slot info? I'm not sure we want to add more functionality to
the walsender, if yes, isn't it going to be much simpler?Standby pulling the information or at least making a first attempt to connect to the primary is a better design as primary doesn't need to spend its cycles repeatedly connecting to an unreachable standby. In fact, primary wouldn't even need to know the followers, for example followers / log shipping standbys
My idea was to let the existing walsender from the primary/publisher
to send the slot info (both logical and physical replication slots) to
the standby/subscriber, probably by piggybacking the slot info with
the WAL currently it sends. Having said that, I don't know the
feasibility of it. Anyways, I'm not in favour of having a new bg
worker to just ship the slot info. The standby/subscriber, while
making connection to primary/publisher, can choose to get the
replication slot info.I think it is possible that the standby is restoring the WAL directly
from the archive location and there might not be any wal sender at
time. So I think the idea of standby pulling the WAL looks better to
me.
My point was that why can't we let the walreceiver (of course users
can configure it on the standby/subscriber) to choose whether or not
to receive the replication (both physical and logical) slot info from
the primary/publisher and if yes, the walsender(on the
primary/publisher) sending it probably as a new WAL record or just
piggybacking the replication slot info with any of the existing WAL
records.
Or simply a common bg worker (as opposed to the bg worker proposed
originally in this thread which, IIUC, works for logical replication)
running on standby/subscriber for getting both the physical and
logical replication slots info.
As I said upthread, the problem I see with standby/subscriber pulling
the info is that: how frequently the standby/subscriber is going to
sync the slot info from primary/publisher? How can it ensure that the
latest information exists say when the subscriber is down/crashed
before it picks up the latest slot information?Yeah that is a good question that how frequently the subscriber should
fetch the slot information, I think that should be configurable
values. And the time delay is more, the chances of losing the latest
slot is more.
I agree that it should be configurable. Even if the primary/publisher
is down/crashed, one can still compare the latest slot info from both
the primary/publisher and standby/subscriber using a new tool
pg_replslotdata proposed at [1]/messages/by-id/CALj2ACW0rV5gWK8A3m6_X62qH+Vfaq5hznC=i0R5Wojt5+yhyw@mail.gmail.com and see how far and which slots missed
the latest replication slot info and probably drop those alone to
recreate and retain other slots as is.
[1]: /messages/by-id/CALj2ACW0rV5gWK8A3m6_X62qH+Vfaq5hznC=i0R5Wojt5+yhyw@mail.gmail.com
Regards,
Bharath Rupireddy.
On Mon, Nov 29, 2021 at 12:19 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
On Mon, Nov 29, 2021 at 11:14 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Mon, Nov 29, 2021 at 9:40 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:On Mon, Nov 29, 2021 at 1:48 AM SATYANARAYANA NARLAPURAM
<satyanarlapuram@gmail.com> wrote:3) Instead of the subscriber pulling the slot info, why can't the
publisher (via the walsender or a new bg worker maybe?) push the
latest slot info? I'm not sure we want to add more functionality to
the walsender, if yes, isn't it going to be much simpler?Standby pulling the information or at least making a first attempt to connect to the primary is a better design as primary doesn't need to spend its cycles repeatedly connecting to an unreachable standby. In fact, primary wouldn't even need to know the followers, for example followers / log shipping standbys
My idea was to let the existing walsender from the primary/publisher
to send the slot info (both logical and physical replication slots) to
the standby/subscriber, probably by piggybacking the slot info with
the WAL currently it sends. Having said that, I don't know the
feasibility of it. Anyways, I'm not in favour of having a new bg
worker to just ship the slot info. The standby/subscriber, while
making connection to primary/publisher, can choose to get the
replication slot info.I think it is possible that the standby is restoring the WAL directly
from the archive location and there might not be any wal sender at
time. So I think the idea of standby pulling the WAL looks better to
me.My point was that why can't we let the walreceiver (of course users
can configure it on the standby/subscriber) to choose whether or not
to receive the replication (both physical and logical) slot info from
the primary/publisher and if yes, the walsender(on the
primary/publisher) sending it probably as a new WAL record or just
piggybacking the replication slot info with any of the existing WAL
records.
Okay, I thought your point was that the primary pushing is better over
standby pulling the slot info, but now it seems that you also agree
that standby pulling is better right? Now it appears your point is
about whether we will use the same connection for pulling the slot
information which we are using for streaming the data or any other
connection? I mean in this patch also we are creating a replication
connection and pulling the slot information over there, just point is
we are starting a separate worker for pulling the slot information,
and I think that approach is better as this will not impact the
performance of the other replication connection which we are using for
communicating the data.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Mon, Nov 29, 2021 at 1:10 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Mon, Nov 29, 2021 at 12:19 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:On Mon, Nov 29, 2021 at 11:14 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Mon, Nov 29, 2021 at 9:40 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:On Mon, Nov 29, 2021 at 1:48 AM SATYANARAYANA NARLAPURAM
<satyanarlapuram@gmail.com> wrote:3) Instead of the subscriber pulling the slot info, why can't the
publisher (via the walsender or a new bg worker maybe?) push the
latest slot info? I'm not sure we want to add more functionality to
the walsender, if yes, isn't it going to be much simpler?Standby pulling the information or at least making a first attempt to connect to the primary is a better design as primary doesn't need to spend its cycles repeatedly connecting to an unreachable standby. In fact, primary wouldn't even need to know the followers, for example followers / log shipping standbys
My idea was to let the existing walsender from the primary/publisher
to send the slot info (both logical and physical replication slots) to
the standby/subscriber, probably by piggybacking the slot info with
the WAL currently it sends. Having said that, I don't know the
feasibility of it. Anyways, I'm not in favour of having a new bg
worker to just ship the slot info. The standby/subscriber, while
making connection to primary/publisher, can choose to get the
replication slot info.I think it is possible that the standby is restoring the WAL directly
from the archive location and there might not be any wal sender at
time. So I think the idea of standby pulling the WAL looks better to
me.My point was that why can't we let the walreceiver (of course users
can configure it on the standby/subscriber) to choose whether or not
to receive the replication (both physical and logical) slot info from
the primary/publisher and if yes, the walsender(on the
primary/publisher) sending it probably as a new WAL record or just
piggybacking the replication slot info with any of the existing WAL
records.Okay, I thought your point was that the primary pushing is better over
standby pulling the slot info, but now it seems that you also agree
that standby pulling is better right? Now it appears your point is
about whether we will use the same connection for pulling the slot
information which we are using for streaming the data or any other
connection? I mean in this patch also we are creating a replication
connection and pulling the slot information over there, just point is
we are starting a separate worker for pulling the slot information,
and I think that approach is better as this will not impact the
performance of the other replication connection which we are using for
communicating the data.
The easiest way to implement this feature so far, is to use a common
bg worker (as opposed to the bg worker proposed originally in this
thread which, IIUC, works for logical replication) running on standby
(in case of streaming replication with physical replication slots) or
subscriber (in case of logical replication with logical replication
slots) for getting both the physical and logical replication slots
info from the primary or publisher. This bg worker requires at least
two GUCs, 1) to enable/disable the worker 2) to define the slot sync
interval (the bg worker gets the slots info after every sync interval
of time).
Thoughts?
Regards,
Bharath Rupireddy.
On 31.10.21 11:08, Peter Eisentraut wrote:
I want to reactivate $subject. I took Petr Jelinek's patch from [0],
rebased it, added a bit of testing. It basically works, but as
mentioned in [0], there are various issues to work out.The idea is that the standby runs a background worker to periodically
fetch replication slot information from the primary. On failover, a
logical subscriber would then ideally find up-to-date replication slots
on the new publisher and can just continue normally.
So, again, this isn't anywhere near ready, but there is already a lot
here to gather feedback about how it works, how it should work, how to
configure it, and how it fits into an overall replication and HA
architecture.
Here is an updated patch. The main changes are that I added two
configuration parameters. The first, synchronize_slot_names, is set on
the physical standby to specify which slots to sync from the primary.
By default, it is empty. (This also fixes the recovery test failures
that I had to disable in the previous patch version.) The second,
standby_slot_names, is set on the primary. It holds back logical
replication until the listed physical standbys have caught up. That
way, when failover is necessary, the promoted standby is not behind the
logical replication consumers.
In principle, this works now, I think. I haven't made much progress in
creating more test cases for this; that's something that needs more
attention.
It's worth pondering what the configuration language for
standby_slot_names should be. Right now, it's just a list of slots that
all need to be caught up. More complicated setups are conceivable.
Maybe you have standbys S1 and S2 that are potential failover targets
for logical replication consumers L1 and L2, and also standbys S3 and S4
that are potential failover targets for logical replication consumers L3
and L4. Viewed like that, this setting could be a replication slot
setting. The setting might also have some relationship with
synchronous_standby_names. Like, if you have synchronous_standby_names
set, then that's a pretty good indication that you also want some or all
of those standbys in standby_slot_names. (But note that one is slots
and one is application names.) So there are a variety of possibilities.
Attachments:
v2-0001-Synchronize-logical-replication-slots-from-primar.patchtext/plain; charset=UTF-8; name=v2-0001-Synchronize-logical-replication-slots-from-primar.patchDownload+1148-71
On 24.11.21 07:11, Masahiko Sawada wrote:
I haven’t looked at the patch deeply but regarding 007_sync_rep.pl,
the tests seem to fail since the tests rely on the order of the wal
sender array on the shared memory. Since a background worker for
synchronizing replication slots periodically connects to the walsender
on the primary and disconnects, it breaks the assumption of the order.
Regarding 010_logical_decoding_timelines.pl, I guess that the patch
breaks the test because the background worker for synchronizing slots
on the replica periodically advances the replica's slot. I think we
need to have a way to disable the slot synchronization or to specify
the slot name to sync with the primary. I'm not sure we already
discussed this topic but I think we need it at least for testing
purposes.
This has been addressed by patch v2 that adds such a setting.
On 24.11.21 17:25, Dimitri Fontaine wrote:
Is there a case to be made about doing the same thing for physical
replication slots too?
It has been considered. At the moment, I'm not doing it, because it
would add more code and complexity and it's not that important. But it
could be added in the future.
Given the admitted state of the patch, I didn't focus on tests. I could
successfully apply the patch on-top of current master's branch, and
cleanly compile and `make check`.Then I also updated pg_auto_failover to support Postgres 15devel [2] so
that I could then `make NODES=3 cluster` there and play with the new
replication command:$ psql -d "port=5501 replication=1" -c "LIST_SLOTS;"
psql:/Users/dim/.psqlrc:24: ERROR: XX000: cannot execute SQL commands in WAL sender for physical replication
LOCATION: exec_replication_command, walsender.c:1830
...I'm not too sure about this idea of running SQL in a replication
protocol connection that you're mentioning, but I suppose that's just me
needing to brush up on the topic.
FWIW, the way the replication command parser works, if there is a parse
error, it tries to interpret the command as a plain SQL command. But
that only works for logical replication connections. So in physical
replication, if you try to run anything that does not parse, you will
get this error. But that has nothing to do with this feature. The
above command works for me, so maybe something else went wrong in your
situation.
Maybe the first question about configuration would be about selecting
which slots a standby should maintain from the primary. Is it all of the
slots that exists on both the nodes, or a sublist of that?Is it possible to have a slot with the same name on a primary and a
standby node, in a way that the standby's slot would be a completely
separate entity from the primary's slot? If yes (I just don't know at
the moment), well then, should we continue to allow that?
This has been added in v2.
Also, do we want to even consider having the slot management on a
primary node depend on the ability to sync the advancing on one or more
standby nodes? I'm not sure to see that one as a good idea, but maybe we
want to kill it publically very early then ;-)
I don't know what you mean by this.
On 28.11.21 07:52, Bharath Rupireddy wrote:
1) Instead of a new LIST_SLOT command, can't we use
READ_REPLICATION_SLOT (slight modifications needs to be done to make
it support logical replication slots and to get more information from
the subscriber).
I looked at that but didn't see an obvious way to consolidate them.
This is something we could look at again later.
2) How frequently the new bg worker is going to sync the slot info?
How can it ensure that the latest information exists say when the
subscriber is down/crashed before it picks up the latest slot
information?
The interval is currently hardcoded, but could be a configuration
setting. In the v2 patch, there is a new setting that orders physical
replication before logical so that the logical subscribers cannot get
ahead of the physical standby.
3) Instead of the subscriber pulling the slot info, why can't the
publisher (via the walsender or a new bg worker maybe?) push the
latest slot info? I'm not sure we want to add more functionality to
the walsender, if yes, isn't it going to be much simpler?
This sounds like the failover slot feature, which was rejected.
Hello,
I started taking a brief look at the v2 patch, and it does appear to work for the basic case. Logical slot is synchronized across and I can connect to the promoted standby and stream changes afterwards.
It's not clear to me what the correct behavior is when a logical slot that has been synced to the replica and then it gets deleted on the writer. Would we expect this to be propagated or leave it up to the end-user to manage?
+ rawname = pstrdup(standby_slot_names); + SplitIdentifierString(rawname, ',', &namelist); + + while (true) + { + int wait_slots_remaining; + XLogRecPtr oldest_flush_pos = InvalidXLogRecPtr; + int rc; + + wait_slots_remaining = list_length(namelist); + + LWLockAcquire(ReplicationSlotControlLock, LW_SHARED); + for (int i = 0; i < max_replication_slots; i++) + {
Even though standby_slot_names is PGC_SIGHUP, we never reload/re-process the value. If we have a wrong entry in there, the backend becomes stuck until we re-establish the logical connection. Adding "postmaster/interrupt.h" with ConfigReloadPending / ProcessConfigFile does seem to work.
Another thing I noticed is that once it starts waiting in this block, Ctrl+C doesn't seem to terminate the backend?
pg_recvlogical -d postgres -p 5432 --slot regression_slot --start -f -
..
^Cpg_recvlogical: error: unexpected termination of replication stream:
The logical backend connection is still present:
ps aux | grep 51263
hsuchen 51263 80.7 0.0 320180 14304 ? Rs 01:11 3:04 postgres: walsender hsuchen [local] START_REPLICATION
pstack 51263
#0 0x00007ffee99e79a5 in clock_gettime ()
#1 0x00007f8705e88246 in clock_gettime () from /lib64/libc.so.6
#2 0x000000000075f141 in WaitEventSetWait ()
#3 0x000000000075f565 in WaitLatch ()
#4 0x0000000000720aea in ReorderBufferProcessTXN ()
#5 0x00000000007142a6 in DecodeXactOp ()
#6 0x000000000071460f in LogicalDecodingProcessRecord ()
It can be terminated with a pg_terminate_backend though.
If we have a physical slot with name foo on the standby, and then a logical slot is created on the writer with the same slot_name it does error out on the replica although it prevents other slots from being synchronized which is probably fine.
2021-12-16 02:10:29.709 UTC [73788] LOG: replication slot synchronization worker for database "postgres" has started
2021-12-16 02:10:29.713 UTC [73788] ERROR: cannot use physical replication slot for logical decoding
2021-12-16 02:10:29.714 UTC [73037] DEBUG: unregistering background worker "replication slot synchronization worker"
On 12/14/21, 2:26 PM, "Peter Eisentraut" <peter.eisentraut@enterprisedb.com> wrote:
CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
On 28.11.21 07:52, Bharath Rupireddy wrote:
1) Instead of a new LIST_SLOT command, can't we use
READ_REPLICATION_SLOT (slight modifications needs to be done to make
it support logical replication slots and to get more information from
the subscriber).
I looked at that but didn't see an obvious way to consolidate them.
This is something we could look at again later.
2) How frequently the new bg worker is going to sync the slot info?
How can it ensure that the latest information exists say when the
subscriber is down/crashed before it picks up the latest slot
information?
The interval is currently hardcoded, but could be a configuration
setting. In the v2 patch, there is a new setting that orders physical
replication before logical so that the logical subscribers cannot get
ahead of the physical standby.
3) Instead of the subscriber pulling the slot info, why can't the
publisher (via the walsender or a new bg worker maybe?) push the
latest slot info? I'm not sure we want to add more functionality to
the walsender, if yes, isn't it going to be much simpler?
This sounds like the failover slot feature, which was rejected.
Here is an updated patch to fix some build failures. No feature changes.
Show quoted text
On 14.12.21 23:12, Peter Eisentraut wrote:
On 31.10.21 11:08, Peter Eisentraut wrote:
I want to reactivate $subject. I took Petr Jelinek's patch from [0],
rebased it, added a bit of testing. It basically works, but as
mentioned in [0], there are various issues to work out.The idea is that the standby runs a background worker to periodically
fetch replication slot information from the primary. On failover, a
logical subscriber would then ideally find up-to-date replication
slots on the new publisher and can just continue normally.So, again, this isn't anywhere near ready, but there is already a lot
here to gather feedback about how it works, how it should work, how to
configure it, and how it fits into an overall replication and HA
architecture.Here is an updated patch. The main changes are that I added two
configuration parameters. The first, synchronize_slot_names, is set on
the physical standby to specify which slots to sync from the primary. By
default, it is empty. (This also fixes the recovery test failures that
I had to disable in the previous patch version.) The second,
standby_slot_names, is set on the primary. It holds back logical
replication until the listed physical standbys have caught up. That
way, when failover is necessary, the promoted standby is not behind the
logical replication consumers.In principle, this works now, I think. I haven't made much progress in
creating more test cases for this; that's something that needs more
attention.It's worth pondering what the configuration language for
standby_slot_names should be. Right now, it's just a list of slots that
all need to be caught up. More complicated setups are conceivable.
Maybe you have standbys S1 and S2 that are potential failover targets
for logical replication consumers L1 and L2, and also standbys S3 and S4
that are potential failover targets for logical replication consumers L3
and L4. Viewed like that, this setting could be a replication slot
setting. The setting might also have some relationship with
synchronous_standby_names. Like, if you have synchronous_standby_names
set, then that's a pretty good indication that you also want some or all
of those standbys in standby_slot_names. (But note that one is slots
and one is application names.) So there are a variety of possibilities.
Attachments:
v3-0001-Synchronize-logical-replication-slots-from-primar.patchtext/plain; charset=UTF-8; name=v3-0001-Synchronize-logical-replication-slots-from-primar.patchDownload+1146-71
On Wed, Dec 15, 2021 at 7:13 AM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:
On 31.10.21 11:08, Peter Eisentraut wrote:
I want to reactivate $subject. I took Petr Jelinek's patch from [0],
rebased it, added a bit of testing. It basically works, but as
mentioned in [0], there are various issues to work out.The idea is that the standby runs a background worker to periodically
fetch replication slot information from the primary. On failover, a
logical subscriber would then ideally find up-to-date replication slots
on the new publisher and can just continue normally.So, again, this isn't anywhere near ready, but there is already a lot
here to gather feedback about how it works, how it should work, how to
configure it, and how it fits into an overall replication and HA
architecture.The second,
standby_slot_names, is set on the primary. It holds back logical
replication until the listed physical standbys have caught up. That
way, when failover is necessary, the promoted standby is not behind the
logical replication consumers.
I might be missing something but isn’t it okay even if the new primary
server is behind the subscribers? IOW, even if two slot's LSNs (i.e.,
restart_lsn and confirm_flush_lsn) are behind the subscriber's remote
LSN (i.e., pg_replication_origin.remote_lsn), the primary sends only
transactions that were committed after the remote_lsn. So the
subscriber can resume logical replication with the new primary without
any data loss.
The new primary should not be ahead of the subscribers because it
forwards the logical replication start LSN to the slot’s
confirm_flush_lsn in this case. But it cannot happen since the remote
LSN of the subscriber’s origin is always updated first, then the
confirm_flush_lsn of the slot on the primary is updated, and then the
confirm_flush_lsn of the slot on the standby is synchronized.
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
I might be missing something but isn’t it okay even if the new primary
server is behind the subscribers? IOW, even if two slot's LSNs (i.e.,
restart_lsn and confirm_flush_lsn) are behind the subscriber's remote
LSN (i.e., pg_replication_origin.remote_lsn), the primary sends only
transactions that were committed after the remote_lsn. So the
subscriber can resume logical replication with the new primary without
any data loss.
Maybe I'm misreading, but I thought the purpose of this to make
sure that the logical subscriber does not have data that has not been
replicated to the new primary. The use-case I can think of would be
if synchronous_commit were enabled and fail-over occurs. If
we didn't have this set, isn't it possible that this logical subscriber
has extra commits that aren't present on the newly promoted primary?
And sorry I accidentally started a new thread in my last reply.
Re-pasting some of my previous questions/comments:
wait_for_standby_confirmation does not update standby_slot_names once it's
in a loop and can't be fixed with SIGHUP. Similarly, synchronize_slot_names
isn't updated once the worker is launched.
If a logical slot was dropped on the writer, should the worker drop logical
slots that it was previously synchronizing but are no longer present? Or
should we leave that to the user to manage? I'm trying to think why users
would want to sync logical slots to a reader but not have that be dropped
as well if it's no longer present.
Is there a reason we're deciding to use one-worker syncing per database
instead of one general worker that syncs across all the databases?
I imagine I'm missing something obvious here.
As for how standby_slot_names should be configured, I'd prefer the
flexibility similar to what we have for synchronus_standby_names since
that seems the most analogous. It'd provide flexibility for failovers,
which I imagine is the most common use-case.
On 1/20/22, 9:34 PM, "Masahiko Sawada" <sawada.mshk@gmail.com> wrote:
CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
On Wed, Dec 15, 2021 at 7:13 AM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:
On 31.10.21 11:08, Peter Eisentraut wrote:
I want to reactivate $subject. I took Petr Jelinek's patch from [0],
rebased it, added a bit of testing. It basically works, but as
mentioned in [0], there are various issues to work out.The idea is that the standby runs a background worker to periodically
fetch replication slot information from the primary. On failover, a
logical subscriber would then ideally find up-to-date replication slots
on the new publisher and can just continue normally.So, again, this isn't anywhere near ready, but there is already a lot
here to gather feedback about how it works, how it should work, how to
configure it, and how it fits into an overall replication and HA
architecture.The second,
standby_slot_names, is set on the primary. It holds back logical
replication until the listed physical standbys have caught up. That
way, when failover is necessary, the promoted standby is not behind the
logical replication consumers.
I might be missing something but isn’t it okay even if the new primary
server is behind the subscribers? IOW, even if two slot's LSNs (i.e.,
restart_lsn and confirm_flush_lsn) are behind the subscriber's remote
LSN (i.e., pg_replication_origin.remote_lsn), the primary sends only
transactions that were committed after the remote_lsn. So the
subscriber can resume logical replication with the new primary without
any data loss.
The new primary should not be ahead of the subscribers because it
forwards the logical replication start LSN to the slot’s
confirm_flush_lsn in this case. But it cannot happen since the remote
LSN of the subscriber’s origin is always updated first, then the
confirm_flush_lsn of the slot on the primary is updated, and then the
confirm_flush_lsn of the slot on the standby is synchronized.
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
On Sat, Jan 22, 2022 at 4:33 AM Hsu, John <hsuchen@amazon.com> wrote:
I might be missing something but isn’t it okay even if the new primary
server is behind the subscribers? IOW, even if two slot's LSNs (i.e.,
restart_lsn and confirm_flush_lsn) are behind the subscriber's remote
LSN (i.e., pg_replication_origin.remote_lsn), the primary sends only
transactions that were committed after the remote_lsn. So the
subscriber can resume logical replication with the new primary without
any data loss.Maybe I'm misreading, but I thought the purpose of this to make
sure that the logical subscriber does not have data that has not been
replicated to the new primary. The use-case I can think of would be
if synchronous_commit were enabled and fail-over occurs. If
we didn't have this set, isn't it possible that this logical subscriber
has extra commits that aren't present on the newly promoted primary?
This is very much possible if the new primary used to be asynchronous
standby. But, it seems like the current patch is trying to hold the
logical replication until the data has been replicated to the physical
standby when synchronous_slot_names is set. This will ensure that the
logical subscriber is never ahead of the new primary. However, AFAIU
that's not the primary use-case of this patch; instead this is to
ensure that the logical subscribers continue getting data from the new
primary when the failover occurs.
If a logical slot was dropped on the writer, should the worker drop logical
slots that it was previously synchronizing but are no longer present? Or
should we leave that to the user to manage? I'm trying to think why users
would want to sync logical slots to a reader but not have that be dropped
as well if it's no longer present.
AFAIU this should be taken care of by the background worker used to
synchronize the replication slot.
--
With Regards,
Ashutosh Sharma.
Hi,
On 2022-01-03 14:46:52 +0100, Peter Eisentraut wrote:
From ec00dc6ab8bafefc00e9b1c78ac9348b643b8a87 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Mon, 3 Jan 2022 14:43:36 +0100
Subject: [PATCH v3] Synchronize logical replication slots from primary to
standby
I've just skimmed the patch and the related threads. As far as I can tell this
cannot be safely used without the conflict handling in [1]/messages/by-id/CA+TgmoZd-JqNL1-R3RJ0jQRD+-dc94X0nPJgh+dwdDF0rFuE3g@mail.gmail.com, is that correct?
Greetings,
Andres Freund
[1]: /messages/by-id/CA+TgmoZd-JqNL1-R3RJ0jQRD+-dc94X0nPJgh+dwdDF0rFuE3g@mail.gmail.com