Doc: fix the note related to the GUC "synchronized_standby_slots"
Hi,
When I read the following documentation related to the "synchronized_standby_slots", I misunderstood that data loss would not occur in the case of synchronous physical replication. However, this is incorrect (see reproduce.txt).
Note that in the case of asynchronous replication, there remains a risk of data loss for transactions committed on the former primary server but have yet to be replicated to the new primary server.
https://www.postgresql.org/docs/17/logical-replication-failover.html
Am I missing something? IIUC, could you change the documentation as suggested in the attached patch? I also believe it would be better to move the sentence to the next paragraph because the note is related to "synchronized_standby_slots.".
Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION
Attachments:
v1-0001-fix-documentation-related-to-synchronized_standby.patchapplication/octet-stream; name=v1-0001-fix-documentation-related-to-synchronized_standby.patchDownload
From b48c68914b687c150447e8e2c382374d754a20b5 Mon Sep 17 00:00:00 2001
From: Masahiro Ikeda <Masahiro.Ikeda@nttdata.com>
Date: Mon, 26 Aug 2024 16:42:40 +0900
Subject: [PATCH v1] fix documentation related to synchronized_standby_slots
---
doc/src/sgml/logical-replication.sgml | 17 ++++++++---------
1 file changed, 8 insertions(+), 9 deletions(-)
diff --git a/doc/src/sgml/logical-replication.sgml b/doc/src/sgml/logical-replication.sgml
index bee7e02983b..a355ad34275 100644
--- a/doc/src/sgml/logical-replication.sgml
+++ b/doc/src/sgml/logical-replication.sgml
@@ -701,18 +701,17 @@ ALTER SUBSCRIPTION
<link linkend="sql-createsubscription-params-with-failover"><literal>failover</literal></link>
parameter ensures a seamless transition of those subscriptions after the
standby is promoted. They can continue subscribing to publications on the
- new primary server without losing data. Note that in the case of
- asynchronous replication, there remains a risk of data loss for transactions
- committed on the former primary server but have yet to be replicated to the new
- primary server.
+ new primary server without losing data.
</para>
<para>
- Because the slot synchronization logic copies asynchronously, it is
- necessary to confirm that replication slots have been synced to the standby
- server before the failover happens. To ensure a successful failover, the
- standby server must be ahead of the subscriber. This can be achieved by
- configuring
+ Note that there remains a risk of data loss for transactions committed on the
+ former primary server but have yet to be replicated to the new primary server even
+ in the case of synchronous physical replication. Because the slot synchronization
+ logic copies asynchronously, it is necessary to confirm that replication slots
+ have been synced to the standby server before the failover happens. To ensure a
+ successful failover, the standby server must be ahead of the subscriber. This
+ can be achieved by configuring
<link linkend="guc-synchronized-standby-slots"><varname>synchronized_standby_slots</varname></link>.
</para>
--
2.34.1
On Mon, Aug 26, 2024 at 1:30 PM <Masahiro.Ikeda@nttdata.com> wrote:
When I read the following documentation related to the "synchronized_standby_slots", I misunderstood that data loss would not occur in the case of synchronous physical replication. However, this is incorrect (see reproduce.txt).
Note that in the case of asynchronous replication, there remains a risk of data loss for transactions committed on the former primary server but have yet to be replicated to the new primary server.
https://www.postgresql.org/docs/17/logical-replication-failover.html
Am I missing something?
It seems part of the paragraph: "Note that in the case of asynchronous
replication, there remains a risk of data loss for transactions
committed on the former primary server but have yet to be replicated
to the new primary server." is a bit confusing. Will it make things
clear to me if we remove that part?
I am keeping a few others involved in this feature development in Cc.
--
With Regards,
Amit Kapila.
On Monday, August 26, 2024 5:37 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Aug 26, 2024 at 1:30 PM <Masahiro.Ikeda@nttdata.com> wrote:
When I read the following documentation related to the
"synchronized_standby_slots", I misunderstood that data loss would not occur
in the case of synchronous physical replication. However, this is incorrect (see
reproduce.txt).Note that in the case of asynchronous replication, there remains a risk of
data loss for transactions committed on the former primary server but have yet
to be replicated to the new primary server.https://www.postgresql.org/docs/17/logical-replication-failover.html
Am I missing something?
It seems part of the paragraph: "Note that in the case of asynchronous
replication, there remains a risk of data loss for transactions committed on the
former primary server but have yet to be replicated to the new primary server." is
a bit confusing. Will it make things clear to me if we remove that part?
I think the intention is to address a complaint[1]/messages/by-id/ZfRe2+OxMS0kvNvx@ip-10-97-1-34.eu-west-3.compute.internal that the date inserted on
primary after the primary disconnects with the standby is still lost after
failover. But after rethinking, maybe it's doesn't directly belong to the topic in
the logical failover section because it's a general fact for async replication.
If we think it matters, maybe we can remove this part and slightly modify
another part:
parameter ensures a seamless transition of those subscriptions after the
standby is promoted. They can continue subscribing to publications on the
- new primary server without losing data.
+ new primary server without losing that has already been replicated and
+ flushed on the standby server.
[1]: /messages/by-id/ZfRe2+OxMS0kvNvx@ip-10-97-1-34.eu-west-3.compute.internal
Best Regards,
Hou zj
On Mon, Aug 26, 2024 at 1:30 PM <Masahiro.Ikeda@nttdata.com> wrote:
When I read the following documentation related to the "synchronized_standby_slots", I misunderstood that data loss would not occur in the case of synchronous physical replication. However, this is incorrect (see reproduce.txt).
I think you see such a behavior because you have disabled
'synchronized_standby_slots' in your script (# disable
"synchronized_standby_slots"). You need to enable that to avoid data
loss. Considering that, I don't think your proposed text is an
improvement.
--
With Regards,
Amit Kapila.
On Mon, Aug 26, 2024 at 6:38 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:
On Monday, August 26, 2024 5:37 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Aug 26, 2024 at 1:30 PM <Masahiro.Ikeda@nttdata.com> wrote:
When I read the following documentation related to the
"synchronized_standby_slots", I misunderstood that data loss would not occur
in the case of synchronous physical replication. However, this is incorrect (see
reproduce.txt).Note that in the case of asynchronous replication, there remains a risk of
data loss for transactions committed on the former primary server but have yet
to be replicated to the new primary server.https://www.postgresql.org/docs/17/logical-replication-failover.html
Am I missing something?
It seems part of the paragraph: "Note that in the case of asynchronous
replication, there remains a risk of data loss for transactions committed on the
former primary server but have yet to be replicated to the new primary server." is
a bit confusing. Will it make things clear to me if we remove that part?I think the intention is to address a complaint[1] that the date inserted on
primary after the primary disconnects with the standby is still lost after
failover. But after rethinking, maybe it's doesn't directly belong to the topic in
the logical failover section because it's a general fact for async replication.
If we think it matters, maybe we can remove this part and slightly modify
another part:parameter ensures a seamless transition of those subscriptions after the standby is promoted. They can continue subscribing to publications on the - new primary server without losing data. + new primary server without losing that has already been replicated and + flushed on the standby server.
Yeah, we can change that way but not sure if that satisfies the OP's
concern. I am waiting for his response.
--
With Regards,
Amit Kapila.
On Monday, August 26, 2024, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Aug 26, 2024 at 6:38 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:On Monday, August 26, 2024 5:37 PM Amit Kapila <amit.kapila16@gmail.com>
wrote:
On Mon, Aug 26, 2024 at 1:30 PM <Masahiro.Ikeda@nttdata.com> wrote:
When I read the following documentation related to the
"synchronized_standby_slots", I misunderstood that data loss would not
occur
in the case of synchronous physical replication. However, this is
incorrect (see
reproduce.txt).
Note that in the case of asynchronous replication, there remains a
risk of
data loss for transactions committed on the former primary server but
have yet
to be replicated to the new primary server.
https://www.postgresql.org/docs/17/logical-replication-failover.html
Am I missing something?
It seems part of the paragraph: "Note that in the case of asynchronous
replication, there remains a risk of data loss for transactionscommitted on the
former primary server but have yet to be replicated to the new primary
server." is
a bit confusing. Will it make things clear to me if we remove that
part?
I think the intention is to address a complaint[1] that the date
inserted on
primary after the primary disconnects with the standby is still lost
after
failover. But after rethinking, maybe it's doesn't directly belong to
the topic in
the logical failover section because it's a general fact for async
replication.
If we think it matters, maybe we can remove this part and slightly modify
another part:parameter ensures a seamless transition of those subscriptions after
the
standby is promoted. They can continue subscribing to publications on
the
- new primary server without losing data. + new primary server without losing that has already been replicatedand
+ flushed on the standby server.
Yeah, we can change that way but not sure if that satisfies the OP's
concern. I am waiting for his response.
I’d suggest getting rid of all mention of “without losing data” and just
emphasize the fact that the subscribers can operate in a hot-standby
publishing environment in an automated fashion by connecting using
“failover” enabled slots, assuming the publishing group prevents any
changes from propagating to any logical subscriber until all standbys in
the group have been updated. Whether or not the primary-standby group is
resilient in the face of failure during internal group synchronization is
out of the hands of logical subscribers - rather they are only guaranteed
to see a consistent linear history of activity coming out of the publishing
group. Specifically, if the group synchronizes asynchronously there is no
guarantee that every committed transaction on the primary makes its way
through to the logical subscriber if a slot failover happens. But at the
same time its view of the world will be consistent with the newly chosen
primary.
David J.
Thans for your responses.
I think you see such a behavior because you have disabled 'synchronized_standby_slots'
in your script (# disable "synchronized_standby_slots"). You need to enable that to
avoid data loss. Considering that, I don't think your proposed text is an improvement.
Yes, I know.
As David said, "without losing data" makes me confused because there are three patterns that users
think the data was lost though there may be other cases.
Pattern1. the data which clients get a committed response for from the old primary, but the new primary doesn’t have in the case of asynchronous replication
-> we can avoid this with synchronous replication. This is not relevant to the failover feature.
Pattern2. the data which the new primary has, but the subscribers don't have
-> we can avoid this with the failover feature.
Pattern3. the data which the subscribers have, but the new primary doesn't have
-> we can avoid this with the 'synchronized_standby_slots' parameter.
Currently, I understand that the following documentation says
* the failover feature makes publications without losing pattern 2 data.
* pattern 1 data may be lost if you use asynchronous replication.
* the following doesn't mention pattern 3 at all, which I misunderstood point.
They can continue subscribing to publications on the new primary server without losing data.
Note that in the case of asynchronous replication, there remains a risk of data loss for transactions
committed on the former primary server but have yet to be replicated to the new primary server
Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION
On Tue, Aug 27, 2024 at 10:18 AM <Masahiro.Ikeda@nttdata.com> wrote:
I think you see such a behavior because you have disabled 'synchronized_standby_slots'
in your script (# disable "synchronized_standby_slots"). You need to enable that to
avoid data loss. Considering that, I don't think your proposed text is an improvement.Yes, I know.
As David said, "without losing data" makes me confused because there are three patterns that users
think the data was lost though there may be other cases.
So, will it be okay if we just remove ".. without losing data" from
the sentence? Will that avoid the confusion you have?
With Regards,
Amit Kapila.
So, will it be okay if we just remove ".. without losing data" from the sentence? Will that
avoid the confusion you have?
Yes. Additionally, it would be better to add notes about data consistency after failover for example
Note that data consistency after failover can vary depending on the configurations. If
"synchronized_standby_slots" is not configured, there may be data that only the subscribers hold,
even though the new primary does not. Additionally, in the case of asynchronous physical replication,
there remains a risk of data loss for transactions committed on the former primary server
but have yet to be replicated to the new primary server.
Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION
On Tue, Aug 27, 2024 at 3:05 PM <Masahiro.Ikeda@nttdata.com> wrote:
So, will it be okay if we just remove ".. without losing data" from the sentence? Will that
avoid the confusion you have?Yes. Additionally, it would be better to add notes about data consistency after failover for example
Note that data consistency after failover can vary depending on the configurations. If
"synchronized_standby_slots" is not configured, there may be data that only the subscribers hold,
even though the new primary does not.
This part can be inferred from the description of
synchronized_standby_slots [1]https://www.postgresql.org/docs/17/runtime-config-replication.html#GUC-SYNCHRONIZED-STANDBY-SLOTS (See: This guarantees that logical
replication failover slots do not consume changes until those changes
are received and flushed to corresponding physical standbys. If a
logical replication connection is meant to switch to a physical
standby after the standby is promoted, the physical replication slot
for the standby should be listed here.)
Additionally, in the case of asynchronous physical replication,
there remains a risk of data loss for transactions committed on the former primary server
but have yet to be replicated to the new primary server.
This has nothing to do with failover slots. This is a known behavior
of asynchronous replication, so adding here doesn't make much sense.
In general, adding more information unrelated to failover slots can
confuse users.
[1]: https://www.postgresql.org/docs/17/runtime-config-replication.html#GUC-SYNCHRONIZED-STANDBY-SLOTS
--
With Regards,
Amit Kapila.
So, will it be okay if we just remove ".. without losing data" from
the sentence? Will that avoid the confusion you have?Yes. Additionally, it would be better to add notes about data
consistency after failover for exampleNote that data consistency after failover can vary depending on the
configurations. If "synchronized_standby_slots" is not configured,
there may be data that only the subscribers hold, even though the new primary doesnot.
This part can be inferred from the description of synchronized_standby_slots [1] (See:
This guarantees that logical replication failover slots do not consume changes until those
changes are received and flushed to corresponding physical standbys. If a logical
replication connection is meant to switch to a physical standby after the standby is
promoted, the physical replication slot for the standby should be listed here.)
OK, it's enough for me just remove ".. without losing data".
Additionally, in the case of asynchronous physical replication,
there remains a risk of data loss for transactions committed on the
former primary server but have yet to be replicated to the new primary server.This has nothing to do with failover slots. This is a known behavior of asynchronous
replication, so adding here doesn't make much sense.In general, adding more information unrelated to failover slots can confuse users.
OK, I agreed to remove the sentence.
Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION
On Wed, Aug 28, 2024 at 6:16 AM <Masahiro.Ikeda@nttdata.com> wrote:
So, will it be okay if we just remove ".. without losing data" from
the sentence? Will that avoid the confusion you have?Yes. Additionally, it would be better to add notes about data
consistency after failover for exampleNote that data consistency after failover can vary depending on the
configurations. If "synchronized_standby_slots" is not configured,
there may be data that only the subscribers hold, even though the new primary doesnot.
This part can be inferred from the description of synchronized_standby_slots [1] (See:
This guarantees that logical replication failover slots do not consume changes until those
changes are received and flushed to corresponding physical standbys. If a logical
replication connection is meant to switch to a physical standby after the standby is
promoted, the physical replication slot for the standby should be listed here.)OK, it's enough for me just remove ".. without losing data".
The next line related to asynchronous replication is also not
required. See attached.
--
With Regards,
Amit Kapila.
Attachments:
fix_doc_1.patchapplication/octet-stream; name=fix_doc_1.patchDownload
diff --git a/doc/src/sgml/logical-replication.sgml b/doc/src/sgml/logical-replication.sgml
index bee7e02983..94c3ad7376 100644
--- a/doc/src/sgml/logical-replication.sgml
+++ b/doc/src/sgml/logical-replication.sgml
@@ -701,10 +701,7 @@ ALTER SUBSCRIPTION
<link linkend="sql-createsubscription-params-with-failover"><literal>failover</literal></link>
parameter ensures a seamless transition of those subscriptions after the
standby is promoted. They can continue subscribing to publications on the
- new primary server without losing data. Note that in the case of
- asynchronous replication, there remains a risk of data loss for transactions
- committed on the former primary server but have yet to be replicated to the new
- primary server.
+ new primary server.
</para>
<para>
So, will it be okay if we just remove ".. without losing data"
from the sentence? Will that avoid the confusion you have?Yes. Additionally, it would be better to add notes about data
consistency after failover for exampleNote that data consistency after failover can vary depending on
the configurations. If "synchronized_standby_slots" is not
configured, there may be data that only the subscribers hold, even
though the new primary doesnot.
This part can be inferred from the description of synchronized_standby_slots [1]
(See:
This guarantees that logical replication failover slots do not
consume changes until those changes are received and flushed to
corresponding physical standbys. If a logical replication connection
is meant to switch to a physical standby after the standby is
promoted, the physical replication slot for the standby should be
listed here.)OK, it's enough for me just remove ".. without losing data".
The next line related to asynchronous replication is also not required. See attached.
Thanks, I found another ".. without losing data".
Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION
Attachments:
fix_doc_2.patchapplication/octet-stream; name=fix_doc_2.patchDownload
diff --git a/doc/src/sgml/logical-replication.sgml b/doc/src/sgml/logical-replication.sgml
index bee7e02983b..bc095d01c00 100644
--- a/doc/src/sgml/logical-replication.sgml
+++ b/doc/src/sgml/logical-replication.sgml
@@ -701,10 +701,7 @@ ALTER SUBSCRIPTION
<link linkend="sql-createsubscription-params-with-failover"><literal>failover</literal></link>
parameter ensures a seamless transition of those subscriptions after the
standby is promoted. They can continue subscribing to publications on the
- new primary server without losing data. Note that in the case of
- asynchronous replication, there remains a risk of data loss for transactions
- committed on the former primary server but have yet to be replicated to the new
- primary server.
+ new primary server.
</para>
<para>
@@ -791,7 +788,7 @@ test_standby=# SELECT slot_name, (synced AND NOT temporary AND NOT conflicting)
If all the slots are present on the standby server and the result
(<literal>failover_ready</literal>) of the above SQL query is true, then
existing subscriptions can continue subscribing to publications now on the
- new primary server without losing data.
+ new primary server.
</para>
</sect1>
On Wed, Aug 28, 2024 at 3:02 PM <Masahiro.Ikeda@nttdata.com> wrote:
The next line related to asynchronous replication is also not required. See attached.
Thanks, I found another ".. without losing data".
I'll push this tomorrow unless there are any other suggestions on this patch.
--
With Regards,
Amit Kapila.