Allow logical failover slots to wait on synchronous replication

Started by John Halmost 2 years ago40 messageshackers
Jump to latest
#1John H
johnhyvr@gmail.com

Hi hackers,

Building on bf279ddd1c, this patch introduces a GUC
'standby_slot_names_from_syncrep' which allows logical failover slots
to wait for changes to have been synchronously replicated before sending
the decoded changes to logical subscribers.

The existing 'standby_slot_names' isn't great for users who are running
clusters with quorum-based synchronous replicas. For instance, if
the user has synchronous_standby_names = 'ANY 3 (A,B,C,D,E)' it's a
bit tedious to have to reconfigure the standby_slot_names to set it to
the most updated 3 sync replicas whenever different sync replicas start
lagging. In the event that both GUCs are set, 'standby_slot_names' takes
precedence.

I did some very brief pgbench runs to compare the latency. Client instance
was running pgbench and 10 logical clients while the Postgres box hosted
the writer and 5 synchronous replicas.

There's a hit to TPS, which I'm thinking is due to more contention on the
SyncRepLock, and that scales with the number of logical walsenders. I'm
guessing we can avoid this if we introduce another set of
lsn[NUM_SYNC_REP_WAIT_MODE] and have the logical walsenders check
and wait on that instead but I wasn't sure if that's the right approach.

pgbench numbers:

// Empty standby_slot_names_from_syncrep
query mode: simple
number of clients: 8
number of threads: 8
maximum number of tries: 1
duration: 1800 s
number of transactions actually processed: 1720173
number of failed transactions: 0 (0.000%)
latency average = 8.371 ms
initial connection time = 7.963 ms
tps = 955.651025 (without initial connection time)

// standby_slot_names_from_syncrep = 'true'
scaling factor: 200
query mode: simple
number of clients: 8
number of threads: 8
maximum number of tries: 1
duration: 1800 s
number of transactions actually processed: 1630105
number of failed transactions: 0 (0.000%)
latency average = 8.834 ms
initial connection time = 7.670 ms
tps = 905.610230 (without initial connection time)

Thanks,

--
John Hsu - Amazon Web Services

Attachments:

0001-Introduce-a-new-guc-standby_slot_names_from_syncrep.patchapplication/octet-stream; name=0001-Introduce-a-new-guc-standby_slot_names_from_syncrep.patchDownload+313-114
#2Nathan Bossart
nathandbossart@gmail.com
In reply to: John H (#1)
Re: Allow logical failover slots to wait on synchronous replication

On Mon, Jun 10, 2024 at 03:51:05PM -0700, John H wrote:

The existing 'standby_slot_names' isn't great for users who are running
clusters with quorum-based synchronous replicas. For instance, if
the user has synchronous_standby_names = 'ANY 3 (A,B,C,D,E)' it's a
bit tedious to have to reconfigure the standby_slot_names to set it to
the most updated 3 sync replicas whenever different sync replicas start
lagging. In the event that both GUCs are set, 'standby_slot_names' takes
precedence.

Hm. IIUC you'd essentially need to set standby_slot_names to "A,B,C,D,E"
to get the desired behavior today. That might ordinarily be okay, but it
could cause logical replication to be held back unnecessarily if one of the
replicas falls behind for whatever reason. A way to tie standby_slot_names
to synchronous replication instead does seem like it would be useful in
this case.

--
nathan

#3Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Nathan Bossart (#2)
Re: Allow logical failover slots to wait on synchronous replication

Hi,

On Mon, Jun 10, 2024 at 09:25:10PM -0500, Nathan Bossart wrote:

On Mon, Jun 10, 2024 at 03:51:05PM -0700, John H wrote:

The existing 'standby_slot_names' isn't great for users who are running
clusters with quorum-based synchronous replicas. For instance, if
the user has synchronous_standby_names = 'ANY 3 (A,B,C,D,E)' it's a
bit tedious to have to reconfigure the standby_slot_names to set it to
the most updated 3 sync replicas whenever different sync replicas start
lagging. In the event that both GUCs are set, 'standby_slot_names' takes
precedence.

Hm. IIUC you'd essentially need to set standby_slot_names to "A,B,C,D,E"
to get the desired behavior today. That might ordinarily be okay, but it
could cause logical replication to be held back unnecessarily if one of the
replicas falls behind for whatever reason. A way to tie standby_slot_names
to synchronous replication instead does seem like it would be useful in
this case.

FWIW, I have the same understanding and also think your proposal would be
useful in this case.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#4Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Bertrand Drouvot (#3)
Re: Allow logical failover slots to wait on synchronous replication

Hi,

On Tue, Jun 11, 2024 at 10:00:46AM +0000, Bertrand Drouvot wrote:

Hi,

On Mon, Jun 10, 2024 at 09:25:10PM -0500, Nathan Bossart wrote:

On Mon, Jun 10, 2024 at 03:51:05PM -0700, John H wrote:

The existing 'standby_slot_names' isn't great for users who are running
clusters with quorum-based synchronous replicas. For instance, if
the user has synchronous_standby_names = 'ANY 3 (A,B,C,D,E)' it's a
bit tedious to have to reconfigure the standby_slot_names to set it to
the most updated 3 sync replicas whenever different sync replicas start
lagging. In the event that both GUCs are set, 'standby_slot_names' takes
precedence.

Hm. IIUC you'd essentially need to set standby_slot_names to "A,B,C,D,E"
to get the desired behavior today. That might ordinarily be okay, but it
could cause logical replication to be held back unnecessarily if one of the
replicas falls behind for whatever reason. A way to tie standby_slot_names
to synchronous replication instead does seem like it would be useful in
this case.

FWIW, I have the same understanding and also think your proposal would be
useful in this case.

A few random comments about v1:

1 ====

+ int mode = SyncRepWaitMode;

It's set to SyncRepWaitMode and then never change. Worth to get rid of "mode"?

2 ====

+ static XLogRecPtr lsn[NUM_SYNC_REP_WAIT_MODE] = {InvalidXLogRecPtr};

I did some testing and saw that the lsn[] values were not always set to
InvalidXLogRecPtr right after. It looks like that, in that case, we should
avoid setting the lsn[] values at compile time. Then, what about?

1. remove the "static".

or

2. keep the static but set the lsn[] values after its declaration.

3 ====

-       /*
-        * Return false if not all the standbys have caught up to the specified
-        * WAL location.
-        */
-       if (caught_up_slot_num != standby_slot_names_config->nslotnames)
-               return false;
+               if (!XLogRecPtrIsInvalid(lsn[mode]) && lsn[mode] >= wait_for_lsn)
+                       return true;

lsn[] values are/(should have been, see 2 above) just been initialized to
InvalidXLogRecPtr so that XLogRecPtrIsInvalid(lsn[mode]) will always return
true. I think this check is not needed then.

4 ====

I did some very brief pgbench runs to compare the latency. Client instance
was running pgbench and 10 logical clients while the Postgres box hosted
the writer and 5 synchronous replicas.

There's a hit to TPS

Out of curiosity, did you compare with standby_slot_names_from_syncrep set to off
and standby_slot_names not empty?

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#5Amit Kapila
amit.kapila16@gmail.com
In reply to: John H (#1)
Re: Allow logical failover slots to wait on synchronous replication

On Tue, Jun 11, 2024 at 4:21 AM John H <johnhyvr@gmail.com> wrote:

Building on bf279ddd1c, this patch introduces a GUC
'standby_slot_names_from_syncrep' which allows logical failover slots
to wait for changes to have been synchronously replicated before sending
the decoded changes to logical subscribers.

The existing 'standby_slot_names' isn't great for users who are running
clusters with quorum-based synchronous replicas. For instance, if
the user has synchronous_standby_names = 'ANY 3 (A,B,C,D,E)' it's a
bit tedious to have to reconfigure the standby_slot_names to set it to
the most updated 3 sync replicas whenever different sync replicas start
lagging. In the event that both GUCs are set, 'standby_slot_names' takes
precedence.

I did some very brief pgbench runs to compare the latency. Client instance
was running pgbench and 10 logical clients while the Postgres box hosted
the writer and 5 synchronous replicas.

There's a hit to TPS, which I'm thinking is due to more contention on the
SyncRepLock, and that scales with the number of logical walsenders. I'm
guessing we can avoid this if we introduce another set of
lsn[NUM_SYNC_REP_WAIT_MODE] and have the logical walsenders check
and wait on that instead but I wasn't sure if that's the right approach.

pgbench numbers:

// Empty standby_slot_names_from_syncrep
query mode: simple

..

latency average = 8.371 ms
initial connection time = 7.963 ms
tps = 955.651025 (without initial connection time)

// standby_slot_names_from_syncrep = 'true'
scaling factor: 200

...

latency average = 8.834 ms
initial connection time = 7.670 ms
tps = 905.610230 (without initial connection time)

The reading indicates when you set 'standby_slot_names_from_syncrep',
the TPS reduces as compared to when it is not set. It would be better
to see the data comparing 'standby_slot_names_from_syncrep' and the
existing parameter 'standby_slot_names'.

I see the value in your idea but was wondering if can we do something
without introducing a new GUC for this. Can we make it a default
behavior that logical slots marked with a failover option will wait
for 'synchronous_standby_names' as per your patch's idea unless
'standby_slot_names' is specified? I don't know if there is any value
in setting the 'failover' option for a slot without specifying
'standby_slot_names', so was wondering if we can additionally tie it
to 'synchronous_standby_names'. Any better ideas?

--
With Regards,
Amit Kapila.

#6John H
johnhyvr@gmail.com
In reply to: Bertrand Drouvot (#4)
Re: Allow logical failover slots to wait on synchronous replication

Hi,

Thanks Bertrand for taking a look at the patch.

On Mon, Jun 17, 2024 at 8:19 AM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:

+ int mode = SyncRepWaitMode;

It's set to SyncRepWaitMode and then never change. Worth to get rid of "mode"?

I took a deeper look at this with GDB and I think it's necessary to
cache the value of "mode".
We first check:

if (mode == SYNC_REP_NO_WAIT)
return true;

However after this check it's possible to receive a SIGHUP changing
SyncRepWaitMode
to SYNC_REP_NO_WAIT (e.g. synchronous_commit = 'on' -> 'off'), leading
to lsn[-1].

2 ====

+ static XLogRecPtr lsn[NUM_SYNC_REP_WAIT_MODE] = {InvalidXLogRecPtr};

I did some testing and saw that the lsn[] values were not always set to
InvalidXLogRecPtr right after. It looks like that, in that case, we should
avoid setting the lsn[] values at compile time. Then, what about?

1. remove the "static".

or

2. keep the static but set the lsn[] values after its declaration.

I'd prefer to keep the static because it reduces unnecessary
contention on SyncRepLock if logical client has fallen behind.
I'll add a change with your second suggestion.

3 ====

-       /*
-        * Return false if not all the standbys have caught up to the specified
-        * WAL location.
-        */
-       if (caught_up_slot_num != standby_slot_names_config->nslotnames)
-               return false;
+               if (!XLogRecPtrIsInvalid(lsn[mode]) && lsn[mode] >= wait_for_lsn)
+                       return true;

lsn[] values are/(should have been, see 2 above) just been initialized to
InvalidXLogRecPtr so that XLogRecPtrIsInvalid(lsn[mode]) will always return
true. I think this check is not needed then.

Removed.

Out of curiosity, did you compare with standby_slot_names_from_syncrep set to off
and standby_slot_names not empty?

I didn't think 'standby_slot_names' would impact TPS as much since
it's not grabbing the SyncRepLock but here's a quick test.
Writer with 5 synchronous replicas, 10 pg_recvlogical clients and
pgbench all running from the same server.

Command: pgbench -c 4 -j 4 -T 600 -U "ec2-user" -d postgres -r -P 5

Result with: standby_slot_names =
'replica_1,replica_2,replica_3,replica_4,replica_5'

latency average = 5.600 ms
latency stddev = 2.854 ms
initial connection time = 5.503 ms
tps = 714.148263 (without initial connection time)

Result with: standby_slot_names_from_syncrep = 'true',
synchronous_standby_names = 'ANY 3 (A,B,C,D,E)'

latency average = 5.740 ms
latency stddev = 2.543 ms
initial connection time = 4.093 ms
tps = 696.776249 (without initial connection time)

Result with nothing set:

latency average = 5.090 ms
latency stddev = 3.467 ms
initial connection time = 4.989 ms
tps = 785.665963 (without initial connection time)

Again I think it's possible to improve the synchronous numbers if we
cache but I'll try that out in a bit.

--
John Hsu - Amazon Web Services

#7John H
johnhyvr@gmail.com
In reply to: Amit Kapila (#5)
Re: Allow logical failover slots to wait on synchronous replication

Hi Amit,

Thanks for taking a look.

On Tue, Jun 18, 2024 at 10:34 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

The reading indicates when you set 'standby_slot_names_from_syncrep',
the TPS reduces as compared to when it is not set. It would be better
to see the data comparing 'standby_slot_names_from_syncrep' and the
existing parameter 'standby_slot_names'.

I added new benchmark numbers in the reply to Bertrand, but I'll
include in this thread for posterity.

Writer with 5 synchronous replicas, 10 pg_recvlogical clients and
pgbench all running from the same server.
Command: pgbench -c 4 -j 4 -T 600 -U "ec2-user" -d postgres -r -P 5

Result with: standby_slot_names =
'replica_1,replica_2,replica_3,replica_4,replica_5'

latency average = 5.600 ms
latency stddev = 2.854 ms
initial connection time = 5.503 ms
tps = 714.148263 (without initial connection time)

Result with: standby_slot_names_from_syncrep = 'true',
synchronous_standby_names = 'ANY 3 (A,B,C,D,E)'

latency average = 5.740 ms
latency stddev = 2.543 ms
initial connection time = 4.093 ms
tps = 696.776249 (without initial connection time)

Result with nothing set:

latency average = 5.090 ms
latency stddev = 3.467 ms
initial connection time = 4.989 ms
tps = 785.665963 (without initial connection time)

Can we make it a default
behavior that logical slots marked with a failover option will wait
for 'synchronous_standby_names' as per your patch's idea unless
'standby_slot_names' is specified? I don't know if there is any value
in setting the 'failover' option for a slot without specifying
'standby_slot_names', so was wondering if we can additionally tie it
to 'synchronous_standby_names'. Any better ideas?

No, I think that works pretty cleanly actually. Reserving some special
keyword isn't great
which is the only other thing I can think of. I've updated the patch
and tests to reflect that.

Attached the patch that addresses these changes.

--
John Hsu - Amazon Web Services

Attachments:

0002-Wait-on-synchronous-replication-by-default-for-logic.patchapplication/octet-stream; name=0002-Wait-on-synchronous-replication-by-default-for-logic.patchDownload+321-116
#8Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: John H (#6)
Re: Allow logical failover slots to wait on synchronous replication

Hi,

On Mon, Jul 08, 2024 at 12:08:58PM -0700, John H wrote:

I took a deeper look at this with GDB and I think it's necessary to
cache the value of "mode".
We first check:

if (mode == SYNC_REP_NO_WAIT)
return true;

However after this check it's possible to receive a SIGHUP changing
SyncRepWaitMode
to SYNC_REP_NO_WAIT (e.g. synchronous_commit = 'on' -> 'off'), leading
to lsn[-1].

What about adding "SyncRepWaitMode" as a third StandbySlotsHaveCaughtup()
parameter then? (so that the function will used whatever value was passed during
the call).

2 ====

+ static XLogRecPtr lsn[NUM_SYNC_REP_WAIT_MODE] = {InvalidXLogRecPtr};

I did some testing and saw that the lsn[] values were not always set to
InvalidXLogRecPtr right after. It looks like that, in that case, we should
avoid setting the lsn[] values at compile time. Then, what about?

1. remove the "static".

or

2. keep the static but set the lsn[] values after its declaration.

I'd prefer to keep the static because it reduces unnecessary
contention on SyncRepLock if logical client has fallen behind.
I'll add a change with your second suggestion.

Got it, you want lsn[] to be initialized only one time and that each call to
StandbySlotsHaveCaughtup() relies on the values that were previously stored in
lsn[] and then return if lsn[mode] >= wait_for_lsn.

Then I think that:

1 ===

That's worth additional comments in the code.

2 ===

+               if (!initialized)
+               {
+                       for (i = 0; i < NUM_SYNC_REP_WAIT_MODE; i++)
+                       {
+                               lsn[i] = InvalidXLogRecPtr;
+                       }
+               }

Looks like setting initialized to true is missing once done.

Also,

3 ===

+               /* Cache values to reduce contention on lock */
+               for (i = 0; i < NUM_SYNC_REP_WAIT_MODE; i++)
+               {
+                       lsn[i] = walsndctl->lsn[i];
+               }

NUM_SYNC_REP_WAIT_MODE is small but as the goal is the keep the lock time as
short as possible I wonder if it wouldn't be better to use memcpy() here instead
of this for loop.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#9shveta malik
shveta.malik@gmail.com
In reply to: John H (#7)
Re: Allow logical failover slots to wait on synchronous replication

On Tue, Jul 9, 2024 at 12:42 AM John H <johnhyvr@gmail.com> wrote:

Can we make it a default
behavior that logical slots marked with a failover option will wait
for 'synchronous_standby_names' as per your patch's idea unless
'standby_slot_names' is specified? I don't know if there is any value
in setting the 'failover' option for a slot without specifying
'standby_slot_names', so was wondering if we can additionally tie it
to 'synchronous_standby_names'. Any better ideas?

No, I think that works pretty cleanly actually. Reserving some special
keyword isn't great
which is the only other thing I can think of. I've updated the patch
and tests to reflect that.

Attached the patch that addresses these changes.

Thank You for the patch. I like the overall idea, it is a useful one
and will make user experience better. Please find few comments.

1)
I agree with the idea that instead of introducing new GUC, we can make
failover logical slots wait on 'synchronous_standby_names', but this
will leave user no option to unlink failover-enabled logical
subscribers from the wait on synchronous standbys. We provide user a
way to switch off automatic slot-sync by disabling
'sync_replication_slots' and we also provide user a way to manually
sync the slots using function 'pg_sync_replication_slots()' and
nowhere we make it mandatory to set 'synchronized_standby_slots'; in
fact in docs, it is a recommended setting and not a mandatory one.
User can always create failover slots, switch off automatic slot sync
and disable wait on standbys by not specifying
'synchronized_standby_slots' and do the slot-sync and consistency
checks manually when needed. I feel, for worst case scenarios, we
should provide user an option to delink failover-enabled logical
subscriptions from 'synchronous_standby_names'.

We can have 'synchronized_standby_slots' (standby_slot_names) to
accept one such option as in 'SAME_AS_SYNCREP_STANDBYS' or say
'DEFAULT'. So when 'synchronous_standby_names' is comma separated
list, we pick those slots; if it is empty, then no wait on standbys,
and if its value is 'DEFAULT' as configured by user, then go with
'synchronous_standby_names'. Thoughts?

2)
When 'synchronized_standby_slots' is configured but standby named in
it is down blocking logical replication, then we get a WARNING in
subscriber's log file:

WARNING: replication slot "standby_2" specified in parameter
synchronized_standby_slots does not have active_pid.
DETAIL: Logical replication is waiting on the standby associated with
"standby_2".
HINT: Consider starting standby associated with "standby_2" or amend
parameter synchronized_standby_slots.

But OTOH, when 'synchronous_standby_names' is configured instead of
'synchronized_standby_slots' and any of the standbys listed is down
blocking logical replication, we do not get any sort of warning. It is
inconsistent behavior. Also user might be left clueless on why
subscribers are not getting changes.

thanks
Shveta

#10shveta malik
shveta.malik@gmail.com
In reply to: shveta malik (#9)
Re: Allow logical failover slots to wait on synchronous replication

On Thu, Jul 18, 2024 at 9:25 AM shveta malik <shveta.malik@gmail.com> wrote:

On Tue, Jul 9, 2024 at 12:42 AM John H <johnhyvr@gmail.com> wrote:

Can we make it a default
behavior that logical slots marked with a failover option will wait
for 'synchronous_standby_names' as per your patch's idea unless
'standby_slot_names' is specified? I don't know if there is any value
in setting the 'failover' option for a slot without specifying
'standby_slot_names', so was wondering if we can additionally tie it
to 'synchronous_standby_names'. Any better ideas?

No, I think that works pretty cleanly actually. Reserving some special
keyword isn't great
which is the only other thing I can think of. I've updated the patch
and tests to reflect that.

Attached the patch that addresses these changes.

Thank You for the patch. I like the overall idea, it is a useful one
and will make user experience better. Please find few comments.

1)
I agree with the idea that instead of introducing new GUC, we can make
failover logical slots wait on 'synchronous_standby_names', but this
will leave user no option to unlink failover-enabled logical
subscribers from the wait on synchronous standbys. We provide user a
way to switch off automatic slot-sync by disabling
'sync_replication_slots' and we also provide user a way to manually
sync the slots using function 'pg_sync_replication_slots()' and
nowhere we make it mandatory to set 'synchronized_standby_slots'; in
fact in docs, it is a recommended setting and not a mandatory one.
User can always create failover slots, switch off automatic slot sync
and disable wait on standbys by not specifying
'synchronized_standby_slots' and do the slot-sync and consistency
checks manually when needed. I feel, for worst case scenarios, we
should provide user an option to delink failover-enabled logical
subscriptions from 'synchronous_standby_names'.

We can have 'synchronized_standby_slots' (standby_slot_names) to
accept one such option as in 'SAME_AS_SYNCREP_STANDBYS' or say
'DEFAULT'. So when 'synchronous_standby_names' is comma separated
list, we pick those slots; if it is empty, then no wait on standbys,
and if its value is 'DEFAULT' as configured by the user, then go with
'synchronous_standby_names'. Thoughts?

One correction here
('synchronous_standby_names-->synchronized_standby_slots). Corrected
version:

So when 'synchronized_standby_slots' is comma separated list, we pick
those slots; if it is empty, then no wait on standbys, and if its
value is 'DEFAULT' as configured by user, then go with
'synchronous_standby_names'. Thoughts?

Show quoted text

2)
When 'synchronized_standby_slots' is configured but standby named in
it is down blocking logical replication, then we get a WARNING in
subscriber's log file:

WARNING: replication slot "standby_2" specified in parameter
synchronized_standby_slots does not have active_pid.
DETAIL: Logical replication is waiting on the standby associated with
"standby_2".
HINT: Consider starting standby associated with "standby_2" or amend
parameter synchronized_standby_slots.

But OTOH, when 'synchronous_standby_names' is configured instead of
'synchronized_standby_slots' and any of the standbys listed is down
blocking logical replication, we do not get any sort of warning. It is
inconsistent behavior. Also user might be left clueless on why
subscribers are not getting changes.

thanks
Shveta

#11John H
johnhyvr@gmail.com
In reply to: shveta malik (#10)
Re: Allow logical failover slots to wait on synchronous replication

Hi Shveta,

Thanks for taking a look at the patch.

will leave user no option to unlink failover-enabled logical
subscribers from the wait on synchronous standbys.

That's a good point. I'm a bit biased in that I don't think there's a
great reason why someone would
want to replicate logical changes out of the synchronous cluster
without it having been synchronously replicated
but yes this would be different behavior compared to strictly the slot one.

...
So when 'synchronized_standby_slots' is comma separated list, we pick
those slots; if it is empty, then no wait on standbys, and if its
value is 'DEFAULT' as configured by user, then go with
'synchronous_standby_names'. Thoughts?

I think I'd prefer having a separate GUC if the alternative is to reserve
special keywords in 'synchronized_standby_slots' but I'm not sure if I
feel strongly about that.

2)
When 'synchronized_standby_slots' is configured but standby named in
it is down blocking logical replication, then we get a WARNING in
subscriber's log file:

WARNING: replication slot "standby_2" specified in parameter
synchronized_standby_slots does not have active_pid.
DETAIL: Logical replication is waiting on the standby associated with
"standby_2".
HINT: Consider starting standby associated with "standby_2" or amend
parameter synchronized_standby_slots.

But OTOH, when 'synchronous_standby_names' is configured instead of
'synchronized_standby_slots' and any of the standbys listed is down
blocking logical replication, we do not get any sort of warning. It is
inconsistent behavior. Also user might be left clueless on why
subscribers are not getting changes.

Ah that's a gap. Let me add some logging/warning in a similar fashion.
Although I think I'd have the warning be relatively generic (e.g.
changes are blocked because
they're not synchronously committed)

Thanks,

--
John Hsu - Amazon Web Services

#12John H
johnhyvr@gmail.com
In reply to: Bertrand Drouvot (#8)
Re: Allow logical failover slots to wait on synchronous replication

Hi Bertrand,

1 ===
...
That's worth additional comments in the code.

There's this comment already about caching the value already, not sure
if you prefer something more?

/* Cache values to reduce contention on lock */

2 ===
...
Looks like setting initialized to true is missing once done.

Thanks, will update.

3 ===
...
NUM_SYNC_REP_WAIT_MODE is small but as the goal is the keep the lock time as
short as possible I wonder if it wouldn't be better to use memcpy() here instead
of this for loop.

It results in a "Wdiscarded-qualifiers" which is safe given we take
the lock, but adds noise?
What do you think?

"slot.c:2756:46: warning: passing argument 2 of ‘memcpy’ discards
‘volatile’ qualifier from pointer target type
[-Wdiscarded-qualifiers]"

Thanks,

--
John Hsu - Amazon Web Services

#13shveta malik
shveta.malik@gmail.com
In reply to: John H (#11)
Re: Allow logical failover slots to wait on synchronous replication

On Fri, Jul 19, 2024 at 2:52 AM John H <johnhyvr@gmail.com> wrote:

Hi Shveta,

Thanks for taking a look at the patch.

will leave user no option to unlink failover-enabled logical
subscribers from the wait on synchronous standbys.

That's a good point. I'm a bit biased in that I don't think there's a
great reason why someone would
want to replicate logical changes out of the synchronous cluster
without it having been synchronously replicated
but yes this would be different behavior compared to strictly the slot one.

...
So when 'synchronized_standby_slots' is comma separated list, we pick
those slots; if it is empty, then no wait on standbys, and if its
value is 'DEFAULT' as configured by user, then go with
'synchronous_standby_names'. Thoughts?

I think I'd prefer having a separate GUC if the alternative is to reserve
special keywords in 'synchronized_standby_slots' but I'm not sure if I
feel strongly about that.

My only concern is, earlier we provided a way to set the failover
property of slots even without mandatorily wait on physical standbys.
But now we will be changing this behaviour. Okay, we can see what
other comments. If we plan to go this way, we can change docs to
clearly mention this.

2)
When 'synchronized_standby_slots' is configured but standby named in
it is down blocking logical replication, then we get a WARNING in
subscriber's log file:

WARNING: replication slot "standby_2" specified in parameter
synchronized_standby_slots does not have active_pid.
DETAIL: Logical replication is waiting on the standby associated with
"standby_2".
HINT: Consider starting standby associated with "standby_2" or amend
parameter synchronized_standby_slots.

But OTOH, when 'synchronous_standby_names' is configured instead of
'synchronized_standby_slots' and any of the standbys listed is down
blocking logical replication, we do not get any sort of warning. It is
inconsistent behavior. Also user might be left clueless on why
subscribers are not getting changes.

Ah that's a gap. Let me add some logging/warning in a similar fashion.
Although I think I'd have the warning be relatively generic (e.g.
changes are blocked because
they're not synchronously committed)

okay, sounds good.

thanks
Shveta

#14Amit Kapila
amit.kapila16@gmail.com
In reply to: shveta malik (#13)
Re: Allow logical failover slots to wait on synchronous replication

On Mon, Jul 22, 2024 at 9:12 AM shveta malik <shveta.malik@gmail.com> wrote:

On Fri, Jul 19, 2024 at 2:52 AM John H <johnhyvr@gmail.com> wrote:

Hi Shveta,

Thanks for taking a look at the patch.

will leave user no option to unlink failover-enabled logical
subscribers from the wait on synchronous standbys.

That's a good point. I'm a bit biased in that I don't think there's a
great reason why someone would
want to replicate logical changes out of the synchronous cluster
without it having been synchronously replicated
but yes this would be different behavior compared to strictly the slot one.

...
So when 'synchronized_standby_slots' is comma separated list, we pick
those slots; if it is empty, then no wait on standbys, and if its
value is 'DEFAULT' as configured by user, then go with
'synchronous_standby_names'. Thoughts?

I think I'd prefer having a separate GUC if the alternative is to reserve
special keywords in 'synchronized_standby_slots' but I'm not sure if I
feel strongly about that.

My only concern is, earlier we provided a way to set the failover
property of slots even without mandatorily wait on physical standbys.
But now we will be changing this behaviour.

Adding a new GUC as John suggests addressing this concern is one way
to go but we should think some more before adding a new GUC. Then
second as you are proposing to add a special value for GUC
synchronized_standby_slots will also have a downside in that it will
create dependency among two GUCs (synchronized_standby_slots and
synchronous_standby_names) which can also make the code complex.

Yet another possibility is to have a slot-level parameter (something
like failover_wait_for) which can be used to decide the GUC preference
for failover-enabled slots.

As this is a new feature and we haven't got much feedback from users
so like John, I am also not very sure how much merit we have in
retaining the old behavior where failover slots don't need to wait for
any of the standbys. But anyway, we have at least some escape route
where logical subscribers keep on waiting for some physical standby
that is down to come back and one may want to use that in some
situations, so there is clearly some value in retaining old behavior.

--
With Regards,
Amit Kapila.

#15Amit Kapila
amit.kapila16@gmail.com
In reply to: John H (#6)
Re: Allow logical failover slots to wait on synchronous replication

On Tue, Jul 9, 2024 at 12:39 AM John H <johnhyvr@gmail.com> wrote:

Out of curiosity, did you compare with standby_slot_names_from_syncrep set to off
and standby_slot_names not empty?

I didn't think 'standby_slot_names' would impact TPS as much since
it's not grabbing the SyncRepLock but here's a quick test.
Writer with 5 synchronous replicas, 10 pg_recvlogical clients and
pgbench all running from the same server.

Command: pgbench -c 4 -j 4 -T 600 -U "ec2-user" -d postgres -r -P 5

Result with: standby_slot_names =
'replica_1,replica_2,replica_3,replica_4,replica_5'

latency average = 5.600 ms
latency stddev = 2.854 ms
initial connection time = 5.503 ms
tps = 714.148263 (without initial connection time)

Result with: standby_slot_names_from_syncrep = 'true',
synchronous_standby_names = 'ANY 3 (A,B,C,D,E)'

latency average = 5.740 ms
latency stddev = 2.543 ms
initial connection time = 4.093 ms
tps = 696.776249 (without initial connection time)

Result with nothing set:

latency average = 5.090 ms
latency stddev = 3.467 ms
initial connection time = 4.989 ms
tps = 785.665963 (without initial connection time)

Again I think it's possible to improve the synchronous numbers if we
cache but I'll try that out in a bit.

Okay, so the tests done till now conclude that we won't get the
benefit by using 'standby_slot_names_from_syncrep'. Now, if we
increase the number of standby's in both lists and still keep ANY 3 in
synchronous_standby_names then the results may vary. We should try to
find out if there is a performance benefit with the use of
synchronous_standby_names in the normal configurations like the one
you used in the above tests to prove the value of this patch.

--
With Regards,
Amit Kapila.

#16shveta malik
shveta.malik@gmail.com
In reply to: Amit Kapila (#15)
Re: Allow logical failover slots to wait on synchronous replication

On Tue, Jul 23, 2024 at 10:35 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Jul 9, 2024 at 12:39 AM John H <johnhyvr@gmail.com> wrote:

Out of curiosity, did you compare with standby_slot_names_from_syncrep set to off
and standby_slot_names not empty?

I didn't think 'standby_slot_names' would impact TPS as much since
it's not grabbing the SyncRepLock but here's a quick test.
Writer with 5 synchronous replicas, 10 pg_recvlogical clients and
pgbench all running from the same server.

Command: pgbench -c 4 -j 4 -T 600 -U "ec2-user" -d postgres -r -P 5

Result with: standby_slot_names =
'replica_1,replica_2,replica_3,replica_4,replica_5'

latency average = 5.600 ms
latency stddev = 2.854 ms
initial connection time = 5.503 ms
tps = 714.148263 (without initial connection time)

Result with: standby_slot_names_from_syncrep = 'true',
synchronous_standby_names = 'ANY 3 (A,B,C,D,E)'

latency average = 5.740 ms
latency stddev = 2.543 ms
initial connection time = 4.093 ms
tps = 696.776249 (without initial connection time)

Result with nothing set:

latency average = 5.090 ms
latency stddev = 3.467 ms
initial connection time = 4.989 ms
tps = 785.665963 (without initial connection time)

Again I think it's possible to improve the synchronous numbers if we
cache but I'll try that out in a bit.

Okay, so the tests done till now conclude that we won't get the
benefit by using 'standby_slot_names_from_syncrep'. Now, if we
increase the number of standby's in both lists and still keep ANY 3 in
synchronous_standby_names then the results may vary. We should try to
find out if there is a performance benefit with the use of
synchronous_standby_names in the normal configurations like the one
you used in the above tests to prove the value of this patch.

I didn't fully understand the parameters mentioned above, specifically
what 'latency stddev' and 'latency average' represent.. But shouldn't
we see the benefit/value of this patch by having a setup where a
particular standby is slow in sending the response back to primary
(could be due to network lag or other reasons) and then measuring the
latency in receiving changes on failover-enabled logical subscribers?
We can perform this test with both of the below settings and say make
D and E slow in sending responses:
1) synchronous_standby_names = 'ANY 3 (A,B,C,D,E)'
2) standby_slot_names = A_slot, B_slot, C_slot, D_slot, E_slot.

thanks
Shveta

#17Amit Kapila
amit.kapila16@gmail.com
In reply to: shveta malik (#16)
Re: Allow logical failover slots to wait on synchronous replication

On Fri, Jul 26, 2024 at 3:28 PM shveta malik <shveta.malik@gmail.com> wrote:

On Tue, Jul 23, 2024 at 10:35 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Jul 9, 2024 at 12:39 AM John H <johnhyvr@gmail.com> wrote:

Out of curiosity, did you compare with standby_slot_names_from_syncrep set to off
and standby_slot_names not empty?

I didn't think 'standby_slot_names' would impact TPS as much since
it's not grabbing the SyncRepLock but here's a quick test.
Writer with 5 synchronous replicas, 10 pg_recvlogical clients and
pgbench all running from the same server.

Command: pgbench -c 4 -j 4 -T 600 -U "ec2-user" -d postgres -r -P 5

Result with: standby_slot_names =
'replica_1,replica_2,replica_3,replica_4,replica_5'

latency average = 5.600 ms
latency stddev = 2.854 ms
initial connection time = 5.503 ms
tps = 714.148263 (without initial connection time)

Result with: standby_slot_names_from_syncrep = 'true',
synchronous_standby_names = 'ANY 3 (A,B,C,D,E)'

latency average = 5.740 ms
latency stddev = 2.543 ms
initial connection time = 4.093 ms
tps = 696.776249 (without initial connection time)

Result with nothing set:

latency average = 5.090 ms
latency stddev = 3.467 ms
initial connection time = 4.989 ms
tps = 785.665963 (without initial connection time)

Again I think it's possible to improve the synchronous numbers if we
cache but I'll try that out in a bit.

Okay, so the tests done till now conclude that we won't get the
benefit by using 'standby_slot_names_from_syncrep'. Now, if we
increase the number of standby's in both lists and still keep ANY 3 in
synchronous_standby_names then the results may vary. We should try to
find out if there is a performance benefit with the use of
synchronous_standby_names in the normal configurations like the one
you used in the above tests to prove the value of this patch.

I didn't fully understand the parameters mentioned above, specifically
what 'latency stddev' and 'latency average' represent.. But shouldn't
we see the benefit/value of this patch by having a setup where a
particular standby is slow in sending the response back to primary
(could be due to network lag or other reasons) and then measuring the
latency in receiving changes on failover-enabled logical subscribers?
We can perform this test with both of the below settings and say make
D and E slow in sending responses:
1) synchronous_standby_names = 'ANY 3 (A,B,C,D,E)'
2) standby_slot_names = A_slot, B_slot, C_slot, D_slot, E_slot.

Yes, I also expect the patch should perform better in such a scenario
but it is better to test it. Also, irrespective of that, we should
investigate why the reported case is slower for
synchronous_standby_names and see if we can improve it.

BTW, you for 2), I think you wanted to say synchronized_standby_slots,
not standby_slot_names. We have recently changed the GUC name.

--
With Regards,
Amit Kapila.

#18shveta malik
shveta.malik@gmail.com
In reply to: Amit Kapila (#17)
Re: Allow logical failover slots to wait on synchronous replication

On Fri, Jul 26, 2024 at 5:11 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Jul 26, 2024 at 3:28 PM shveta malik <shveta.malik@gmail.com> wrote:

On Tue, Jul 23, 2024 at 10:35 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Jul 9, 2024 at 12:39 AM John H <johnhyvr@gmail.com> wrote:

Out of curiosity, did you compare with standby_slot_names_from_syncrep set to off
and standby_slot_names not empty?

I didn't think 'standby_slot_names' would impact TPS as much since
it's not grabbing the SyncRepLock but here's a quick test.
Writer with 5 synchronous replicas, 10 pg_recvlogical clients and
pgbench all running from the same server.

Command: pgbench -c 4 -j 4 -T 600 -U "ec2-user" -d postgres -r -P 5

Result with: standby_slot_names =
'replica_1,replica_2,replica_3,replica_4,replica_5'

latency average = 5.600 ms
latency stddev = 2.854 ms
initial connection time = 5.503 ms
tps = 714.148263 (without initial connection time)

Result with: standby_slot_names_from_syncrep = 'true',
synchronous_standby_names = 'ANY 3 (A,B,C,D,E)'

latency average = 5.740 ms
latency stddev = 2.543 ms
initial connection time = 4.093 ms
tps = 696.776249 (without initial connection time)

Result with nothing set:

latency average = 5.090 ms
latency stddev = 3.467 ms
initial connection time = 4.989 ms
tps = 785.665963 (without initial connection time)

Again I think it's possible to improve the synchronous numbers if we
cache but I'll try that out in a bit.

Okay, so the tests done till now conclude that we won't get the
benefit by using 'standby_slot_names_from_syncrep'. Now, if we
increase the number of standby's in both lists and still keep ANY 3 in
synchronous_standby_names then the results may vary. We should try to
find out if there is a performance benefit with the use of
synchronous_standby_names in the normal configurations like the one
you used in the above tests to prove the value of this patch.

I didn't fully understand the parameters mentioned above, specifically
what 'latency stddev' and 'latency average' represent.. But shouldn't
we see the benefit/value of this patch by having a setup where a
particular standby is slow in sending the response back to primary
(could be due to network lag or other reasons) and then measuring the
latency in receiving changes on failover-enabled logical subscribers?
We can perform this test with both of the below settings and say make
D and E slow in sending responses:
1) synchronous_standby_names = 'ANY 3 (A,B,C,D,E)'
2) standby_slot_names = A_slot, B_slot, C_slot, D_slot, E_slot.

Yes, I also expect the patch should perform better in such a scenario
but it is better to test it. Also, irrespective of that, we should
investigate why the reported case is slower for
synchronous_standby_names and see if we can improve it.

+1

BTW, you for 2), I think you wanted to say synchronized_standby_slots,
not standby_slot_names. We have recently changed the GUC name.

yes, sorry, synchronized_standby_slots it is.

thanks
Shveta

#19Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: John H (#12)
Re: Allow logical failover slots to wait on synchronous replication

Hi John,

On Thu, Jul 18, 2024 at 02:22:08PM -0700, John H wrote:

Hi Bertrand,

1 ===
...
That's worth additional comments in the code.

There's this comment already about caching the value already, not sure
if you prefer something more?

/* Cache values to reduce contention on lock */

Yeah, at the same place as the static lsn[] declaration, something like:

static XLogRecPtr lsn[NUM_SYNC_REP_WAIT_MODE]; /* cached LSNs */

but that may just be a matter of taste.

3 ===
...
NUM_SYNC_REP_WAIT_MODE is small but as the goal is the keep the lock time as
short as possible I wonder if it wouldn't be better to use memcpy() here instead
of this for loop.

It results in a "Wdiscarded-qualifiers" which is safe given we take
the lock, but adds noise?
What do you think?

"slot.c:2756:46: warning: passing argument 2 of ‘memcpy’ discards
‘volatile’ qualifier from pointer target type
[-Wdiscarded-qualifiers]"

Right, we may want to cast it then but given that the for loop is "small" I think
that's also fine to keep the for loop.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#20John H
johnhyvr@gmail.com
In reply to: shveta malik (#13)
Re: Allow logical failover slots to wait on synchronous replication

Hi Shveta,

On Sun, Jul 21, 2024 at 8:42 PM shveta malik <shveta.malik@gmail.com> wrote:

Ah that's a gap. Let me add some logging/warning in a similar fashion.
Although I think I'd have the warning be relatively generic (e.g.
changes are blocked because
they're not synchronously committed)

okay, sounds good.

thanks
Shveta

I took a look at having similar warnings the existing
'synchronized_standby_slots' feature has
and I don't think it's particularly feasible.

The checks/warnings for 'synchronized_standby_slots' are intended to
protect against misconfiguration.
They consist of slot validation (valid slot_name, not logical slot,
slot has not been invalidated), and
whether or not the slot is active.

I don't think there's a "misconfiguration" equivalent for waiting on
synchronous_commit.
With the current proposal, once you have (synchronous_commit enabled
&& failover_slots), logical
decoding is dependent on whether or not the writes have been
replicated to a synchronous replica.
If there is no data being replicated out of the logical slot, it is
because from the perspective of the
database no writes have been committed yet. I don't think it would
make sense to add logging/warning as to
why a transaction is still not committed (e.g. which potential replica
is the one lagging). There isn't a
nice way to determine why synchronous commit is waiting without being
particularly invasive, and even then
it could be particularly noisy (e.g. provide all the application_names
that we are potentially waiting on).

Thanks,

--
John Hsu - Amazon Web Services

#21John H
johnhyvr@gmail.com
In reply to: shveta malik (#18)
#22John H
johnhyvr@gmail.com
In reply to: Bertrand Drouvot (#19)
#23Amit Kapila
amit.kapila16@gmail.com
In reply to: John H (#21)
#24shveta malik
shveta.malik@gmail.com
In reply to: John H (#20)
#25John H
johnhyvr@gmail.com
In reply to: Amit Kapila (#23)
#26shveta malik
shveta.malik@gmail.com
In reply to: John H (#25)
#27John H
johnhyvr@gmail.com
In reply to: shveta malik (#26)
#28shveta malik
shveta.malik@gmail.com
In reply to: John H (#27)
#29John H
johnhyvr@gmail.com
In reply to: shveta malik (#28)
#30shveta malik
shveta.malik@gmail.com
In reply to: John H (#29)
#31shveta malik
shveta.malik@gmail.com
In reply to: shveta malik (#30)
#32Amit Kapila
amit.kapila16@gmail.com
In reply to: shveta malik (#31)
#33shveta malik
shveta.malik@gmail.com
In reply to: Amit Kapila (#32)
#34Amit Kapila
amit.kapila16@gmail.com
In reply to: shveta malik (#33)
#35shveta malik
shveta.malik@gmail.com
In reply to: Amit Kapila (#34)
#36Amit Kapila
amit.kapila16@gmail.com
In reply to: shveta malik (#35)
#37shveta malik
shveta.malik@gmail.com
In reply to: Amit Kapila (#36)
#38John H
johnhyvr@gmail.com
In reply to: shveta malik (#33)
#39John H
johnhyvr@gmail.com
In reply to: shveta malik (#37)
#40Amit Kapila
amit.kapila16@gmail.com
In reply to: John H (#39)