synchronized standby: committed local and waiting for remote ack

Started by qihua wuabout 3 years ago3 messagesgeneral
Jump to latest
#1qihua wu
staywithpin@gmail.com

We are using patroni to set up 1 primary and 5 slaves, and using ANY 2 (*)
to commit a transaction if any 2 standbys receive the WAL. If there is a
network partitioning between the primary and the slave, then commit will
hang from user perspective, but the commit is actually done locally, just
waiting for remote ack which is not possible because of network split. And
if patroni promotes a slave to primary, then we will lost data. Do you
think of a different design: first wait for remote ACK, and then commit
locally, this will only failed if local commit failed, but local commit
fail is much rarer than network partitioning in a cloud env: it will only
fail when IO issue or disk is full. So I am thinking of the possibility of
switch the order: first wait for remote ACK, and then commit locally.

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: qihua wu (#1)
Re: synchronized standby: committed local and waiting for remote ack

qihua wu <staywithpin@gmail.com> writes:

We are using patroni to set up 1 primary and 5 slaves, and using ANY 2 (*)
to commit a transaction if any 2 standbys receive the WAL. If there is a
network partitioning between the primary and the slave, then commit will
hang from user perspective, but the commit is actually done locally, just
waiting for remote ack which is not possible because of network split. And
if patroni promotes a slave to primary, then we will lost data. Do you
think of a different design: first wait for remote ACK, and then commit
locally, this will only failed if local commit failed, but local commit
fail is much rarer than network partitioning in a cloud env: it will only
fail when IO issue or disk is full. So I am thinking of the possibility of
switch the order: first wait for remote ACK, and then commit locally.

That just gives you a different set of failure modes. It'd be
particularly bad if you have more than one standby, because you could
easily get into a situation where *none* of the nodes represent truth.

regards, tom lane

#3qihua wu
staywithpin@gmail.com
In reply to: Tom Lane (#2)
Re: synchronized standby: committed local and waiting for remote ack

How to understand "because you could
easily get into a situation where *none* of the nodes represent truth."?

In current design, when a user commits, it will first commit on primary,
and then is waiting for slave ack. if slaves and primary are splitted in
the network, then the user commit command will hang forever, and usually
the user side has timeout setting, when it timeouts, users will THINK the
commit fails, but actually it commits on primary, so the primary doesn't
reflect the truth. If user commit first waits for slave ack, and then do
local commit, if network partitioning happens, it commits on neither
primary or slave which is good, and if network partitioning doesn't happen,
it will more likely to commit on both primary and locally and local commit
is less likely to error out.

On Sat, Jan 14, 2023 at 1:31 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Show quoted text

qihua wu <staywithpin@gmail.com> writes:

We are using patroni to set up 1 primary and 5 slaves, and using ANY 2

(*)

to commit a transaction if any 2 standbys receive the WAL. If there is a
network partitioning between the primary and the slave, then commit will
hang from user perspective, but the commit is actually done locally, just
waiting for remote ack which is not possible because of network split.

And

if patroni promotes a slave to primary, then we will lost data. Do you
think of a different design: first wait for remote ACK, and then commit
locally, this will only failed if local commit failed, but local commit
fail is much rarer than network partitioning in a cloud env: it will only
fail when IO issue or disk is full. So I am thinking of the possibility

of

switch the order: first wait for remote ACK, and then commit locally.

That just gives you a different set of failure modes. It'd be
particularly bad if you have more than one standby, because you could
easily get into a situation where *none* of the nodes represent truth.

regards, tom lane