asynchronous commit&synchronous replication

Started by Konstantin Knizhnikover 1 year ago2 messageshackers
Jump to latest
#1Konstantin Knizhnik
k.knizhnik@postgrespro.ru

Hi hackers,

Logical replication apply worker by default switches off asynchronous
commit. Cite from documentation of subscription parameters:

```

|synchronous_commit|(|enum|)<https://www.postgresql.org/docs/devel/sql-createsubscription.html#SQL-CREATESUBSCRIPTION-PARAMS-WITH-SYNCHRONOUS-COMMIT&gt;

The value of this parameter overrides thesynchronous_commit
<https://www.postgresql.org/docs/devel/runtime-config-wal.html#GUC-SYNCHRONOUS-COMMIT&gt;setting
within this subscription's apply worker processes. The default value
is|off|.
It is safe to use|off|for logical replication: If the subscriber
loses transactions because of missing synchronization, the data will
be sent again from the publisher.

```

So subscriber can confirm transaction which are not persisted. But
consider a PostgreSQL HA setup with:

* primary node
* (cold) standby node streaming WAL from the primary
* synchronous replication enabled, so that you get zero data loss if
the primary dies
* the primary/standby cluster is a subscriber to a remote PostgreSQL
server

It can happen that:

* the primary streams some transactions from the remote PostgreSQL,
with logical replication
* the primary crashes. Failover to the standby happens
* the standby tries to stream the transactions from the subscriber.
But some transactions are missed, because the primary had already
reported a higher flush LSN.

I wonder if such scenario is considered as an "expected behavior" or
"bug" by community?
It seems to be quite easily fixed (see attached patch).

So should we take in account sync replication in LR apply worker or not?

Thanks to Heikki Linnakangas <hlinnaka@iki.fi> for describing this
scenario and Arseny Sher <ars@neon.tech> for providing the patch.

Attachments:

sync_replication.patchtext/plain; charset=UTF-8; name=sync_replication.patchDownload+12-0
#2Andrey Borodin
amborodin@acm.org
In reply to: Konstantin Knizhnik (#1)
Re: asynchronous commit&synchronous replication

On 10 Aug 2024, at 17:25, Konstantin Knizhnik <knizhnik@garret.ru> wrote:

So should we take in account sync replication in LR apply worker or not?

There was some relevant discussion of this topic on PGCon2020 Unconference [0]https://wiki.postgresql.org/wiki/PgCon_2020_Developer_Unconference/Edge_cases_of_synchronous_replication_in_HA_solutions.
My recollection is that it would be nice to have LR slot setting akin to synchronous_standby_names which describes what kind of durability guarantees should be met by streamed data.

Best regards, Andrey Borodin.

[0]: https://wiki.postgresql.org/wiki/PgCon_2020_Developer_Unconference/Edge_cases_of_synchronous_replication_in_HA_solutions