Urgent :: Postgresql streaming replication issue - sync mode

Started by Shital Aover 6 years ago7 messagesgeneral
Jump to latest
#1Shital A
brightuser2019@gmail.com

Hello,

We are seeing a strange issue with postgresql streaming application in sync
mode.

We are using postgresql 9.6. Old version because of some specific
requirements. We have setup cluster with master-standby using pacemaker.

When we kill master using killall -9 postgres. The failed primary has few
records extra than standby node. We have done setup with synchronous_commit
= remote_apply and have set synchronous_standby_names=server_name.

As the failed primary is having more data, How is it possible that primary
is committing transaction before they were applied on standby with
synchronous_commit=remote_apply?

Please share if you have any thoughts. Are we missing any config ?

Thanks !

#2Ravi Krishna
srkrishna@vivaldi.net
In reply to: Shital A (#1)
Re: Urgent :: Postgresql streaming replication issue - sync mode

As the failed primary is having more data, How is it possible that primary is committing transaction before they were applied on standby with synchronous_commit=remote_apply?

If I am not mistaken remote_apply is only from ver 11.

#3Shital A
brightuser2019@gmail.com
In reply to: Ravi Krishna (#2)
Re: Urgent :: Postgresql streaming replication issue - sync mode

On Thu, 3 Oct 2019, 00:08 Ravi Krishna, <srkrishna@vivaldi.net> wrote:

As the failed primary is having more data, How is it possible that

primary is committing transaction before they were applied on standby with
synchronous_commit=remote_apply?

If I am not mistaken remote_apply is only from ver 11.

Hi Ravi,

Thanks for your reply.

This property/feature is available in 9.6.
https://www.postgresql.org/docs/9.6/runtime-config-wal.html

Thanks!

Show quoted text
#4Shital A
brightuser2019@gmail.com
In reply to: Shital A (#1)
Re: Urgent :: Postgresql streaming replication issue - sync mode

On Thu, 3 Oct 2019, 03:10 Jason Wang, <jasonwang.public@gmail.com> wrote:

I think when you use kill -9 it wouldn't give any chance for postgres to
do what it normally does. So in your case, the db was killed with no chance
to apply to remote then it would be up to the recovery to decide how to
handle the extra data at the master. I'm not sure what would happen but
killall in general is a dangerous command.

On Thu, 3 Oct 2019, 7:00 am Shital A, <brightuser2019@gmail.com> wrote:

On Thu, 3 Oct 2019, 00:08 Ravi Krishna, <srkrishna@vivaldi.net> wrote:

As the failed primary is having more data, How is it possible that

primary is committing transaction before they were applied on standby with
synchronous_commit=remote_apply?

If I am not mistaken remote_apply is only from ver 11.

Hi Ravi,

Thanks for your reply.

This property/feature is available in 9.6.
https://www.postgresql.org/docs/9.6/runtime-config-wal.html

Thanks!

Thanks Jason.

Using killall -9 we are trying to simulate the situation where primary is
stopped unexpectedly/crashed.

So in this case there is data loss because when the broken primary later
comes in sync with new primary it copies data from new primary and the data
records that were extra in old primary are lost. Can this data loss be
prevented in anyway in postgres 9.6 ? Please suggest.

Thanks!

Show quoted text
#5Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Shital A (#1)
Re: Urgent :: Postgresql streaming replication issue - sync mode

On Wed, 2019-10-02 at 23:58 +0530, Shital A wrote:

We are seeing a strange issue with postgresql streaming application
in sync mode.

We are using postgresql 9.6. Old version because of some specific
requirements. We have setup cluster with master-standby using
pacemaker.

When we kill master using killall -9 postgres. The failed primary has
few records extra than standby node. We have done setup with
synchronous_commit = remote_apply and have set
synchronous_standby_names=server_name.

As the failed primary is having more data, How is it possible that
primary is committing transaction before they were applied on standby
with synchronous_commit=remote_apply?

Please share if you have any thoughts. Are we missing any config ?

This is to be expected.

The transaction will be committed on the primary, then on the standby,
and COMMIT will only return once the standby reports success.

But the transacaction still has to be committed on the primary first.

If the standby sis promoted while COMMIT is waiting for the standby,
you can end up with the transaction committed on the primary,
but not yet committed on the standby.

You should use "pg_rewind" on the failed primary if you want to use
it as new standby for the promoted server.

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com

#6Jason Wang
jasonwang.public@gmail.com
In reply to: Laurenz Albe (#5)
Re: Urgent :: Postgresql streaming replication issue - sync mode

I read this
https://www.2ndquadrant.com/en/blog/evolution-fault-tolerance-postgresql-synchronous-commit/

But don't see why your primary would have more records than the standby?

If killall was issued before commit returned, that means the transaction
wasn't completed so yes you would lose records after last commit but that's
expected; if commit was returned both primary and standby should have the
transaction.

Are you sure in your case you end up with primary and standby with
different records from a single transaction?

On Thu, 3 Oct 2019, 9:41 pm Laurenz Albe, <laurenz.albe@cybertec.at> wrote:

Show quoted text

On Wed, 2019-10-02 at 23:58 +0530, Shital A wrote:

We are seeing a strange issue with postgresql streaming application
in sync mode.

We are using postgresql 9.6. Old version because of some specific
requirements. We have setup cluster with master-standby using
pacemaker.

When we kill master using killall -9 postgres. The failed primary has
few records extra than standby node. We have done setup with
synchronous_commit = remote_apply and have set
synchronous_standby_names=server_name.

As the failed primary is having more data, How is it possible that
primary is committing transaction before they were applied on standby
with synchronous_commit=remote_apply?

Please share if you have any thoughts. Are we missing any config ?

This is to be expected.

The transaction will be committed on the primary, then on the standby,
and COMMIT will only return once the standby reports success.

But the transacaction still has to be committed on the primary first.

If the standby sis promoted while COMMIT is waiting for the standby,
you can end up with the transaction committed on the primary,
but not yet committed on the standby.

You should use "pg_rewind" on the failed primary if you want to use
it as new standby for the promoted server.

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com

#7Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Jason Wang (#6)
Re: Urgent :: Postgresql streaming replication issue - sync mode

On Fri, 2019-10-04 at 00:34 +1000, Jason Wang wrote:

I read this
https://www.2ndquadrant.com/en/blog/evolution-fault-tolerance-postgresql-synchronous-commit/

But don't see why your primary would have more records than the
standby?

If killall was issued before commit returned, that means the
transaction wasn't completed so yes you would lose records after last
commit but that's expected; if commit was returned both primary and
standby should have the transaction.

Are you sure in your case you end up with primary and standby with
different records from a single transaction?

PostgreSQL synchronous streaming replicatoin doesn't use anything like
two-phase commit.

1. It commits the transaction locally first, which generates WAL.
2. The WAL gets replicated.
3. As soon as the standby reports success, COMMIT returns.

If there is a failure after the first step completed, the
transaction will be committed locally, but not on the standby.

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com